AI creates fake versions to protect documents

Sheila Zabeu -

August 11, 2021

Just as highly valuable jewelry and paintings have fake versions, now it’s the turn of valuable documents to also gain convincing fakes. The fake good will be an algorithm called WE-FORGE (Word Embedding-based Fake Online Repository Generation Engine) that generates specific lures for technical documents. However, according to the Dartmouth College developers responsible for creating the new technique, it could be employed to produce fake versions of any document that needs to be protected.

WE-FORGE uses Artificial Intelligence (AI) to apply a method of espionage called the “canary trap” that spreads multiple versions of fake documents to conceal sensitive information. This technique can be used to sniff out information leaks or, as in the Second World War, to create distractions to hide valuable data. WE-FORGE, on the other hand, automatically creates fake documents to protect intellectual property such as drug formulations and military technology.

Cybersecurity experts already use this scheme to create traps for potential attackers. WE-FORGE has refined the idea using natural language processing and the insertion of random elements to prevent enemies from easily identifying the original document. According to the researchers, a single patent, for example, includes more than a thousand concepts with up to 20 possible substitutions, so WE-FORGE can consider millions of possibilities for a single technical document. The algorithm also allows the author of the original document to make suggestions, so the combination of human and machine ingenuity can make the job of intellectual property thieves even harder.

V. S. Subrahmanian, a cybersecurity researcher at Dartmouth College, said he thought of this project after reading that new types of cyberattacks take an average of 312 days to be discovered. “Criminals have almost a year to get away with all the documents, all the intellectual property. That’s enough time to steal almost everything,” he warns.

Historically, canaries were used by miners to detect toxic gases and thus protect them from inhaling hazardous substances. In the field of cybersecurity, a canary device generally impersonates another device in order to attract attackers. In the specific case of archives, virtual canaries are designed to trigger alerts when attackers access documents.

This type of protection scheme can be a cost-effective threat evidence gathering solution that enables IT, teams, to respond quickly to intrusions. Virtual canaries also serve as a continuous monitoring system for networks. When strategically placed, they can alert administrators when and how attackers attempt to penetrate networks.

Canary tokens and honeypots have similar goals but use different approaches. A honeypot pretends to be an attractive target for a cybercriminal. When the attacker falls for the bait, IT administrators can study their behavior and gather important information about the nature of the threat.

But what is a canary token? A canary token can be used to track the behavior of cybercriminals. They are deployed in regular files and when the user accesses the file or runs a process, a message is sent to whoever deployed the token. When cybercriminals open the token, you get the IP address and name of the token, as well as the time the file was accessed. In short, a honeypot provides a place for attackers to play, while a canary token gives them a toy to play with. With both solutions, once attackers fall into the trap, you can gather valuable information about them.