Machine learning in the hands of cybercrime

Sheila Zabeu -

September 09, 2021

Cybercrime has always been at the forefront of the technological race. All it takes is for new technology to emerge and it can also be used for malicious purposes, or for more vulnerabilities to appear that threaten the security of companies and individuals. And, in most cases, the experience with these threats is accompanied by high financial losses, to say the least.

And with machine learning, it was no different. This technology, along with other modalities of artificial intelligence (AI), has been used for several years in fields ranging from medical diagnostics to autonomous vehicles. However, there is a strand with not-so-beneficial objectives, called adversarial machine learning, which exploits weaknesses in artificial intelligence algorithms to achieve malicious goals by performing subtle manipulations on observed data that go unnoticed by the naked eye, but not by AI systems. In the cybersecurity universe, the term “adversary” is used to describe individuals or machines that attempt to hack or corrupt software, systems, or networks.

To understand the proposed new technique to protect against this type of attack, developed by researchers at Carnegie Mellon University and the KAIST Cybersecurity Research Center, it is worth understanding details of adversarial machine learning.

Perhaps one of the most emblematic cases of the use of adversarial machine learning involves the image of a panda, which has received a small modification very well calculated to the point of being recognized by an artificial intelligence system as a gibbon, a category of apes. And worse, with high reliability. In practical life, the use of a similar technique has the potential to generate great damage. For example, traffic signs modified with stickers or paint can lead an autonomous vehicle to cause accidents by circumventing the recognition of AI systems.

Source: OpenAI

Creating adversarial machine learning cases that are successful from a cybercrime perspective is a process of trial and error, making small changes to pixels and submitting the image to the AI model to see how trust levels behave. Often, this process can be automated.

Moreover, these cases do not only apply to visual data, but also to text and audio. For example, the well-known Amazon Alexa and Apple Siri assistants use automated speech recognition systems to analyze voice commands. So, a YouTube video could be modified to include a specific, malicious command not recognizable by human hearing. When that audio was played, the intelligent assistant’s machine learning algorithm would execute the hidden command, with the results desired by the cybercriminal.

How to protect yourself

To guard against adversarial machine learning, one of the ways is precisely to train AI systems against this type of practice in supervised mode to make them more robust to perturbations in the input data. In general, a large batch of adversarial machine learning examples is generated and the system is tuned to correctly classify the cases. This training can incur high costs for case creation and training and usually sacrifices model performance in daily practice. It is also not guaranteed to work against attack tactics for which it has not been trained.

A new technique, developed by researchers at Carnegie Mellon University and KAIST Cybersecurity Research Centre and recently presented at the Adversarial Machine Learning Workshop (AdvML), employs unsupervised learning and seeks to discover which input data may have undergone malicious modifications.

Scientists have found a link between adversarial machine learning attacks and explainability, i.e. the ability of AI systems to explain their decisions. In many machine learning models, decisions are difficult to track due to the large number of parameters involved in the inference process. However, researchers have developed different methods that can help understand the decisions made by machine learning models.

Explainability techniques produce saliency maps that explain how features of the input data are scored based on their contribution to the final result. So when an image is modified with small perturbations, the new method developed by Carnegie Mellon and KAIST finds abnormal results by subjecting it to an explainability algorithm. In other words, the technique detects instances of adversarial machine learning based on explainability maps.

“Our recent work started with a simple observation that adding small noises to the inputs resulted in a big difference in explanations,” Gihyuk Ko, Ph.D., of Carnegie Mellon University, told the TechTalks website.

The scientists tested the method using MNIST, a dataset of handwritten digits commonly used to evaluate different machine learning techniques. According to the group of researchers, the unsupervised method was able to detect several instances of adversarial machine learning with equal or better performance than other known techniques.

In the future, the researchers intend to test the method with more complex datasets, such as CIFAR10/100 and ImageNet, and with more complicated attacks.

Want to know more about adversarial machine learning? Check out some websites below: