Compiler flaw facilitates ‘supply chain’ attacks

Sheila Zabeu -

November 03, 2021

Researchers at Cambridge University have identified a flaw that affects most compilers and allows vulnerabilities invisible to the human eye to be introduced into programs under development. These targeted vulnerabilities could pave the way to trigger supply chain attacks, which became more widely known after the SolarWinds case that surfaced in late 2020 with a flurry of compromised systems at private sector companies and major US government agencies.

In an article entitled “Trojan Source: Invisible Vulnerabilities“, the researchers explain that developing systems, such as open-source ones, rely on human review to detect malicious contributions from volunteers.

So how is it possible to trick compilers into generating binary code that does not match what is being programmed, the so-called Trojan horses in source code form? Some tricks have been discovered to manipulate the source code files so that human reviewers and compilers see different logic. According to researchers, this attack works with C, C ++, C #, JavaScript, Java, Rust, Go, and Python languages. It is suspected to work with most other modern languages.

The weak point has to do with the Unicode component and its bidirectional or “Bidi” algorithm. In detail, Unicode is the standard that matches a character to a unique number, understandable by computers regardless of platform, program, or language. The Bidi algorithm, on the other hand, deals with differences in meaning in the display of texts – for example, when a sentence in English, which is read from left to right, is quoted in an Arabic newspaper, whose language is interpreted from right to left.

Source: Another paper by Cambridge researchers – Attack using reordering in a machine translation system. The red circle indicates the sequence of characters encoded in reverse order

This loophole had already been exploited more than 10 years ago to disguise the file extensions of malware files disseminated by email.

In the specific case of compilers and programming languages, the display order control characters can be used freely, seemingly innocuously, and away from the eyes of proofreaders. Several layers of text sequence and such codes can make a seemingly random reordering, but which is actually an anagram of refined logic. In general, compilers do not process formatting control characters, including Bidi codes, so this loophole can be exploited to architect instructions that will not be picked up visually. Most well-crafted programming languages do not allow the use of arbitrary control characters in source code, as they are seen as capable of affecting the programmed logic. In such cases, introducing Bidi control characters into the source code will usually result in a syntax error.

In the paper, the researchers point out that countermeasures can be taken at various levels, from language and compiler specification to code repository and development pipeline. They also believe that a long-term solution to the problem will be implemented in the compilers themselves. According to them, almost all already defend against creating functions with zero-character names, for example. Others already accuse errors in response to tricks that exploit hieroglyphics in function names.

In addition, the researchers claim to have made a responsible disclosure to all companies and organizations responsible for the products in which the vulnerabilities were discovered. They say they offered them a 99-day embargo after the first disclosure to allow the affected products to be patched. There have been several responses, ranging from patch commitments and bug bounties to summary dismissal and references to legal policies.