Google to improve software supply chain security

Sheila Zabeu -

November 03, 2022

Software supply chain security has been haunting the world in recent months. Attacks of this kind happen when a malicious agent infiltrates the network of a particular software vendor and compromises, for example, open-source solutions or even update tools, which large enterprises with critical infrastructure can eventually use. When these users inadvertently install or update these systems, they become victims and even propagators of such a threat.

According to data from Sonatype‘s 8th annual State of the Software Supply Chain Report, an average 700% increase in attacks on open-source software repositories has been recorded over the past three years. According to the research, these repositories contribute malware-infected software components that are distributed and adopted by applications trusted by businesses and consumers.

“Almost every modern enterprise relies on open source. Clearly, the use of open-source repositories as an entry point for cyberattacks shows no signs of slowing down, making early detection of known and unknown vulnerabilities more important than ever. Barring malicious components is critical to risk prevention and should be part of every conversation about protecting software supply chains,” says Brian Fox, co-founder and technical director at Sonatype.

Within this context, Google announced a new open-source project called Graph for Understanding Artifact Composition or GUAC. Still, in the early stages, the work aims to help change the industry’s understanding of software supply chains by generating construction, security and dependency metadata for various types of software and democratising the availability of this information by making it widely and freely accessible.

According to Google, data such as that from SBOM (Software Bills of Materials) solutions, signed attestations on how software was developed (SLSA, Google Cloud Build, for example), and vulnerability databases are currently available and helpful but are difficult to combine into a comprehensive view.

To solve this problem, Google has created a free tool to combine different software security metadata sources. The intention is to compose a high-fidelity graphical database, normalising the identities of organisations in the field and mapping the relationships between them. By querying this graph, high-level organisational outputs can be generated for use in audits, policy, risk management and even as support for developers.

Conceptually, the GUAC tool occupies the “aggregation and synthesis” layer of the software supply chain transparency logic model. The four main functionalities of the GUAC tool are:

Collection: GUAC can be configured to connect to multiple sources of software security metadata. These can be open and public sources, primary sources (for example, internal repositories), or third-party proprietary sources.

Ingestion: From these sources, GUAC imports data about artefacts, projects, resources, vulnerabilities, repositories, and even developers.

Grouping: After ingesting raw metadata from different sources, GUAC puts it into a coherent graph, normalising entity identifiers, traversing the dependency tree and defining implicit relationships, e.g. between project and developer, between vulnerability and software version, between artefact and source repository and so on.

Query: In this graph, you can query metadata associated with entities. The query of a particular artefact can return its SBOM, provenance, development chain, project scorecard, vulnerabilities and recent life cycle events.

In short, GUAC will aggregate and synthesise software security metadata at scale and make it relevant and accessible. With this information in hand, it will be possible to answer three categories of common questions about software supply chain security:

From a proactivity perspective: What are the software supply chain ecosystem’s most commonly used critical components? Where are the weaknesses in the overall security posture? How can supply chain compromises be prevented before they occur? Where do the risk dependencies lie?

From an operational perspective: Is there evidence that the application to be installed meets organisation policy? Can all binary files in production be traced to a securely managed repository?

From a reactive point of view: Which parts of the organisation’s inventory are affected by a particular vulnerability? In the case of a suspicious event in the project lifecycle, where have risks been introduced to the organisation? In the case of an open-source project being preempted, how is the organisation affected?