Microsoft AI Research Unintentionally Leaks Terabytes of Sensitive Data

NNicholas September 18, 2023 3:52 PM

In an unfortunate incident, Microsoft AI researchers unintentionally exposed tens of terabytes of sensitive data, including private keys and passwords. The data leakage occurred when they published a storage bucket of open-source training data on the popular platform, GitHub.

Accidental exposure of sensitive data

Microsoft's AI division, while intending to publish a bucket of open-source training data on GitHub, ended up unintentionally exposing a colossal amount of sensitive data. This included private keys and passwords, which are critical security elements. This inadvertent data leak highlights the potential security risks associated with handling and sharing large volumes of data in an open-source environment.

The GitHub repository that was offering open-source code and AI models for image recognition inadvertently led users to a misconfigured Azure Storage URL. This URL, discovered by cloud security startup Wiz, had been set up to provide permissions to the entire storage account instead of just the required data, causing further data to be exposed accidentally. This highlights the importance of correctly configuring permissions, especially in cloud storage solutions, to prevent unintentional data leaks.

'Full control' permission poses risks

The same URL, which had been exposing data since 2020, was not only misconfigured but also set to allow 'full control,' as opposed to 'read-only' permissions. This, according to the findings by Wiz, presented an even greater risk, as it meant that anybody who stumbled upon the URL and knew how to exploit this could potentially delete, replace, or even inject malicious content into the exposed data. This incident underscores the crucial role of correct permission settings in securing data.

Upon discovering the issue, Wiz informed Microsoft about their findings on June 22nd. Responsively, Microsoft revoked the problematic SAS token just two days later, showcasing its quick action against potential data breaches. It took until August 16th for Microsoft to complete an investigation into the potential organizational impact of the data exposure, indicating the seriousness with which the tech giant took the situation.

Enhancement of GitHub's secret scanning service

As a result of Wiz's revelation, Microsoft took proactive measures to enhance its GitHub's secret scanning service. This service, which keeps an eye on all changes to public open-source code for any plaintext exposure of credentials and other secrets, has now been broadened to include any SAS tokens that may have overly generous expiration times or privileges. This step underscores Microsoft's commitment to improving security measures and mitigating future data exposure risks.

More articles

Also read

Here are some interesting articles on other sites from our network.