Generative AI Security: Preventing Microsoft Copilot Data Exposure

0

Microsoft Copilot has been called one of the most powerful productivity tools on the planet.

Copilot is an AI assistant that lives inside each of your Microsoft 365 apps — Word, Excel, PowerPoint, Teams, Outlook, and so on. Microsoft’s dream is to take the drudgery out of daily work and let humans focus on being creative problem-solvers.

What makes Copilot a different beast than ChatGPT and other AI tools is that it has access to everything you’ve ever worked on in 365. Copilot can instantly search and compile data from across your documents, presentations, email, calendar, notes, and contacts.

And therein lies the problem for information security teams. Copilot can access all the sensitive data that a user can access, which is often far too much. On average, 10% of a company’s M365 data is open to all employees.

Copilot can also rapidly generate net new sensitive data that must be protected. Prior to the AI revolution, humans’ ability to create and share data far outpaced the capacity to protect it. Just look at data breach trends. Generative AI pours kerosine on this fire.

There is a lot to unpack when it comes to generative AI as a whole: model poisoning, hallucination, deepfakes, etc. In this post, however, I’m going to focus specifically on data securityand how your team can ensure a safe Copilot rollout.

The use cases of generative AI with a collaboration suite like M365 are limitless. It’s easy to see why so many IT and security teams are clamoring to get early access and preparing their rollout plans. The productivity boosts will be enormous.

For example, you can open a blank Word document and ask Copilot to draft a proposal for a client based on a target data set which can include OneNote pages, PowerPoint decks, and other office docs. In a matter of seconds, you have a full-blown proposal.

Here are a few more examples Microsoft gave during their launch event:

Here’s a simple overview of how a Copilot prompt is processed:

With Microsoft, there is always an extreme tension between productivity and security.

This was on display during the coronavirus when IT teams were swiftly deploying Microsoft Teams without first fully understanding how the underlying security model worked or how in-shape their organization’s M365 permissions, groups, and link policies were.

Let’s take the bad news one by one.

Granting Copilot access to only what a user can access would be an excellent idea if companies were able to easily enforce least privilege in Microsoft 365.

Microsoft states in its Copilot data security documentation:

“It’s important that you’re using the permission models available in Microsoft 365 services, such as SharePoint, to help ensure the right users or groups have the right access to the right content within your organization.”

Source: Data, Privacy, and Security for Microsoft 365 Copilot

We know empirically, however, that most organizations are about as far from least privilege as they can be. Just take a look at some of the stats from Microsoft’s own State of Cloud Permissions Risk report.

This picture matches what Varonis sees when we perform thousands of Data Risk Assessments for companies using Microsoft 365 each year. In our report, The Great SaaS Data Exposure, we found that the average M365 tenant has:

Why does this happen? Microsoft 365 permissions are extremely complex. Just think about all the ways in which a user can gain access to data:

To make matters worse, permissions are mostly in the hands of end users, not IT or security teams.

Microsoft relies heavily on sensitivity labels to enforce DLP policies, apply encryption, and broadly prevent data leaks. In practice, however, getting labels to work is difficult, especially if you rely on humans to apply sensitivity labels.

Microsoft paints a rosy picture of labeling and blocking as the ultimate safety net for your data. Reality reveals a bleaker scenario. As humans create data, labeling frequently lags behind or becomes outdated.

Blocking or encrypting data can add friction to workflows, and labeling technologies are limited to specific file types. The more labels an organization has, the more confusing it can become for users. This is especially intense for larger organizations.

The efficacy of label-based data protection will surely degrade when we have AI generating orders of magnitude more data requiring accurate and auto-updating labels.

Varonis can validate and improve an organization’s Microsoft sensitivity labeling by scanning, discovering, and fixing:

AI can make humans lazy. Content generated by LLMs like GPT4 is not just good, it’s great. In many cases, the speed and the quality far surpass what a human can do. As a result, people start to blindly trust AI to create safe and accurate responses.

We have already seen real-world scenarios in which Copilot drafts a proposal for a client and includes sensitive data belonging to a completely different client. The user hits “send” after a quick glance (or no glance), and now you have a privacy or data breach scenario on your hands.

It’s critical to have a sense of your data security posture before your Copilot rollout. Now that Copilot is generally available,it is a great time to get your security controls in place.

Varonis protects thousands of Microsoft 365 customers with our Data Security Platform, which provides a real-time view of risk and the ability to automatically enforce least privilege.

We can help you address the biggest security risks with Copilot with virtually no manual effort. With Varonis for Microsoft 365, you can:

The best way to start is with a free risk assessment. It takes minutes to set up and within a day or two, you’ll have a real-time view of sensitive data risk.

This article originally appeared on the Varonis blog.

LEAVE A REPLY

Please enter your comment!
Please enter your name here