GitHub Copilot Enterprise Code Graph: What It Indexes and What That Means for Your Code Security (2026)

When GitHub launched Copilot Enterprise in February 2024, the headline feature was the "code graph": a semantic index of your entire GitHub organization's codebase that lets Copilot answer questions about code the developer doesn't have open. For founders and engineering leads evaluating the upgrade, the immediate questions are practical: what exactly does this index contain, who can see it, and what happens if your repo has something in it you didn't intend to expose?

The short answer is that the code graph indexes more than most people expect, the data lives in Azure under Microsoft's custody, and the access controls are good but require deliberate configuration.

TL;DR

Copilot Enterprise builds a semantic index (function signatures, call graphs, dependency maps) across every repo you grant it access to. It stores that index in Azure under your Enterprise contract, not on GitHub's CDN. GitHub commits to no model training on Enterprise org data. The security risk is not that Microsoft will misuse your code; it's that the code graph will index files you forgot were committed, including hardcoded secrets or internal API endpoints. Configure repo access scope before enabling.

What the Code Graph Actually Builds

Standard GitHub Copilot reads the file you have open plus a rolling context window of adjacent code. When you type, it sends that window to a model and returns a suggestion. Nothing is indexed or persisted between sessions.

Copilot Enterprise adds a pre-built semantic index called the code graph. It is built from:

Function and method signatures from all files in accessible repositories
Class hierarchies and interface definitions
Cross-file call relationships (which functions call which)
Import and dependency graphs (what each module depends on)
Symbol tables (all named identifiers and where they're defined)
Natural-language embeddings of docstrings and comments

The index is generated asynchronously by a GitHub-controlled indexer that clones your repos, parses them, builds the graph, and stores the result in Azure Blob Storage under your org's tenant. The process runs nightly for repositories you've authorized.

The code graph does not index commit history, git blame, or binary files. It indexes the current state of the default branch for each repository it can access.

When a developer asks Copilot Enterprise a question like "Where is the payment processing logic?", it queries the code graph to find semantically relevant functions before constructing the prompt. This means the model sees context from files the developer never opened.

What the Code Graph Can See in Your Repos

This is the security-relevant part. The indexer will parse any file in a repository it has access to. That includes:

.env.example files (if committed with real values, which happens more than developers admit)
Configuration files with internal service URLs or API endpoints
Test fixtures that contain sample credentials
Comments with internal architecture details ("TODO: replace with prod DB at 10.0.1.45:5432")
Any file that was accidentally committed and removed later (not in history, but potentially in a stale branch the indexer also processes)

The code graph doesn't expose these values directly to developers making suggestions, but they become part of the embedding space the model queries. A developer asking "What database connection string format does this project use?" could surface patterns that point toward sensitive configuration.

Run a secret scan on your repositories before enabling Copilot Enterprise's code graph. Any hardcoded credential that exists in a file on the default branch is a candidate for indexing. CheckYourVibe's scanner checks for this class of exposure in under two minutes.

Data Handling: What Microsoft's Contract Says

For GitHub Copilot Business and Enterprise subscribers, GitHub commits to several specific data handling behaviors that are not present in the individual plan:

No model training: Code submitted by Enterprise users (including code graph data) is not used to train GitHub's foundational models or any Microsoft product. This is a contractual commitment, not just a policy statement.

Data isolation: The code graph index for your org is stored in a tenant-scoped partition in Azure. It is not shared with other GitHub customers, even if they use the same underlying model infrastructure.

Retention: When you revoke Copilot access from a repository or delete the Copilot Enterprise subscription, GitHub commits to deleting the associated code graph index within 30 days.

Residency: As of early 2026, code graph data is processed in the United States regardless of where your GitHub organization is hosted. EU data residency for Copilot is on GitHub's roadmap but not yet generally available.

If your organization has a data residency requirement (SOC 2 Type II, GDPR, or sector-specific compliance), confirm the current Copilot Enterprise data processing addendum before enabling the code graph. The DPA is separate from the GitHub Enterprise Agreement and requires explicit acceptance.

Copilot Enterprise vs Standard Copilot: The Security Difference

Feature	Individual / Teams	Business	Enterprise
No model training guarantee	No	Yes	Yes
Org-scoped data isolation	No	Yes	Yes
Code graph (cross-repo context)	No	No	Yes
Audit logs	No	Partial	Full
Repo access controls	No	No	Yes
SAML/SSO enforcement	No	No	Yes
IP indemnity	No	No	Yes

The individual and Teams plans route your code through shared inference infrastructure with no per-customer data isolation. For any organization where code confidentiality matters, Business is the minimum; Enterprise adds the code graph and the access controls that let you scope which repos it touches.

Configuring the Code Graph Safely

Before enabling the code graph across your organization, three configuration decisions determine your exposure:

1. Scope repository access precisely

The code graph indexes every repository Copilot has access to. In GitHub's settings, you can configure Copilot Enterprise to access all repositories in the org, selected repositories, or no repositories (disabling the code graph while keeping inline suggestions).

Start with selected repositories. Include only the repos your engineers actively work in, not your internal tooling repos, deployment configs, or repositories that contain compliance-sensitive data.

2. Audit before you index

Run your repositories through a secret scanner before the first code graph build. Look specifically for:

API keys and tokens committed to any file
Database connection strings in config files
Internal IP addresses or hostnames in comments
.env.example files with real values

3. Set a re-index policy

The code graph rebuilds nightly. If a developer commits a secret today, it will be in the code graph tomorrow. Pair Copilot Enterprise with GitHub's native secret scanning push protection (or a pre-commit hook that blocks secrets) so the code graph never ingests credentials in the first place.

Pre-commit hook: block secrets before push

#!/bin/bash
# Install: cp this to .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
# Requires: gitleaks (https://github.com/gitleaks/gitleaks)

gitleaks protect --staged --no-banner
if [ $? -ne 0 ]; then
  echo "Secret detected. Commit blocked. Remove the credential and retry."
  exit 1
fi

Is the Code Graph a Security Risk?

The threat model is more mundane than "Microsoft reads your code." The realistic risk is internal: the code graph makes private information about your codebase more accessible to anyone with Copilot Enterprise access in your org.

A developer in your marketing team who has read access to a repository they'd never navigate manually can ask Copilot "What does the auth service do?" and get a detailed answer sourced from your authentication code. The code graph doesn't create new access at the filesystem level, but it dramatically lowers the effort required to explore code someone technically has permission to read.

For most engineering organizations this is a feature, not a bug. For organizations where code compartmentalization matters (financial services, defense contractors, healthcare with strict data segmentation), audit who in your GitHub org has Copilot Enterprise enabled and whether their repository access scope aligns with what the code graph can now synthesize for them.

What does GitHub Copilot Enterprise's code graph index?

The code graph indexes function signatures, class definitions, cross-file call relationships, and dependency graphs across every repository you grant Copilot access to. It does not perform real-time file reads during suggestions; it queries a pre-built semantic index stored in Microsoft Azure.

Does GitHub Copilot Enterprise store my code?

Yes. Code graph data (function signatures, call graphs, embeddings) is stored in Microsoft Azure under your GitHub Enterprise contract. GitHub commits to not using this data to train public models for Enterprise subscribers. The data is scoped to your org and is not shared across customers.

Can GitHub Copilot Enterprise see my .env files or secrets?

If those files are committed to a repository Copilot has access to, yes: they are candidates for indexing. GitHub's secret scanning runs separately and can flag known secret formats, but it does not prevent them from being indexed by the code graph. Keep secrets out of your repository entirely.

Is GitHub Copilot Enterprise safer than the individual plan for companies?

Yes for data isolation. Enterprise adds org-level access controls, no-training guarantees, audit logs, and the ability to restrict which repositories Copilot can access. The individual plan has no org-scoped data commitments.

How is Copilot Enterprise's code graph different from standard Copilot?

Standard Copilot reads only the open file and adjacent context window. Copilot Enterprise's code graph can cross file boundaries, surface relevant functions from across your entire codebase, and answer questions about code the engineer doesn't have open.

Audit Your Repos Before Enabling Code Graph

CheckYourVibe scans for hardcoded secrets, exposed API keys, and insecure configuration files in under two minutes.