February 9, 2026 • 2 min read • Fateh Mohammed

Secret Detection in Pull Requests: Beyond Regex Patterns

Regex catches known secret formats but misses custom tokens and context-dependent credentials. Here is how layered detection closes the gap.

EngineeringDetectorsSecrets

Secret detection is one of the highest-value checks you can run on a pull request. A leaked API key or database credential in a public repository can be exploited within minutes.

Most secret scanners rely on regex patterns. That works for well-structured tokens like AWS access keys or GitHub personal access tokens. It does not work for everything.

Where regex falls short

Regex-based detection struggles with:

Custom tokens — internal API keys that do not follow a standard format.
Connection strings — database URLs embedded in configuration that vary by provider.
Context-dependent secrets — a base64 string that looks benign in isolation but is a signing key in context.
Multi-line secrets — PEM keys or certificates split across lines.

Layered detection

A robust secret detector uses multiple strategies:

1. Pattern matching

Start with regex for known formats. This catches the majority of structured secrets with zero false positives.

2. Entropy analysis

High-entropy strings in assignment contexts (environment variables, config files, constructor arguments) are likely secrets even without a known format.

3. Semantic analysis

Examine variable names, file paths, and surrounding code. A variable named db_password assigned a high-entropy string is a stronger signal than either indicator alone.

4. Contextual validation

Check whether the file is a test fixture, documentation example, or production configuration. Test fixtures with fake credentials should not block a merge.

Practical considerations

False positives in secret detection are especially costly because they erode trust fast. Developers who see three false positive secret alerts in a week will start ignoring all of them.

Balance sensitivity with precision. A detector that catches 95% of real secrets with 1% false positives is more useful than one that catches 99% with 20% false positives.