How to Prevent Secret Leaks in Your Repositories

Securing secrets exposed through Git repositories is crucial, considering it is one of the common ways of becoming vulnerable to security breaches. A survey by GitGuardian’s State of Secrets Sprawl 2023 report states that 1 in 10 authors exposed a secret while pushing code in GitHub and 10Mn new secrets detected in public GitHub commits in just the year 2022 (an increase of 67% compared to 2021).

Git repositories can be secured by enabling secret scanning in repositories for any secrets already committed so that the exposed secrets can be identified and removed. While running scans can be a first step, one can also run checks for secrets getting committed during each commit so that such potential leaks are identified even before they happen, ultimately helping to follow the “Shift left” practice during code integration.

In this blog post, we will explore features in common Git hosting platforms that help avoid secret exposure. We will also explore an independent open source tool as an alternative.

How does secret scanning usually work?

Secret scanning relies on regular expressions (regex) to identify patterns associated with sensitive information in code repositories. Regular expressions are sequences of characters that define a search pattern and are widely used for text matching. The secret scanning regex patterns are crafted to recognize standard formats of sensitive data, such as API keys, passwords, and access tokens.

For instance, a regex pattern to catch any AWS access keys exposed might be something like AKIA[0-9A-Z]{16}. This regex pattern matches strings starting with “AKIA” followed by exactly 16 characters, each character being a digit or an uppercase letter. This could be used to identify AWS Access Key IDs, which are 20-character identifiers starting with “AKIA.”

Securing secrets in Git hosting platforms with in-house features

With the recognition of the increasing number of secret exposures in the Git hosting platforms, many of the platforms have taken the initiative to address this issue by providing in-house features to avoid secret exposure on their platforms, and, if already exposed, ways to proactively flag it and take necessary action. Simple settings are available in these platforms to activate functionalities such as secret scanning, alerts for secrets pushed in pull requests, and blocking any incoming pull requests with exposed secrets.

Some of the platforms also have vendor partnerships to identify tool-specific vulnerabilities and actively maintain a list of patterns to recognize secrets, which are constantly updated. Utilizing AI (Artificial intelligence) is another option to extend the capabilities of secret scanning.

Let’s have a brief look at what functionalities the major Git hosting platforms provide, which may or not fit your requirements for a secure repository.

1. Secret scanning with GitHub repository

GitHub provides the feature to scan for secrets existing in your repository as well as to put a restriction on pushing any secrets to the repository.

Enabling GitHub secret scanning

(Enabling GitHub secret scanning)

GitHub enables secret scanning for:

Owners of public repositories on GitHub.com.
Organizations with public repositories.
Organizations using GitHub Enterprise Cloud for public repositories (free) and private/internal repositories (with GitHub Advanced Security license).

When secret scanning identifies a secret in a commit, issue description, or comment, GitHub generates an alert. Alerts are sent to the registered email and can be viewed in the Security tab of the repository.

Also, GitHub allows you to enable secret push restriction at the repository setting and at the user level as a setting to avoid pushing to any repository, even if the setting is not enabled at the repository level. Check out the below links:

It is interesting to note that GitHub has a feature to use AI for generating regular expression which comes with an enterprise account. It is still in the beta stage but could help catch more secrets through regular expression. Further, you can read about troubleshooting secret scanning to understand why secret scanning might miss your secrets while scanning.

2. Secret scanning with Azure DevOps repository

Microsoft Azure provides secret scanning in the Azure DevOps repository under the GitHub Advanced Security license for the Azure DevOps repository. It provides secret scanning and push protection along with other features such as dependency scanning and code scanning.

Enabling Azure DevOps secret scanning

(Enabling Azure DevOps secret scanning)

Secret scanning push protection and repository scanning are automatically enabled when you turn on Advanced Security. The scan details are displayed on the Azure DevOps Advanced Security page, where the user can take action on the alert.

Push protection works to stop any commits done in the command line and web interface as well. You can read the official documentation to learn more about Azure DevOps secret scanning.

3. Secret scanning with Bitbucket repository

Bitbucket scans your repositories for secrets and triggers notifications when leaked secrets are detected within new commits. Email notifications are sent to everyone involved in the commit history of the secret: the authors, committers, and the developer who pushed or merged the code containing secrets into the repositories.

Enabling Bitbucket secret scanning

(Enabling Bitbucket secret scanning)

Secret scanning is enabled by default in the Bitbucket instance, and both global and system admins can disable or enable secret scanning by modifying the configuration properties in the bitbucket.properties file. The functionality is provided without any additional cost. However, while writing this blog post, Bitbucket does not provide the functionality to proactively prevent any secrets from being committed to the repositories.

You can read the official website page to learn more about this feature of the Bitbucket repository.

4. Secret scanning with GitLab repository

GitLab proactively scans Git repositories to detect potential secrets (API keys, passwords, tokens, etc.) before they’re accidentally committed and exposed. Secret scanning in GitLab runs a job to scan for secrets in the repository once it is enabled. It can be configured by enabling Auto DevOps or editing .gitlab-ci.yml.

Secret scanning with GitLab repository

GitLab provides the functionality as a combination of free and paid tier. In the free tier, one can run a scan, but certain features, such as security dashboards, are not available.

You can read official documentation to learn more about GitLab’s offering to secure secrets.

Open source tools

While we explored the in-built functionality provided by Git hosting platforms, there are scenarios where users may not be able to roll the end-to-end security in place to prevent any secret exposure in the Git repository due to certain constraints such as missing features, etc. End-to-end security means not only securing secrets already exposed in the Git repository but also adding a preventive step for users while doing commits locally. Some repository platforms provide all these features, such as GitHub and Azure DevOps repository, while others do not. At the same time, one also needs to buy an additional license to enable these features on some of the platforms. A crucial point to note is that if there is no step to stop the user from committing secrets locally, the secret is anyway compromised even if you have restrictions to disallow remote commits since the secrets may leave the user’s system.

To address these, we will explore open source tools as an alternative. We will check out how you can leverage Gitleaks in different ways to enforce security around secrets. The steps used for Gitleaks are similar to some of the other tools. While there are multiple tools available, one can select the tool based on the exact requirement. Below is a summary of the common tools and their features:

Feature	TruffleHog	Gitleaks	git-secrets	Infisical
Language	Python	Go	Shell	Go
Secret Detection Methods	Regex patterns, keyword lists, custom detectors	Regex patterns, custom patterns, machine learning	Regex patterns, custom patterns	AI-powered secret detection, pattern matching, anomaly detection
Ease of Use	Easy to set up and run	Easy to set up and run	Easy to set up and run	Requires minimal user configuration
Customization	Highly customizable with custom detectors and patterns	Moderately customizable with custom patterns	Limited customization options	Highly customizable with flexible policies and configuration options
Integrations	CI/CD pipelines, IDEs, security platforms	CI/CD pipelines, GitHub, GitLab	CI/CD pipelines, GitHub, GitLab	Extensive integrations with DevOps tools and platforms
Pricing	Open source (free)	Open source (free)	Open source (free)	Open source + paid (with additional features)
Overall Suitability	Suitable for developers and small teams	Suitable for developers and small teams	Suitable for developers and small teams	Suitable for enterprises and organizations with complex security needs
⭐ on Github	13K	14K	12K	11K

Secret scanning with Gitleaks

Gitleaks is an open source independent tool for detecting and preventing hardcoded secrets like passwords, API keys, and tokens in Git repositories. Gitleaks is an easy-to-use, all-in-one solution for detecting secrets, past or present, in your code. It can be a great option in case one is unable to implement steps to secure leaks of secrets into the Git repository end-to-end, starting from preventing any commits made by users in the local system to scanning for any secrets already pushed to repositories.

Gitleaks can be utilized to scan for secrets in:

GitHub action pipeline.
As a pre-commit file to scan for secrets every time a user runs a commit command.
To scan for secrets in the user’s local repository before and after the Git commit with the Gitleaks CLI command.

1. Using Gitleaks in GitHub action pipeline

A simple example of using Gitleaks in your GitHub action is as follows:

name: gitleaks
on:
  pull_request:
  push:
  workflow_dispatch:
  schedule:
    - cron: "0 4 * * *" # run once a day at 4 AM
jobs:
  scan:
    name: gitleaks
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
      - uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: $
          GITLEAKS_LICENSE: $ # Only required for Organizations, not personal accounts.

The GitHub Actions workflow will automate the process of running Gitleaks to scan a repository for sensitive information, and it can be triggered by various events such as pull requests, pushes, manual requests, or on a scheduled basis.

2. Using Gitleaks as a pre-commit

Whenever a user clones the repository and tries to commit a file with a secret, it will give a warning to remove the secrets with Gitleaks as a pre-commit file. Follow the steps to implement it:

Install pre-commit.

Create a .pre-commit-config.yaml file at the root of your repository with the following content: repos:

  repos:
    - repo: https://github.com/gitleaks/gitleaks
      rev: v8.16.1
      hooks:
        - id: gitleaks

Auto-update the config to the latest repository version by executing pre-commit autoupdate.
Install with pre-commit install.

Now you’re all set!

3. Using Gitleaks in the user’s local system

Pre-commit - The protect command can scan uncommitted changes in a Git repository. This command should be used on developer machines to follow shift left practice in security.

In your repository, run the below command, and it will let you know if any secrets are detected. You can use --staged along with the command to check for any secrets after adding new files to the staging area.
```
    gitleaks protect -v
```

Post-commit - The detect command can be used to scan repositories, directories, and files for scanning any secrets committed

In your repository, run the below command, and it will let you know if any secrets are detected.
```
    gitleaks detect -v
```

Gitleaks allows you to add custom configuration rules if you want to detect a particular pattern of secrets. It can be quite helpful to control false positives or to catch any particular pattern of secrets. For installation and to read more on the tool, check out the official GitHub documentation for Gitleaks.

Conclusion

Securing secrets can seem like a trivial task, but can be a crucial part of your step to have a secure repository and avoid unwanted security breaches. While some of the Git repository host platforms provide in-house features to scan your repository for secrets and also provide a barrier to any commits with secrets, some do not. You can leverage an independent open source tool like Gitleaks to prevent secret leaks in your repositories.

Furthermore, you might want to do some additional configuration work to ensure you can catch any secrets with particular patterns or ignore certain ones you do not want to flag, which are false positives. While securing the repository is one of the important measures to implement security, also check out implementing DevSecOps to secure your CI/CD pipeline for achieving adequate security in place for your organization from the integration of code to deployment and monitoring.

I hope you found this blog post informative and engaging. I’d love to hear your thoughts on this post. Let’s connect and start a conversation on LinkedIn.

Looking for help with securing your infrastructure or want to outsource DevSecOps to the experts? Learn why so many startups & enterprises consider us as one of the best DevSecOps consulting & services companies.