Find publicly exposed source code repositories
Repo Lookout is a large-scale security scanner, with a single purpose: Find source code repositories that have been inadvertently exposed to the public and report them to the domain’s technical contact.
Accidentally exposed source code repositories often contain highly sensitive information that can be used for downstream attacks, such as data leakage and ransomware extortion. While the problem has been known and extensively documented for years,1 2 3 4 our findings show that it is still prevalent.
Our goal is to combat this vulnerability by automatically detecting and reporting instances.
URL index
The URL index for the scanning process is obtained from several sources:
- CommonCrawl builds and maintains an open repository of web crawling data.
- Tranco List is a research-oriented top site ranking hardened against manipulation.
- Chrome UX Report provides metrics for how real-world Chrome users experience the web.
Statistics
Including our latest security scan on February 28, 2023 , we have scanned 5,045,968,136 URLs on 541,366,546 domains.
A total of 589,976 publicly exposed source code repositories have been found to date.
Frequently Asked Questions
How to prevent exposed repositories?
It’s recommended to configure the web server to deny access to all “dot folders” (i.e., folders starting
with a “.
”).5 However, to prevent Git respositories from
being exposed, it’s sufficient to deny access to “.git
” folders.
Exactly how to do this depends on the server software used, but here are some configuration examples for nginx, Apache, and Caddy.
How to opt-out of being scanned?
The security scanner is designed to be a good network citizen, so requests are throttled and bandwidth usage is minimal. However, we do understand that not every website may welcome the scanning process.6
There are two ways to opt-out of the scanning process:
Send us an opt-out email with the domain name, IP, IP range(s), or ASN. In the case of IP range(s) or ASN, we will request log entries from previous scan as a means of authentication.
Deny all requests with an HTTP User-Agent prefix of “
RepoLookoutBot
”.
Sponsoring
To support this project, consider becoming a sponsor on Ko-fi. All tips will be used for the crawling and email infrastructure.
Thank you very much!
How unprotected .git repositories compromise website security (German, 2015)
Source code disclosure via exposed .git folder (English, 2018)
Open .git global scan (English, 2018)
Finding exposed .git repositories (English, 2020)
With the exception of the “
.well-known
” folder, which is defined in RFC 8615E.g. Fail2Ban is sometimes configured to trigger alerts on scanning
.git
folders