A powerful OSINT tool for analyzing GitHub profiles and detecting suspicious activity patterns. This tool helps identify potential bot accounts, scammers, and fake developer profiles by analyzing various aspects of GitHub activity. Comes together with a set of handy tools for scanning and extracting multiple types of metadata from Github profile, organization or repository.
NOTE: For the comprehensive solution of monitoring your Github organization, analyzing contributors and active alerting system against potential impersonation or other Github related threats - contact SEAL911. SEAL operates the project-wide version of the software. This package is not optimized for speed. Its main goal is supporting individual security researchers.
NOTE: The project was possible thanks to the contribution from Ethereum Ecosystem Support Program. All of the investigations conducted by Ketman Project were made with help of gh-fake-analyzer
.
- Profile Analysis: Download and analyze complete GitHub profile data
- Commit Analysis: Detect copied commits and suspicious commit patterns
- Identity Detection: Track email/name variations and potential identity rotation
- Organization Scanning: Analyze contributors across entire organizations and repositories
- Activity Monitoring: Real-time monitoring of profile changes and activities
- Advanced Tools:
- Commit author lookup
- Activity checking
- Search result dumping
- Organization scanning
- Repository scanning
- Finding interesting files in repositories
- Automatically flagging account against list of your own IOCs
pip install gh-fake-analyzer
# or, if you are still using the old version
pip install gh-fake-analyzer --upgrade
- Python 3.7 or higher
- Git installed on your system (
sudo apt install git
)
You need a GitHub API token for full functionality. Set it up in one of these ways:
- Create a
.env
file withGH_TOKEN=<your_token>
- Use
--token <your_token>
flag when running commands - Set environment variable:
export GH_TOKEN=<your_token>
NOTE: If you cloned the repository before version 1.0.0 release, re-download the whole package. We did a significant commit re-write to make the repository more light-weight.
# Clone the repository
git clone https://github.com/shortdoom/gh-fake-analyzer.git
cd gh-fake-analyzer
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate
# Install the package in development mode
pip install -e .
- Create a local
config.ini
file in your working directory:
[LIMITS]
MAX_FOLLOWING = 1000
MAX_FOLLOWERS = 1000
MAX_REPOSITORIES = 1000
CLONE_DEPTH = 100
CLONE_BARE = True
MONITOR_SLEEP = 10
REMOVE_REPO = True
- Set up your GitHub token in
.env
:
echo "GH_TOKEN=your_token_here" > .env
- Test the installation:
gh-analyze --help
The most common flow for using the gh-fake-analyzer
in CTI related tasks is to:
gh-analyze <username>
# or
gh-analyze --targets <path/to/newlinefile/targets>
# then, for a quick view (supply full path in place of <username> if report is not in the standard out/ path)
gh-analyze --parse <username> --summary
# other often used command is to extract full contributors information from organizations. also works with list of --org-targets.
gh-analyze --tool scan_organization --scan-org <org_name>
# optionally, append --full-analysis to immediately perform full scan on each contributor
gh-analyze --tool scan_organization --scan-org <org_name> --full-analysis
# you could also scan individual repository
gh-analyze --tool scan_repository --scan-repo owner/repo_name --full-analysis
# scan_organization and scan_repository will run `check_activity` tool in the background against all found contributors. for best effect, supply the list of usernames and organizations in target_list/ files to signal if any of those were found in contributors data.
gh-analyze --tool check_activity --targets <file>
It is a good practice to update target_list/
files with your own indicators (usernames and names of organizations). Avoid commiting changed to those files in public.
USERNAMES
- list of usernames you consider suspicious and would like to be informed if scanned account has any relation to.
ORGANIZATIONS
- list of organizations you consider suspicious and would like to be informed if scanned account has any relation to.
# Analyze a single user
gh-analyze <username>
# Analyze multiple users from a file (one username per line)
gh-analyze --targets <file>
# Custom output directory
gh-analyze <username> --out_path /path/to/dir
# Include forked repositories in analysis (default: off)
gh-analyze <username> --forks
# Only fetch basic profile data (no commits, followers, etc.)
gh-analyze <username> --only_profile
# Regenerate report from existing data without fetching from GitHub
gh-analyze <username> --regenerate
# Search for copied commits in a specific repository
gh-analyze <username> --commit_search <repo_name>
# Search for copied commits across all repositories
gh-analyze <username> --commit_search
# Monitor user activity in real-time
gh-analyze <username> --monitor
# Monitor multiple users from a file
gh-analyze --targets <file> --monitor
# Parse and display specific data from an existing report (<username> needs to be in the default out/ directory, otherwise - supply full path)
gh-analyze --parse <username> --key <output_key>
# Display summary of profile (<username> needs to be in the default out/ directory, otherwise - supply full path)
gh-analyze --parse <username> --summary
# Quick-dump specific data to a file
gh-analyze --parse <username> --key unique_emails >> dump.txt
# Scan a single organization
gh-analyze --tool scan_organization --scan-org <org_name>
# Scan multiple organizations from a file
gh-analyze --tool scan_organization --org-targets <file>
# Perform full analysis for each contributor (generates report.json file for each contributor)
gh-analyze --tool scan_organization --scan-org <org_name> --full-analysis
# scan individual repository
gh-analyze --tool scan_repository --scan-repo owner/repo_name
# Get detailed commit author information
gh-analyze --tool get_commit_author --commit-author <sha>
# Search GitHub users
gh-analyze --tool dump_search_results --search "<query>" --endpoint users
# Search GitHub code
gh-analyze --tool dump_search_results --search "<query>" --endpoint code
# Check activity patterns of multiple users, requires targets/connections_filter/usernames file with list of target usernames to check activity against
gh-analyze --tool check_activity --targets <file>
# Find interesting files in user's repositories (.txt, .pdf, binary files etc.)
gh-analyze --tool find_interesting_files <username>
# Find interesting files for multiple users from a file (.txt, .pdf, binary files etc.)
gh-analyze --tool find_interesting_files --targets <file>
# Custom output directory
gh-analyze --tool find_interesting_files <username> --out_path /path/to/dir
# Disable logging to script.log
gh-analyze --logoff
It's possible to develop your own tools by re-using methods accessible in modules/analyze.py
. Inspect existing tools
code for examples and inspiration.
The tool uses a configuration file at ~/.gh_fake_analyzer/config.ini
. You can create a local config.ini
to override settings:
[LIMITS]
MAX_FOLLOWING = 1000
MAX_FOLLOWERS = 1000
MAX_REPOSITORIES = 1000
CLONE_DEPTH = 100
CLONE_BARE = True # False if you want to save the source code
MONITOR_SLEEP = 10
REMOVE_REPO = True # False if you want to save the source code
Analysis results are saved in the out
directory with the following structure:
The report.json
file contains comprehensive data about the analyzed GitHub profile:
profile_info
: Basic GitHub user profile datalogin
: GitHub usernamename
: Display namelocation
: User's locationbio
: Profile biocompany
: Company/organizationblog
: Website/blog URLemail
: Public emailcreated_at
: Account creation dateupdated_at
: Last profile updatefollowers
: Number of followersfollowing
: Number of following
original_repos_count
: Number of original repositoriesforked_repos_count
: Number of forked repositoriesrepo_list
: Names of all non-forked repositoriesforked_repo_list
: Names of all forked repositoriesrepos
: Full repository data for every user repository (includes metadata, languages, stars, etc.)
mutual_followers
: List of accounts that follow and are followed by the userfollowing
: List of accounts the user followsfollowers
: List of accounts following the user
unique_emails
: Emails and associated names extracted from commit datacontributors
: User's repositories and their contributorspull_requests_to_other_repos
: List of PRs made to other repositoriescommits_to_other_repos
: List of commits made to repositories not owned by the userduplicate_hashes_found
: List of repositories with owner commits that do not belong to the ownercommits
: Full commit data for every user repositoryissues
: List of issues opened by the usercomments
: List of comments made by the user
recent_events
: List of recent events on the analyzed account (last 90 days)- Stars
- Pushes
- Forks
- Issues
- Pull requests
- Profile updates
errors
: List of repositories that failed to retrieve data- Network errors
- DMCA takedowns
- Access denied
- Repository not found
potential_copy
: List of repositories with first commit date earlier than account creationcommit_filter
: List of repositories with similar/duplicated commit messages
- User avatar downloaded to the output directory
script.log
: Detailed logging of the analysis processmonitoring.log
: Activity monitoring logs (when using --monitor)github_cache.sqlite
file will be created on the first run to speed up potential re-downloading from the same endpoints within the 1h window.
This tool is for reconnaissance purposes only. The confidence in detecting "malicious" GitHub profiles varies, and many regular user accounts may appear in analysis files. Do not make baseless accusations based on this content. All information is sourced from publicly available third-party sources.