+
Skip to content

shortdoom/gh-fake-analyzer

Repository files navigation

GitHub Profile Analyzer

A powerful OSINT tool for analyzing GitHub profiles and detecting suspicious activity patterns. This tool helps identify potential bot accounts, scammers, and fake developer profiles by analyzing various aspects of GitHub activity. Comes together with a set of handy tools for scanning and extracting multiple types of metadata from Github profile, organization or repository.

NOTE: For the comprehensive solution of monitoring your Github organization, analyzing contributors and active alerting system against potential impersonation or other Github related threats - contact SEAL911. SEAL operates the project-wide version of the software. This package is not optimized for speed. Its main goal is supporting individual security researchers.

NOTE: The project was possible thanks to the contribution from Ethereum Ecosystem Support Program. All of the investigations conducted by Ketman Project were made with help of gh-fake-analyzer.

Features

  • Profile Analysis: Download and analyze complete GitHub profile data
  • Commit Analysis: Detect copied commits and suspicious commit patterns
  • Identity Detection: Track email/name variations and potential identity rotation
  • Organization Scanning: Analyze contributors across entire organizations and repositories
  • Activity Monitoring: Real-time monitoring of profile changes and activities
  • Advanced Tools:
    • Commit author lookup
    • Activity checking
    • Search result dumping
    • Organization scanning
    • Repository scanning
    • Finding interesting files in repositories
    • Automatically flagging account against list of your own IOCs

Installation

pip install gh-fake-analyzer

# or, if you are still using the old version

pip install gh-fake-analyzer --upgrade

Requirements

  • Python 3.7 or higher
  • Git installed on your system (sudo apt install git)

GitHub Token Setup

You need a GitHub API token for full functionality. Set it up in one of these ways:

  1. Create a .env file with GH_TOKEN=<your_token>
  2. Use --token <your_token> flag when running commands
  3. Set environment variable: export GH_TOKEN=<your_token>

Local Installation

NOTE: If you cloned the repository before version 1.0.0 release, re-download the whole package. We did a significant commit re-write to make the repository more light-weight.

# Clone the repository
git clone https://github.com/shortdoom/gh-fake-analyzer.git
cd gh-fake-analyzer

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate

# Install the package in development mode
pip install -e .

Configuration for Development

  1. Create a local config.ini file in your working directory:
[LIMITS]
MAX_FOLLOWING = 1000
MAX_FOLLOWERS = 1000
MAX_REPOSITORIES = 1000
CLONE_DEPTH = 100
CLONE_BARE = True
MONITOR_SLEEP = 10
REMOVE_REPO = True
  1. Set up your GitHub token in .env:
echo "GH_TOKEN=your_token_here" > .env
  1. Test the installation:
gh-analyze --help

Usage

Quick Start Recipe

The most common flow for using the gh-fake-analyzer in CTI related tasks is to:

gh-analyze <username>

# or

gh-analyze --targets <path/to/newlinefile/targets>

# then, for a quick view (supply full path in place of <username> if report is not in the standard out/ path)

gh-analyze --parse <username> --summary

# other often used command is to extract full contributors information from organizations. also works with list of --org-targets.

gh-analyze --tool scan_organization --scan-org <org_name>

# optionally, append --full-analysis to immediately perform full scan on each contributor

gh-analyze --tool scan_organization --scan-org <org_name> --full-analysis

# you could also scan individual repository

gh-analyze --tool scan_repository --scan-repo owner/repo_name --full-analysis

# scan_organization and scan_repository will run `check_activity` tool in the background against all found contributors. for best effect, supply the list of usernames and organizations in target_list/ files to signal if any of those were found in contributors data.

gh-analyze --tool check_activity --targets <file>

It is a good practice to update target_list/ files with your own indicators (usernames and names of organizations). Avoid commiting changed to those files in public.

USERNAMES - list of usernames you consider suspicious and would like to be informed if scanned account has any relation to. ORGANIZATIONS - list of organizations you consider suspicious and would like to be informed if scanned account has any relation to.

All Commands

Basic Profile Analysis

# Analyze a single user
gh-analyze <username>

# Analyze multiple users from a file (one username per line)
gh-analyze --targets <file>

# Custom output directory
gh-analyze <username> --out_path /path/to/dir

# Include forked repositories in analysis (default: off)
gh-analyze <username> --forks

# Only fetch basic profile data (no commits, followers, etc.)
gh-analyze <username> --only_profile

# Regenerate report from existing data without fetching from GitHub
gh-analyze <username> --regenerate

Advanced Analysis

# Search for copied commits in a specific repository
gh-analyze <username> --commit_search <repo_name>

# Search for copied commits across all repositories
gh-analyze <username> --commit_search

# Monitor user activity in real-time
gh-analyze <username> --monitor

# Monitor multiple users from a file
gh-analyze --targets <file> --monitor

# Parse and display specific data from an existing report (<username> needs to be in the default out/ directory, otherwise - supply full path)
gh-analyze --parse <username> --key <output_key>

# Display summary of profile (<username> needs to be in the default out/ directory, otherwise - supply full path)
gh-analyze --parse <username> --summary

# Quick-dump specific data to a file
gh-analyze --parse <username> --key unique_emails >> dump.txt

Organization Analysis

# Scan a single organization
gh-analyze --tool scan_organization --scan-org <org_name>

# Scan multiple organizations from a file
gh-analyze --tool scan_organization --org-targets <file>

# Perform full analysis for each contributor (generates report.json file for each contributor)
gh-analyze --tool scan_organization --scan-org <org_name> --full-analysis

# scan individual repository
gh-analyze --tool scan_repository --scan-repo owner/repo_name 

Advanced Tools

# Get detailed commit author information
gh-analyze --tool get_commit_author --commit-author <sha>

# Search GitHub users
gh-analyze --tool dump_search_results --search "<query>" --endpoint users

# Search GitHub code
gh-analyze --tool dump_search_results --search "<query>" --endpoint code

# Check activity patterns of multiple users, requires targets/connections_filter/usernames file with list of target usernames to check activity against
gh-analyze --tool check_activity --targets <file>

# Find interesting files in user's repositories (.txt, .pdf, binary files etc.)
gh-analyze --tool find_interesting_files <username>

# Find interesting files for multiple users from a file (.txt, .pdf, binary files etc.)
gh-analyze --tool find_interesting_files --targets <file>

# Custom output directory
gh-analyze --tool find_interesting_files <username> --out_path /path/to/dir

# Disable logging to script.log
gh-analyze --logoff

It's possible to develop your own tools by re-using methods accessible in modules/analyze.py. Inspect existing tools code for examples and inspiration.

Configuration

The tool uses a configuration file at ~/.gh_fake_analyzer/config.ini. You can create a local config.ini to override settings:

[LIMITS]
MAX_FOLLOWING = 1000
MAX_FOLLOWERS = 1000
MAX_REPOSITORIES = 1000
CLONE_DEPTH = 100
CLONE_BARE = True # False if you want to save the source code
MONITOR_SLEEP = 10
REMOVE_REPO = True # False if you want to save the source code

Output

Analysis results are saved in the out directory with the following structure:

report.json Structure

The report.json file contains comprehensive data about the analyzed GitHub profile:

Profile Information

  • profile_info: Basic GitHub user profile data
    • login: GitHub username
    • name: Display name
    • location: User's location
    • bio: Profile bio
    • company: Company/organization
    • blog: Website/blog URL
    • email: Public email
    • created_at: Account creation date
    • updated_at: Last profile update
    • followers: Number of followers
    • following: Number of following

Repository Statistics

  • original_repos_count: Number of original repositories
  • forked_repos_count: Number of forked repositories
  • repo_list: Names of all non-forked repositories
  • forked_repo_list: Names of all forked repositories
  • repos: Full repository data for every user repository (includes metadata, languages, stars, etc.)

Social Network Analysis

  • mutual_followers: List of accounts that follow and are followed by the user
  • following: List of accounts the user follows
  • followers: List of accounts following the user

Contribution Analysis

  • unique_emails: Emails and associated names extracted from commit data
  • contributors: User's repositories and their contributors
  • pull_requests_to_other_repos: List of PRs made to other repositories
  • commits_to_other_repos: List of commits made to repositories not owned by the user
  • duplicate_hashes_found: List of repositories with owner commits that do not belong to the owner
  • commits: Full commit data for every user repository
  • issues: List of issues opened by the user
  • comments: List of comments made by the user

Activity Tracking

  • recent_events: List of recent events on the analyzed account (last 90 days)
    • Stars
    • Pushes
    • Forks
    • Issues
    • Pull requests
    • Profile updates

Error Tracking

  • errors: List of repositories that failed to retrieve data
    • Network errors
    • DMCA takedowns
    • Access denied
    • Repository not found

Suspicious Activity Indicators

  • potential_copy: List of repositories with first commit date earlier than account creation
  • commit_filter: List of repositories with similar/duplicated commit messages

Additional Output Files

  • User avatar downloaded to the output directory
  • script.log: Detailed logging of the analysis process
  • monitoring.log: Activity monitoring logs (when using --monitor)
  • github_cache.sqlite file will be created on the first run to speed up potential re-downloading from the same endpoints within the 1h window.

Disclaimer

This tool is for reconnaissance purposes only. The confidence in detecting "malicious" GitHub profiles varies, and many regular user accounts may appear in analysis files. Do not make baseless accusations based on this content. All information is sourced from publicly available third-party sources.

About

Dump github profile data for OSINT analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载