Cowurl

Taming the Web. Inspired by the cowboy's dexterity in capturing wild horses, Cowurl offers the ability to capture and manage URLs from the vastness of cyberspace.

Use this when your auto-scraping CLI tools are hindered by WAFs that require real browser-based access.

Key Features

Automatic URL Collection
Automatic Pagination
Real-time Output Cleaning
Persistent Storage
Scraping Control
Smart Auto-Stop
Easy Output Management

How to Use Cowurl

Installation

Download or clone this repository.
Open Chrome and navigate to chrome://extensions.
Enable "Developer mode" in the top right corner.
Click "Load unpacked".
Select the folder where you saved the Cowurl extension files.

Configuration

Start URL: Enter the initial URL from which you want to start scraping. Make sure the URL includes the page parameter if the site uses pagination (ex: https://example.com/search?page=1 or https://example.com/items?offset=SPG:0).
Max Page: Specify the maximum number of pages you wish to scrape.
Main Tag: Enter the primary HTML tag containing the elements you want to extract (e.g., div, a, li).
Lock Selector (attr/class/id): Use this selector to filter Main Tag elements.
- attr:attribute_name (e.g., attr:href to filter elements that have an href attribute)
- class:class_name (e.g., class:product-item to filter elements with the product-item class)
- id:id_name (e.g., id:main-content to filter elements with the main-content ID)
- Leave blank if no specific filter is needed.
Get Value From (attr/class/id): Define how values will be extracted from the filtered elements.
- attr:attribute_name (e.g., attr:src to get the value from the src attribute)
- class:class_name (e.g., class:title to get the text from a child element with the title class)
- id:id_name (e.g., id:product-link to get the text from a child element with the product-link ID)
Output Cleaner (regex): (Optional) Enter a regular expression to purify the extracted output. This is applied to each scraped item before being displayed and saved. Example: (?<=images/)(.*?)(?=\.png) will extract the .png filename from an image URL.

Contributions

Contributions are welcome! Please feel free to open an issue or create a pull request.

Created with ❤️ by [xcapri/Tegalsec]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
background.js		background.js
content.js		content.js
democowrl.gif		democowrl.gif
manifest.json		manifest.json
popup.html		popup.html
popup.js		popup.js
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cowurl

Key Features

How to Use Cowurl

Installation

Configuration

Contributions

About

Uh oh!

Releases

Packages

Languages

tegal1337/Cowurl

Folders and files

Latest commit

History

Repository files navigation

Cowurl

Key Features

How to Use Cowurl

Installation

Configuration

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages