Discovering hidden pathways and directories on a website is crucial for comprehensive security assessments and thorough website analysis. Dirsearch is a powerful command-line tool designed for this purpose, enabling users to brute-force web paths and uncover valuable resources that might not be immediately apparent. This guide provides an in-depth look at how to effectively use dirsearch to enhance your web reconnaissance efforts.
Dirsearch logo displayed in dark mode, showcasing the tool’s branding.
Getting Started with Dirsearch: Installation and Basic Usage
Before you can start using dirsearch to find hidden files and directories, you need to install it. Dirsearch is written in Python and requires Python 3.9 or higher. Here are several installation methods to get you up and running:
Installation Options:
- Git (Recommended): Cloning the repository using Git is the preferred method as it allows for easy updates:
git clone https://github.com/maurosoria/dirsearch.git --depth 1
- ZIP File: Alternatively, you can download dirsearch as a ZIP file from the official GitHub repository.
- Docker: For containerized environments, Docker images are available:
docker build -t "dirsearch:latest" .
Refer to the Docker documentation for detailed Docker usage.
- PyPi: Install directly using pip:
pip3 install dirsearch
or
pip install dirsearch
- Kali Linux (Deprecated): While available in Kali Linux repositories, it’s recommended to use a more up-to-date installation method:
sudo apt-get install dirsearch
Once installed, navigate to the dirsearch directory in your terminal if you installed via Git or ZIP. If you installed via PyPi, dirsearch should be directly accessible from your command line.
Basic Usage Example:
To start a simple scan, use the following command, replacing https://target.com
with the actual target website:
python3 dirsearch.py -u https://target.com
This command initiates a basic directory brute-force scan against the target URL using default settings and wordlists.
Understanding Wordlists: The Key to Effective Web Path Discovery
Wordlists are fundamental to dirsearch’s operation. They are text files containing lists of common directory and file names that dirsearch uses to probe the target website. Understanding how dirsearch handles wordlists and extensions is crucial for customizing your scans.
Wordlist Essentials:
- Structure: A wordlist is simply a text file where each line represents a potential path.
- Extensions: Unlike some other tools, dirsearch specifically uses the keyword
%EXT%
in wordlists to denote where extensions from the-e
flag should be inserted. - Force Extensions (-f | –force-extensions): For wordlists that don’t use
%EXT%
(like those from SecLists), the-f
or--force-extensions
flag is essential. This flag appends the specified extensions to every entry in the wordlist, as well as a trailing slash/
for directory probing. - Overwrite Extensions (-O | –overwrite-extensions): If your wordlist contains entries with existing extensions, and you want to replace them with your specified extensions, use the
-O
or--overwrite-extensions
flag. Note that certain extensions (like.log
,.json
,.xml
, and media file extensions) are typically excluded from being overwritten. - Multiple Wordlists: You can use multiple wordlists by separating their paths with commas in the
-w
flag. Example:-w wordlist1.txt,wordlist2.txt
.
Wordlist Examples:
-
Normal Extensions:
Wordlist entry:
index.%EXT%
Command:
python3 dirsearch.py -e asp,aspx -u https://target.com -w wordlist.txt
Generated dictionary entries:
index index.asp index.aspx
-
Force Extensions:
Wordlist entry:
admin
Command:
python3 dirsearch.py -e php,html -f -u https://target.com -w wordlist.txt
Generated dictionary entries:
admin admin.php admin.html admin/
-
Overwrite Extensions:
Wordlist entry:
login.html
Command:
python3 dirsearch.py -e jsp,jspa -O -u https://target.com -w wordlist.txt
Generated dictionary entries:
login.html login.jsp login.jspa
Exploring Dirsearch Options: Customizing Your Scans
Dirsearch offers a wide array of options to fine-tune your web path discovery process. These options can be categorized into dictionary settings, general settings, request settings, connection settings, view settings, and output settings. Here’s a breakdown of some of the most commonly used and important options:
Usage: dirsearch.py [-u|--url] target [-e|--extensions] extensions [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
Mandatory:
-u URL, --url=URL Target URL(s), can use multiple flags
-l PATH, --urls-file=PATH
URL list file
--stdin Read URL(s) from STDIN
--cidr=CIDR Target CIDR
--raw=PATH Load raw HTTP request from file (use '--scheme' flag to set the scheme)
--nmap-report=PATH Load targets from nmap report (Ensure the inclusion of the -sV flag during nmap scan for comprehensive results)
-s SESSION_FILE, --session=SESSION_FILE
Session file
--config=PATH Path to configuration file (Default: 'DIRSEARCH_CONFIG' environment variable, otherwise 'config.ini')
Dictionary Settings:
-w WORDLISTS, --wordlists=WORDLISTS
Wordlist files or directories contain wordlists (separated by commas)
-e EXTENSIONS, --extensions=EXTENSIONS
Extension list separated by commas (e.g. php,asp)
-f, --force-extensions
Add extensions to the end of every wordlist entry. By default dirsearch only replaces the %EXT% keyword with extensions
-O, --overwrite-extensions
Overwrite other extensions in the wordlist with your extensions (selected via `-e`)
--exclude-extensions=EXTENSIONS
Exclude extension list separated by commas (e.g. asp,jsp)
--remove-extensions Remove extensions in all paths (e.g. admin.php -> admin)
--prefixes=PREFIXES Add custom prefixes to all wordlist entries (separated by commas)
--suffixes=SUFFIXES Add custom suffixes to all wordlist entries, ignore directories (separated by commas)
-U, --uppercase Uppercase wordlist
-L, --lowercase Lowercase wordlist
-C, --capital Capital wordlist
General Settings:
-t THREADS, --threads=THREADS
Number of threads
--async Enable asynchronous mode
-r, --recursive Brute-force recursively
--deep-recursive Perform recursive scan on every directory depth (e.g. api/users -> api/)
--force-recursive Do recursive brute-force for every found path, not only directories
-R DEPTH, --max-recursion-depth=DEPTH
Maximum recursion depth
--recursion-status=CODES
Valid status codes to perform recursive scan, support ranges (separated by commas)
--subdirs=SUBDIRS Scan sub-directories of the given URL[s] (separated by commas)
--exclude-subdirs=SUBDIRS
Exclude the following subdirectories during recursive scan (separated by commas)
-i CODES, --include-status=CODES
Include status codes, separated by commas, support ranges (e.g. 200,300-399)
-x CODES, --exclude-status=CODES
Exclude status codes, separated by commas, support ranges (e.g. 301,500-599)
--exclude-sizes=SIZES Exclude responses by sizes, separated by commas (e.g. 0B,4KB)
--exclude-text=TEXTS Exclude responses by text, can use multiple flags
--exclude-regex=REGEX
Exclude responses by regular expression
--exclude-redirect=STRING
Exclude responses if this regex (or text) matches redirect URL (e.g. '/index.html')
--exclude-response=PATH
Exclude responses similar to response of this page, path as input (e.g. 404.html)
--skip-on-status=CODES
Skip target whenever hit one of these status codes, separated by commas, support ranges
--min-response-size=LENGTH
Minimum response length
--max-response-size=LENGTH
Maximum response length
--max-time=SECONDS Maximum runtime for the scan
--exit-on-error Exit whenever an error occurs
Request Settings:
-m METHOD, --http-method=METHOD
HTTP method (default: GET)
-d DATA, --data=DATA HTTP request data
--data-file=PATH File contains HTTP request data
-H HEADERS, --header=HEADERS
HTTP request header, can use multiple flags
--headers-file=PATH File contains HTTP request headers
-F, --follow-redirects
Follow HTTP redirects
--random-agent Choose a random User-Agent for each request
--auth=CREDENTIAL Authentication credential (e.g. user:password or bearer token)
--auth-type=TYPE Authentication type (basic, digest, bearer, ntlm, jwt)
--cert-file=PATH File contains client-side certificate
--key-file=PATH File contains client-side certificate private key (unencrypted)
--user-agent=USER_AGENT
--cookie=COOKIE
Connection Settings:
--timeout=TIMEOUT Connection timeout
--delay=DELAY Delay between requests
-p PROXY, --proxy=PROXY
Proxy URL (HTTP/SOCKS), can use multiple flags
--proxies-file=PATH File contains proxy servers
--proxy-auth=CREDENTIAL
Proxy authentication credential
--replay-proxy=PROXY Proxy to replay with found paths
--tor Use Tor network as proxy
--scheme=SCHEME Scheme for raw request or if there is no scheme in the URL (Default: auto-detect)
--max-rate=RATE Max requests per second
--retries=RETRIES Number of retries for failed requests
--ip=IP Server IP address
--interface=NETWORK_INTERFACE
Network interface to use
Advanced Settings:
--crawl Crawl for new paths in responses
View Settings:
--full-url Full URLs in the output (enabled automatically in quiet mode)
--redirects-history Show redirects history
--no-color No colored output
-q, --quiet-mode Quiet mode
Output Settings:
-o PATH/URL, --output=PATH/URL
Output file or MySQL/PostgreSQL URL (Format: scheme://[username:password@]host[:port]/database- name)
--format=FORMAT Report format (Available: simple, plain, json, xml, md, csv, html, sqlite, mysql, postgresql)
--log=PATH Log file
Key Options Explained
-e EXTENSIONS, --extensions=EXTENSIONS
: Specifies file extensions to scan for (e.g.,php,html,js
).-w WORDLISTS, --wordlists=WORDLISTS
: Defines the wordlist(s) to use for brute-forcing. You can provide a path to a single wordlist file or a comma-separated list of files or directories.-t THREADS, --threads=THREADS
: Sets the number of threads to use, controlling the concurrency of requests. Higher thread counts can speed up scans but might also increase the risk of detection or server overload.-r, --recursive
: Enables recursive brute-forcing. If a directory is found, dirsearch will automatically scan within that directory.--max-recursion-depth=DEPTH
: Limits the depth of recursive scanning to prevent scans from going too deep.-x CODES, --exclude-status=CODES
: Excludes specific HTTP status codes from the scan results (e.g.,404,500-599
). This helps filter out irrelevant responses.--exclude-sizes=SIZES
: Excludes responses based on their size (e.g.,0B,4KB
).--exclude-text=TEXTS
: Excludes responses containing specific text in their body.--proxy PROXY
: Routes requests through a proxy server (e.g.,--proxy http://127.0.0.1:8080
). Useful for anonymity or bypassing IP-based restrictions.--user-agent=USER_AGENT
: Sets a custom User-Agent header for requests, allowing you to mimic different browsers or bots.-o PATH, --output=PATH
: Saves the scan results to a file in a specified format.--format=FORMAT
: Specifies the output format (e.g.,plain
,json
,html
).
Practical Examples of Dirsearch in Action
To illustrate how to use dirsearch effectively, let’s look at some practical examples:
Simple Scan with Specific Extensions
To scan https://example.com
for PHP, HTML, and JavaScript files using a common wordlist:
python3 dirsearch.py -u https://example.com -e php,html,js -w /path/to/wordlist.txt
Replace /path/to/wordlist.txt
with the actual path to your wordlist file.
Recursive Scan with Depth Limitation
To perform a recursive scan of https://example.com
up to a depth of 3 directories, excluding 404 errors:
python3 dirsearch.py -u https://example.com -r --max-recursion-depth 3 -x 404
Scanning Multiple Subdirectories
To scan specific subdirectories of https://example.com
, such as /admin/
and /api/
:
python3 dirsearch.py -u https://example.com --subdirs /admin/,/api/
Using Proxies for Anonymity
To scan https://example.com
through a proxy server:
python3 dirsearch.py -u https://example.com -p http://127.0.0.1:8080
Saving Output to a JSON File
To save the scan results of https://example.com
to a JSON file named results.json
:
python3 dirsearch.py -u https://example.com -o results.json --format json
Pausing and Resuming Scans
Dirsearch allows you to pause a scan by pressing CTRL+C
. You will then be presented with options to:
- Save progress: Allows you to resume the scan later from where you left off.
- Skip current target: Moves to the next target if you are scanning multiple URLs.
- Skip current subdirectory: Skips the current subdirectory in a recursive scan.
This feature is incredibly useful for long scans or when you need to interrupt and resume your work.
Advanced Dirsearch Techniques
Beyond basic usage, dirsearch offers advanced features to refine your scans further:
Asynchronous Mode
Enable asynchronous mode using the --async
flag for potentially improved performance and lower CPU usage. Asynchronous mode utilizes coroutines instead of threads for handling concurrent requests, which can be more efficient in certain scenarios.
python3 dirsearch.py -u https://example.com --async
Prefixes and Suffixes
Use --prefixes
and --suffixes
to add custom prefixes or suffixes to every wordlist entry. This can be useful for finding backup files, temporary files, or files with specific naming conventions.
python3 dirsearch.py -u https://example.com --prefixes .,admin,_ --suffixes ~,.bak
Blacklists
Dirsearch incorporates blacklist files located in the db/
folder. Paths listed in these files will be automatically filtered from scan results if they return the status code specified in the blacklist filename (e.g., db/403_blacklist.txt
for 403 Forbidden responses). You can customize these blacklists to further reduce noise in your scans.
Filters for Refined Results
Utilize various filter options like --exclude-sizes
, --exclude-texts
, --exclude-regex
, and --exclude-redirect
to fine-tune your results and eliminate false positives.
python3 dirsearch.py -u https://example.com --exclude-texts "Page not found","Error" --exclude-regex "^[a-f0-9]{32}$"
Raw HTTP Requests
For advanced scenarios, you can import raw HTTP requests from a file using the --raw
option. This allows you to craft highly customized requests and replay them with dirsearch. Remember to specify the scheme using --scheme
if it’s not explicitly defined in the raw request.
python3 dirsearch.py --raw request.txt --scheme https -w wordlist.txt
Wordlist Formats: Uppercase, Lowercase, Capitalization
Dirsearch supports different wordlist formats using the -U
, -L
, and -C
flags to automatically convert your wordlist entries to uppercase, lowercase, or capitalized formats, respectively.
python3 dirsearch.py -u https://example.com -U -w wordlist.txt # Scan with uppercase wordlist
Excluding Extensions from Wordlists
The -X | --exclude-extensions
option allows you to remove entries from your wordlist that contain specific extensions.
python3 dirsearch.py -u https://example.com -X jsp -w wordlist.txt
Scanning Subdirectories Directly
The --subdirs
option enables you to target scans directly at specific subdirectories of a URL.
python3 dirsearch.py -u https://example.com --subdirs /admin/,/uploads/ -w wordlist.txt
Proxy Lists for Enhanced Scanning
For large-scale scans or when dealing with rate limiting, use --proxies-file
to provide a list of proxy servers. Dirsearch will automatically rotate through these proxies during the scan.
python3 dirsearch.py -u https://example.com --proxies-file proxies.txt
Comprehensive Reporting
Dirsearch supports various report formats, including plain
, simple
, json
, xml
, md
, csv
, html
, sqlite
, mysql
, and postgresql
. Choose the format that best suits your needs using the --format
option and specify the output file with -o
.
python3 dirsearch.py -u https://example.com --format html -o report.html
Tips for Effective Web Path Discovery with Dirsearch
- Bypass Request Limits: If you encounter request limits, consider using proxy lists (
--proxy-list
) to randomize your IP addresses and bypass these restrictions. - Find Configuration and Backup Files: Utilize suffixes like
~
and prefixes like.
(--suffixes ~ --prefixes .
) to uncover potential configuration or backup files. - Focus on Directories: To specifically target directories, combine
--remove-extensions
with--suffixes /
to scan for directory paths without file extensions. - Optimize CIDR Scans: When scanning CIDR ranges, reduce timeout and retry settings (
--timeout 3 --retries 1
) to speed up the process and minimize false negatives. - Skip High-Status Error Targets: Use
--skip-on-status 429
to automatically skip targets that return 429 (Too Many Requests) errors, preventing unnecessary delays. - Handle Large Files: If the server hosts large files that slow down scans, consider using the
HEAD
HTTP method instead ofGET
to only retrieve headers and not the entire file content.
Contribution and Support
Dirsearch is an actively developed open-source tool, and contributions are welcomed. You can find more information about contributing in the CONTRIBUTORS.md file in the repository. For support and community discussions, join the Discord server.
Dirsearch empowers you to effectively perform web path discovery, uncovering hidden areas of websites and enhancing your security assessments. By mastering its options and techniques, you can significantly improve your ability to find valuable information and potential vulnerabilities in web applications.