How to Use Dirsearch: A Comprehensive Guide to Web Path Discovery

Discovering hidden pathways and directories on a website is crucial for comprehensive security assessments and thorough website analysis. Dirsearch is a powerful command-line tool designed for this purpose, enabling users to brute-force web paths and uncover valuable resources that might not be immediately apparent. This guide provides an in-depth look at how to effectively use dirsearch to enhance your web reconnaissance efforts.


Dirsearch logo displayed in dark mode, showcasing the tool’s branding.

Getting Started with Dirsearch: Installation and Basic Usage

Before you can start using dirsearch to find hidden files and directories, you need to install it. Dirsearch is written in Python and requires Python 3.9 or higher. Here are several installation methods to get you up and running:

Installation Options:

  • Git (Recommended): Cloning the repository using Git is the preferred method as it allows for easy updates:
    git clone https://github.com/maurosoria/dirsearch.git --depth 1
  • ZIP File: Alternatively, you can download dirsearch as a ZIP file from the official GitHub repository.
  • Docker: For containerized environments, Docker images are available:
    docker build -t "dirsearch:latest" .

    Refer to the Docker documentation for detailed Docker usage.

  • PyPi: Install directly using pip:
    pip3 install dirsearch

    or

    pip install dirsearch
  • Kali Linux (Deprecated): While available in Kali Linux repositories, it’s recommended to use a more up-to-date installation method:
    sudo apt-get install dirsearch

Once installed, navigate to the dirsearch directory in your terminal if you installed via Git or ZIP. If you installed via PyPi, dirsearch should be directly accessible from your command line.

Basic Usage Example:

To start a simple scan, use the following command, replacing https://target.com with the actual target website:

python3 dirsearch.py -u https://target.com

This command initiates a basic directory brute-force scan against the target URL using default settings and wordlists.

Understanding Wordlists: The Key to Effective Web Path Discovery

Wordlists are fundamental to dirsearch’s operation. They are text files containing lists of common directory and file names that dirsearch uses to probe the target website. Understanding how dirsearch handles wordlists and extensions is crucial for customizing your scans.

Wordlist Essentials:

  • Structure: A wordlist is simply a text file where each line represents a potential path.
  • Extensions: Unlike some other tools, dirsearch specifically uses the keyword %EXT% in wordlists to denote where extensions from the -e flag should be inserted.
  • Force Extensions (-f | –force-extensions): For wordlists that don’t use %EXT% (like those from SecLists), the -f or --force-extensions flag is essential. This flag appends the specified extensions to every entry in the wordlist, as well as a trailing slash / for directory probing.
  • Overwrite Extensions (-O | –overwrite-extensions): If your wordlist contains entries with existing extensions, and you want to replace them with your specified extensions, use the -O or --overwrite-extensions flag. Note that certain extensions (like .log, .json, .xml, and media file extensions) are typically excluded from being overwritten.
  • Multiple Wordlists: You can use multiple wordlists by separating their paths with commas in the -w flag. Example: -w wordlist1.txt,wordlist2.txt.

Wordlist Examples:

  • Normal Extensions:

    Wordlist entry:

    index.%EXT%

    Command:

    python3 dirsearch.py -e asp,aspx -u https://target.com -w wordlist.txt

    Generated dictionary entries:

    index
    index.asp
    index.aspx
  • Force Extensions:

    Wordlist entry:

    admin

    Command:

    python3 dirsearch.py -e php,html -f -u https://target.com -w wordlist.txt

    Generated dictionary entries:

    admin
    admin.php
    admin.html
    admin/
  • Overwrite Extensions:

    Wordlist entry:

    login.html

    Command:

    python3 dirsearch.py -e jsp,jspa -O -u https://target.com -w wordlist.txt

    Generated dictionary entries:

    login.html
    login.jsp
    login.jspa

Exploring Dirsearch Options: Customizing Your Scans

Dirsearch offers a wide array of options to fine-tune your web path discovery process. These options can be categorized into dictionary settings, general settings, request settings, connection settings, view settings, and output settings. Here’s a breakdown of some of the most commonly used and important options:

Usage: dirsearch.py [-u|--url] target [-e|--extensions] extensions [options]
Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit

Mandatory:
  -u URL, --url=URL   Target URL(s), can use multiple flags
  -l PATH, --urls-file=PATH
                        URL list file
  --stdin               Read URL(s) from STDIN
  --cidr=CIDR           Target CIDR
  --raw=PATH            Load raw HTTP request from file (use '--scheme' flag to set the scheme)
  --nmap-report=PATH    Load targets from nmap report (Ensure the inclusion of the -sV flag during nmap scan for comprehensive results)
  -s SESSION_FILE, --session=SESSION_FILE
                        Session file
  --config=PATH         Path to configuration file (Default: 'DIRSEARCH_CONFIG' environment variable, otherwise 'config.ini')

Dictionary Settings:
  -w WORDLISTS, --wordlists=WORDLISTS
                        Wordlist files or directories contain wordlists (separated by commas)
  -e EXTENSIONS, --extensions=EXTENSIONS
                        Extension list separated by commas (e.g. php,asp)
  -f, --force-extensions
                        Add extensions to the end of every wordlist entry. By default dirsearch only replaces the %EXT% keyword with extensions
  -O, --overwrite-extensions
                        Overwrite other extensions in the wordlist with your extensions (selected via `-e`)
  --exclude-extensions=EXTENSIONS
                        Exclude extension list separated by commas (e.g. asp,jsp)
  --remove-extensions   Remove extensions in all paths (e.g. admin.php -> admin)
  --prefixes=PREFIXES   Add custom prefixes to all wordlist entries (separated by commas)
  --suffixes=SUFFIXES   Add custom suffixes to all wordlist entries, ignore directories (separated by commas)
  -U, --uppercase       Uppercase wordlist
  -L, --lowercase       Lowercase wordlist
  -C, --capital         Capital wordlist

General Settings:
  -t THREADS, --threads=THREADS
                        Number of threads
  --async               Enable asynchronous mode
  -r, --recursive       Brute-force recursively
  --deep-recursive      Perform recursive scan on every directory depth (e.g. api/users -> api/)
  --force-recursive     Do recursive brute-force for every found path, not only directories
  -R DEPTH, --max-recursion-depth=DEPTH
                        Maximum recursion depth
  --recursion-status=CODES
                        Valid status codes to perform recursive scan, support ranges (separated by commas)
  --subdirs=SUBDIRS     Scan sub-directories of the given URL[s] (separated by commas)
  --exclude-subdirs=SUBDIRS
                        Exclude the following subdirectories during recursive scan (separated by commas)
  -i CODES, --include-status=CODES
                        Include status codes, separated by commas, support ranges (e.g. 200,300-399)
  -x CODES, --exclude-status=CODES
                        Exclude status codes, separated by commas, support ranges (e.g. 301,500-599)
  --exclude-sizes=SIZES  Exclude responses by sizes, separated by commas (e.g. 0B,4KB)
  --exclude-text=TEXTS  Exclude responses by text, can use multiple flags
  --exclude-regex=REGEX
                        Exclude responses by regular expression
  --exclude-redirect=STRING
                        Exclude responses if this regex (or text) matches redirect URL (e.g. '/index.html')
  --exclude-response=PATH
                        Exclude responses similar to response of this page, path as input (e.g. 404.html)
  --skip-on-status=CODES
                        Skip target whenever hit one of these status codes, separated by commas, support ranges
  --min-response-size=LENGTH
                        Minimum response length
  --max-response-size=LENGTH
                        Maximum response length
  --max-time=SECONDS    Maximum runtime for the scan
  --exit-on-error       Exit whenever an error occurs

Request Settings:
  -m METHOD, --http-method=METHOD
                        HTTP method (default: GET)
  -d DATA, --data=DATA  HTTP request data
  --data-file=PATH      File contains HTTP request data
  -H HEADERS, --header=HEADERS
                        HTTP request header, can use multiple flags
  --headers-file=PATH   File contains HTTP request headers
  -F, --follow-redirects
                        Follow HTTP redirects
  --random-agent        Choose a random User-Agent for each request
  --auth=CREDENTIAL     Authentication credential (e.g. user:password or bearer token)
  --auth-type=TYPE      Authentication type (basic, digest, bearer, ntlm, jwt)
  --cert-file=PATH     File contains client-side certificate
  --key-file=PATH      File contains client-side certificate private key (unencrypted)
  --user-agent=USER_AGENT
  --cookie=COOKIE

Connection Settings:
  --timeout=TIMEOUT     Connection timeout
  --delay=DELAY         Delay between requests
  -p PROXY, --proxy=PROXY
                        Proxy URL (HTTP/SOCKS), can use multiple flags
  --proxies-file=PATH   File contains proxy servers
  --proxy-auth=CREDENTIAL
                        Proxy authentication credential
  --replay-proxy=PROXY  Proxy to replay with found paths
  --tor                 Use Tor network as proxy
  --scheme=SCHEME       Scheme for raw request or if there is no scheme in the URL (Default: auto-detect)
  --max-rate=RATE       Max requests per second
  --retries=RETRIES     Number of retries for failed requests
  --ip=IP               Server IP address
  --interface=NETWORK_INTERFACE
                        Network interface to use

Advanced Settings:
  --crawl               Crawl for new paths in responses

View Settings:
  --full-url            Full URLs in the output (enabled automatically in quiet mode)
  --redirects-history   Show redirects history
  --no-color            No colored output
  -q, --quiet-mode      Quiet mode

Output Settings:
  -o PATH/URL, --output=PATH/URL
                        Output file or MySQL/PostgreSQL URL (Format: scheme://[username:password@]host[:port]/database- name)
  --format=FORMAT       Report format (Available: simple, plain, json, xml, md, csv, html, sqlite, mysql, postgresql)
  --log=PATH            Log file

Key Options Explained

  • -e EXTENSIONS, --extensions=EXTENSIONS: Specifies file extensions to scan for (e.g., php,html,js).
  • -w WORDLISTS, --wordlists=WORDLISTS: Defines the wordlist(s) to use for brute-forcing. You can provide a path to a single wordlist file or a comma-separated list of files or directories.
  • -t THREADS, --threads=THREADS: Sets the number of threads to use, controlling the concurrency of requests. Higher thread counts can speed up scans but might also increase the risk of detection or server overload.
  • -r, --recursive: Enables recursive brute-forcing. If a directory is found, dirsearch will automatically scan within that directory.
  • --max-recursion-depth=DEPTH: Limits the depth of recursive scanning to prevent scans from going too deep.
  • -x CODES, --exclude-status=CODES: Excludes specific HTTP status codes from the scan results (e.g., 404,500-599). This helps filter out irrelevant responses.
  • --exclude-sizes=SIZES: Excludes responses based on their size (e.g., 0B,4KB).
  • --exclude-text=TEXTS: Excludes responses containing specific text in their body.
  • --proxy PROXY: Routes requests through a proxy server (e.g., --proxy http://127.0.0.1:8080). Useful for anonymity or bypassing IP-based restrictions.
  • --user-agent=USER_AGENT: Sets a custom User-Agent header for requests, allowing you to mimic different browsers or bots.
  • -o PATH, --output=PATH: Saves the scan results to a file in a specified format.
  • --format=FORMAT: Specifies the output format (e.g., plain, json, html).

Practical Examples of Dirsearch in Action

To illustrate how to use dirsearch effectively, let’s look at some practical examples:

Simple Scan with Specific Extensions

To scan https://example.com for PHP, HTML, and JavaScript files using a common wordlist:

python3 dirsearch.py -u https://example.com -e php,html,js -w /path/to/wordlist.txt

Replace /path/to/wordlist.txt with the actual path to your wordlist file.

Recursive Scan with Depth Limitation

To perform a recursive scan of https://example.com up to a depth of 3 directories, excluding 404 errors:

python3 dirsearch.py -u https://example.com -r --max-recursion-depth 3 -x 404

Scanning Multiple Subdirectories

To scan specific subdirectories of https://example.com, such as /admin/ and /api/:

python3 dirsearch.py -u https://example.com --subdirs /admin/,/api/

Using Proxies for Anonymity

To scan https://example.com through a proxy server:

python3 dirsearch.py -u https://example.com -p http://127.0.0.1:8080

Saving Output to a JSON File

To save the scan results of https://example.com to a JSON file named results.json:

python3 dirsearch.py -u https://example.com -o results.json --format json

Pausing and Resuming Scans

Dirsearch allows you to pause a scan by pressing CTRL+C. You will then be presented with options to:

  • Save progress: Allows you to resume the scan later from where you left off.
  • Skip current target: Moves to the next target if you are scanning multiple URLs.
  • Skip current subdirectory: Skips the current subdirectory in a recursive scan.

This feature is incredibly useful for long scans or when you need to interrupt and resume your work.

Advanced Dirsearch Techniques

Beyond basic usage, dirsearch offers advanced features to refine your scans further:

Asynchronous Mode

Enable asynchronous mode using the --async flag for potentially improved performance and lower CPU usage. Asynchronous mode utilizes coroutines instead of threads for handling concurrent requests, which can be more efficient in certain scenarios.

python3 dirsearch.py -u https://example.com --async

Prefixes and Suffixes

Use --prefixes and --suffixes to add custom prefixes or suffixes to every wordlist entry. This can be useful for finding backup files, temporary files, or files with specific naming conventions.

python3 dirsearch.py -u https://example.com --prefixes .,admin,_ --suffixes ~,.bak

Blacklists

Dirsearch incorporates blacklist files located in the db/ folder. Paths listed in these files will be automatically filtered from scan results if they return the status code specified in the blacklist filename (e.g., db/403_blacklist.txt for 403 Forbidden responses). You can customize these blacklists to further reduce noise in your scans.

Filters for Refined Results

Utilize various filter options like --exclude-sizes, --exclude-texts, --exclude-regex, and --exclude-redirect to fine-tune your results and eliminate false positives.

python3 dirsearch.py -u https://example.com --exclude-texts "Page not found","Error" --exclude-regex "^[a-f0-9]{32}$"

Raw HTTP Requests

For advanced scenarios, you can import raw HTTP requests from a file using the --raw option. This allows you to craft highly customized requests and replay them with dirsearch. Remember to specify the scheme using --scheme if it’s not explicitly defined in the raw request.

python3 dirsearch.py --raw request.txt --scheme https -w wordlist.txt

Wordlist Formats: Uppercase, Lowercase, Capitalization

Dirsearch supports different wordlist formats using the -U, -L, and -C flags to automatically convert your wordlist entries to uppercase, lowercase, or capitalized formats, respectively.

python3 dirsearch.py -u https://example.com -U -w wordlist.txt # Scan with uppercase wordlist

Excluding Extensions from Wordlists

The -X | --exclude-extensions option allows you to remove entries from your wordlist that contain specific extensions.

python3 dirsearch.py -u https://example.com -X jsp -w wordlist.txt

Scanning Subdirectories Directly

The --subdirs option enables you to target scans directly at specific subdirectories of a URL.

python3 dirsearch.py -u https://example.com --subdirs /admin/,/uploads/ -w wordlist.txt

Proxy Lists for Enhanced Scanning

For large-scale scans or when dealing with rate limiting, use --proxies-file to provide a list of proxy servers. Dirsearch will automatically rotate through these proxies during the scan.

python3 dirsearch.py -u https://example.com --proxies-file proxies.txt

Comprehensive Reporting

Dirsearch supports various report formats, including plain, simple, json, xml, md, csv, html, sqlite, mysql, and postgresql. Choose the format that best suits your needs using the --format option and specify the output file with -o.

python3 dirsearch.py -u https://example.com --format html -o report.html

Tips for Effective Web Path Discovery with Dirsearch

  • Bypass Request Limits: If you encounter request limits, consider using proxy lists (--proxy-list) to randomize your IP addresses and bypass these restrictions.
  • Find Configuration and Backup Files: Utilize suffixes like ~ and prefixes like . (--suffixes ~ --prefixes .) to uncover potential configuration or backup files.
  • Focus on Directories: To specifically target directories, combine --remove-extensions with --suffixes / to scan for directory paths without file extensions.
  • Optimize CIDR Scans: When scanning CIDR ranges, reduce timeout and retry settings (--timeout 3 --retries 1) to speed up the process and minimize false negatives.
  • Skip High-Status Error Targets: Use --skip-on-status 429 to automatically skip targets that return 429 (Too Many Requests) errors, preventing unnecessary delays.
  • Handle Large Files: If the server hosts large files that slow down scans, consider using the HEAD HTTP method instead of GET to only retrieve headers and not the entire file content.

Contribution and Support

Dirsearch is an actively developed open-source tool, and contributions are welcomed. You can find more information about contributing in the CONTRIBUTORS.md file in the repository. For support and community discussions, join the Discord server.

Dirsearch empowers you to effectively perform web path discovery, uncovering hidden areas of websites and enhancing your security assessments. By mastering its options and techniques, you can significantly improve your ability to find valuable information and potential vulnerabilities in web applications.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *