Metadata-Version: 2.4
Name: dorkbot
Version: 1.9.1+git
Summary: Command-line tool to scan search results for vulnerabilities
Author-email: John Gordon <jgor@utexas.edu>
License: UT AUSTIN RESEARCH LICENSE (SOURCE CODE)
        The University of Texas at Austin has developed certain software and documentation that it desires to make available without charge to anyone for academic, research, experimental or personal use. This license is designed to guarantee freedom to use the software for these purposes. If you wish to distribute or make other use of the software, you may purchase a license to do so from the University of Texas.
        The accompanying source code is made available to you under the terms of this UT Research License (this “UTRL”). By clicking the “ACCEPT” button, or by installing or using the code, you are consenting to be bound by this UTRL. If you do not agree to the terms and conditions of this license, do not click the “ACCEPT” button, and do not install or use any part of the code.
        The terms and conditions in this UTRL not only apply to the source code made available by UT, but also to any improvements to, or derivative works of, that source code made by you and to any object code compiled from such source code, improvements or derivative works.
        1. DEFINITIONS.
        1.1 “Commercial Use” shall mean use of Software or Documentation by Licensee for direct or indirect financial, commercial or strategic gain or advantage, including without limitation: (a) bundling or integrating the Software with any hardware product or another software product for transfer, sale or license to a third party (even if distributing the Software on separate media and not charging for the Software); (b) providing customers with a link to the Software or a copy of the Software for use with hardware or another software product purchased by that customer; or (c) use in connection with the performance of services for which Licensee is compensated.
        1.2 “Derivative Products” means any improvements to, or other derivative works of, the Software made by Licensee.
        1.3 “Documentation” shall mean all manuals, user documentation, and other related materials pertaining to the Software that are made available to Licensee in connection with the Software.
        1.4 “Licensor” shall mean The University of Texas.
        1.5 “Licensee” shall mean the person or entity that has agreed to the terms hereof and is exercising rights granted hereunder.
        1.6 “Software” shall mean the computer program(s) referred to as “Dorkbot” made available under this UTRL in source code form, including any error corrections, bug fixes, patches, updates or other modifications that Licensor may in its sole discretion make available to Licensee from time to time, and any object code compiled from such source code.
        2. GRANT OF RIGHTS.
        Subject to the terms and conditions hereunder, Licensor hereby grants to Licensee a worldwide, non- transferable, non-exclusive license to (a) install, use and reproduce the Software for academic, research,
        The University of Texas at Austin Research License (Source Code) Page 1 of 4
         
        experimental and personal use (but specifically excluding Commercial Use); (b) use and modify the Software to create Derivative Products, subject to Section 3.2; and (c) use the Documentation, if any, solely in connection with Licensee’s authorized use of the Software.
        3. RESTRICTIONS; COVENANTS.
        3.1 Licensee may not: (a) distribute, sub-license or otherwise transfer copies or rights to the Software (or any portion thereof) or the Documentation; (b) use the Software (or any portion thereof) or Documentation for Commercial Use, or for any other use except as described in Section 2; (c) copy the Software or Documentation other than for archival and backup purposes; or (d) remove any product identification, copyright, proprietary notices or labels from the Software and Documentation. This UTRL confers no rights upon Licensee except those expressly granted herein.
        3.2 Licensee hereby agrees that it will provide a copy of all Derivative Products to Licensor and that its use of the Derivative Products will be subject to all of the same terms, conditions, restrictions and limitations on use imposed on the Software under this UTRL. Licensee hereby grants Licensor a worldwide, non- exclusive, royalty-free license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute Derivative Products. Licensee also hereby grants Licensor a worldwide, non-exclusive, royalty-free patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Derivative Products under those patent claims licensable by Licensee that are necessarily infringed by the Derivative Products.
        4. PROTECTION OF SOFTWARE.
        4.1 Confidentiality. The Software and Documentation are the confidential and proprietary information of Licensor. Licensee agrees to take adequate steps to protect the Software and Documentation from unauthorized disclosure or use. Licensee agrees that it will not disclose the Software or Documentation to any third party.
        4.2 Proprietary Notices. Licensee shall maintain and place on any copy of Software or Documentation that it reproduces for internal use all notices as are authorized and/or required hereunder. Licensee shall include a copy of this UTRL and the following notice, on each copy of the Software and Documentation. Such license and notice shall be embedded in each copy of the Software, in the video screen display, on the physical medium embodying the Software copy and on any Documentation:
        Copyright © 2017, The University of Texas at Austin. All rights reserved.
        UNIVERSITY EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES CONCERNING THIS SOFTWARE AND DOCUMENTATION, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, NON- INFRINGEMENT AND WARRANTIES OF PERFORMANCE, AND ANY WARRANTY THAT MIGHT OTHERWISE ARISE FROM COURSE OF DEALING OR USAGE OF TRADE. NO WARRANTY IS EITHER EXPRESS OR IMPLIED WITH RESPECT TO THE USE OF THE SOFTWARE OR DOCUMENTATION. Under no circumstances shall University be liable for incidental, special, indirect, direct or consequential damages or loss of profits, interruption of business, or related expenses which may arise from use of Software or Documentation, including but not limited to those resulting from defects in Software and/or Documentation, or loss or inaccuracy of data of any kind.
        5. WARRANTIES.
        5.1 Disclaimer of Warranties. TO THE EXTENT PERMITTED BY APPLICABLE LAW, THE SOFTWARE AND DOCUMENTATION ARE BEING PROVIDED ON AN “AS IS” BASIS WITHOUT ANY
        The University of Texas at Austin Research License (Source Code) Page 2 of 4
         
        WARRANTIES OF ANY KIND RESPECTING THE SOFTWARE OR DOCUMENTATION, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF DESIGN, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
        5.2 Limitation of Liability. UNDER NO CIRCUMSTANCES UNLESS REQUIRED BY APPLICABLE LAW SHALL LICENSOR BE LIABLE FOR INCIDENTAL, SPECIAL, INDIRECT, DIRECT OR CONSEQUENTIAL DAMAGES OR LOSS OF PROFITS, INTERRUPTION OF BUSINESS, OR RELATED EXPENSES WHICH MAY ARISE AS A RESULT OF THIS LICENSE OR OUT OF THE USE OR ATTEMPT OF USE OF SOFTWARE OR DOCUMENTATION INCLUDING BUT NOT LIMITED TO THOSE RESULTING FROM DEFECTS IN SOFTWARE AND/OR DOCUMENTATION, OR LOSS OR INACCURACY OF DATA OF ANY KIND. THE FOREGOING EXCLUSIONS AND LIMITATIONS WILL APPLY TO ALL CLAIMS AND ACTIONS OF ANY KIND, WHETHER BASED ON CONTRACT, TORT (INCLUDING, WITHOUT LIMITATION, NEGLIGENCE), OR ANY OTHER GROUNDS.
        6. INDEMNIFICATION.
        Licensee shall indemnify, defend and hold harmless Licensor, the University of Texas System, their Regents, and their officers, agents and employees from and against any claims, demands, or causes of action whatsoever caused by, or arising out of, or resulting from, the exercise or practice of the license granted hereunder by Licensee, its officers, employees, agents or representatives.
        7. TERMINATION.
        If Licensee breaches this UTRL, Licensee’s right to use the Software and Documentation will terminate immediately without notice, but all provisions of this UTRL except Section 2 will survive termination and continue in effect. Upon termination, Licensee must destroy all copies of the Software and Documentation.
        8. GOVERNING LAW; JURISDICTION AND VENUE.
        The validity, interpretation, construction and performance of this UTRL shall be governed by the laws of the State of Texas. The Texas state courts of Travis County, Texas (or, if there is exclusive federal jurisdiction, the United States District Court for the Central District of Texas) shall have exclusive jurisdiction and venue over any dispute arising out of this UTRL, and Licensee consents to the jurisdiction of such courts. Application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded.
        9. EXPORT CONTROLS.
        This license is subject to all applicable export restrictions. Licensee must comply with all export and import laws and restrictions and regulations of any United States or foreign agency or authority relating to the Software and its use.
        10. U.S. GOVERNMENT END-USERS.
        The Software is a “commercial item,” as that term is defined in 48 C.F.R. 2.101, consisting of “commercial computer software” and “commercial computer software documentation,” as such terms are used in 48 C.F.R. 12.212 (Sept. 1995) and 48 C.F.R. 227.7202 (June 1995). Consistent with 48 C.F.R. 12.212, 48 C.F.R. 27.405(b)(2) (June 1998) and 48 C.F.R. 227.7202, all U.S. Government End Users acquire the Software with only those rights as set forth herein.
        The University of Texas at Austin Research License (Source Code) Page 3 of 4
         
        11. MISCELLANEOUS
        If any provision hereof shall be held illegal, invalid or unenforceable, in whole or in part, such provision shall be modified to the minimum extent necessary to make it legal, valid and enforceable, and the legality, validity and enforceability of all other provisions of this UTRL shall not be affected thereby. Licensee may not assign this UTRL in whole or in part, without Licensor’s prior written consent. Any attempt to assign this UTRL without such consent will be null and void. This UTRL is the complete and exclusive statement between Licensee and Licensor relating to the subject matter hereof and supersedes all prior oral and written and all contemporaneous oral negotiations, commitments and understandings of the parties, if any. Any waiver by either party of any default or breach hereunder shall not constitute a waiver of any provision of this UTRL or of any subsequent default or breach of the same or a different kind.
        END OF LICENSE
        
Project-URL: Homepage, http://dorkbot.io
Project-URL: GitHub, https://github.com/utiso/dorkbot
Classifier: Programming Language :: Python :: 3
Classifier: License :: Free for non-commercial use
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

![Image of Dorkbot](https://security.utexas.edu/sites/default/files/Artboard%203_0.png)

dorkbot
=======

Scan Google (or other) search results for vulnerabilities.

dorkbot is a modular command-line tool for performing vulnerability scans against sets of webpages returned by Google search queries or other supported sources. It is broken up into two sets of modules:

* *Indexers* - modules that return a list of targets
* *Scanners* - modules that perform a vulnerability scan against each target

Targets are stored in a database as they are indexed. Once scanned, a standard JSON report is produced containing any vulnerabilities found. Indexing and scanning processes can be run separately or combined in a single command (up to one of each).

Quickstart
==========
* Create a Google API credential via the [Developer Console](https://console.developers.google.com)
* Create a Google [Custom Search Engine](https://www.google.com/cse/) and note the search engine ID, e.g. 012345678901234567891:abc12defg3h
<pre>$ pip3 install dorkbot wapiti3</pre>
<pre>$ dorkbot -i google_api -o key=your_api_credential_here -o engine=your_engine_id_here -o query="filetype:php inurl:id"</pre>
<pre>$ dorkbot -s wapiti</pre>

Help
====
<pre>
 -h, --help            Show program (or specified module) help
</pre>
<pre>
  --show-defaults       Show default values in help output
</pre>

Usage
=====
<pre>
usage: dorkbot [-c CONFIG] [-r DIRECTORY] [--source [SOURCE]]
               [--show-defaults] [--count COUNT] [--random] [-h] [--log LOG]
               [-v] [-V] [-d DATABASE] [-u] [--drop-tables]
               [--retries RETRIES] [--retry-on RETRY_ON] [--show-stats] [-l]
               [-n] [--list-sources] [--add-target TARGET]
               [--delete-target TARGET] [--flush-targets] [-m] [-e]
               [-i INDEXER] [-o INDEXER_ARG] [-s SCANNER] [-p SCANNER_ARG]
               [-t] [-x] [--mark-unscanned MARK_UNSCANNED] [-g] [-f]
               [--fingerprint-max FINGERPRINT_MAX] [--list-blocklist]
               [--add-blocklist-item ITEM] [--delete-blocklist-item ITEM]
               [--flush-blocklist] [-b EXTERNAL_BLOCKLIST]

options:
  -c, --config CONFIG   Configuration file
  -r, --directory DIRECTORY
                        Dorkbot directory (default location of db, tools,
                        reports)
  --source [SOURCE]     Label associated with targets
  --show-defaults       Show default values in help output
  -h, --help            Show program (or specified module) help
  --log LOG             Path to log file
  -v, --verbose         Enable verbose logging (can be used multiple times to
                        increase verbosity)
  -V, --version         Print version

retrieval:
  --count COUNT         number of targets to retrieve (0/unset = all)
  --random              retrieve targets in random order

database:
  -d, --database DATABASE
                        Database file/uri
  -u, --prune           Apply fingerprinting and blocklist without scanning
  --drop-tables         Delete and recreate tables
  --retries RETRIES     Number of retries when an operation fails
  --retry-on RETRY_ON   Error strings that should result in a retry (can be
                        used multiple times)
  --show-stats          Show the total/unscanned target and fingerprint counts

targets:
  -l, --list-targets    List targets in database
  -n, --unscanned-only  Only include unscanned targets
  --list-sources        List sources in database
  --add-target TARGET   Add a url to the target database
  --delete-target TARGET
                        Delete a url from the target database
  --flush-targets       Delete all targets
  -m, --delete-on-match
                        Delete target if it matches blocklist item
  -e, --delete-on-error
                        Delete target if error encountered while processing it

indexing:
  -i, --indexer INDEXER
                        Indexer module to use
  -o, --indexer-arg INDEXER_ARG
                        Pass an argument to the indexer module (can be used
                        multiple times)

scanning:
  -s, --scanner SCANNER
                        Scanner module to use
  -p, --scanner-arg SCANNER_ARG
                        Pass an argument to the scanner module (can be used
                        multiple times)
  -t, --test            Fetch next scannable target but do not mark it scanned
  -x, --reset-scanned   Reset scanned status of all targets
  --mark-unscanned MARK_UNSCANNED
                        Reset scanned status of given target

fingerprints:
  -g, --generate-fingerprints
                        Generate fingerprints for all targets
  -f, --flush-fingerprints
                        Delete all generated fingerprints
  --fingerprint-max FINGERPRINT_MAX
                        Maximum matches per fingerprint before deleting new
                        matches

blocklist:
  --list-blocklist      List internal blocklist entries
  --add-blocklist-item ITEM
                        Add an ip/host/regex pattern to the internal blocklist
  --delete-blocklist-item ITEM
                        Delete an item from the internal blocklist
  --flush-blocklist     Delete all internal blocklist items
  -b, --external-blocklist EXTERNAL_BLOCKLIST
                        Supplemental external blocklist file/db (can be used
                        multiple times)
</pre>

Tools / Dependencies
=====
Database drivers:
* [psycopg](https://www.psycopg.org) (pip install "psycopg[binary]") (preferred)
* [psycopg2](https://pypi.org/project/psycopg2-binary/) (pip install psycopg2-binary)

Scanners:
* [Wapiti](http://wapiti.sourceforge.net/) (pip install wapiti3)
* [Codename SCNR](https://github.com/scnr/installer)
* [Arachni](https://github.com/Arachni/arachni) (deprecated)

As needed, dorkbot will search for tools in the following order:
* Directory specified via relevant module option
* Located in *tools* directory (within current directory, by default), with the subdirectory named after the tool
* Available in the user's PATH (e.g. installed system-wide)

Files
=====
All SQLite databases, tools, and reports are saved in the dorkbot directory, which by default is the current directory. You can force a specific directory with the --directory flag. Default file paths within this directory are as follows:

* SQLite database file: *dorkbot.db*
* External tools directory: *tools/*
* Scan report output directory: *reports/*

Configuration files are by default read from *~/.config/dorkbot/* (Linux / MacOS) or in the Application Data folder (Windows), honoring $XDG_CONFIG_HOME / %APPDATA%. Default file paths within this directory are as follows:

* Dorkbot configuration file: *dorkbot.ini*

Config File
===========
The configuration file (dorkbot.ini) can be used to prepopulate certain command-line flags.

Example dorkbot.ini:
<pre>
[dorkbot]
database=/opt/dorkbot/dorkbot.db
[dorkbot.indexers.wayback]
domain=example.com
[dorkbot.scanners.arachni]
path=/opt/arachni/bin
report_dir=/tmp/reports
</pre>

Database
========
The target database stores urls to be scanned and the sources where they came from. It tracks each url's scanned status by building a list of fingerprints for each unique page + parameter set and comparing new targets to existing fingerprints. Fingerprints only need to be generated once and will be generated on demand as needed. Fingerprints and scanned status may be reset independently.

Supported database addresses:
* postgresql://[connection_string]
* sqlite3:///path/to/sqlite_file.db
* /path/to/sqlite_file.db
* :memory:

Note that for SQLite target databases the protocol is optional (it is still required for external blocklists). Additionally, SQLite's in-memory feature can be used to avoid writing to disk entirely by specifying a database address of ":memory:".

Blocklist
=========
The blocklist is a list of ip addresses, hostnames, or regular expressions of url patterns that should *not* be scanned. If a target url matches any item in this list it will be skipped and removed from the database. The internal blocklist is maintained in the dorkbot database, but a separate file or database can be specified by passing the appropriate file path or connection uri to --external-blocklist. Targets are matched first against the internal blocklist and then optionally against any provided external blocklists.

Supported external blocklists:
* postgresql://[connection_string]
* sqlite3:///path/to/blocklist.db
* /path/to/blocklist.txt

Example blocklist items:
<pre>
regex:^[^\?]+$
regex:.*login.*
regex:^https?://[^.]*.example.com/.*
host:www.google.com
ip:127.0.0.1
</pre>

The first item will remove any target that doesn't contain a question mark, in other words any url that doesn't contain any GET parameters to test. The second attempts to avoid login functions, and the third blocklists all target urls on example.com. The fourth excludes targets with a hostname of www.google.com and the fifth excludes targets whose host resolves to 127.0.0.1.

Prune
=====
The prune flag iterates through all targets, computes the fingerprints, and marks subsequent matching targets as scanned. It also marks blocklist matches as scanned. The result is a database where --list-targets --unscanned-only returns only scannable urls. It honors the **random** flag to process fingerprints in random order and the **unscanned-only** flag to skip scanned targets which necessarily already have fingerprints. The blocklist check honors --delete-on-match to delete the target rather than marking it scanned, and --delete-on-error to delete any entry that fails to parse a valid hostname or resolve the hostname to ip address.

General Options
===============
These options are applicable regardless of module chosen:
<pre>
  --source [SOURCE]     Label associated with targets
  --count COUNT         number of urls to retrieve/scan, or -1 for all
  --random              retrieve urls in random order
</pre>

Additionally, blocklist-related options --delete-on-match / --delete-on-error will cause matching or erroneous targets to be skipped.

Indexer Modules
===============
### General Options ###
<pre>
  http-retries RETRIES     number of times to retry fetching results on error
  threads THREADS          number of concurrent requests to commoncrawl.org
</pre>

### bing_api ###
<pre>
  Searches bing.com

  key KEY             API key
  query QUERY         search query
</pre>

### commoncrawl ###
<pre>
  Searches commoncrawl.org crawl data

  domain DOMAIN       pull all results for given domain or subdomain
  index INDEX         search a specific index
  filter FILTER       query filter to apply to the search
  page-size PAGE_SIZE number of results to request per page
</pre>

### google_api ###
<pre>
  Searches google.com

  key KEY             API key
  engine ENGINE       CSE id
  query QUERY         search query
  domain DOMAIN       limit searches to specified domain
</pre>

### pywb ###
<pre>
  Searches a given pywb server's crawl data

  server SERVER       pywb server url
  domain DOMAIN       pull all results for given domain or subdomain
  cdx-api-suffix CDX_API_SUFFIX
                      suffix after index for index api
  index INDEX         search a specific index
  field FIELD         field (fl) to query
  filter FILTER       query filter to apply to the search
  page-size PAGE_SIZE
                      number of results to request per page
</pre>

### stdin ###
<pre>
  Accepts urls from stdin, one per line
</pre>

### wayback ###
<pre>
  Searches archive.org crawl data

  domain DOMAIN       pull all results for given domain or subdomain
  filter FILTER       query filter to apply to the search
  page-size PAGE_SIZE number of results to request per page
  from FROM           beginning timestamp
  to TO               end timestamp
</pre>

Scanner Modules
===============
### General Options ###
<pre>
  args ARGS           space-delimited list of additional arguments
  report-dir REPORT_DIR
                      directory to save report file
  report-filename REPORT_FILENAME
                      filename to save vulnerability report as
  report-append       append to report file if it exists
  report-indent REPORT_INDENT
                      indent level for vulnerability report json
  label LABEL         friendly name field to include in vulnerability report
</pre>

### arachni ###
<pre>
  Scans with the arachni command-line scanner

  path PATH           path to scanner binary
</pre>

### scnr ###
<pre>
  Scans with the scnr command-line scanner

  path PATH           path to scanner binary
</pre>

### wapiti ###
<pre>
  Scans with the wapiti command-line scanner

  path PATH           path to scanner binary
</pre>
