Crawling, also known as Spidering is a process of systematic browsing and indexing webpages. Each of subpages, each of links and connections are visited and the whole of interiors of a webpage are indexed.
Goals of crawling are:
- Collecting information about content and connections and structure
- Crawling is also used by the internet search engines like the Google or Bing for
example
Important part of crawling is adhering to ethics, and adhering to rules described in the robots.txt file.
Crawling in cybersecurity differs somewhat from crawling used by the internet search engines.
Crawling in Cybersecurity:
- Search for hidden endpoints
- Used technologies and frameworks identification
- Finding potential vulnerabilities
- Discovery of sensitive data
- Increasing the 'Attack Surface'
The more of links and data we find, the larger our attack surface is, and we have greater chance of finding vulnerabilities.
Crawlers are tools that can help us aggregate more of inforcmations we collect as a part of reconnaissance before the exploitation phase.
EyeWitness is a tool that doesn't 'walk around the structure of the webpage', but it can help us in one of reconnaissance's aspects, it can help us to get a clue about of what type of the page we are analyzing is.
In earlier article we did look for subdomains of a webpage, and created a list of these. Now we need to check what these contains, but manual testing is time consuming when amount of webpages is big (in practice it can be hundreds or more). EyeWitness tool checks for domains and subdomains and makes screenshots that tell us of what is in there. Then the report is created that we can quickly browse.
This allows us to prioritize targets, save time (manual browsing of hundreds of webpages would take days or even weeks of time), and nominate potential targets for our pentesting attacks. Time for performing pentest is usually limited, so it's professional approach to save the precious time.
For example, if we notice a login panel then we can want to visit it first. Same is with webpages that display errors, it's worth to check these early as these might contain vulnerabilities easy to detect, so called: 'low hanging fruits'.
Katana is intelligent and fast webcrawler that analyzes javascript among other clues when searching through page's structure. It can be downloaded from github: https://github.com/projectdiscovery/katana. Quickness is professional and important too, as we mentioned earlier.
Katana can work in one of two modes:
- standard: fast, doesn't create DOM object model, nor analyze javascript
- headless: slow, emulates a web browser and analyzes javascript
When using crawlers, it's important to:
- determine the depth of the search. usually, it's optimal to use depth of 3-5.
- limit search to a selected scope (for example: -sc parameter in the katana
webcrawler)
- set up requests limits (per second)
- filter results
- set up scanning modes
Details in crawler tools with --help or -h parameters.
Gospider is another webcrawler tool, easier to use than advanced tools like katana.
There are other tools for webcrawling as well.
- Burp Suite with 'Spider' module
- Zap: Open Source alternative to the Burp Suite, also has the 'Spider' module
equivalent
After we have webpage's structure written, we can use webscrapers that 'scrape' conents of the webage.
There are tools just for this case:
- Scrapy (written in Python)
- JSoup (written in Java)
Introduction to Network Scanning.
Before going to learn about scanning & enumeration, it's worth to refresh knowledge or learn network protocols as OSI or TCP/IP.
Nmap - classic, legendary tool for used by pentesters. During it's lifetime, Nmap evolved from a simple ports scanner to advanced tool for:
- Hosts discovery,
- Ports scanning,
- Detecting services with version numbers,
- Detecting Operating Systems.
Additionally, Nmap Scripting Engine allows us to write scripts that are useful for advanced scanning of specific services.
Nmap offers varied scanning techniques as well.
Scanning with Nmap, and specificially - usage of basic SYN SCAN is based on the three-way handshake mechanism, known from TCP/IP protocol where it's used to establish TCP connection.
1. Nmap sends SYN message to a given IP address and port.
2. Nmap receives host's response and analyzes it. Or, if there's no response from
the host, it's also information to interpret.
Nmap's SYN SCAN scanning is relatively fast and hard to detect, because we never establish full TCP connection. After receiving SYN-ACK response from the host, we reply with RST message, which means: 'Reset'.
Example of Nmap's use: nmap megaclinic.pl -sS --top-ports 10 -vv.
-sS switch means to use basic scanning (SYN SCAN).
--top-ports 10 means command to scan 10 of most popular ports.
-vv means to work in verbose mode, to give more informations.
It's worth to familiarize oneself with and memorize 30-40 of the most popular ports with associated services. So, for example, if we see an open port 3306 we instantly know that MySQL service is open. Or if we see 1433 we know that it's MSSQL.
Links:
- List of popular ports,
- List of all ports registered in IANA.
Hosts Discovery in Computer Networks.
Often, when agreeing to do pentesting, we have scope of the test determined. For example, we might need to test hosts in IP Range: 10.0.0.1 - 10.0.0.255. But how many hosts are in this IP Range? We need to discover hosts before we can scan for open ports.
There are public and private IP Adresses. Public IP Adresses:
- Available in the whole of the Internet
- Reachable from any place
- Must be unique
Private IP Adresses:
- Used within network
- Unreachable from outside
Private Adresses Pools:
- 10.0.0.0 - 10.255.255.255 (over 16 millions of adresses)
- 172.16.0.0 - 172.31.255.255 (over 1 million of adresses)
- 192.168.0.0 - 192.168.255.255 (over 65 thousands of adresses)
Nmap tool can be used to discover active hosts within subnet. For example, we can issue a command of:
nmap 192.168.0.0/16 -sn , where:
- last number of 16 is a 'mask' that says that first 16 bits of IP Adress do not
change, to scan the rest of network (192.168.0.0 - 192.168.255.255).
IPv4 Address has length of 32 bits, it consists of 4 octets, 8 bits each.
- -sn parameter means that we want to scan only for active hosts and their
IP Adresses. We do not want to scan the ports for now.
We can narrow the search scope by specifying scanned hosts IP Address range:
- nmap 192.168.0.1-15 -sn
- nmap 192.168.0.20,21,22,23 -sn
- nmap 192.168.0.20,21,22,23 192.168.0.50-100 -sn
We should reduce amount of scans to minimume, so it's useful to write scan results to file, using -oA parameter.
When executed with -oA parameter, the nmap generates 3 output files. One of them has .xml extension that other tools as nikto for example can take as input. This makes it easier to integrate the tools.
Nmap's output .xml file can be also converted to nicely readable html file, using the xsltproc tool.
When we'll scan for open ports, the .html files generated by nmap and xsltproc, we can use as a part of the pentest report documentation, that we'll make at the end of the pentesting process.
Another tool, beside Nmap, for hosts discovery is arp-scan.
Nmap for scanning networks uses mechanisms of ping and similar ICMP methods, and
arp-scan is different - it uses the ARP protocol.
Arp-scan can be executed using command:
- sudo arp-scan --localnet (for basic scanning)
- sudo arp-scan --localnet --format='${IP}' (formatting to generate only IP Adresses)
- sudo arp-scan --localnet --format='${IP}' --plain > hosts-local.arpscan
(for generating IP Adress list file: hosts-local.arpscan)
Nmap can use yet another way for accepting input list of IP Adresses to scan: getting IP Address list from input file. This can be done using -iL parameter.
If we want to scan for 10 of the most popular of ports (using the above-mentioned
SYN SCAN method), for a list of IP Adresses contained in a file, we can issue a command, for example:
- sudo nmap -sS --top-ports 10 -iL hosts-local.arpscan -oA hosts-top10-ss
Then, we can use xsltproc tool to generate nice .html report file.
Netdiscover is another tool for discovering hosts in network. It can work in the 'listen mode', which can be useful in the WiFi networks for example.
It can be executed in the 'passive mode', using the -p parameter. Passive scanning won't reveal whole list of hosts we want to discover, but can be useful when we do not want to be noticed.
Sometimes we do not have access to inner network of a company, and we do not have access to all of above mentioned advanced tools for hosts discovery. We still can use basic tools like: ping.
'Ping Sweep' one-liner command is:
- for ip in $(seq 1 254); do ping 192.158.54.$ip -c1 -W1 & done | grep ttl
-c1 means count 1, making only 1 ping to be sent to scanned ip address
-W1 means short timeout, so we do not wait too long for report
grep ttl makes only succesful pings to be displayed (lines that contain: 'ttl'
pattern)

To get a plain list of IP Adresses, we can extract information we need using: 'cut' command:
- for ip in $(seq 1 254); do ping 192.158.54.$ip -c1 -W1 & done | grep ttl | cut -d " " -f4 | cut -d ":" -f1

Nmap: Techniques & Scan Modes.
Nmap can be run in various modes, can accept many input parameters:
- With -sS parameter, the full TCP connection is not established
- With -sT parameter, the full TCP connection is established, then it's closed with
RST ICMP message
- it's much more 'noisy' and leaves tracks in the scanned system
- it can be executed without root privileges - no need to prepare special packets to
send
- it's default scanning mode in nmap
- it's much slower than partial scan (-sS)
- With -sX parameter, we execute 'XMas Scan', where sent packet has 3 flags lit:
'FIN', 'PSH', 'URG", so it shines (and leaves tracks) like a XMas Tree
- it's specific, unusual scan mode that can be used to test firewall settings, for
example
- With -sN parameter, we execute 'Null Scan', where sent packets have no flags lit
- With -sA parameter, we execute 'ACK Scan', where sent packets have 'ACK' flag lit
- With -sF parameter, we execute 'FIN Scan', where sent packets have 'FIN' flag lit
The details of different scan modes can be analyzed using the Wireshark Network Packet Analyzer, for example.
Ports to scan can be passed to Nmap in various ways:
- With -p parameter, then with port number
for example: nmap 192.168.0.0 -p 2121
- With -p parameter, then with ports range
for example: nmap 192.168.0.0 -p 1-100
- With -p parameter, then with ports list
for example: nmap 192.168.0.0 -p 21,22,80,777
- With -p- parameter, meaning we scan all of possible ports
- it takes long time, we should think before whether we need to scan all ports
- With --top-ports parameter, then with number meaning count of top ports
for example: nmap 192.168.0.0 --top-ports 10
Port numbers can range from 0 - 65535, port number can be written using 16 bits.
Ports, like IP Adresses also have their 'pools':
- Well known ports: 0 - 1023
- Registered ports / IANA: Internet Assigned Numbers Authority /: 1024 - 49151
- Private ports: 49152 - 65535
Nmap also offers various tiers of aggressiveness and of scan speed. These are selected by using the -T parameter, with range of 0-5. T0 means the 'paranoia mode', it's very difficult to detect, but it's very, very slow. It uses only one port at the time, and waits 5 minutes between sending packets. It can be used to avoid the IDS (Intrusion Detection Systems). T4 is called: 'aggressive scan', and is very fast - we don't wait for more than 10 miliseconds between packets. Default Nmap mode is T3.
for example: nmap 192.168.0.0 -p 1-100 -T0.
There are other techniques that can make scanning to be more difficult to detect
- Fragmentation of packets into smaller packets, with -f parameter
for example: nmap -sS -Pn 192.168.0.0 -p 2121 -f
- Nonstandard packet length, with --data-length parameter
for example: nmap -sS -Pn 192.168.0.0 -p 2121 --data-length 100
Finally, the --max-rate parameter can be used to limit scanning to n scans per second. for example: nmap -sS -Pn 192.168.0.0 -p 2121 --max-rate 50
can be used to limit scanning to 50 scans per second.
Nmap: Services Enumeration.
Continuing the nmap topic, there's parameter: -sV that makes nmap display versions of the services on open ports we scan.
It's slow and noisy operation, but in pentesting job, when hacking legally, leaving tracks in logs is usually not a big issue. Likewise with firewalls - when performing pentests we can ask the admins to disable the firewall sometimes, because we do not test only the firewall's tightness, but also handle the scenarios where outsider hacker went past the firewall. All of details of these rules should be agreed with customer during the pre-engagement phase of the pentest, of course.
Not always, however, the nmap will provide detailed versions information as on image above. This time, services introduced themselves to us, so nmap used the technique known as: 'Banner Grabbing' and captured the services introductions, then presented it to us.
When services do not introduce themselves, then nmap uses many varied techniques to identify the service(s).
Why the version numbers are important?
If the services are in old, outdated versions, then there's bigger chance to find vulnerabilities of these, or even premade exploits that we can use.
When we do not have access to nmap, we can use other tools as: 'netcat' for example. This goes beyond the Banner Grabbing alone, it allows us to talk with the service we connect with.
In cybersecurity, we remove the 'introduction banners' of services that are active on open ports, or even make these services to give false introductory informations, to mislead the attackers.
As we can see, nmap is not some magical, always-right tool, sometimes it can be misled. But in practice it's still very useful thing, that makes pentester's work easier and faster, can be used to automate the scanning and version banner grabbing. It also can use other methods for services version finding, not only banner grabbing.
With nmap, there comes NSE - Nmap Scripting Engine. Scripts have many pre-made operations that can attempt scanning, testing headers, testing for certain vulnerabilities, or even attempt logging in into services, and are written in the LUA language.
NSE scripts are divided into following categories:
- auth, for testing authentication
- default, default scripts that are executed by giving nmap -sC input parameter
- discovery, that allow us to discover additional resources and informations
- exploit, for testing known vulnerabilities
- brute, for testing brute-force attacks
- safe, secure scripts that do not intrude and do not impact our target
Some scripts can cause irrevocable changes in our target, so should be used with care.
In Kali Linux, there are many premade nmap scripts, located in the: /usr/share/nmap/scripts folder.
When trying to browse through premade nmap scripts, we can use ls linux command with grep filter.
For example:
--script-help parameter.
For example:
nmap --script-help default
nmap --script-help safe
Let's try the 'http-headers' script:

Nmap with -sC parameter executes many scripts from the default category. It decides which scripts to use automatically.
For example:
It's common to use the combination of -sV and -sC parameters when doing scans.
No comments:
Post a Comment