Command line crawling with Screaming Frog SEO Spider.

Use apt-get to install Screaming Frog

  1. Visit Screaming Frog's Check Updates page to identify the latest version number.

  2. Update apt-get

sudo apt-get update
  1. Install Screaming Frog
wget -P /path/to/download/dir
  1. Install the package
sudo dpkg -i /path/to/download/dir/screamingfrogseospider_18.2_all.deb 
  1. Verify installation
which screamingfrogseospider 
  • If you're unsure where to download your package, you can always use /usr/local/bin. I've included a diagram below to see other common places to use within the Ubuntu file directory.*


Add your paid license in headless mode.

Create a new license.txt file within a hidden directory called .ScreamingFrogSEOSpider.

sudo nano ~/.ScreamingFrogSEOSpider/licence.txt

Paste your license.


Accept the EULA

Create a new spider.config file within the same directory.

sudo nano ~/.ScreamingFrogSEOSpider/spider.config

Paste this acceptance agreement.


Choose Your Storage Mode

In-Memory Mode

If you want to change the amount of memory, you want to allocate to the crawler, then create another configuration file.

sudo nano ~/.screamingfrogseospider

Suppose you want to increase your memory to 8GB. Here's the configuration detail.


If you're unsure of your available memory, try this command.

free -h

Database Mode

Your default mode is in-memory, but you might want to add a database file if you're dealing with stats like these.

Crawls < 200k URLS (8GB of RAM)
Crawls > 1M+ (16GB of RAM)

If you use a database instead of in-memory, add this to spider.config.


Disable the Embedded Browser

Since we're working in headless mode, we'll want to disable the embedded browser.


Let's Start Crawling

  1. Create a directory for crawls
mkdir ~/crawls-2023wk08
  1. Minimalist example.
screamingfrogseospider --crawl --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output  

About Command Line Options

There is a list of available flags. Below are required to accomplish a basic example.
--crawl is the URL to crawl.
--headless is required for command line processes.
--save-crawl saves your data to a crawl.seospider.

where you want to save your file.
--timestamped-output creates a timestamped folder for crawl.seospider helps prevent crawl collisions from your previous processes.

  1. Advanced Example
# screamingfrogseospider --crawl --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output --create-images-sitemap

--create-images-sitemap creates a sitemap from the completed crawl.




Cheatsheet regarding where to put your files