Installing Screaming Frog on Ubuntu via command line

Crawl millions of URLs via command line using Screaming Frog on Ubuntu.

Installing Screaming Frog on Ubuntu via command line

Command line crawling with Screaming Frog SEO Spider.

Use apt-get to install Screaming Frog

  1. Visit Screaming Frog's Check Updates page to identify the latest version number.

  2. Update apt-get

sudo apt-get update
  1. Install Screaming Frog
wget https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_18.2_all.deb -P /path/to/download/dir
  1. Install the package
sudo dpkg -i /path/to/download/dir/screamingfrogseospider_18.2_all.deb 
  1. Verify installation
which screamingfrogseospider 
  • If you're unsure where to download your package, you can always use /usr/local/bin. I've included a diagram below to see other common places to use within the Ubuntu file directory.*

Configure

Add your paid license in headless mode.

Create a new license.txt file within a hidden directory called .ScreamingFrogSEOSpider.

sudo nano ~/.ScreamingFrogSEOSpider/licence.txt

Paste your license.

screaming_frog_username
XXXXXXXXXX-XXXXXXXXXX-XXXXXXXXXX

Accept the EULA

Create a new spider.config file within the same directory.

sudo nano ~/.ScreamingFrogSEOSpider/spider.config

Paste this acceptance agreement.

eula.accepted=11

Choose Your Storage Mode

In-Memory Mode

If you want to change the amount of memory, you want to allocate to the crawler, then create another configuration file.

sudo nano ~/.screamingfrogseospider

Suppose you want to increase your memory to 8GB. Here's the configuration detail.

-Xmx8g

If you're unsure of your available memory, try this command.

free -h

Database Mode

Your default mode is in-memory, but you might want to add a database file if you're dealing with stats like these.

Crawls < 200k URLS (8GB of RAM)
Crawls > 1M+ (16GB of RAM)

If you use a database instead of in-memory, add this to spider.config.

storage.mode=DB

Disable the Embedded Browser

Since we're working in headless mode, we'll want to disable the embedded browser.

embeddedBrowser.enable=false

Let's Start Crawling

  1. Create a directory for crawls
mkdir ~/crawls-2023wk08
  1. Minimalist example.
screamingfrogseospider --crawl https://www.chrisjmendez.com --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output  

About Command Line Options

There is a list of available flags. Below are required to accomplish a basic example.
--crawl is the URL to crawl.
--headless is required for command line processes.
--save-crawl saves your data to a crawl.seospider.
--output-folder

where you want to save your file.
--timestamped-output creates a timestamped folder for crawl.seospider helps prevent crawl collisions from your previous processes.

  1. Advanced Example
# screamingfrogseospider --crawl https://www.chrisjmendez.com --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output --create-images-sitemap

--create-images-sitemap creates a sitemap from the completed crawl.

Resources

URLs

Diagrams

Cheatsheet regarding where to put your files