Command line crawling with Screaming Frog SEO Spider.

Use `apt-get` to install Screaming Frog

Visit Screaming Frog's Check Updates page to identify the latest version number.
Update apt-get

sudo apt-get update

Install Screaming Frog

wget https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_18.2_all.deb -P /path/to/download/dir

Install the package

sudo dpkg -i /path/to/download/dir/screamingfrogseospider_18.2_all.deb

Verify installation

which screamingfrogseospider

If you're unsure where to download your package, you can always use /usr/local/bin. I've included a diagram below to see other common places to use within the Ubuntu file directory.*

Configure

Add your paid license in headless mode.

Create a new license.txt file within a hidden directory called .ScreamingFrogSEOSpider.

sudo nano ~/.ScreamingFrogSEOSpider/licence.txt

Paste your license.

screaming_frog_username
XXXXXXXXXX-XXXXXXXXXX-XXXXXXXXXX

Accept the EULA

Create a new spider.config file within the same directory.

sudo nano ~/.ScreamingFrogSEOSpider/spider.config

Paste this acceptance agreement.

eula.accepted=11

Choose Your Storage Mode

In-Memory Mode

If you want to change the amount of memory, you want to allocate to the crawler, then create another configuration file.

sudo nano ~/.screamingfrogseospider

Suppose you want to increase your memory to 8GB. Here's the configuration detail.

-Xmx8g

If you're unsure of your available memory, try this command.

free -h

Database Mode

Your default mode is in-memory, but you might want to add a database file if you're dealing with stats like these.

Crawls < 200k URLS (8GB of RAM)
Crawls > 1M+ (16GB of RAM)

If you use a database instead of in-memory, add this to spider.config.

storage.mode=DB

Disable the Embedded Browser

Since we're working in headless mode, we'll want to disable the embedded browser.

embeddedBrowser.enable=false

Let's Start Crawling

Create a directory for crawls

mkdir ~/crawls-2023wk08

Minimalist example.

screamingfrogseospider --crawl https://www.chrisjmendez.com --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output

About Command Line Options

There is a list of available flags. Below are required to accomplish a basic example.
--crawl is the URL to crawl.
--headless is required for command line processes.
--save-crawl saves your data to a crawl.seospider.
--output-folder

where you want to save your file.
--timestamped-output creates a timestamped folder for crawl.seospider helps prevent crawl collisions from your previous processes.

Advanced Example

# screamingfrogseospider --crawl https://www.chrisjmendez.com --headless --save-crawl --output-folder ~/crawls-2023wk08 --timestamped-output --create-images-sitemap

--create-images-sitemap creates a sitemap from the completed crawl.

Resources

URLs

Screaming Frog Command Line Info

Installing Screaming Frog on Ubuntu via command line

Use `apt-get` to install Screaming Frog

Configure

Add your paid license in headless mode.

Accept the EULA

Choose Your Storage Mode

In-Memory Mode

Database Mode

Disable the Embedded Browser

Let's Start Crawling

Resources

URLs

Diagrams

Use apt-get to install Screaming Frog

Configure

Add your paid license in headless mode.

Accept the EULA

Choose Your Storage Mode

In-Memory Mode

Database Mode

Disable the Embedded Browser

Let's Start Crawling

Resources

URLs

Diagrams

Use `apt-get` to install Screaming Frog