NickJS is a Javascript wrapper for headless browsing. If you still have scripts written for PhantomJS or CasperJS, NickJS is your best tool moving forward. As a stand-alone product, NickJS is also a pretty good simple web scraper.
Step 1 - Add an environmental variable to your ~/.bash_profile
.
export CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
Step 2 - Create a Simple Scraper
This little script will collect title, URL and description from this website's homepage. What's unique to understand is that you can actually inject jQuery into a website for DOM manipulation.
const Nick = require("nickjs");
const nick = new Nick();
(async () => {
// Step 1: Do an action
const tab = await nick.newTab()
// Step 2: Wait for the action to have an effect
await tab.open("chrisjmendez.com")
await tab.untilVisible(".main-content-area") // Make sure we have loaded the page
// Step 3: Use jQuery to scrape
await tab.inject("https://code.jquery.com/jquery-3.2.1.min.js")
const myLinks = await tab.evaluate( (arg, callback) => {
// Here we're in the page context. It's like being in your browser's inspector tool
const data = [];
$(".content-area-wrap article").each((index, element) => {
data.push({
title: $.trim( $(element).find(".title").text() ),
url: $(element).find(".title a").attr("href"),
desciption: $.trim( $(element).find(".post-content").text() )
});
});
callback(null, data);
})
console.log(myLinks);
//console.log(JSON.stringify(myLinks, null, 2))
})()
.then( () => {
console.log("Job done!")
nick.exit()
})
.catch( (err) => {
console.log(`Something went wrong: ${err}`)
nick.exit(1)
})
Resources
- PhantomBuster is the managed service for NickJS scripts.
- NickJS on Github
Subscribe to new posts
Processing your application
Please check your inbox and click the link to confirm your subscription
There was an error sending the email