Ways to scrape data
There are many situations where you may need to scrape data. Data scraping development is really an art form (of its own) and the complexity of a project can range from a giant aggregator —written to capture, parse and store data— to something really small like a single function connected to a timer (aka cronjob) that yanks data from Craigslist.
This article is for new developers looking for "quick wins" from a few noteworthy tools. I will continue to update this with more info over time.
Narrow Google Searches
I use this trick all the time for my own blog. Let's say you are looking for an article about Ruby Rake tasks on my blog. If you really want to narrow the search, simply type this into Google.
rake task site:chrisjmendez.com
This will only pick articles from within my website that mention "rake" or "tasks" from within http://www.chrisjmendez.com.
Explicit Google Searches
This example is really useful while scraping Twitter. Suppose your social media manager uses bit.ly to encode links on Twitter. Lets say that during a Downtown LA campaign, she and her team posted a handful of Tweets with the hash "#dtla2017". Weeks later, you're trying to do an audit of the tweets and you don't want to bug everyone with your inquiry. Here's how to seach within Twitter for anything with a bit.ly URL and a reference to the keyword "#dtla2017".
site:twitter.com intext:bit.ly "#dtla2017 *"
site:twitter.com intext:bit.ly "classicalkusc *"
You can time-box your search (and feed) by adjusting the date parameter
dateRestrict=. More ».
Google alerts is still a great way to get notifications based on keywords you specify. This is especially useful in situations where you want to monitor a business competitor's moves or maybe track your own name online.
Yahoo Pipes Clones
Yahoo Pipes was an incredible piece of software and although it's no longer in production, there area few clones worth looking into.
You can you If This Then That for Ebay.
- Datas crape Ebay
- Data scrape Craigslist
- Data scape Twitter
- Data scrape SongKick
- Data scrape stock quotes
- Data scrape the Scoop.it feed focused on for Artists Opportunities and publish it to Pocket through IFTTT.
Feedity provides a service to scrape web pages into feeds.
Google API RSS
Google API RSS tool helps you create RSS feeds for Google Search Results.
You can go one step beyond Google API and start screen scraping using Google Spreadsheets