Chris Mendez in For Developers

How to Get Data with YQL (Yahoo Query Language)

Does Not Work Anymore

This article is simply here for archival purposes. The Yahoo YQL website is incredible for data scraping and crawling. The examples below were used to retrieve data and parse out the search results using xpath.

Get data from a specific table cell:

select * from html where url ="http://library.usc.edu/uhtbin/cgisirsi/x/0/0/5?searchdata1=nirvana" and xpath="/html/body/table[3]/tr/td[3]/div/form/table[2]/tr/td/table/tr/td"  

Get the number of results:

select * from html where url ="http://library.usc.edu/uhtbin/cgisirsi/x/0/0/5?searchdata1=nirvana" and xpath="//td[@class='searchsum']"  
select * from html where url ="http://library.usc.edu/uhtbin/cgisirsi/x/0/0/5?searchdata1=nirvana" and xpath="//td[@class='searchsum']/table/tr/td[@class='itemlisting' or @class='itemlisting2']/.."  

Get the call number:

//td[@class="searchsum"]/table[position()>1]/tr[1]/td[2]/strong
title: //td[@class="searchsum"]/table[position()>1]/tr[2]/td/label/strong/a  
var keyword = "nirvana"  
var query = 'select * from html where url ="http://library.usc.edu/uhtbin/cgisirsi/x/0/0/5?searchdata1=' + keyword + '" and xpath="//td[@class=\'searchsum\']/table/tr/td[@class=\'itemlisting\' or @class=\'itemlisting2\']/.."';  

Add a limit to your search results:

select * from html where url ="http://library.usc.edu/uhtbin/cgisirsi/x/0/0/5?searchdata1=nirvana" and xpath="//td[@class='searchsum']/table[position()>1]" limit 6