Web scraping for content auditing and project planning
Here at USC Radio Group, the new media team is currently working on a redesign of Classical KUSC and Classical KDFC. Both websites currently live on different content management systems and one of our digital goals is to migrate them to Wordpress. We like Wordpress; the plug-ins are plentiful, the community is strong and there's no shortage of answers on StackOverflow.
As any project manager will tell you, during the planning stages of a redesign, one of the first things any team should do is conduct a content audit. Not only does it help managers understand the scope of the project, it also gives radio programming a chance to reflect on content. There are many tools we can use to create a content audit including a CSV export, a PHP MyAdmin export, a MySQL Dump, or even a hand-crafted Google Spreadsheet but there's an even cooler tool that has almost been around for as long as the Web has existed, Web Scrapers. Recently, the team discovered Crawly and proved to be the perfect time saver.
Crawly is brilliant because it both scrapes your data and presents it in a beautifully packaged JSON (or CSV) format. For a software developer, JSON is the best format to package data and Crawly doesn't it without any extra work.