Chris Mendez in For Developers, Tools for Software Engineers, Data Science

Datasets for machine learning, data mining, app mashups and research

As a market researcher, app producer and software entrepreneur, I use a lot of different data sets for either research or tell stories. Here are a few great repositories I use regularly:

Market Research

  • USA tapestry data 2015 is great for cross referencing where people live and the type of psychographic lifestyle they exhibit.
  • USA Demographic data 2015 is the foundation of marketing. Age, sex and location and ethnicity.
  • General Social Survey from the National Opinion Research Center offers the most often used survey data on happiness in the U.S. Since 1972.
  • Gallop poll and Gallop Analytics provide a variety of data sets focusing on:
    • economic confidence
    • employment
    • entrepreneurial energy
    • confidence in leadership
    • confidence in military and police
    • religion
    • food access
    • corruption
    • freedom of media
    • life evaluations

App mashups

Data sources for civic engagement

  1. City data has collected and analyzed data from numerous sources to create as complete and interesting profiles of all U.S. cities as we could.
  2. Envirofacts Envirofacts provides a single point of access to U.S. EPA environmental data contained in U.S. EPA databases. Interested parties from State and local governments, EPA or other Federal agencies, or individuals can search for information about environmental activities that may affect air, water, and land anywhere in the United States. Envirofacts makes it easy to find information using an address, ZIP Code, city, county, water body, or other geographic designation. Envirofacts make it easy to find information from all sources or within specific environmental subject areas, such as Waste, Water, Toxics, Air, Radiation, and Land. Experienced users can use more sophisticated capabilities such as maps or customized reporting.
  3. U.S. Census The American Community Survey 5 Year Data covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population.

Crowdsourced city

  1. FourSquare The foursquare API gives you access to all of the data used by the foursquare mobile applications, and, in some cases, even more.
  2. Yelp The Yelp v2.0 API enables access to more relevant search results that more closely match the results on Yelp.
  • Find up to the 40 best results for a geographically-oriented search
  • Sort results by the best match for the query, highest ratings, or distance
  • Limit results to those businesses offering a Yelp Deal, displaying information about the deal such as title, savings and purchase URL
  • Identify and display whether a business has been claimed on


  1. Dark sky API lets you query for short-term precipitation forecast data at geographical points inside the United States.
  2. Weather API alerts, almanac, astronomy, conditions, currenthurricane, forecast, forecast10day, geolookup, history, hourly, hourly10day, planner, rawtide, satellite,tide,webcams,yesterday


Here's my curated list of Map API's which also include geo-coding and GIS.


Machine Learning

Even More

I found this post on Stack exchange and it's very good.