Free Online Public Data Sources: An Annotated Bibliography

Free Online Health Data Sources: An Annotated Bibliography

By William Murakami-Brundage

1st edition, February 2012

There exists a shortage of usable data sets and public health data. Whether your interest is biomedical engineering, health informatics, data mining, or public health analysis, this annotated bibliography should contain something that will aid your search for knowledge. It is my pleasure to compile this resource for you, and I hope that you find it as useful as I have during my work as a health informaticist and data scientist. Thank you for using this research in your work, and I wish you the best on your data endeavors.

For this first edition, this bibliography is compiled alphabetically. As things progress and this work grows, it can be certain that a different shape will emerge. At the same time, the basic concept still holds true: keep it simple. If you are looking for a database, data set, visualization tool, or government health data fact, you can probably find it within one of these data sets. Please feel free to write me at with any specific data requests or questions, and I will be happy to aid you if possible.

An Annotated Bibliography: Free Online Health Data Sources

  1. caBIG Knowledge Center:

The caBIG Knowledge Center is a databank hosted by the National Cancer Institute. Under its umbrella are a wiki, a forum, and a whole host of databanks. These include: caGrid Knowledge Center, Clinical Trials Management Systems Knowledge Center, Data Sharing and Intellectual Capital Knowledge Center, Imaging Knowledge Center, Molecular Analysis Tools Knowledge Center, Tissue/Biospecimen Banking and Technology Tools Knowledge Center, Vocabulary Knowledge Center, and the Development Code Repository, a Subversion server dedicated to knowledge center development code.

2.  Centers for Disease Control:

This repository includes data and statistics via topic, including: Aging, blood disorders, cancer, chronic diseases, deaths, diabetes, genomics, growth charts, heart disease, immunizations, life expectancy, MRSA, oral health, overweight & obesity, physical inactivity, reproductive health, smoking & tobacco, STDs, vital signs, and the workplace.

3.  Centers for Medicare and Medicaid Services, Data Compendium:

“The CMS Center for Strategic Planning produces an annual CMS Data Compendium to provide key statistics about CMS programs and national health care expenditures. The CMS Data Compendium contains historic, current, and projected data on Medicare enrollment and Medicaid recipients, expenditures, and utilization. Data pertaining to budget, administrative and operating costs, individual income, financing, and health care providers and suppliers are also included. National health expenditure data not specific to the Medicare or Medicaid programs is also included making the CMS Data Compendium one of the most comprehensive sources of information available on U.S. health care finance. This CMS report is published annually in electronic form and is available for each year from 2002 through present.”

4.  Community Health Profile:  National Aggregate of Urban Indian Health Organization Service Areas, December 2011:

This report contains statistical data for the Urban Indian Health Institute’s research: topics include sociodemographics, mortality, access to care, alcohol use, and environmental, heart, mental, and maternal/child health. Compiled from the national service areas located within the USA.


Includes data tools and data sets: for example, Fiscal data for public schools and universities, common data core sets, educational progress and primary/postsecondary data. Data sets include legal data, Federal resources, and trends in science and mathematics for students. Data sets are in a variety of formats, XML, CSV, and XLS.


“You’ve found a public resource designed to bring together high-value datasets, tools, and applications using data about health and health care to support your need for better knowledge and to help you to solve problems. These datasets and tools have been gathered from agencies across the Federal government with the goal of improving health for all Americans. Check back frequently because the site will be updated as more datasets and tools become available”

Key elements include a massive index of health data sets: Medicare, geographic data, medical record system adoption, child welfare, and assisted reproduction data. There is a health apps repository/demo site, and a small collection of other data sources that bears looking at, especially for 1. California’s health data, and 2. The Gallup Poll Well-Being Index.

7.  Educational Data Partnership, California’s K-12 Schools:

Data for all of California’s public school system, by State, County, District, and school. Also includes reports, teacher salaries, and data about charter schools.

8.  FastStats A to Z:

FastStats has data for any illness or major life complication that could arise for a citizen of the USA. A small sample includes: American Indian or Alaskan Native health, assault/homicide, cancer, deaths/mortality, emergency department visits, immunizations, kidney disease, life expectancy, marriage, Mexican American health, obesity/overweight, pertussis, smoking, and teen pregnancy. If it is a life-changing event, chances are good that FastStats has at least basic data for it.

9.  Federal Government IT Dashboard:

“The IT Dashboard is a website enabling federal agencies, industry, the general public and other stakeholders to view details of federal information technology investments. The purpose of the Dashboard is to provide information on the effectiveness of government IT programs and to support decisions regarding the investment and management of resources. The Dashboard is now being used by the Administration and Congress to make budget and policy decisions.

Importantly, there are analysis tools and data feeds, not quite a data set. Also, the source code is available for the IT Dashboard.

10.  Health and Human Services Open Data Initiative:

Includes details for mHealth Initiative, Startup America, and health data competitions. Also includes data about executive orders and records and reports.

11.  Health Indicators Warehouse:

The Health Indicators Warehouse has data sets sorted by topic, geography, and initiative. Example data sets include: Chronic Diseases, Disabilities, Health Care, County data, Community Health Data Indicators, and CMS Community Indicators. Also, data sets are available for all 50 states and Washington, D.C.

12.  HealthyPeople 2020:

The Healthy People 2020 Initiative is dedicated to creating a health environment for everyone, and contains data and publications that strive to meet this goal. It has a specific focus on health disparities and prevention efforts.

13.  Justice Department’s Open Data Initiative:

”Publishing high-value datasets that increase accountability and responsiveness improve public knowledge of the Department of Justice and our operations, create economic opportunity, and respond to need and demands of the public are a core component of our efforts to fulfill The Open Government Directive”

Data sets available include jail data for numerous years, antitrust cases, jail census data, law enforcement data, forensic unit funding, state and Federal correctional facility data, Chapter 7 filing, Freedom of Information filings, hate crime statistics, and prosecutor data.

14.  Many Eyes:

Donated data sets, combined with an information visualization application, creates real-time displays from an almost endless supply of data. Everything from average Canadian household expenses, to London’s air quality, to Kobe Bryant’s game scoring, and quite a bit in between. Also, the application is relatively simple to use, which means that any given data set can be visualized with little effort.

15.  Massachusetts Open Data Initiative, Data Catalog:

A huge repository of open data sets from the state of Massachusetts: economic, education, geography, health, population, public safety, and technology are all covered, as well as quite a few other subjects.

16.  National Cancer Institute:

Statistical tools and data: SEER data, SEER*Stat software, health disparities calculator, Medicare-linked database, and analytic software. Also includes a bank of statistical methods for cancer, cancer survival, and geographic information systems.

17.  National Center for Health Statistics:

“Welcome to the National Center for Health Statistics’ website, a rich source of information about America’s health. As the Nation’s principal health statistics agency, we compile statistical information to guide actions and policies to improve the health of our people. We are a unique public resource for health information – a critical element of public health and health policy.”Data covers: diseases, health care and coverage, injuries, life stages, populations, lifestyle factors, and more.

18.  New York City’s Open Data Initiative:

Open data sets for everything from subway data to open-access WiFi networks, park maps, SAT scores, and filming locations. Too much of a hodge-podge of data sets to really define – besides the key element that everything is related to New York, there is no strict boundary or catalog.

19.  Open Data Initiative:

“The Open Data Initiative is a Web 2.0 site for disseminating public data.”Includes visualize data sets for suburb safety, Australian criminology tracking, and the Saudi Arabian census. May bear further watching, or may be transitory.

20.  Open Government Data Initiative, The:

“The Open Government Data Initiative (OGDI) is an initiative led by Microsoft Public Sector Developer Evangelism team. OGDI uses the Windows Azure Platform to make it easier to publish and use a wide variety of public data from government agencies. OGDI is also a free, open source ‘starter kit’ with code that can be used to publish data on the Internet in a Web-friendly format with easy-to-use, open API’s. OGDI-based web API’s can be accessed from a variety of client technologies such as Silverlight, Flash, JavaScript, PHP, Python, Ruby, mapping web sites, etc.”

Hosted by Microsoft’s Cloud App servers, this data initiative displays visualized data sets and has a section for data developers as well.


Database and compendium of government regulations and laws.

22.  VitalStats:

VitalStats includes data sets for: births, deaths, perinatal mortality, and other public use data files related to vital statistics and their usage in the USA.

23.  World Bank Data:

This is the motherload of all data banks. Provides access to over 7,000 indicators for global statistics, including economic, health, education, and environmental; by country, year, and topic. Also has a microdata library.

This entry was posted in Digital Libraries, Education, Information Technology, Resource-a-rama, The Cloud and tagged , , , , , , , , , , . Bookmark the permalink.

1 Response to Free Online Public Data Sources: An Annotated Bibliography

  1. Pingback: Public Data: Where to Test Your Next Big Data App - Dice News

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s