I spend a considerable portion of my time convincing researchers of the benefits associated with publishing their data online in open repositories. Bringing up things like reproducibility of research and the idea of others using their original data sets to advance scholarship in their field or another are my usual selling points. Academics produce vast amounts of data that has value well beyond the scope of their original project. That being said, government agencies produce endless amounts of data and information as they conduct their day to day business. There are obvious products that have mounds of useful information in them, like the U.S. Census or the American Community Survey. Governments rely on information in all sorts of formats to perform countless tasks on a day to day basis. For example, many local governments rely on spatial data of their infrastructure (roads, sewers, power lines) to set maintenance schedules or to select an ideal space for new residential development.
This semester, FSU became the newest consortial member of Atlanta’s Census Research Data Center. Funded primarily by the College of Social Sciences and the Office of Research, the Florida State community can now use Census micro-data without paying lab fees, which can range upwards of $15,000 per project. There are currently 18 Census Research Data Centers in the United States, and outside of North Carolina’s Research Triangle the only one located in the southeastern United States is The Federal Reserve Bank of Atlanta.
So, what is a Census Research Data Center? The Center for Economic Studies defines Census Research Data Centers (RDCs) as U.S. Census Bureau facilities, staffed by a Census Bureau employee, which meet all physical and computer security requirements for access to restricted–use data. At RDCs, qualified researchers with approved projects receive restricted access to selected non–public Census Bureau data files.
To understand the true value of doing research with non-public data from the RDC, it’s important to note the difference between micro data and macro data, which is often referred to as aggregate data. When most of us use datasets for research or analysis, we’re looking at summary figures. For example, if you extract Census data for analysis, you’re typically looking at some sort of summary or aggregation for a specific geographic unit. These geographic units range from state, county, city as well as much smaller units such as census tracts and block groups. Regardless of unit of analysis, the data itself is a summarization of individual survey responses for participants in that specific area.
Data science is all the rage lately. Harvard Business Review even named it the sexiest job of the 21st century. Even though the term is rapidly gaining mind share, many are still confused about what data science actually is. When you cut through the hype, the core of data science is actually pretty simple: it’s the study of data. What kind of data is being studied, how it is being studied, and what the individual data scientist is looking for all depend on the specific case. Data science is just another field of study using digital methods, putting it firmly under the umbrella of Digital Scholarship.