Welcome to the Seattle Pacific University guide for Data.
If you are hoping to dive right into finding a datasets or statistical information, please see the links on the left! The Data repositories and datasets linked here are our most frequently used, but many more exist. If you need help finding particular statistics or data, please reach out to your liaison librarian for assistance.
"Although widely used, the definition of data is difficult to pin down. Often, data are conceptualized as a collection of numbers, or strings of zeroes and ones, or symbols. However, this conceptualization is too restrictive because it excludes many ways in which people and organizations learn about the world. Instead, the definition here is adapted from Borgman’s definition of data, “Entities used as evidence of a phenomenon for a particular purpose,” which was formulated for data used in scholarly research (Borgman 2015).
"This definition has two implications of particular interest. One is that it includes a wide range of entities that could become data, including text and images. A second implication is that an entity only becomes data in relation to its use, which ensures we focus on the actions and intentions of those using the data, and the effects of use of the data when considering data ethics.
"The term big data is widely used and much hyped. Recent years have seen rapid technological developments that allow for production, collection, aggregation, processing, and analysis of datasets on an increasingly large scale. The news media, governments, and businesses all hail the potential benefits that arise from big data analytics."
Source: Peter Darch, "Data Ethics," Foundations of Information Ethics, edited by John T. F. Burgess and Emily J. M. Knox (American Library Association, 2019), 77.
Data vs. Statistics
Data are raw ingredients from which statistics are created. Statistics are useful when you just need a few numbers to support an argument (ex. In 2003, 98.2% of American households had a television set--from Statistical Abstract of the United States). Statistics are usually presented in tables. Statistical analysis can be performed on data to show relationships among the variables collected. Through secondary data analysis, many different researchers can re-use the same data set for different purposes.
Data Sets, Studies, and Series
In data archives like ICPSR, a data set or study is made up of the raw data file and any related files, usually the codebook and setup files. The codebook is your guide to making sense of the raw data. For survey data, the codebook usually contains the actual questionnaire and the values for the responses to each question. The setup files help will not display properly.
ICPSR uses the term series to describe collections of studies that have been repeated over time. For example, the National Health Interview Survey is conducted annually. In the ICPSR archive, you will find a description of the series that provides an overview. You will also find individual descriptions of each study (i.e. National Health Interview Survey, 2004). The study number in ICPSR refers to the individual survey.
Types of Data
Cross-Sectional describes data that are only collected once.
Time Series study the same variable over time. The National Health Interview Survey is an example of time series data because the questions generally remain the same over time, but the individual respondents vary.
Longitudinal Studies describe surveys that are conducted repeatedly, in which the same group of respondents are surveyed each time. This allows for examining changes over the life course. The Project on Human Development in Chicago Neighborhoods (PHDCN) Series contains a longitudinal component that tracks changes in the lives of individuals over time through interviews.
(Originally from Sue Erickson at Vanderbilt University)