Dataset Summaries: Interrogating Data, Structures, and Systems in Tabular Datasets with Pivot Tables
Module developed by Bob Gradeck and Liz Monk from the University of Pittsburgh Center for Social and Urban Research
Many people are familiar with using spreadsheets to create, manipulate, and carry-out mathematical operations on data. In this module, you’ll learn about how the pivot table functionality included in most spreadsheet software can also be used to quickly explore the underlying structure of tabular data, and gain insight into the underlying systems and processes used to create it.
This exercise will ask you to determine which Pittsburgh neighborhood has the largest number of food-producing trees. You will answer this question by downloading a dataset of street trees in Pittsburgh and using a spreadsheet to generate a pivot table. In the exercise below you will navigate to the Western Pennsylvania Regional Data Center’s open data portal, find the dataset of street trees, download this dataset and open it in Microsoft Excel, and use a pivot table to produce summary statistics on which neighborhood has the largest number of trees.
To complete this exercise you will need a computer with Microsoft Excel, an internet connection, and a sense of the types of trees that fruit squirrels find delicious. If you’ve never made a pivot table before, this exercise should take you between 10 and 15 minutes to complete.
Investigate New Dataset #1: 311 Non-Emergency Service Requests in the City of Pittsburgh.
Questions to explore:
- Do the type of requests included in this data suggest something about what data or system governance processes look like (so many nearly-identical categories!)?
- Are there differences by year or time of year in some requests – which are a few seasonal ones?
- What are the most-common requests? Are there differences in the most-common requests by neighborhood?
- Is the rate at which people make requests by phone changing over time, and which neighborhoods have the highest rate of requests submitted through the call center?
- Are there any requests that should concern the squirrel from “WATCH” – “Dead animals” and “dead trees” are two request types. There are others, too (hint- “dog”).
Investigate New Dataset #2: Allegheny County Dog License Data
Questions to explore:
- What are the most and least-common dog names? Which names are your favorite? Which are the most creative? How many puns do you see?
- Which breeds are most common?
- Are there differences between data in Fox Chapel (Zip Codes 15215 & 15238) and communities that aren’t as wealthy?
- How might you use this data and pivot tables if you wanted to start a business catering to dogs and dog owners?
- Creating frequency tables, crosstabs, and filtering data can provide insights about underlying data systems. Pivot tables can also be used to identify the kinds of things people that created the system found important or unimportant to include or capture, hard-coding organizational values in the structure of the data itself.
- Looking at classification systems and data quality through pivot tables can also reveal the use or lack of quality control and governance practices, and can be another way to explore what an organization finds important or values (thinking here about whether organizations collect data about race, ethnicity, gender, etc in a way that can be used to highlight inequities, and in a way that represents people in ways they want to be represented).
- Pivot tables are a quick way to explore your data. They work with a wide variety of tabular data formats. Any data that works with a spreadsheet can work with a pivot table. Many people have experience working with spreadsheets, making pivot tables a tool with a relatively low barrier to entry.
This work is licensed under a Creative Commons Attribution 4.0 International License.