Our Data
Collection process
The data for this project was collected throughout the first eight weeks of the Fall 2023 semester by ourselves as well as other students in the Data Storytelling course at the University of Kansas. Each student was tasked with visiting and recording data on student parking lots – at KU, these lots are designated with yellow signage – at least two times per week. Students could visit the same parking lot on both occasions or two different ones.
Our group decided to visit Lot 61 on Wednesdays solely for the latter half of the data collection period. This decision was made because we wanted to have enough data points about a particular lot on a particular day to be able to distinguish if there was a pattern and if this pattern mirrored a pattern across all parking lots on all days of the week.
Data structure
Our data was housed in a shared Excel sheet that we accessed using the desktop and mobile services available by Microsoft. Our spreadsheet contained several pages of data split into different tabs - each labeled - to allow us to make transformations on one tab but keep the original dataset intact on another tab. Each tab began with the same dataset, structured in the same way.
The original dataset contained 23 columns and 585 rows. After our data cleaning transformations, which will be explained in detail later in this reflection, our dataset contains 14 columns and 574 columns. Each column is labeled succinctly, no labels exceeding more than six words.
Data transformations
After the eight-week period of data collection concluded, our group began the process of cleaning the data – getting rid of any data points that were not relevant to our research.
The first step of this process was to remove any data collections that included data points that exceeded the possibilities. This included collections that claimed the number of spots filled exceeded the total number of spots available in that particular lot. We did this not because we believed any of our classmates were maliciously attempting to skew the data, but because we wanted to avoid any potential errors in our insights as a result.
We then decided to look at data only related to parking on Wednesdays at Lot 61 (Sunnyside & Illinois), a common parking lot for students as it is one of the larger on campus and close to one of the main libraries.
To only look at this data we utilized the “Filter” feature on Excel to show us data from this parking lot (Lot 61) on a particular day (Wednesday). We then deleted any columns with information not vital to our research. This included data collected about the number of vehicles with out-of-state license plates, the number of vehicles with at least one tire on a barrier line, and if class was in session. We determined that this data was not relevant to our research because we wanted to look solely at how time of day and weather impacted lot capacity.
After removing columns of data irrelevant to our insights, we added additional columns that would allow us to recognize patterns within the data sight and create insights. These additional columns were:
- Hour of the day that data was collected
- The exact time of the collection was already available to us, but we wanted to create a new column with just the hour of the day it was collected to make it easier for us to derive insights
- A calculation of the percent lot capacity
- The formula for this column was: =((# of spots filled/# of spots available)*100)
- A calculation of the number of spots available when the data was collected
- The formula for this column was: =(# of spots in lot - # of cars parked in the lot)
- The temperature at the time the data was collected by the 10s
- The exact temperature and feel-like temperature were already available to us, but we wanted to create a new column with the temperature broken into 10-degree increments to make it easier for us to derive insights
Originally, we were only going to transform and analyze data regarding Lot 61 for our final project, but after discussions with one another and our professor - Chris Etheridge - we determined it would be best if we also analyzed the broader dataset. We came to this conclusion because we wanted to have an understanding of broader trends to compare the patterns we recognized in Lot 61 to them. This change challenged our ability to think and work flexibly as it was made in the final three weeks of our project.
After deciding to also work with the broader dataset, we applied the same changes we did to our dataset with only Lot 61 data points to the broader Excel sheet that included all of the data collected throughout the eight weeks. These changes included:
- Removing data points irrelevant to our end goal
- Number of vehicles with out-of-state license plates
- Number of vehicles with at least one tire on a barrier line
- If class was in session at the time the data was collected
- Adding new columns to allow us to visualize patterns within the data
- Hour of the day the day was collected
- A calculation of the percent lot capacity at the time the data was collected
- A calculation of the number of spots available in the parking lot when the data was collected
- The temperature at the time of collection by the 10s
After we completed the transformations on both of our datasets, we were ready to begin analyzing.
Data Analysis
Our analysis process first began with using Pivot Tables – a function available through Microsoft Excel. Our entire group was new to using pivot tables, so we watched videos as a part of the Data Storytelling curriculum, asked questions of our professor, and utilized Google for any other inquiries.
The first tables that we created were to calculate the average lot capacity by the hour and be able to quickly visualize how capacity changes throughout the day. We created a pivot table with the data from Lot 61 on Wednesdays only, as well as one with the data from all of the parking lots for all days of the week.
Having both of these tables allowed us to make comparisons between lot capacity at Lot 61 (Sunnyside & Illinois) and all KU student parking lots. Being able to make these comparisons allowed us to recognize patterns within the data.
After utilizing pivot tables within Microsoft Excel to recognize patterns within our dataset, we were able to create insights that reflected what we saw. These insights were:
- The average Yellow Lot capacity peaks around lunchtime each day, with the lowest capacities being early in the morning and late in the afternoon.
- People looking for parking for evening classes are going to have the easiest time finding spots. Only 5% of spots are filled after 5 p.m.
- From about 1 p.m. to 4 p.m., the lot capacity stays relatively the same, where two out of every three spots are filled.
- Lot capacity is highest when the feel-like temperature is in the 70s, peaking at 7 out of every 10 spots filled. This only drops slightly when the feel-like temperature is in the 90s.
- In the three times we observed parking lots when the feel-like temperature was below 40 degrees Fahrenheit, they were all less than 14% filled.
- When comparing temperature and time of day, the lots are most full between 10 a.m. and about 2 p.m. when the feel-like temperature rises above 70 degrees.
- As a general trend, our data shows that as rain increases, parking lot fullness drops dramatically.
After crafting our insights, we moved to work within Tableau and Flourish to create more visual illustrations that represented our data and insights.
As none of us had ever used either of these applications before, we watched online tutorials to familiarize ourselves with the applications. In addition to completing online tutorials, we had a virtual meeting with our professor where we also discussed different ways to use these tools.
In Tableau and Flourish, we created seven total visualizations, one for each insight. Kenna McNally used Tableau for three of our insights, while Lauren Danielski and Hailey Krumm completed the other four using Flourish. The process of creating these graphs was difficult at times but it encouraged us to be creative and push ourselves to create the best possible product.
Project Assembly
After collecting, organizing, analyzing, and visualizing our data it was time to create our final project. For this, we included the best of the data visualizations that we created using Tableau and Flourish and paired them with a standard-style news article that included interviews with students impacted by parking at the University of Kansas.
We are still working on the assembly of our project, so this section will include more information after we receive feedback on our rough draft and work toward creating our final draft.
Credits:
Kenna McNally - Data reflection
Lauren Danielski - Anecdote in data story and data story
Hailey Krumm - Data story