Originally posted January 11, 2014 on "Cotidianity"
In a much trumpeted affair in 2011, NYC opened a website with a lots of data sets (currently "1100+" according to the website) from government agencies and other public service providers ranging from the number of social media followers of government websites to the locations of acute health care hospitals, the dimensions of transportation facilities, school test scores, etc. across NYC. These are divided into 10 basic categories: Business, City Government, Education, Environment, Health, Housing & Development, Public Safety, Recreation, Social Services, and Transportation. Clearly, there are many different types of data that could open up new and helpful findings about city life. Although a little late into the game, we at the Center for Collective Computation & Complexity thought to look at this data to see what sorts of interesting things we could find and potential questions we could investigate about the dynamics underlying this complex system.
|Categories of data on NYC OpenData.|
Unfortunately, the initial descriptions of this database are misleading. Although I mentioned that the database contains 1100+ data sets, a search for "taxi" in the Transportation category, four out of ten data sets that appear on the first search page are "no longer being updated" in the description. There is no explicit mention of a historical archive of data sets and the titles do not suggest that sort of categorization. Three of these that share the same description--"This list contains the most recently published active medallion drivers list"--are titled differently as "JJ", "jEFFY," and "Medallion Drivers - Active," showing a lack of centralization. In fact, the first two data sets were created and uploaded by the same user "Jeffy" and now clog up the search results. Some data sets are in unhelpful formats like Microsoft Access Database (.mdb) and others are missing data points or have errors in transcription.
Overall, however, the website is likely to be a good source of new research and otherwise fun! Anyone can download from it! A lot of the data sets are in useful formats (not to mention machine-readable in the first place) and the website provides a useful browser-based tool for mapping the data or producing charts with selected data points. NYC OpenData also provides an API for using with the data and allows for querying the data (presumably for mobile app developers).
One of the data sets that stuck out to me by chance was a list of parking tickets: "A listing of all NYC parking tickets sorted by summons number, license plate, issue date, violation, fine amount, and other categories." Though the latest update is April 30, 2013, it only contains data from the month of March 2010. I thought I would write about some of the cool patterns we can find in this data set.
First, a few facts about this data set. There are 872,371 entries, which means on average one tickets was handed out every three seconds. I show the breakdown of these tickets by state below. Not all of these tickets have license plates from a US state and some are entered incorrectly or the particular row of the data file is problematic. These are a small fraction of ~1% of the data and so we can pretty much ignore them ("Other" in the graph). The vast majority (77%) of tickets are NY license plates. Unsurprisingly, NJ comes in second with about 10%, and PA follows with about 3%.
We also know on which days tickets were handed out and so we can show them by day as below. We see clearly the weekly pulse starting with a Monday on the first. We see that many more tickets are handed out during the week with a sharp drop delineating weekends. The average number of tickets handed out on the most problematic weekday (Thursday, 36,662 tickets) is 4.6x more tickets than that handed out on the average Sunday (7,980 tickets). This, of course, because parking is enforced differently on Sunday than the rest of the week (as Anh-Chi helpfully pointed out).
Although the time period is short, we can see the impact of holidays and natural disasters on the number of tickets assigned. Since it looks like a weekly cycle is important, I show the number of tickets handed out on each day of the week by week below and on the right and point out some of the important days in this month. For most of the days, we find a rather steady number of tickets throughout the course of the month with a general decrease over the course of the month for some days. Mondays stand out in that tickets drop by 30% over the course of the month. This observation belies the common wisdom that parking enforcers increase tickets offenses near the end of the month to reach quotas. I haven't looked into the enforcement system in NYC, so I don't know if that schedules holds for NYC. Otherwise, this seems like a weird phenomenon, and I'm curious to know why this is the case...
|Parking tickets organized by day of the week.|
We see that some holidays have a depressing effect on the number of tickets handed out. One might consider Friday an almost-holiday and it turns out that less tickets appear on Friday compared to Tuesday and Thursday, but not to Monday or Wednesday. Instead of being the "hump" day, Wednesday actually is surrounded by peaks on Tuesday and Thursday. Although not a hump here, it could be that people just make less effort to drive or parking enforcers are less assiduous on Wednesday, something that would appear on other data sets.
The three holidays (or special days) are St. Patrick's Day, Passover, and Palm Sunday. St. Patrick's Day is the lowest number of tickets given out on a Wednesday, but it is not really clear whether that outlier is special given that we only have 5 data points, but it's rather far out (1.3 standard deviations), and we might expect that people are too busy reveling to leave home or more optimistically that less people are driving in the evening after celebrating (or more cynically that parking enforcers are reveling on their job). Palm Sunday is notably normal compared to the other Sundays. Are Christians not especially motivated by Palm Sunday? If we had one more week, we would be able to compare this day to Easter, which is certainly an important Christian day.
|A multiplier effect?|
Passover, on the other hand, is most clearly an outlier, reflecting the impact of the large Jewish population in NYC. The UJA Federation of NY estimates that there were 1.538 million Jews in 2011 in the eight counties in and around NYC Bronx, Brooklyn/Kings, Manhattan/New York, Queens, Staten Island/Richmond, Nassau, Suffolk, and Westchester. Combining US Census estimates for those counties in 2010, we get a total population of nearly 12 million. The Jewish population is large, accounting for about 7.8% of the total, but how could that they account for a 41% drop in the number of traffic tickets! What is going on?
Assuming that people more or less drive the same in on Passover and parking enforcers are doing their job equally well, this disparity might be that more Jewish as a percentage of that group than in the rest of the population, and so that if most Jews don't drive on Passover we get a large drop. Another possible explanation is that NYC streets are near capacity such that a small drop in the number of cars means that many less people are stuck in situations that could land them a parking ticket. We've all been in situations where we have to circle repeatedly around a block to search for a parking spot. What if those parking spots were opening up at just the right rate? Then, there would be less cars circling, less congestion, less changes of collision in a self-reinforcing process. I think that this latter explanation would be particularly interesting and have implications for NYC transportation design, but I don't think that is answerable with this data set.
Another interesting data point the impact of a large rainstorm, windstorm, Nor'easter that hit NYC on March 13. There is a large drop in the number of tickets (30% or ~6,500 tickets compared to the average of the other three Saturdays), but it seems that things recover quite quickly afterwards. I wonder what this looked like after Hurricane Sandy.
There are a number of explanations why parking enforcement might fluctuate--maybe because there are fewer people driving out, less parking enforcers, they decide to be lenient for the day, etc.--but it seems that even with a quick look that it can be informative about other processes occurring in the city.Back to top