On Data and Transportation. Q&A with Carlo Ratti

Q1. Are you involved in initiatives (Private, Public) in the area of Transportation where (Big) Data is used? or Can you refer us to initiatives in the area of Transportation that you are aware of, where (Big) Data is used?

Let me answer both questions together. Big Data means a better knowledge of the urban environment. In relation to mobility, let me share an example: At MIT Senseable City Lab, we analyzed all taxi trips that connect the City of New York in a given year. The project-called HubCab– gathered 170 million taxi trips by over 13,000 Medallion taxis in New York City, with GPS coordinates of all pickup and drop off points and corresponding times. We then created a mathematical model to determine the potential impact of ride sharing applied to such vast database. The project introduced the concept of “shareability networks” that allows for efficient modelling and optimization of the trip-sharing opportunities. Such an approach could lead to less traffic congestion, reduced operating costs and split fares, and to a less polluted environment. 

Q2. Do you know which Data Sources are typically used in these initiatives? Do you use any special software for managing data for Transportation?

Let me still refer to HubCab. The basis of the HubCab tool is a data set of over 170 million taxi trips of all 13,500 Medallion taxis in New York City in 2011.
The data set contains GPS coordinates of all pickup and drop off points and corresponding times. Other sources are data collected from cellphone, either in opportunistic ways (a-la Tom Tom) or with special apps (think about Waze). Regarding software, we usually develop our own. In the case of Hub Cab, cartographic data of street shapes were obtained from OpenStreetMap.

The streets were cut into over 200,000 street segments of 40m length each with a Python script and the help of the shapely Python library, and imported into a MongoDB.
Pickup and drop off points were matched to the closest street segments. Street types unlikely to contain taxi drop offs or pickups, such as footpaths, trunks, service roads, etc. were not used in the matching process. Line widths of yellow and blue street segments on low zoom levels were styled on a logarithmic scale. The pickup and drop off points, represented as dots on the high zoom levels, were generated via an Arcpy script, being placed randomly within a box around a given street segment with the box width again following a logarithmic scale.

GPX files of the dots were styled using Maperitive, then merged and amended for different zoom levels. The dots and street line files were layered together with MapBox, which is the platform that streams all the map content.
The data back end of HubCab runs on a MongoDB, containing all street segments and their coordinates, and all flows between each pair of street segments.
The number of all possible street segment pairs is over 40 billion (200,000 times 200,000) per map. Radius selection is dynamic, using MongoDB’s $near function to obtain flows from all segments within the radius of the pickup marker to all segments within the radius of the drop off marker. With nine maps (one for the yearly data, eight for 3-hour time segments on all Fridays/Saturdays) and three selectable radii, there is a total of over one trillion flow combinations that can be explored with HubCab. Communication between MongoDB and the front end is realized via PHP scripts and Javascript+JSONP.

Q4. Do you know what are the main Challenges in managing data for Transportation?

I would say the same issues that one encounters in all Big Data projects.


Carlo Ratti is an Architect and Engineer, founder of the design and innovation firm CRA-Carlo Ratti Associati and director of the MIT Senseable City lab.

You may also like...