Traffic intensity in Medellín

Air quality is a concern in Medellín - and in the wider Metropolitan Area of the Aburrá Valley. High concentrations of particulate matter (PM10 and PM2.5) are a chief concern, but so are the levels of nitrogen oxides (NOx), sulfur oxides (SOx), and ozone (O3). The main cause, as usual, is human activity. Fossil fuel combustion is the primary driver of most of these emissions.

Having experienced poor air quality in our first week in Medellín, we decided that air quality should be one of the factors taken into account when deciding where to live. The question is: how to do this? After an initial scoping exercise of existing literature and data, it seems that fine-grained (neighborhood-level) insights are hard to come by.

The valley's early warning system (Sistema de Alerta Temprana de Medellín y el Valle de Aburrá) lists a total of 32 air quality monitors. However, only six of them are within our area of interest and these are so spread out that it is difficult to draw useful conclusions from their historic record.

While we further investigate air quality data availability, I decided to engage in a somewhat crude but hopefully informative exercise around traffic intensity. With the high vehicle intensity in Medellín, and the strong link between pollutants and combustion, it seems useful to get a better idea of road and traffic intensity. Even if pollution might travel, it seems sensible to live in an area where the total number of vehicles driving around is relatively low.

Analyzing Google Maps and Google Traffic data

Google Maps seemed to be a useful starting point. Through the API, a map can be obtained that outlines the road network and traffic congestion. I focused the analysis on a 14x7 km area (about 100 km2). Within this area, I obtained the road network and congestion flows throughout a day (in this case a Monday). I did this at a relatively high zoom level, resulting in images measuring 6000x3000 pixels.

The next step was to create a grid with smaller areas. I chose a grid of 700 x 700 meters, resulting in 200 tiles from this image.

Each of these images contains information about the road network. By default, Google Maps will hide roads that are too small or where there is little traffic. That is ideal for my purposes, as those will have little impact on air quality. Furthermore, larger roads with multiple lanes are drawn with thicker lines. Complicated traffic infrastructures, such as traffic circles, will have additional lines. In other words, all the non-white pixels on this map imply traffic.

The first exercise was to try and map this road network - not yet focusing on the congestion differences. I did this by doing a pixel count. How many non-white pixels are there in each tile? This should be a decent representation of how many (somewhat significant) roads there are in that area. After doing this count, I added an overlay to the original map, showing the pixel count as a color gradient (the darker the tile, the higher the road pixel count). This resulted in the following map.

The next step was to take into account the congestion that is present at different times of the day. I decided to look at the situation at 8 AM and at 5.30 PM, which seem to be rush hour moments. In order to "weigh" congestion colors, I first had a look at the different colors present in the image. For this kind of work, I used the never-disappointing image tool ImageMagick -- in this case by extracting a histogram of each tile which could be easily done through the Wand python package.

There were generally 11 different colors present in each tile (many of which were slightly different hues of the main colors used for the roads, such as green or red). All colors were mapped to what I believe was their primary color, and then grouped by congestion intensity (with green being not congested, orange being somewhat congested, and red being highly congested). Each of these congestion levels was given a multiplication factor, with green as a baseline having a congestion factor of 1, orange 2, and red 3. This means that red roads count 3 times as much to the final score of each tile than green roads. The only remaining step was to multiply the different pixel counts for each tile, and generate a new map, showing congestion counts, tile by tile.

The results can be seen in the images below.

So there we have it. An interesting initial picture! Some very broad impressions based on these images:

Now, this is far from a solid analysis but it does provide a first glance at some of the traffic patterns in the city, which might be linked to the city's air pollution. At a very minimum, it's given me some better insights into the major roads and the region's urban form as shaped by roads. Let's see where we can go from here...

Back to overview