methodology and data


Though data for a country in wartime is always going to be limited and not necessarily accurate, we were surprisingly able to find a wealth of well organized and relatively complete data. As well, we were able to access maps on an open source website, Open Street Map, and Wikileaks released a wealth of data that we were able to base our project on. This section will outline our processes and sources of data.


data


        Given the timely nature of our project, gathering census and other data from Iraq has been problematic. In particular, finding data at the scale needed, by tract areas in Baghdad was not feasible at all. Unfortunately, the Iraqi Central Organization for Statistics and Information Technology (COSIT) has not published data for catchment areas smaller than governorate (of which Baghdad is one of eighteen. COSIT released a report in June 2009 establishing their intentions to conduct a “national census to pave the way for a new era of planning and policy intervention and micromanagement at the government level in Iraq.” The proposed date for the start of the census was October 2009, however, the data has yet to be published currently. The previous national census, conducted in 1997, excluded the three Kurdistan provinces, significantly skewing data. According to estimates, the Kurdish population is near 3 million people. The large delay between conducting national censuses had been argued to be political in nature. For instance, Khalid al-Ansary and Jim Loney  argue that “[e]thnic groups in contested areas like the northern city of Kirkuk, home to Arabs, Kurds, Turkmen and a valuable part of Iraq's oil fields, opposed [the census] because it might reveal demographics that would undermine political ambitions.” Security issues around conducting the census have also lead to delays, due to the hotly contested nature of the questions, namely those surrounding religion. Those collecting census data have to travel with security personal in certain areas which greatly oppose the census. It soon became evident to us that there were a host of reasons connected to the political and historical situation in Iraq that would prevent us from creating a highly quantitative based project, but would allow us to explore other qualitative methods in explaining and understanding the data that we were able to collect from open source data.

        We are extremely grateful for the open source data provided by Wikileaks, as this has been our primary data source. Without this data being available, it would be impossible for geographers and others such as ourselves, to produce maps autonomously and to find new creative ways of visualizing the data. This comprehensive data set provides data on the time, type, description, latitude and longitude, and means of each death recorded. The data is openly circulated  on the Guardian's “Data Blog” whose tag line is indeed fitting for our project: “facts are sacred.” Our base map was provided from Open Street Map, another open source data set.


methodology


        The goal of our project was to creatively map data from "Operation Iraqi Freedom" from the 2004 to 2009 in a way that is interactive and easily readable and to a wide range of viewers. We found it surprising that currently there are few easily accessible historical accounts of the war, given the surprisingly large amount of civilian causalities for a 'modern war.' With thanks to Wikileaks, there is relatively well recorded open data on the total causalities, however, to make sense of these numbers is a challenging task. In order to make this data more 'open' and available, we chose three main methods to display and organize the data: projecting the data onto a residential road buffer of 10 meters, using a time enabled animation, using a raster layer to show cumulative deaths, and comparing actual with predicted wounded civilian rasters. For all of our methods, we chose to focus on Baghdad, as this is where the highest concentration of death and violence is occurring, as well as where some of the highest population densities are.


        To begin our project, after downloading our data from Wikileaks, we searched for and removed or corrected the any 'gaps' or inconsistencies. We then made the format accessible to ArcGIS and were able to export the total deaths over the total time period into a shapefile. Using a background map from CloudMade, we were able to project the data using the latitude and longitude coordinates associated with each incident. In addition, we added other data from the GeoCommunity, which had some shapefiles of water bodies, however these were not projected and river shapes were not accurate, therefore this data had to be edited by hand to be correctly referenced to our background map. From this, we created this basemap of Baghdad, to be used for the rest of our project:



Residential Road Buffer:

        We were curious to find out how many of the deaths were occurring in residential areas, however, due to the limitations of procuring data from a war torn country, little background information on Baghdad was accessible to us. However, included in the attribute table of our background map was information on the road types, which we decided to assign a total 10 meter buffer - 10 meters from each side of the road - to get a closer look at how many deaths were occurring in explicitly residential areas.  The 10 meter buffer was used to allow for residential dwellings that most likely are located on either side of the road. Following the creation of our buffer, we selected only the deaths that had occurred in these areas.  These results are considerable, with significant, with 7,841 civilian deaths, 1,476 iraqi forces, and 132 coalition deaths (see below table)


        In order to better display our data, we created graduated symbol maps. These symbols were sized manually, to accommodate the varying amounts of deaths. In addition, one set of symbols would not have worked to show all four sets of data, as they range from 1-6 deaths to 1-250.  There was an effort to be consistent with visual sizing, therefore, all symbols are visually compatible. The variance in groupings are due to the differences created by using a ‘natural breaks’ (Jenks Method) classification. While it may be arguable that equal distribution classification would have been more accurate, because of such a huge number of incidences clustered around the same values (1 death, for example) natural breaks made our maps more readable, therefore giving us a better overall representation. The results are staggering, showing about sixty percent more civilian deaths than other deaths ('enemy', US coalition, and Iraqi Forces combined) in residential buffer zones.


Animations:

        Using our already organized and prepared data, we added a 'time and date' column to our attribute table in order to time enable our layers. This required us to go back into our original data set from Wikileaks and edit the data on dates and times due to some major inconsistencies in the reporting. While data was recorded down to the hour, we found that this data was one, likely problematic due to the nature of the subject, and two, incomprehensible to view on an map due to the large time frame. Following reformatting and organizing this data, we were able to 'time enable' our data, differentiating the data on each layers by one year increments.  Using the 'time slider' tool, we were able to record and export animations.


        For our animations, we chose a number of ways to display the data in order to make our series as comprehensive as possible. We created one series using points from the total deaths, and individual animations for each type of death. We then did the same process using proportional symbols, as while the points are effective in conveying a large volume of deaths, proportional symbols give the viewer a better sense of the concentrations of death, and is overall more legible. Given that both using points and proportional symbols have their unique advantages, we considered it important to use both. We also found it important to display our data with all the deaths on one map, to give the viewer a general historical portrayal of the deaths occurring. Using individual layouts for each death type allows the viewers to make a better comparisons between the number of each type of deaths. As well, the individual lay outs portray more information specific to each type of death, depending on what the viewer feels is most important.


Rasters:

        Finally, we thought it was important to create a surface layer that would better demonstrate where concentrations of civilian causalities and injuries were occurring in Baghdad as a general trend from 2004-2009.  We converted our vector data to raster, using a sum calculation. This allowed our data to be displayed cumulatively over 2004 to 2009. The raster maps differ from our animated maps, which show snap shots from each year. The raster surfaces are also important visualization tool for understanding the outcomes of the war as a sustained effort over many years. Using attribute data from GeoCommunity, we were able to display some of the schools in the Administrative District of Baghdad. This was important for us to display in order to better convey the lived realities of the citizens living in Baghdad during this time. We chose to display civilian deaths, as well as civilians wounded. Including information on civilians wounded was important as it was not included in our other map series, yet is also an important component in a comprehensive understanding of the conditions that civilians faced in Baghdad during 2004-2009.


        After comparing civilians wounded to civilians killed, we noticed an unusually low reporting of civilians wounded. By using the rastor calculator, we divided civilians wounded to civilians killed, to generate a map showing the ratio between the two reportings. Comparing this with Iraq Body Count's predictions of a mean ratio of 5 injuries to 1 death, we noticed a large margin of potential under reporting of civilians wounded. From this ratio, we made a predictive model of how many injuries should have been reported on this model based on the death counts. We were able to use the following formula to demonstrate potential under and over reporting of civilians wounded found in our data set.

Reported civilians wounded - Predicted civilians wounded
______________________________________________

Reported civilians wounded