Instructor: Brian Klinkenberg
Office: Room 209
Office Hours: Tues 12:30-1:30
Wed 12:00-1:00
Lab Help: Jose Aparicio
Office: Room 240D

Instructor: Brian Klinkenberg
Office: Room 209
Office Hours: Tues 12:30-1:30
Wed 12:00-1:00
Lab Help: Jose Aparicio
Office: Room 240D
In many law enforcement agencies in the United States, Canada, and much of the rest of the world (e.g., this site in the UK), GIS now plays an important role. In today's lab you'll be working with some crime data related to Ottawa (the Ottawa Police Department) and using the program CrimeStat to analyze that data. CrimeStat is a very comprehensive program, and in this lab you will be exploring only a very small part of its analytical capabilities. Since the program is free to download, and all of the manuals are available on the CrimeStat web site (along with some sample data), it is a program that you could continue to use in the years to come. Although it was developed for the analysis of crime data, even health and environmental data can be analysed using a program such as CrimeStat--as you should know by now.
In this lab you will consider the spatial distribution of various crimes that occurred between January 2005 and March 2006, and examine how adjusting statistics for baseline population differences can make a significant difference in the interpretation of the results. The data for this lab consists of:
Data sources: the crime data was extracted from files originally posted by the Ottawa Police Depatment. The DA data was derived from Statistics Canada (I took the DA polygons, found their centroids, and then assigned the attributes to the centroids). The other layers (roads, lansues, etc.) come from data provided by DMTI.
The crime data represents crimes that were committed (and obviously reported) in Ottawa between Jan 2005--Mar 2006. You will use ArcGIS to display the data, and to display the results of the CrimeStat analyses. You should have ArcGIS and CrimeStat up and running, and the shapefiles loaded. The data should be obtained using GetData (G479, Lab 4). Display the data in ArcGIS in order to get a sense of the spatial distribution of the criminal activities. However, since Windows 'lock's files and prevents a program from opening them if they have already been 'opened' by another program, you should remove the two B&E, the Stolen_vehicles, and the OttDA06_pts shapefiles from the list of layers prior to starting your CrimeStat analyses.
To start CrimeStat, open Window's Explorer, go to C:\CrimeStat, and double-click on Crimestat.exe.
The first component of your exploration of the data will be to perform some simple distance analyses--using nearest neighbour statistics to determine if the pattern of activities (e.g. car thefts) shows evidence of clustering (i.e., possibly a repeat offender is working in the area), or if they appear to be dispersed. Chapter 5 of the CrimeStat manual explains nearest neighbour statistics (among other distance analyses measures). You will then determine if the criminal activities display 'hot spots' using several of the routines implemented in CrimeStat--Chapters 6 and 7 provide an overview of these methods. Since the time of the crime was reported, you can also consider if the crimes show evidence of being clustered in both space and time. Finally, you will use what CrimeStat refers to as 'spatial modeling' to interpolate surfaces using kernel density estimation methods--Chapter 8 of the CrimeStat manual provides a very comprehensive discussion of density kernels and other spatial modeling techniques (such as journey-to-crime analyses). I will now lead you through the analytical procedures.
The first step in working with CrimeStat is to complete the Data Setup. For the first set of analyses, select BE_Commercial.shp as the Primary File (set the Type of file to Shapefiles), and set the X and Y variables appropriately. The Type of coordinate system should be Projected (Euclidean--the coordinate system associated with the data is the UTM PBCS), the Data Units should be Meters. Unfortunately, I have found that if you Save the Parameters (under Options) the program subsequently bombs--so, unfortunately, do not save or load the parameters.
You will need to set the Measurement Parameters in order to ensure that various statistics are calculated correctly. The Area of the Coverage is 3,339,405,576 Square meters, and the Direct Length of the street network is 8,055,537 m. (I obtained these values by summarizing the appropriate values from the shapefiles you are using in the lab.) NOTE: When entering these numbers into CrimeStat do not include the commas! If you do, the numbers will not be read properly.
Having established the input parameters for the program, you can now begin to calculate the crime statistics. In most instances you should Save the results to a file (in C:\Data\Lab4); CrimeStat normally puts a meaningful prefix onto the name you provide, so when using the BE_Commercial data I suggest you simply provide BeC as the filename (and when working with the residential data set use BeR as the filename; use Car for the Stolen vehicles data).
You will now conduct an analysis of the nearest neighbour statistics. Click on Distance Analysis I (under Spatial Description), and select Nearest Neighbor Analysis. Select a Rectangular border correction and Set the number of nearest neighbors to be computed to 25. Save the Output to a DBF file (call it BeC)--the results can then be imported into Excel and plotted (plot the index, not the distances). Note that the file that CrimeStats creates will be called NnaBeC.dbf. You can make some interesting discoveries by examining the nearest neighbour index over several orders of neighbours (i.e., computing it for the 25 nearest neighbours and plotting the index values--the CrimeStat manual has some interesting examples of what this type of analysis can reveal). Print out the results. NOTE: You must have entered in the Measurement Parameters (above) before calculating the nearest neighbour statistics--if you didn't enter the value, your results will be incorrect.
Change the Primary file to Stolen_Vehicles (setting the File source for the appropriate X and Y values), and rerun the nearest neighbor analysis, setting the output filename to Car.
Change the Primary file to BE_Residential (making sure to change the File sources for the X and Y values). In addition to setting the parameters as described above for BE_Commercial, set the Time to Time (check to ensure that the File source is BE_Residential) and the Time Unit to hours. Re-run the Nearest Neighbor Analysis (don't forget to rename the output file to BeR). Print out the results. In Excel, combine the three sets of output so that you can produce a single histogram showing the three sets of indices (one showing the NNA results for the residential break-ins, one showing the commercial break-ins, and one showing the car thefts).
Are the 1st order B & E's and car thefts more or less spatially aggregated than expected? Does the index change as a function of the type of crime (residential vs commercial vs car theft) and the nearest neighbour order, and what does that tell you about the spatial distribution of the crimes?
Once you have finished these analyses uncheck Nearest Neighbor Analysis (if you fail to do this CrimeStat will bomb when you attempt the next set of analyses).
You should now set the Secondary File to OttDA06_pts; in this window, in addition to setting the X and Y variables, set pop15 as the Z (Intensity) variable. Pop15 represents the total population (15 years old and above) in a dissemination area, and will be used to adjust the criminal activities to reflect the underlying population density (assuming that most crimes are committed by and affect people 15 years or older). For the remaining analyses, use only BE_Residential as the primary file.
Click on Spatial description / 'Hot Spot' Analysis I, and Fuzzy Mode. Set the radius to 500 meters. Save the results--you'll notice, however, that the only choice of file output type is 'dbf' (regardless, do save the results). You can display the results in ArcGIS, but you will need to first add the dbf file as a table, and then use the Add XY Data Tool to create a shapefile. Create a quantitative map showing the Frequency of criminal activities (derived using a 'fuzzy' distance of 500 meters). Examine the map to determine where the 'hot spots' are. Given that most of the crimes occur in the central part of the city, I suggest you zoom into that area when viewing the results. When producing your final maps (to be handed in), also zoom into the central area so that the results are clearly displayed (that is, your maps need not / should not show the entire Ottawa-Nepean area).
A related approach is the nearest neighbor hierarchical spatial clustering technique--this technique determines if there are any clusters (i.e., hot spots) in the data, but does so using statistical criteria. You should compare the clusters identified using this technique to those you visually identified using the fuzzy mode results (the maps saved will have a name such as Nnh1BeR.shp, Nnh2BeR.shp, etc.). The '1' refers to first-order clusters (i.e., the method identified individual criminal activities that clustered together), the '2' refers to second-order clusters (i.e., the method identified clusters of groups of criminal activities identified in the Nnh1 results--it clusters the clusters!), etc. Ensure you save the results and also save the ellipses (to a shapefile). Use a fixed distance of 500 m (so that the results are comparable to the fuzzy mode results), the minimum number of points should be left at 10, the output units to kilometers (and accept the defaults for the other parameters). You don't need to save the Convex Hull. Don't forget to 'uncheck' the previous analysis technique before moving onto the next step. (Note that in some cases only a first-order cluster is identified, while in other cases second-order, etc. clusters are identified--the number of clusters is dependent on the underlying statistical distribution of the data points.)
The identification of clusters using absolute data (i.e., the locations of the criminal activities in and of themselves) can produce misleading impressions since the underling conditions aren't known (e.g., there are far more people living in the some neighbourhoods [e.g., the West End of Vancouver] than in others [e.g., Shaughnessy], so the relative risk to a person in one area may, in fact, be much less than for a person living in another). In order to account for such differences (i.e., to produce a relative-risk map), we can use a secondary surface to adjust the probabilities accordingly. The number of people 15 years and older per enumeration area is a reasonable secondary surface to use. Recompute the nearest neighbour hierarchical spatial clustering results, but this time make the analyses Risk-adjusted. Set the unit to kilometers, and set the Risk Parameters to: use intensity variable, normal, adaptive, 100, and Output unit to square kilometers. Compute the risk-adjusted clusters, and compare these results to those of the standard nearest neighbour hierarchical spatial cluster analysis.
As noted above, the time that the crime was reported was recorded, and we can use that information to determine if there is both spatial and temporal clustering in the residential B & E's. (Ensure that you have selected Time as the Time variable and set the Time Unit to hours for the Primary File BE_Residential.) Making sure that all of previously selected techniques have been unselected, go to Spatial Modeling / Space-time Analysis. Select the Knox Index, set the Closeness method to custom ("Close" time: 8 hours; "Close" distance: 2500 m) and the number of Simulation runs to 19. Click on Compute. You should Print out the results once the analysis is completed, since that is the only record produced. (If you Save the results to a text file you can include them as an Appendix when you hand in your lab.) Using the distribution of the simulated index you can determine if the observed Chi-square value [the one associated with the actual crimes] is significant or not (i.e., where does it lie within the range of values produced during the simulations?). Read through the help file and Chapter 9 of the Crime Stat manual in order to understand what the Knox index is about. Don't forget to 'uncheck' the analysis technique before moving onto the next step.
You will now need to establish the parameters for a Reference File that will be used in the kernel density estimate component of this lab. You will Create a (vector) Grid with a Cell Spacing of 250 (m), the (X,Y) coordinates for the lower left / upper right are:
X |
Y |
|
| Lower Left | 394000 |
5049000 |
| Upper Right | 494000 |
4979000 |
For the final component of this lab, you will perform kernel density estimation interpolation (under Spatial Modeling / Interpolation I), producing both a single and a dual surface estimation. For the single kernel density estimate, accept most of the defaults (which should set the B & E residential file as the Primary File) but change the Method of Interpolation to Triangular (Minimum sample size:100), the area units to points per square kilometers, and select Probabilities rather than Absolute Densities as the output units. (Read over Chapter 8 of the CrimeStat manual in order to understand what the output units are--briefly, the map will show the likelihood that an incident [Residential B & E] would occur at any one location.) Save your results to a shapefile.
You will also be computing a dual surface [a relative-risk surface]--the Secondary Surface will be the OttDA06_pts file using pop15 as the Z (Intensity) variable. (This should be the default since you previously specified OttDA06_pts as the secondary surface--it is the Second file: listed.) Change the defaults for the Dual kernel density estimate to match those of the single kernel density estimate. Set the Output units to Ratio of densities. Make sure that you save the results (as always, CrimeStat will supply a meaningful prefix to your filename, so just use something like BeR as the filename). You should compare the two surfaces, and report on the different perspectives they provide on crime incidents in Ottawa.
Note that the results of the kernel density estimates are not a raster grid file (as you might expect), but a vector file of grid cells! In order to display that 'surface' properly you need to remove the outline from the quantitative legend symbols [set it to 0]. A reminder: the 'grid cell' sizes and their coordinates were specified earlier, when you entered the parameters for the Reference File. Important! You will also have to set the sample size (in ArcMap) to a larger number in order to accommodate the size of the vector grid (Classify - Sampling...., add two 0's to the default number and click on Apply). I suggest you use a Geometric Interval as the classification type, and somewhere between 5 - 10 classes.
Note: since the kernel density surfaces are vector files, you can and should clip them to the study area boundaries (Ottawa-Nepean).
To hand in: 1) Answers to the questions posed above with respect to the nearest neighbour index, along with a plot showing the indices (not the distances). 2) A two paragraph discussion on the comparison between the fuzzy mode (visual) clusters and the nearest neighbor hierarchical spatial clustering results, followed by 3) a two paragraph discussion on the differences between the standard nearest neighbor hierarchical spatial clustering results and the risk-adjusted results. 4) A two paragraph discussion of the Knox index. Finally, 5) a discussion on the results of the kernel density estimation (and how they relate to the fuzzy mode and clustering results). You should hand in several maps showing the results of the fuzzy / spatial clustering analyses (overlay the nearest neighbour ellipses onto the fuzzy mode results), and the kernel density analyses. You should also include, as an appendix, the results of the Knox spate-time analysis. (All text should be typed, double-spaced.)
The lab results are due on Wednesday, March 6th.
Below is an example of a clipped dual kernel map that I produced. I used the Ottawa-Nepean layer to clip the dual kernel map. I then added the waterbody class from the land use layer (OttLU), along with the highways (Hwy) and the major roads (Main_rds) and the Ottawa_Nepean layer as an outline (with a line weight of 2). I also added the residential break and enter crimes (using a black symbol of size 3).

Adding a background map service
In order to provide some additional context for your map you can add an ESRI map service (e.g., adding a satellite image as a basemap so that you can get a better sense of the Ottawa-Nepean area and where the crimes occur). To add the map service:
Note that you should have at least one projected map layer open before adding a GIS service, otherwise the service doesn't know what projection you are using, and the map services will not align with your map layers.