EDI and High School Comparison Methods
The data for this part of the analysis was aggregated at the high school catchment area (HSCA) scale (which we refer to as neighbourhoods). The high school scores were contained in a point shape file. We spatially joined these points to the school catchment polygons –to create a choropleth of the results. EDI scores were stored as points at the enumeration area (EA) scale. HSCA polygons and EDI point data were spatially joined and EDI data was averaged for each ‘neighbourhood’ polygon .
To explore the data we also needed to create a shape file which contained EDI and high school scores as points. This was necessary to visually compare the data as both proportional symbols (relative to the score or ranking shown) and in the form of a choropleth base. To create these points we used the EDI scores averaged at the ‘neighbourhood’ level and joined them to the original high school points.
Mixed Income Neighbourhoods Methods
To investigate which neighbourhoods in Vancouver were more economically mixed, enumeration areas were clipped to high school catchment areas and statistics, average, minimum and maximum incomes were calculated. The difference between the maximum average income EA and the minimum average income EA was then calculated and a ranking of this difference was assigned to each high school catchment area. The high school catchment area with the largest difference between the maximum and minimum average income EAs was designated as the high school catchment area with the highest degree of mixing.
In order to incorporate EDI and GVRD census data into points for each EA the following steps were undertaken:
EDI scores for EAs were joined to the GVRD census data. Next, the centroid was calculated for each EA using a Python script. Finally, the EDI scores and census data were spatially joined with the appropriate centroid. The end result was point data for each EA with the EDI scores and census data as attributes.
In order to incorporate Fraser Institute Ranking scores and GVRD census data in to points for each HSCA the following steps were undertaken:
Fraser Institute Ranking data was joined to point data for each high school. Next, the centroid was calculated for each EA using a Python script and the EA census data was applied to its centroid. These EA points were spatially joined to the HSCA which they fell in. Finally, the HSCA were joined to the HSCA points resulting in a data point for each high school with Fraser Institute Ranking and GVRD census data for the HSCA.
GWR3 is a software program that executes a geographically weighted regression (GWR) allowing for the spatial analysis of the changing relationships between a dependent variable and independent variables. GWR were run on EDI and high school outcomes data using GWR3 software. Six independent socio-economic variables were selected that were thought to have the greatest influence on EDI and high school outcomes based on previous research. These included:
Three additional variables were examined initially for both EDI and high school scores but were only used in the final analysis for high schools . These additional variables were included for the GWR analysis of the high school outcome data as only with the inclusion of these variables were significant results found (see Limitations for further information). These included:
All values were taken either as a percentage of the total population or, in the case of average income, divided by 1000 in an effort to standardize the regression values. This means that all variables have a similar range allowing for the evaluation of the relative contribution of each variable to the predicted value.
The following variables are part of the GWR3 output:
Akaike Information Criterion
The Akaike Information Criterion (AIC) is a statistic used to select the optimal model. The GWR3 software outputs results from both a global and local model allowing for a comparison between the two. The model with the smaller AIC value is considered a better fit -- it will better predict the value of the dependent variable based on the independent variables provided. Typically a decrease of 3 for the AIC is taken to be significant (http://www.uwm.edu/People/danlinyu/GISDay_GWR.ppt#289,31,Tests).
Coefficient of Determination
The coefficient of determination is calculated by comparing the expected values from the models to the observed values at each data point. The GWR model is expected to increase to some extent due to an increase in degrees of freedom between the global and GWR models.
The Monte Carlo allows us to assess how "real" observed effects of geography are. The scores are randomly reassigned to each data point 99 times for each variable and a p-value is calculated. This value represents the number of times out of all the trials that the variance in values was less when each data point was associated with its true attributes (i.e. the number of times that geography "mattered"). A p-value ≤ 0.05 is considered significant. The intercept indicates whether the dependent variable varies geographically independently of the independent variables.