top of page

The purpose of this project is to utilize Exploratory Data Analysis (EDA) in combination with the choropleth technique to map the distribution of individuals without post-secondary education (NoEdr) in Northern Nova Scotia.  The goal is to observe how EDA can assist in properly determining the most appropriate choropleth method for a particular dataset. The best classification method is determined based on the frequency and area distribution from the various schemes. 

Project Overview

// Choropleth Mapping

 

Exploratory Data Analysis

Kolmogorov-Smirnov test for normality with an array of variables from the 2013 Canadian census dataset. As can be seen in Figure 1, only one of the variables, NoEdr, was proven to have a significance value greater than 0.05. This result indicates that there is a normal distribution of individuals without post-secondary education in Northern Nova Scotia. This variable was therefore selected for mapping with various choropleth methods.

A second Kolmogorov-Smirnov test was conducted, but for uniformity of the tested variables. No variables were significantly distributed, and therefore the lowest Kolmogorov-Smirnov Z  value was the ‘most uniform’. Interestingly, NoEdr was the most uniform of the tested variables. This could possibly be due to the other variables having linear or exponential distributions, therefore making a normally distributed dataset the ‘most uniform’.

Mapped: Four Classification Schemes

Equal Intervals

Standard Deviation

Natural Breaks

Quantiles

A bar chart, table, and maps showing the distribution of the chosen variable based on various classification schemes were created (right). It was determined from the Standard Deviation method that the optimal number of breaks would be six, so this number was used in each of the methods. The total area occupied by the classified polygons was calculated, and graphed.

 

By reviewing the area distribution of the six classes across the study site from the 4 choropleth methods, the best representation of the data was deemed to be the Equal Interval method. Equal Interval classification proved to generate the most normal distribution of polygon area for the six classes. Extreme values on both ends are very small in comparison to classes closer to the center.

 

Although other classification methods produced somewhat normal distributions, they were not normal enough to be nearly as effective as Equal Intervals. The Quantile method produced a linear distribution, and both Standard Deviations and Natural Breaks only had an extreme value on one end of the spectrum. By reviewing their accompanying maps, it is evident that too much area has been classified as extreme values (red and green), with a lack of average values being represented (yellow). Only the Equal Interval method classified a relatively low number of extreme values on both ends, and therefore much more aptly reflects the nature of the normally distributed data.

A correlation matrix was created in SPSS post-classification with the aggregated areas and frequencies from each class. The table to the left displays results of comparing the aggregated areas from each of the classification methods. As SPSS did not demarcate any of the correlations to be significant, the ‘most significant’ relationship is used to determine the second best classification method. Although Natural Breaks wasn’t quite normally distributed, it was the closest to being so with a Pearson Correlation of 0.767. This indicates that the Natural Breaks method had the greatest similarity to the best method (Equal Intervals).

 

 In comparison, the least similar method to Equal Intervals was the Quantile classification. As mentioned earlier, the Quantile method produced a linear distribution evidenced from the created bar chart and table above. The Pearson Correlation between Equal Intervals and Quantiles was -0.298, indicating a strong dissimilarity between the best method, and the worst method.

Sean Thibert, 2015

Map Datum: WGS 1984     Data Source: 2013 Canadian Census (Esri Canada Business Analyst)

Disclaimer: All maps were created for a student project and are for educational purposes only.

bottom of page