Try our new website. Please send us your feedback.

 

Matching Census Data to Postal Codes using SPSS

The Canadian Census  does not use postal codes as a geographic unit.  If researchers want to match census data to postal codes, they need to use the Postal Code Conversion Files (PCCF).  The PCCF allows one to combine census data and postal codes in one file (see figure 1).

 

The most appropriate levels of census geographies to link to postal codes are dissemination areas [1] (DAs), which cover all of Canada, and census tracts (CTs), which occur in urban areas only.

An open SPSS table showing merged data files. It contains a postal code variable from the Postal Code Conversion File and two income variables from the census.

This guide shows you how to match census data to postal codes, and how to merge them in a SPSS file.  We will select income variables from the 2006 Census and PCCFs from Toronto.

Summary of  Steps:

  1. Obtain the census data from CHASS Census Analyser and save it as a .sav file
  2. Download postal code data from the the Map and Data Library Website
  3. Processing and subsetting the raw postal code data file
  4. Merge the two files in SPSS

 

Step 1) Obtain the census data from CHASS Census Analyser and save it as a .sav file

Go to the Data Library homepage: http://data.library.utoronto.ca/ and click the link CHASS Census Analyzer.

An open browser window containing Map and Data Library, data homepage. To access the census data click CHASS Census Analyzer in the Statistics box in the center of the page.

 

To access census data on the DA-level (Dissemination Area), click on the link Enumeration area/ Dissemination area.

An open browser window containing the CHASS Census Analyzer. Click the "Enumeration area/Dissemination area" link located under "Census Profiles Tables".

 

CHASS has divided up the census variables into subheadings so as to facilitate selecting the variables.  Since we will be selecting income variables from the 2006 census, we will click on Income and earnings and housing and shelter costs.

A browser window with the Profiles of Dissemination Area page. Click "Income and earnings and housing and shelter costs" located under 2006 cumulative.

 

You can now select the census data and location you are interested in.

In the Census Division window, select Toronto.  If you want to add other census divisions, press your Ctrl key and make further selections.

 

A browser widow with the subsetting selections for Income and earnings and housing and shelter costs and the 2006 census. For census divisions we selected Toronto.

 

More Subsetting categories. Under "Census Category" we selected: 1. Total income in 2005 of pop. 15 yrs and over. 2. Median Income $. 3. Average Income $. Under "Include with Result" make sure you select: "DAuid". This variable is needed to merge the files.

 

 

 

A Browser showing more subsetting options. Under "Select the output format" choose "SPSS". To create the file layout click the "Submit  Query" button located on the bottom of the page.

 

Once you have submitted your query, the data will be displayed in your browser:

 

  A browser with the results of our query. If you scroll to the bottom the variable labels you will need when formatting you SPSS file are displayed.

It is now time to download the data. Please note the instructions below work in the Firefox and Chrome browsers.

The same output as above. To save it as a file right click the page and click "Save As...".

 

                                                           The "Save As" dialog box. Ensure that the file name ends with the extension ".sav". This is the extension used for SPSS data. To Save the file click the "Save" button.                                                    

Now that you have downloaded your census data and saved it as a .sav file, you need to open it in SPSS and make minor adjustments to it.   Make sure to keep the browser window open so you can check  what data is represented in which column.

Go to the location where you have saved your file, and double-click on it.  Be patient, SPSS takes a few minutes to load.  Once it is ready, you will see the following Text Import Wizard window:

 

                                                                                                   Step 1 of the "Text Import Wizard" that opens when you double click the SPSS file. Click "Next" for the next step.

 

                                                                                                Step 2 of the Text Import Wizard. Click the "Next" button to move to the next step.

 

                                                                                              Step 3 of the Text Import Wizard. This step is important you need to tell SPSS which row the data starts on. You can get this information from the "Data Preview" pain on the bottom of the window. Input the number in the box following the sentence "The first case of data begins on which line number?". Click the "Next" button to move to the next step.

                                                                                                 Step 4 of the Text Import Wizard. Click the "Next" button to move to the next step.

             

                                                                Step 5 of the Text Import Wizard. In this step you will input the variables the names provided by the file in the CHASS browser. Click the variable you are changing in the "Data preview" pane and change the name in the box labelled "Variable  name:".  Click the "Next" button to move to the next step.

                                                                            Step 6 of the Text Import Wizard. Click the "Finish" button and your data will be displayed.

                                                                                             

An SPSS data table showing the Census data we downloaded from CHASS. There are empty variable columns. These can be cleaned up in variable view. Click the "Variable View" tab located at the bottom of the screen.

 

The Variable View. Each variable is represented by a row. Click the numbers corresponding to each variable you are removing and use Ctrl + click to select more than one variable. Remove the variables by right clicking and then selecting "Clear".
 

Click "file" in the top menu and then "Save As..." to save your file.

 

 

Step 2): Download postal code data from the the Map and Data Library Website

It is now time to get the Postal Code Conversion Files (PCCF).  Go back to the Data Library Homepage: http://data.library.utoronto.ca/

A Browser window containing the Map and Data Library, Data Home Page. Find the PCCF file by typing "pccf" in the search box on the right side of the screen and click the "Search" button.

 

Click the "Postal code conversion file (PCCF)" link provided on the following search results page.

 

On the "Postal code conversion file (PCCF)" page scroll down to find the PCCF edition that goes with your data. You may review the reference guides to determine the ideal combination. In this example we are using data from the 2006 census so, we will select the most recent PCCF edition that uses 2006 census geography.

Scroll down on the Postal Code Conversion File (PCCF) page. The links to the data will be provided in a table.

 

You will be asked to log on with your UTORid.

To access this data you will need to provide your UTORid. To do this click the link "log in with your UTORid via the University of Toronto Web Login".

Save the file onto your computer. The file will be zipped and you will need to extract it before use.

Once you have saved the filed onto your computer you will need to extract it before use. Find the file and right click it. Then click "Extract All..." in the resulting menu.

A window called Extract Compressed (Zipped) Folders will open. Click the "Extract" button on the bottom of the window and a folder containing the PCCF will be created.

The extracted folder should contain a raw data file with and a .txt extension and a spss syntax file with a .sps file extension. There may also be other files which could include a formatted spss data file with a .sav extension.

A window containing the extracted syntax and data files.

These files are ready to be processed and used.

 

Step 3) Processing and subsetting the raw data file.

The purpose of this step to get rid of variables and observations that will not be useful for your analysis. For example if you have data for Toronto you may not want gepographic records for the rest of Canada. Also the postal code data contains postal codes that have been retired. You can create a best match version with only active potal codes. If you want the complete data file including all geographic variables and retired postal codes you may use the already formatted SPSS data file with the .sav extension and skip this step.

Double click the SPSS syntax file to open it.

A window with the SPSS syntax file displayed.

 

Once the file is opened in SPSS you will need to insert the file pathway to where you saved the raw data file. This will tell SPSS where to find the data that you are trying to format.

The SPSS syntax file will open. Find a line near the top beginning with "FILE HANDLE DATA/ data-cke-saved-NAME=" NAME=" which will be followed by a file pathway in quotation marks. Replace the file pathway with the pathway to the text data file. Make sure to include the file name and the extension and it must be within the quotation marks.

You will also need to tell SPSS where to save the formatted data file. Scroll to the bottom of the screen.

Scroll to the bottom of the syntax file. Find a line beginning with "SAVE Outfile=" followed by a file pathway within quotation marks. Replace the existing pathway with the pathway to the location you want to save your formatted SPSS file. Remember to include the file name you want to give your file with a .sav extension and input it within the quotation marks.

Now we will subset the datafile to give us the observations that match the geography you need and to create a 'Best Match' file with only active postal codes. For this process we will use the 'Select if'  command. In the example below we inserted the line: Select if (cduid=3520 and sli=1).

The syntax file with the "Select if" subsetting argument inputted before the line beginning with "SAVE OUTFILE".

 

Now we will tell SPSS to keep only the variables we plan to use. For this we are using the 'KEEP' command. For a description of variable names used in the PCCF please see the Reference Guide. In the example below we inserted the line: /KEEP postcode cduid csdname dauid sli.

The syntax file with the "/KEEP" argument inputted directly under the line beginning with SAVE OUTFILE. Make sure to remove the period from the save outfile line and input it at the end of the /keep line.

We will now run the syntax file.

To run the syntax file click Run in the top menu and then click "All" from the drop down menu.

 

A formatted .sav will be created with the location  and name you identified. Open the file to check if your subsetting worked. 

An image of the formatted data file open in SPSS. Check your variables and observations to make sure that the file contains what you need.

 

Step 4): Merging of the two files

Click on Data -> Merge Files –> Add Variables 

The PCCF file opened in SPSS with the cursor selecting "data" then "Merge Files" then "Add Variables".

 

Use the browse feature to locate the data you are merging with the PCCF and click continue.

The add variables window.

The next step of the "Add Variables" window. In this window you will assign the DAuid variable to be the key variable to match the two datasets. To achieve this follow instructions A, B and C. A) Click DAuid in located under "Excluded Variables". B) Click the Check box "Match cases on key variables". Make sure the button for "Non-active dataset is keyed table" is selected. C) Click the arrow to transfer DAuid to the box "Key Variables:"

Click OK and your merged file will be displayed: 

Click the "OK" button and the merged files will be displayed.

 

 

The merged datasets. Check the file to confirm that the variables from each dataset are represented. The original census dataset contained one row for each Dissemination Area (DA). As a DA contains more than one postal codes these rows are duplicated when the files are merged.

 

 

-------------------------------------------------------------------------------- [1] Prior to 2001, DAs were called enumeration areas.