Proc Import [3-7]

Importing a CSV File

Proc Import can also be used to import a CSV file.

Let's look at the CSV_Practice_01.csv file.

This is a very simple CSV file with just three columns:

  • Date
  • Alert Count
  • Comments

It is not difficult to import this file. The code is below:

Importing an Excel spreadsheet is similar to importing a CSV file; there are only two things you need to change:

  1. The file name and the extension (.csv)
  2. The DBMS option is specified as CSV

Run the code in SAS Studio, and you will have imported the file:

GUESSINGROWS Statement

When importing a CSV file, SAS uses the first 20 observations to determine the attributes of the variables.

In our previous example, the first 20 observations of the alert count column were numeric values. 

Therefore, SAS assumes the rest of the columns are also numeric. 

This could cause issues when later on there are values that are not numeric.

Let's look at the CSV_Practice_02.csv file:

On row 22, the alert count has a character value 'unknown' in the field.

We also have a longer comment in the Comments column (i.e. 'Further investigation required' vs 'No comments').

Let's import the file using the code we have learned so far:

You will see the following note on the SAS log along with the error message below:

NOTE: Invalid data for ALERT_COUNT in line 23 12-18.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
23 2018-05-27,Unknown,Further investigation required 49
DATE=2018-05-27 ALERT_COUNT=. Comments=Further inv _ERROR_=1 _N_=22


ERROR: Import unsuccessful. See SAS Log for details.


This indicates a data issue.

The file was imported, however, the alert count for that particular row is now missing and the comment gets truncated:

As mentioned, SAS uses the first 20 rows to determine the attributes of the variables.

In the first 20 rows, all of the values for alert count are numeric.

Also, the comment in the first 20 rows (i.e. 'No comments') has just 11 characters.

It assumes these are true for the rest of the observations.

As a result, one particular row cannot be captured properly because the alert count is a character value (i.e. 'unknown').

The row where the comment is longer than 11 characters is also truncated.


Fortunately, the issue can be resolved by adding the GUESSINGROWS statement.

Let's look at the example below:

The GUESSINGROWS statement specifies the number of rows to use to "guess" the attributes of the variables.

In this example, we tell SAS to use the first 50 observations to determine the attributes of the variables.

The file is imported properly now:


Exercise

Use Proc Import to import the CSV_Practice_03.csv.

Get Hint

Get Solution