Importing a CSV File
Proc Import can also be used to import a CSV file.
Let's look at the csv_practice_01.csv file.
This is a very simple CSV file with just three columns:
It is not difficult to import this file. The code is shown below:
Filename csv_fl1 '<your file location>/csv_practice_01.csv'; proc import datafile=csv_fl1 out = csv_fl1 dbms = csv replace; run;
Note: you must change the location path in the code before you run it.
Importing a CSV file is similar to importing an Excel spreadsheet. There are only two things you need to change:
Run the code in SAS Studio, and you will have imported the file:
When importing a CSV file, SAS uses the first 20 observations to determine the attributes (data type and the length) of the variables.
In our previous example, the first 20 observations of the Alert Count column were numeric values.
As a result, SAS assumes that all other values in this column are also numeric.
This could cause issues later if there are values in this column that are not numeric.
Let's look at the csv_practice_02.csv file:
On row 22, the Alert Count field has an 'unknown' character value.
We also have a longer comment in the Comments column (i.e., 'Further investigation required' vs 'No comments').
Let's import the file using the code we have learned so far.
Filename csv_fl2 '/home/kisumsam/proc_import/csv_practice_02.csv'; proc import datafile=csv_fl2 out = csv_fl2 dbms = csv replace; run;
You will see the following note on the SAS log along with the error message:
NOTE: Invalid data for ALERT_COUNT in line 23 12-18.
23 2018-05-27,Unknown,Further investigation required 49
DATE=2018-05-27 ALERT_COUNT=. Comments=Further inv _ERROR_=1 _N_=22
ERROR: Import unsuccessful. See SAS Log for details.
This indicates a data issue.
The file was imported, however, the Alert Count for that particular row is now missing and the comment will be truncated:
As mentioned, SAS uses the first 20 rows to determine the data type and the length of the column variable.
In the first 20 rows, all of the values for Alert Count are numeric.
Also, the comment in the first 20 rows (i.e., 'No comments') has just 11 characters.
It assumes these are true for the remaining observations.
As a result, one particular row cannot be captured properly because the Alert Count is a character value (i.e., 'unknown').
The row that contains the comment that is longer than 11 characters is also truncated.
Fortunately, the issue can be resolved by adding the GUESSINGROWS statement.
Let's look at the example below.
Filename csv_fl2 '/home/kisumsam/proc_import/csv_practice_02.csv'; proc import datafile=csv_fl2 out = csv_fl2 dbms = csv replace; guessingrows = 50; run;
The GUESSINGROWS statement specifies the number of rows to use to "guess" the data type and length of the variables.
In this example, we tell SAS to use the first 50 observations to determine the data type and length of the variables.
The file is imported properly now:
Programming Tip 2
You can ensure all of the values are imported properly by specifying the option GUESSINGROWS=MAX.
This will tell SAS to use all of the observations to determine the data type and length of the variables.
However, this could potentially affect the speed of the data import.
Use Proc Import to import the csv_practice_03.csv.
The name of the variable with the longest length appears in row 29 in the CSV file (including the variable name in the first row).
** Remember to change the path in the FILENAME statement **;
Filename csv_fl3 '/home/your_user_name/proc_import/csv_practice_03.csv';
proc import datafile=csv_fl3
out = csv_fl3
dbms = csv
guessingrows = max;