Proc Import Options and Statements
Proc Import is a powerful tool to import external data files.
It has a number of options and statements that allow flexibility when reading files.
In this section, we will look at:
The REPLACE option tells SAS to overwrite the existing SAS data set if it already exists.
Let's look at an example.
In the previous section, you ran the code below and created the IMPORT data set.
PROC IMPORT DATAFILE=REFFILE DBMS=XLSX OUT=WORK.IMPORT; GETNAMES=YES; RUN;
Now, run exactly the same code to import the file one more time.
You will see the following message in the SAS log:
NOTE: Import cancelled. Output dataset WORK.IMPORT already exists. Specify REPLACE option to overwrite it. NOTE: The SAS System stopped processing this step because of errors.
SAS, by default, does not allow you to overwrite an existing SAS data set when using Proc Import.
The IMPORT data set was created the first time you ran the code.
SAS does not allow you to overwrite this data set, unless the REPLACE option is specified.
PROC IMPORT DATAFILE=REFFILE DBMS=XLSX OUT=WORK.IMPORT REPLACE; GETNAMES=YES; RUN;
The code above will allow you to overwrite the IMPORT data set even if it already exists.
By default, SAS reads the first worksheet in the Excel file.
The SHEET statement allows you to select another worksheet to import.
Let's look at the excel_practice_01.xlsx spreadsheet again.
The Excel spreadsheet has two worksheets:
Segment 2 is another segment. It contains records from ID 41 to 80 (i.e., ID00041 to ID00080).
We will simply add the SHEET statement to import the "Segment 2" worksheet.
PROC IMPORT DATAFILE=REFFILE DBMS=XLSX OUT=WORK.IMPORT REPLACE; GETNAMES=YES; SHEET = 'Segment 2'; RUN;
The Proc Import procedure above imports the "Segment 2" worksheet, instead of "Segment 1".
The SHEET statement is a separate statement from the Proc Import statement.
It is a common mistake to treat the SHEET statement as one of the Proc Import options (such as OUT, DBMS or REPLACE).
Doing so will generate an error message in the SAS log.
By default, Proc Import treats the first row of data as the variable names.
For example, in the excel_practice_01.xlsx file, the variable names are captured in the first row of data:
When importing the file, the first rows of data become the variable names:
Sometimes, you will encounter files that have no headers in the first row.
Instead, the first row contains the actual data.
Let's look at the excel_practice_03.xlsx file:
In Excel_Practice_03.xlsx, there are no variable names in the first row.
We will read the data starting from the first row, instead of the second.
This can be done by adding the option GETNAMES=No.
Let's look at an example:
FILENAME ExPrac03 '<your file location>/excel_practice_03.xlsx'; PROC IMPORT DATAFILE=ExPrac03 DBMS=XLSX OUT=Practice_03 REPLACE; GETNAMES=NO; RUN;
Note: you must change the file location in the code before you run it.
The "GETNAMES=NO" statement tells SAS to NOT use the first row of data as the variable names.
The data set created uses generic columns A, B and C instead, which is correct:
Use Proc Import to import the second worksheet in excel_practice_02.xlsx.
You can use the SHEET statement to import the second worksheet in the Excel spreadsheet.
** Remember to change the path in the FILENAME statement **;
filename PracFle2 '/home/your_user_name/proc_import/excel_practice_02.xlsx';
proc import datafile=PracFle2
sheet = 'Crime 2';