Proc Import [2-7]

h

Proc Import Options and Statements

Proc Import is a powerful tool to import external data files.

It has a number of options and statements that allow flexibility when reading files.

In this section, we will look at:

  • The REPLACE option
  • The SHEET statement
  • The GETNAMES statement.

REPLACE option

The REPLACE option tells SAS to overwrite the existing SAS data set if it already exists.

Let's look at an example.

In the previous section, you ran the code below and created the IMPORT data set.

PROC IMPORT DATAFILE=REFFILE
	DBMS=XLSX
	OUT=WORK.IMPORT;
	GETNAMES=YES;
RUN;

Now, run exactly the same code to import the file one more time.

You will see the following message in the SAS log:

NOTE: Import cancelled. Output dataset WORK.IMPORT already exists. Specify REPLACE option to overwrite it. NOTE: The SAS System stopped processing this step because of errors.

SAS, by default, does not allow you to overwrite an existing SAS data set when using Proc Import.

The IMPORT data set was created the first time you ran the code.

SAS does not allow you to overwrite this data set, unless the REPLACE option is specified.

Example

PROC IMPORT DATAFILE=REFFILE
	DBMS=XLSX
	OUT=WORK.IMPORT
	REPLACE;
	GETNAMES=YES;
RUN;

The code above will allow you to overwrite the IMPORT data set even if it already exists.


SHEET Statement

By default, SAS reads the first worksheet in the Excel file.

The SHEET statement allows you to select another worksheet to import.

Let's look at the excel_practice_01.xlsx spreadsheet again.

The Excel spreadsheet has two worksheets:

  • Segment 1 and
  • Segment 2

Segment 2 is another segment. It contains records from ID 41 to 80 (i.e., ID00041 to ID00080).

We will simply add the SHEET statement to import the "Segment 2" worksheet.

PROC IMPORT DATAFILE=REFFILE
	DBMS=XLSX
	OUT=WORK.IMPORT
	REPLACE;
	GETNAMES=YES;
	SHEET = 'Segment 2';
RUN;

The Proc Import procedure above imports the "Segment 2" worksheet, instead of "Segment 1".

💡 Important

The SHEET statement is a separate statement from the Proc Import statement.​

It is a common mistake to treat the SHEET statement as one of the Proc Import options (such as OUT, DBMS or REPLACE).

Doing so will generate an error message in the SAS log.


GETNAMES Statement

By default, Proc Import treats the first row of data as the variable names.

For example, in the excel_practice_01.xlsx file, the variable names are captured in the first row of data:

When importing the file, the first rows of data become the variable names:

Sometimes, you will encounter files that have no headers in the first row.

Instead, the first row contains the actual data.

Let's look at the excel_practice_03.xlsx file:

In Excel_Practice_03.xlsx, there are no variable names in the first row.

We will read the data starting from the first row, instead of the second.

This can be done by adding the option GETNAMES=No.

​Let's look at an example:

FILENAME ExPrac03 '<your file location>/excel_practice_03.xlsx';

PROC IMPORT DATAFILE=ExPrac03
	DBMS=XLSX
	OUT=Practice_03
	REPLACE;
	GETNAMES=NO;
RUN;

Note: you must change the file location in the code before you run it.

The "GETNAMES=NO" statement tells SAS to NOT use the first row of data as the variable names.

The data set created uses generic columns A, B and C instead, which is correct:


Exercise

Use Proc Import to import the second worksheet in excel_practice_02.xlsx.

Get Hint

Get Solution