Proc Import [2-7]

Proc Import Options and Statements

Proc Import is a powerful tool to import external data files.

It has a number of options and statements that allow flexibility when reading files.

In this section, we will look at:

  • The REPLACE option
  • The SHEET statement
  • The GETNAMES statement.

REPLACE Option

In the previous section, we have the REPLACE option in the Proc Import procedure.

The REPLACE option tells SAS to replace the existing output data set if it already exists.

The EX1 data set was created in the previous section (please create it now if you haven't already done so).

We will run the code again without the REPLACE option:

You will see the following error on the SAS log:

NOTE: Import cancelled. Output dataset WORK.EX1 already exists. Specify REPLACE option to overwrite it.
NOTE: The SAS System stopped processing this step because of errors.

The import was cancelled due to the fact that the EX1 data set already exists.

It is recommended to use the REPLACE option unless the file that already exists contains data that is still required.


SHEET Statement

By default, SAS reads the first worksheet in the Excel file.

The SHEET statement allows you to select another worksheet to import.

Let's look at the Excel_Practice_01.xlsx again.

The Excel spreadsheet has two worksheets:

  • Segment 1 and
  • Segment 2

The segment 2 is another segment. It contains records from ID 41 to 80 (i.e. ID00041 to ID00080).

We will simply add the SHEET statement to import the "Segment 2" worksheet.

The Proc Import procedure above imports the "Segment 2" worksheet instead of "Segment 1".

GETNAMES Statement

By default, Proc Import treats the first row of data as the variable names.

For example, in the Excel_Practice_01.xlsx file, the variable names are captured in the first row of data:

When importing the file, the first rows of data become the variable names:

Sometimes, you will encounter files that have no headers in the first row.

Instead, the first row contains the actual data.

Let's look at the Excel_Practice_03.xlsx file:

In the Excel_Practice_03.xlsx, there are no variable names in the first row.

We will read the data starting from the first row, instead of the second.

This can be done by adding an option called GETNAMES.

​Let's look at an example:

The "GETNAMES=NO" statement tells SAS to NOT use the first row of data as the variable names.

The data set created uses generic columns A, B and C instead, which is correct:


Exercise

Use Proc Import to import the second worksheet in Excel_Practice_02.xlsx.

Get Hint

Get Solution