Master SAS in 30 days!

A jargon-free, easy-to-learn SAS base course that is tailor-made for students with no prior knowledge of SAS.

How to Import Text Files into SAS

Text files are a common file format to use when importing or exporting data from one data source for another. When importing text files from other data sources or databases, there are many variations in the data structure and delimiters that one can come across.
 
This article aims to address some of the more common challenges that arise when attempting to import different variations of text files into SAS. A few different tips and methods are also provided along the way.
 
Topics covered include:

  1. Importing tab-delimited text files with PROC IMPORT
  2. Importing special character delimited text files with PROC IMPORT
  3. Importing space-delimited text files with PROC IMPORT
  4. Using PROC IMPORT to Generating Data Step code for importing text files

Software

Before we continue, make sure you have access to SAS Studio. It's free!

Data Sets

The examples used in this article are based on the text files listed below. The text files are derived from the SASHELP datasets including CARS and ORSALES datasets:

  1. Cars_tab.txt - download
  2. Cars_pipe.txt - download
  3. Orsales_space.txt - download

Before running any of the examples below, you will need to replace the path ‘/home/your_username/SASCrunch’ with a directory that you have read/write access to in your environment.

This can vary depending on the machine you are running SAS on, the version of SAS you are running and the Operating System (OS) you are using. 

If you are using SAS OnDemand for Academics, you must first upload the files to the SAS server.

Please visit here for instruction if you are not sure how to do this.

1. Importing a Tab-delimited Text File with PROC IMPORT

With a tab-delimited text file, the variables (columns) are separated by a tab and the files typically end with a “.txt” extension.

In this example, the input file is the cars_tab.txt file. This is a text file based on the SASHELP.CARS dataset.

The first part you need following the PROC IMPORT statement is the datafile argument. The datafile argument is required so that SAS knows where the file you would like to import is stored and what the name of that file is. Inside the quotation marks following the datafile argument, you need to add the complete path, including the filename and file extension. Be sure to replace ‘/home/your_username/SASCrunch’ with the correct directory on your machine or environment where cars_tab.txt is saved. In this example, “/home/your_username/SASCrunch” is the path, “cars_tab” is the filename, and “.txt” is the file extension.

To import tab-delimited text files, both the DBMS and DELIMITER options will need to be used. The DBMS value used for this case is DLM. The DLM value tells SAS that you would like to specify a custom delimiter for the dataset.

After closing off the PROC IMPORT statement with a semi-colon, a second option, DELIMITER is added. The value of DELIMITER for a tab-delimited file is ‘09’x, which is the hexadecimal representation of a TAB on an ASCII platform.

Finally, the replace option is included to allow for multiple re-runs and overwrites of the CARS_TAB dataset in WORK. If you prefer not to overwrite the newly imported SAS dataset, you can simply remove the replace option.

Using these parameters, the following code will import the tab-delimited cars_tab.txt file and output a SAS dataset in WORK called CARS_TAB:

proc import datafile = '/home/your_username/SASCrunch/cars_tab.txt'
 out = cars_tab
 dbms = dlm
 replace;
 delimiter = '09'x;
run;

After running the above code, you will notice something is a bit off with the output dataset:

If you were to open up the cars_tab.txt file directly using Notepad, Wordpad, TextEdit or similar on your computer, you would notice that this file has a extra row of invalid data in it. This type of situation often occurs when the text file is created from another data source.
 
Fortunately, SAS provides an option that you can add to your PROC IMPORT statement to skip this extra line of data that you don’t need. By adding the datarow option, you can let SAS know at which row the data (observations) start. In this case, we know that the first row has the headings, the second row has no data, and the observations start on the third row, so we set datarow = 3:

proc import datafile = '/home/your_username/SASCrunch/cars_tab.txt'
 out = cars_tab
 dbms = dlm
 replace
 ;
 delimiter = '09'x;
 datarow = 3;
run;

In the output data shown partially below, you will see that extra row has now been removed:

Do you have a hard time learning SAS?

Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!

2. Importing Text Files Delimited with Special Characters

Since text files can contain any number of special characters as delimiters, the DELIMITER statement be used with just about any keyboard character.
 
For example, if the values of a text file are delimited with the pipe bar “|”, you can simply specify the pipe bar symbol in the DELIMITER statement, similar to how we used ‘09’x for tab-delimited files. In this example, the cars_pipe.txt file is read in to create the CARS_PIPE SAS dataset in the WORK library:

proc import datafile = '/home/your_username/SASCrunch/cars_pipe.txt'
 out = cars_pipe
 dbms = dlm
 replace
 ;
 delimiter = '|';
run;

After updating the path in the datafile statement and running the above code, you will notice that while the columns have been read in correctly, the variable names are not correct and actual values are being used as the variable names:
If you were to open up the cars_pipe.txt file directly using Notepad, Wordpad, TextEdit or similar text editors on your computer, you would notice that this text file has no column headings and the data starts directly in the first row. 
 
To get around this, you need to let SAS know that there are no column headings provided in the input text file. By default, there is a GETNAMES option in PROC IMPORT which is set to YES. With this setting equal to YES,  SAS assumes that the first row of data contains the column headings, which ultimately end up as the SAS variable names. When this is not the case, simply set GETNAMES = NO to let SAS know there are no column headings provided in the input file:
 

proc import datafile = '/home/your_username/SASCrunch/cars_pipe.txt'
 out = cars_pipe
 dbms = dlm
 replace
 ;
 getnames = no;
 delimiter = '|';
run;

Now in the output data, all the records will be found in the dataset itself, but the heading names will have generic names from VAR1 up to VAR15 in this case, since there are 15 columns:

To fix the variable names, you could for example use the SAS Data Step with the RENAME statement to create a new dataset. As an example, the following dataset code would create a dataset called CARS_PIPE_CLEAN, which uses the RANEM statement to set the appropriate variable names as shown here:

data cars_pipe_clean;
 set cars_pipe; 
 rename           var1 = make
                          var2 = model
                          var3 = type
                          var4 = origin
                        /*var5 = ...
                          var15 = ...*/
                             ;
run;

3. Importing Space-delimited Text Files with PROC IMPORT

Space-delimited text files are yet another common file type you may encounter that you would like to import into SAS. By default, setting DBMS = DLM with your PROC IMPORT statement will use space as the delimiter, so you don’t need to explicitly use the delimiter option in this case.
 
For example, the orsales_space.txt text file contains space-delimited columns, and can be imported into SAS with DBMS = DLM:

proc import datafile = '/home/your_username/SASCrunch/orsales_space.txt'
 out = orsales
 dbms = dlm
 replace
 ;
run;

At first glance, it appears that the import was successful and the ORSALES dataset was successfully created in WORK as shown partially here:

However, if you run a PROC FREQ (code provide below) on the Product_Line variable, you will discover that one of the values for Product_Category is truncated:

proc freq data = orsales;
 tables product_category;
run;

As shown in the Results, “Assorted Sports Articles” is now only “Assorted Sports A” in this newly imported dataset:

This type of situation can often occur when importing datasets into SAS because PROC IMPORT will only check a portion of the records before determining what the appropriate variable type and lengths should be on the output SAS dataset.
 
The solution to this problem is to include the GUESSINGROWS option with your PROC IMPORT call. By specifying a number for GUESSINGROWS, you can tell SAS how many rows it should scan in your incoming dataset before determining what the appropriate length and variable types should be.
 
In this example import, there are 912 rows of data. Here, by setting GUESSINGROWS = 912 we can be certain that SAS will pick the largest width necessary to avoid truncation of any data when it completes the import. A new dataset, ORSALES_GUESSINGROWS, is then created so you can see the difference in results:

proc import datafile = '/home/your_username/SASCrunch/orsales_space.txt'
 out = orsales_guessingrows
 dbms = dlm
 replace
 ;
 guessingrows = 912;
run;

By running a PROC FREQ to generate a frequency table on the newly created dataset, we can test whether or not the GUESSINGROWS option was effective:

proc freq data = orsales_guessingrows;
 tables product_category;
run;

As you can see from the output, the Product_Category value “Assorted Sports Articles” now shows up correctly and is no longer truncated:

It’s important to note that GUESSINGROWS can be extremely computationally intensive and may significantly slow down the time it takes to import your dataset to SAS. The larger the value you set for GUESSINGROWS, the longer the processing will take, but more reliable the results will be. The run time will of course depend on your environment, the number of records and the number of variables found in your data.

4. Importing a Tab-delimited File using Data Step

Although the amount of SAS code required to import a Text file using Data Step is longer than the code required for PROC IMPORT, using Data Step code allows for greater flexibility.

By using Data Step code, the variable names, lengths and types can be manually specified at the time of import. The advantage is that this allows you to format the dataset exactly the way you desire as soon as it is created in SAS, rather than having to make additional modifications later on.

First, as with any SAS Data Step code, you need to specify the name and location for the dataset you are going to create. Here, a dataset named CARS_DATASTEP will be created in the WORK directory.

The next step is to use the INFILE statement. The INFILE statement in this case is made up of 6 components:

  1. The location of the Text file –  /home/your_username/SASCrunch in this example
  2. Delimiter option – the delimiter found on the input file enclosed in quotation marks (delimiter is ‘09’x in this example since it is a tab-delimited file)
  3. MISSOVER option – Tells SAS to keep reading the same record even if a missing value is found for one of the variables
  4. FIRSTOBS – The first row that contains the observations in the input file (Set to 3 in this example since the observations start on the third row in the cars_tab.txt file)
  5. DSD – Tells SAS that when a delimiter is found within a quotation mark in the dataset, it should be treated as a value and not a delimiter
  6. LRECL – Maximum length for an entire record (32767 is the default maximum to use which will ensure no truncation within 32767 characters)

After the INFILE statement, the simplest way to ensure that your variable names, lengths, types and formats are specified correctly is to use a format statement for each variable. After an appropriate format has been assigned to each variable, the variables that you would like to import should be listed in order after an INPUT statement. Note that character variables should have a dollar sign ($) after each variable name.

Note that you can also specify INFORMATs and LENGTHs optionally here, but in most cases the FORMAT and INPUT statements should be all you need for a successful import. 

Below is the Data Step code that would successfully import the cars_tab.txt file into a SAS dataset. As mentioned, be sure to update the path to the correct location of the cars_tab.txt file in your environment before running the following code:

data work.cars_datastep_tab;
infile '/home/your_username/SASCrunch/cars_tab.txt'
                 delimiter='09'x
                 missover
                 firstobs=2
                 DSD
                 lrecl = 32767;

        format Make $5. ;
        format Model $30. ;
        format Type $6. ;
        format Origin $6. ;
        format DriveTrain $5. ;
        format MSRP $9. ;
        format Invoice $9. ;
        format EngineSize best12. ;
        format Cylinders best12. ;
        format Horsepower best12. ;
        format MPG_City best12. ;
        format MPG_Highway best12. ;
        format Weight best12. ;
        format Wheelbase best12. ;
        format Length best12. ;
     input
                 Make $
                 Model $
                 Type $
                 Origin $
                 DriveTrain $
                 MSRP $
                 Invoice $
                 EngineSize
                 Cylinders
                 Horsepower
                 MPG_City
                 MPG_Highway
                 Weight
                 Wheelbase
                 Length
     ;
 run;

After running the above code, you should see the CARS_DATASTEP_TAB data set, shown partially here:

Become a Certified SAS Specialist

Get access to two SAS base certification prep courses and 150+ practice exercises

5. Generating Data Step Code with PROC IMPORT

When the variable names, types, lengths or formats that SAS is automatically generating with PROC IMPORT are not what you are looking for, and you don’t want to type out 40+ lines of code as in the previous example, PROC IMPORT can still be a time-saving tool.
 
Going back to the cars_pipe.txt text file, recall that this text file did not contain column headings.
 
Re-run the following code to import cars_pipe.txt into SAS and create a temporary dataset, CARS_PIPE to be stored in WORK: 

proc import datafile = '/home/your_username/SASCrunch/cars_pipe.txt'
 out = cars_pipe
 dbms = dlm
 replace
 ;
 getnames = no;
 delimiter = '|';
run;

After running the above code, go to the Log that is created and notice that SAS Data Step code is actually being generated as a result of the PROC IMPORT:

By simply copying and pasting this code from your log into your SAS program, you can now use this code as a template to start your Data Step code, modifying it as needed to adjust variable names, types and lengths.
 
For example, you can replace the variable names VAR1-VAR15 with the original variable names from CARS, as shown here:

data WORK.CARS_PIPE_CUSTOM    ;
infile '/home/your_username/SASCrunch/cars_pipe.txt' delimiter  =  '|' MISSOVER DSD lrecl = 32767; 
     informat make $5. ;
     informat model $30. ;
     informat type $6. ;
     informat origin $6. ;
     informat drivetrain $5. ;
     informat msrp nlnum32. ;
     informat invoice nlnum32. ;
     informat enginesize best32. ;
     informat cylinders best32. ;
     informat horsepower best32. ;
     informat mpg_city best32. ;
     informat mpg_highway best32. ;
     informat weight best32. ;
     informat wheelbase best32. ;
     informat length best32. ;
     format make $5. ;
     format model $30. ;
     format type $6. ;
     format origin $6. ;
     format drivetrain $5. ;
     format msrp nlnum12. ;
     format invoice nlnum12. ;
     format enginesize best12. ;
     format cylinders best12. ;
     format horsepower best12. ;
     format mpg_city best12. ;
     format mpg_highway best12. ;
     format weight best12. ;
     format wheelbase best12. ;
     format length best12. ;
  input
              make $
              model $
              type $
              origin $
              drivetrain $
              msrp
              invoice
              enginesize
              cylinders
              horsepower
              mpg_city
              mpg_highway
              weight
              wheelbase
              length
  ;
run;

After running the above code, a new dataset WORK.CARS_PIPE_CUSTOM is created by importing the cars_pipe.txt text file using the SAS Data Step code we generated using PROC IMPORT.
 

Master SAS in 30 Days

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
iconmail

Get latest articles from SASCrunch

SAS Base Certification Exam Prep Course

Two Certificate Prep Courses and 300+ Practice Exercises