Master SAS in 30 days!

A jargon-free, easy-to-learn SAS base course that is tailor-made for students with no prior knowledge of SAS.

The SCAN function in SAS

The SCAN function in SAS provides a simple and convenient way to parse out words from character strings. The SCAN function can be used to select individual words from text or variables which contain text and then store those words in new variables. This article provides a number of different examples and uses for the SCAN function, including some of the most commonly used options to help you get the most from this function.
 
In particular, this article will cover:
  1. Selecting the nth word in a character string.
  2. Selecting the last word in a character string.
  3. Handling different word delimiters.
  4. Using SCAN with DO LOOPs to parse long character strings.
 
 

Software

Before we continue, make sure you have access to SAS Studio. It's free!

Data Sets

In this article, the CARS and BASEBALL datasets from the SASHELP library will be used to illustrate a number of different uses for the SCAN function.

Selecting the Nth Word in a Character String

 

One of the simplest operations you can perform with the SCAN function is to find the nth word in a character string.
 
Let’s start with an example to demonstrate how to find the first word in a character string and then store the result in a separate variable. The most basic use of the SCAN function requires only two arguments. After specifying SCAN and an open parenthesis, the first part of the function is to specify the character string that you are planning to select words from. This can be either a variable or an explicit character string. In this first example we are using the explicit character string, “I am an Expert SAS Programmer”.
 
The second argument is the count, which is the numeric position of the word within the character string that you want to search. So, to return the first word, we can explicitly specify a number 1. This could also be replaced with a variable containing the desired count value.
 
The SAS syntax is as follows:

data example;
 first_word = scan("I am a SAS Programming Expert",1);
run;

As you can see in the output below, the new variable FIRST_WORD has been created and its value is the first word, “I” from the character string, “I am a SAS Programming Expert”:

In most cases, the character string that you would like to select words from is contained in a variable itself. In this next example, the variable TEXT contains the character string “I am a SAS Programming Expert” and again we would like to extract the first word from the string. To do this, we simply replace the explicit character string in quotes with the variable name TEXT. The count, 1, stays the same as the previous example since we are still interested in the first word:

data example; 
 text = "I am a SAS Programming Expert";
 first_word = scan(text,1); 
run;

The output dataset now contains both the original TEXT variable and the newly created FIRST_WORD variable which contains the first word from the TEXT variable, “I”:
 
To select additional words, such as the second, third and fourth word, we can modify the count argument of the SCAN function. To select the second word from a string, simply set the count argument to 2. For the third word, set the count equal to 3, and so on.
 
In the following example, we create three additional variables, SECOND_WORD, THIRD_WORD and FOURTH_WORD, which select the second, third and fourth word respectively from the TEXT variable:
 

data example; 
 text = "I am a SAS programming expert"; 
 first_word = scan(text,1);
 second_word = scan(text,2);
 third_word = scan(text,3);
 fourth_word = scan(text,4); 

run;

The output data now has 5 variables – the original TEXT variable as well as the first through fourth word, each in separate variables, as shown here:

Do you have a hard time learning SAS?

Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!

Selecting the Last Word in a Character String


Using the SCAN function, you also the have the ability to read from right to left, effectively allowing you to capture the last word in a character string.
 
To tell SAS to read from right to left, we simply change the count argument to be a negative number to indicate the word number that we would like to read, starting from the right and moving left. So, to select the word “Expert” in our TEXT variable, we can use a count of -1, as shown here:

data example; 
 text = "I am a SAS Programming Expert"; 
 last_word = scan(text,-1);
run;

As you can see in the output data, we now have a new variable, LAST_WORD, which contains the last word of the text string, “Expert”:
 

Alternatively, instead of using a negative count you can use the “b” modifier available with the SCAN function. By specifying a “b” argument with the SCAN function, you can tell SAS to read from right to left instead of the default left to right. Note when using a modifier with the SCAN function, the modifier needs to be the fourth argument, so you must always explicitly state the third argument (the delimiter) together with the fourth modifier argument so that SAS won’t treat your modifier as the delimiter!

Here is the syntax with the “b” modifier included:

data example; 
 text = "I am a SAS Programming Expert"; 
 last_word = scan(text,1," ","b");
run;

The resulting output dataset shows the same result as previously – the last word, “Expert” has been captured in the LAST_WORD variable:

Note there are many other modifiers for the SCAN function to help with special cases. These modifiers can be found in the SAS documentation.

Become a Certified SAS Specialist

Get access to two SAS base certification prep courses and 150+ practice exercises

Handling Different Word Delimiters


So far, the examples we have looked at have only had blanks or spaces as the delimiter between words. What happens when there is a different delimiter, such as a comma?
 
In the example below, the code has been modified so that the words in the character string of the text variable are delimited with a comma instead of spaces. Here, we are trying to select the fourth word:
 

data example; 
 text = "I,am,a,SAS,Programming,Expert"
 fourth_word = scan(text,4);
run;

As you can see from the output data shown below, the SCAN function still works even with commas as the delimiter:

The reason this still works is because by default, with any computer using ASCII characters, the SCAN function will automatically check for any of the following characters as delimiters:
 
blank ! $ % & ( ) * + , - . / ; < ^ :
 
When your data contains a delimiter between words not found in the default list, you can use the charlist argument (the third argument) with the SCAN function to specify your own custom delimiter.
 
For example, if the words in your character string are delimited with a plus sign (+), you simply need to enclose the plus sign in quotations as the third argument to the scan function.
 
The syntax below demonstrates how to select the fifth word from a plus sign delimited character string:

data example; 
 text = "I+am+a+SAS+Programming+Expert"
 fifth_word = scan(text,5,"+");
run;

In the output data below, you can see the fifth word in the string has been successfully selected:

In some cases, you may also want to force SAS to use only one of the default delimiters. By default, SAS will use not just one but all of the delimiters in the default list. This can become problematic in certain cases when your data contains multiple delimiters.
 
In the SASHELP.BASEBALL dataset, the NAME variable contains a list of first, last and middle names. The structure is as follows: <last name>,<firstname><blank><middlename>. You would like to create two new variables: LASTNAME and GIVEN_NAMES.
 
Since commas and spaces are default delimiters, we start without specifying our own delimiter:

data baseball;
 set sashelp.baseball;
 
 lastname = scan(name,1);
 given_names = scan(name,2);
 
 keep name given_names lastname;
run;

At first glance it may appear as though the results are correct, but after further inspection you will notice that some names were not parsed properly. For example, Andy Van Slyke’s given name should have been “Andy” and not “Slyke” as shown below:

To correct this, we can tell SAS to only use the comma as a delimiter so that “Van Slyke” will become the last name and Andy will be the given name:

data baseball;
 set sashelp.baseball;
 
 lastname = scan(name,1,",");
 given_names = scan(name,2,",");
 
 keep name given_names lastname;
run;

Now that the blanks are no longer considered delimiters and only the commas are, we get the desired result in our output data with “Andy” now in the GIVEN_NAMES variable and “Van Slyke” in the LASTNAME variable:

Using SCAN with DO LOOPS to Parse Long Character Strings

When combined with a simple DO LOOP and a SAS , the SCAN function makes it easy to parse out each word from a character string into separate variables.

For example, in the SASHELP.CARS dataset, you would like to parse out each word from the MODEL variable into 5 separate variables. Since the words of the full model name are delimited by spaces, no modification is needed to the delimiter argument and the default can be used.

The code below uses a DO LOOP to scan the MODEL variable and then create the variables MODELNAME1 to MODELNAME5:

data cars_parse;
 set sashelp.cars;
 
 array modelname[5] $15 model1-model5;
 do i = 1 to 5;
  modelname[i] = scan(model,i,", ");
 end;
 
 keep model model1-model5;
run;

As you can see in the output data shown partially below, we now have 5 new MODEL variables, with one word per variable:

Master SAS in 30 Days

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
iconmail

Get latest articles from SASCrunch

SAS Base Certification Exam Prep Course

Two Certificate Prep Courses and 300+ Practice Exercises