Master SAS in 30 days!

A jargon-free, easy-to-learn SAS base course that is tailor-made for students with no prior knowledge of SAS.

The INDEX function in SAS

By definition, the INDEX function will search a character string for a specified string of characters. If a match is found, the INDEX function returns the position of the first occurrence of the string’s first character, when searched from left to right.
 
The basic INDEX function only has 2 arguments, source and excerpt. The source is the character string variable or expression that you would like to search and the excerpt is the character string, variable or expression that would like to search for within the source.  
 
Two variations of the INDEX function, the INDEXC and INDEXW functions, are also available in SAS for different use cases and will also be covered in this article. In particular, this article will cover the following topics:

  1. INDEX Function
  2. INDEX vs. INDEXC vs. INDEXW Functions
  3. Using the Delimiter Argument with INDEXW
 
 

Software

Before we continue, make sure you have access to SAS Studio. It's free!

Data Sets

The following datasets found in the SASHELP library are used in this article for various examples:

  1. CARS – Car Data
  2. ORSALES - Fictitious  Sports and Outdoors Store Sales data

Index Function

Let’s start with a simple example to illustrate exactly what the INDEX function does and how it is used. In the following syntax, the INDEX function is used to generate a value for the variable SAS_POSITION. In this example, the string “I am a expert SAS programmer” is the source that will be searched and “SAS” is the character string that SAS will be searching for.
 
Since the INDEX function returns the position of the excerpt’s first character the first time it is found, we expect it to return an 8 based on the diagram below:

Using the syntax below and examining the output with a PROC PRINT, the result is confirmed:

data index_function;
 sas_position = index('I am a SAS expert','SAS');
run;

proc print data = index_function;
run;

Of course, the INDEX function can also be used to search variables within a dataset, and not just explicit character strings. In the SASHELP.CARS dataset, the MODEL variable contains car model names. Let’s say for example you would like to know if the word “convertible” occurs within the values of MODEL and if so, what is the starting position of the string “convertible”.
 
In this case, we now use the variable MODEL as the source argument to the INDEX function and the string we want to search for is “convertible”, which will be the excerpt argument. Using the syntax below, a new dataset, CONVERTIBLE, is created which creates a new variable CONVERTIBLE_POSITION to store the start position of the word “convertible” within the variable MODEL. To examine the results, PROC PRINT is used to display only the MODEL and newly created CONVERTIBLE_POSITION variable values:

data convertible;
 set sashelp.cars;
 
 convertible_position = index(model,'convertible');
run;
 
proc print data=convertible;
 vars model convertible_position;
run;

By looking at the Results from the PROC PRINT, the CONVERTIBLE_POSITION variable appears to have been created correctly as those Models without convertible have a “0” for CONVERTIBLE_POSITION and those Models with the word convertible within them have the position of the “c” (counting from left to right) when the word “convertible” is found:

These results can be further analyzed using the PROC FREQ syntax below to determine the distribution of values for CONVERTIBLE_POSITION:

proc freq data = convertible;
 tables convertible_position;
run;

By examining the results shown below, you can see for example that 387 cars do not have convertible in the Model name (frequency=387 when CONVERTIBLE_POSITION=0) while 3 cars have convertible starting at the 6th position in the model name (frequency=3 when CONVERTIBLE_POSITION=6):
 
A third way to use the INDEX function is to use variables both as the source (the character string to search) and the excerpt (the character string to search for within source).
 
In the SASHELP.ORSALES dataset, products are grouped my Product Line, Product Category and Product Group. If for example you are interested in how often the Product Line (PRODUCT_LINE) values occur within the Product Categories (PRODUCT_CATEGORY), you could use the INDEX function with these two variables as the source and except arguments.
 
In the syntax below, a new variable PRODUCT_MATCHES is created which will contain the the starting positions for the values of PRODUCT_LINE if they are found within the PRODUCT_CATEGORY. For the purposes of this example, we are only interested in the unique PRODUCT_CATEGORY and PRODUCT_LINE groups so the duplicate records are removed using a PROC SORT to make the results easier to examine visually. Finally, A PROC PRINT is then used to display the results:
 

data orsales;
 set sashelp.orsales;
 
 product_matches = index(product_category,strip(product_line));
 
run;
 
proc print data = orsales;
 var product_category product_line product_matches;
run;

In the results shown below, you will notice that they are not quite what we were expecting. Sometimes the PRODUCT_MATCHES variable correctly shows the starting position of the PRODUCT_LINE values with PRODUCT_CATEGORY (e.g. the “S” from PRODUCT_LINE value “Sports” starts at the 6th position within the PRODUCT_CATEGORY value “Swim Sports”). However a value of 0 is shown for PRODUCT_MATCHES with the PRODUCT_LINE value “Sports” and PRODUCT_CATEGORY value “Assorted Sports Articles”, implying that “Sports” was not found within “Assorted Sports Articles”:

This situation can often occur when searching character strings as SAS is also considering any blanks that may be present in the source or the excerpt variables. To ignore the blanks and search for only the characters, the STRIP function can be applied to both the PRODUCT_CATEGORY and PRODUCT_LINE variables to remove any leading and trailing blanks before searching for matches. Note the STRIP function should be used with caution on a case by case basis as there could also be scenarios where the leading and trailing blanks are important to keep.
 
Here is how the STRIP function can be applied to the INDEX argument:

data orsales;
 set sashelp.orsales;
 
product_matches= index(strip(product_category),strip(product_line)) ;
 
run;
 
proc sort data = orsales nodupkey;
 by product_category product_line product_matches;
run;
 
proc print data = orsales;
 var product_category product_line product_matches;
run;

As shown below, we now have the desired results for the PRODUCT_MATCHES variable:

Do you have a hard time learning SAS?

Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!

INDEX vs. INDEXC vs. INDEXW Functions

In addition to the INDEX function, SAS also has both an INDEXC and INDEXW function for handling different scenarios. Although the syntax for the INDEX, INDEXC and INDEXW functions is nearly identical, they each produce slightly different results.
 
While the INDEX function searches a character string for a string of characters and then returns the position of the first occurrence of that string’s first character, the INDEXC function will search for any of the characters in the specified string and then return the position of the first occurrence of any of those character.
 
The INDEXW function on the other hand will search for a string that is specified as a word and then returns of the position of the first character in that word. In INDEXW, a third argument, delimiter, is available to specify which delimiter is used to separate words. By default the delimiter is a space, which is of course the most common word delimiter.
 
Let’s look at an example that uses all three functions to compare and contrast the results. In the example below, the variable STRING contains a string of text which is used as the source argument to search through. The same excerpt, “cat” will be used as the except argument with the INDEX, INDEXC and INDEXW functions to compare the results. Finally, in the syntax below a PROC PRINT is used to examine the results:

 

data index_comparison;
 
 string = 'I have a dog, a catfish and a cat named bob';
 
 index_result = index(string,'cat');
 indexc_result= indexc(string,'cat');
 indexw_result = indexw(string,'cat');

 
run;
 
proc print data = index_comparison;
 var string index_result indexc_result indexw_result;
run;

Now, let’s walk through each of the results shown below:

First, the INDEX_RESULT variable is showing a value of 17. This is because the first time INDEX identifies the string “cat”, it returns the start position of that string. In this case, the “c” on “catfish” is at the 17th position when counting from left to right.
 
INDEXC_RESULT however is showing a value of 4. This is because INDEXC searches for any of the characters within “cat” (i.e. a “c”, “a” or “t”) and then returns the position of the first one it finds. In this case, the “a” on “have” is at the 4th position when counting from left to right, and so a 4 is returned.
 
Finally, INDEXW is showing a 31. Although both INDEX and INDEXW are looking for the full string “cat”, the result for INDEXW is different because it is searching for space delimited words. Since “catfish” does not have a space before and after the string “cat”, INDEXW doesn’t find a match until “ cat “ is found in the STRING variable at the 31st position when reading from left to right.
 

Using the Delimiter Argument with INDEXW

By default, INDEXW will assume that words in the character string it is searching are space delimited. If the words in the character string you are searching are delimited by something other than spaces, the delimiter argument can be used.
 
For example, if the the pipebar “|” is used instead of spaces for delimiters in the source character string, the pipebar “|” can be used as the delimiter argument, as shown in the syntax below:

data index_comparison;
 
 string = 'I|have|a|dog,|a|catfish|and|a|cat|named|bob';
 
 index_result = index(string,'cat');
 indexc_result= indexc(string,'cat');
 indexw_result = indexw(string,'cat',"|");

 
run;
 
proc print data = index_comparison;
 var string index_result indexc_result indexw_result;
run;

As you can see in the output shown below, the results are the same as the previous example when the words in the STRING variable were delimited with spaces:
 

Become a Certified SAS Specialist

Get access to two SAS base certification prep courses and 150+ practice exercises

Master SAS in 30 Days

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
iconmail

Get latest articles from SASCrunch

SAS Base Certification Exam Prep Course

Two Certificate Prep Courses and 300+ Practice Exercises