Master SAS in 30 days!

A jargon-free, easy-to-learn SAS base course that is tailor-made for students with no prior knowledge of SAS.

The Complete Guide to SAS Arrays

Are you looking to become a more efficient Data Step programmer? Do you often need to perform the same manipulation on multiple variables? If so, arrays are a great tool to simplify your SAS code and improve your programming efficiency. By using arrays, you can execute complex data manipulation tasks, allowing you to manipulate multiple variables with DO LOOPs and carry out a variety of data transformations with limited lines of code.
 
This article will address different types of arrays and walk through different a variety of examples where arrays can be used on a variety of data manipulation tasks.
 
In particular this article will cover:
  1. Overview of Arrays
  2. One-Dimensional Arrays
    - Using arrays to perform a repetitive calculation
    - Creating new variables with arrays
    - Manipulating character variables with arrays
    - Defining array bounds
  3. Implicit Arrays and DO OVER
  4. Multi-dimensional arrays

Software

Before we continue, make sure you have access to SAS Studio. It's free!

Data Sets

A variety of data sets from the SASHELP library are used throughout this article. The datasets used include:

  1. SASHELP.APPLIANC – Sales Time Series for 24 Appliances by Cycle
  2. SASHELP.PRICEDATA – Simulated monthly sales data
  3. SASHELP.CARS – Data about 2004 cars

Array Overview

 

In order to take advantage of SAS arrays, you first need to have a basic understanding of DO LOOPs. For a complete guide on SAS DO LOOPs, see The Complete Guide to Do-loop, Do-while and Do-Until found here.
 
First, let’s walkthrough the different components of  a SAS array. The most commonly used array type is the explicit SAS Array, which can be broken down into 6 main components:
 
array array-name {X} $ length array-elements initial-values
 
Each array statement must at minimum contain these 3 elements:
  1. Array-name: The name of the array
  2. X: the number of elements in the array
  3. Array-elements: the list of variables to be grouped within the array
 
Optionally, the array statement can also include:
  1. $: A dollar sign ($) to denote character variables in the array
  2. length: A length value to declare a common length for elements in the array
  3. initial-value(s): An initial value to assign to element(s) in the array
 
In the next section, we will walkthrough a simple array example to help you better understand the structure of SAS arrays.

One-Dimensional Arrays

  

The simplest form of SAS arrays are a one-dimensional arrays. In one-dimension arrays, a grouping of SAS variables is grouped under a single array. Once variables are grouped under a single array, you can easily perform the same calculation on all the variables with just a few lines of code.

Let’s look at an example where we perform the same task both with and without SAS arrays to compare and contrast the two methods.

Example 1A – Performing a Repetitive Calculation on Multiple Variables, Without an Array

 

In many cases, you often need to perform the same calculation on multiple similar variables. This type of task is well suited for arrays because it can greatly reduce the amount of code you need to write.

In the SASHELP dataset APPLIANC, the number of units sold for 24 appliances by cycle are stored in 24 variables, UNITS_1 to UNITS_24. Let’s say for example due to a computer glitch you need to add 3 units sold to the first 10 appliances (i.e. the UNITS_1 to UNITS_10 variables).

To demonstrate how arrays can simplify your code, let’s first look at how this calculation can be done without using arrays. Adding 3 units to each UNIT_# variable is a simple arithmetic operation, but as you know it can become quite long and repetitive as there are 10 unit variables in the APPLIANC dataset which each need to have their values modified.

Below is the basic Data Step code to complete this task. Two PROC PRINT statements are also added to allow for an easy comparison of the first 10 observations of the original and modified datasets:

data applianc;
 set sashelp.applianc;
 
 units_1 = units_1 + 3;
 units_2 = units_2 + 3;
 units_3 = units_3 + 3;
 units_4 = units_4 + 3; 
 units_5 = units_5 + 3;
 units_6 = units_6 + 3;
 units_7 = units_7 + 3;
 units_8 = units_8 + 3;
 units_9 = units_9 + 3;
 units_10 = units_10 + 3;
run;
 
proc print data = sashelp.applianc (obs=10);
 var units_1-units_10;
 title "First 10 records of unmodified SASHELP.APPLIANC dataset";
run;
 
proc print data = applianc (obs=10);
 var units_1-units_10;
 title "First 10 records of modified APPLIANC dataset";
run;
As you can see the resulting PROC PRINT outputs shown below, we have successfully added 3 units to each of the UNITS_1 to UNITS_10 variables:

Example 1B – Performing a Repetitive Calculation on Multiple Variables, With an Array

 

To simplify this task with SAS array programming, we need to define a single array which will group all the UNITS_# variables together that we wish to modify. This array will be defined as follows:

  1. Array name: units_sold
  2. Number of elements: (*) – the asterisks can be used in place of an explicit number which tells SAS to count the number of array elements for you
  3. Array elements: units_1-units_10

After defining the array, a DO LOOP needs to be set up to loop through each of the 10 elements and then increase the number of units sold by 3 for each appliance.

The complete syntax is as follows:

data applianc_array;
 set sashelp.applianc;
 
 array units_sold{*} units_1-units_10;
 
 do i = 1 to 10;
  units_sold{i} = units_sold{i}+3;
 end;
run;
 
proc print data = sashelp.applianc (obs=10);
 var units_1-units_10;
 title "First 10 records of unmodified SASHELP.APPLIANC dataset";
run;
 
proc print data = applianc (obs=10);
 var units_1-units_10;
 title "First 10 records of modified APPLIANC_ARRAY dataset";
run;
When compared to the original SASHELP.APPLIANC dataset, you can now see that each of the values for the UNIT_# variables has been incremented by 3. The output of the PROC PRINT statements comparing the first 10 records of the original and modified dataset is shown below:

Example 2 – Creating New Variables with an Array

 

The PRICEDATA dataset in the SASHELP library contains simulated data of the prices of 17 different products. The prices of the 17 different products are stored in the variables PRICE1-PRICE17 in USD. To convert these prices to Canadian Dollars, we would need to multiple by approximately 1.26 (based on the current exchange rate).

Without using arrays, let’s first look at how we would do this with traditional Data Step code. In the following code, a new PRICE_CAD# variable is created by multiplying the original PRICE# variable by 1.26 to convert the values from USD (United States Dollars) to Canadian Dollars (CAD). Finally, a PROC PRINT is used to print out the first 3 PRICE variables and their newly created Canadian Dollar equivalents so we can see the differences in the calculated values:

data pricedata_cad;
 set sashelp.pricedata;
 
 price_cad1 = price1*1.26;
 price_cad2 = price2*1.26;
 price_cad3 = price3*1.26;
 price_cad4 = price4*1.26;
 price_cad5 = price5*1.26;
 price_cad6 = price6*1.26;
 price_cad7 = price7*1.26;
 price_cad8 = price8*1.26;
 price_cad9 = price9*1.26;
 price_cad10 = price10*1.26;
 price_cad11 = price11*1.26;
 price_cad12 = price12*1.26;
 price_cad13 = price13*1.26;
 price_cad14 = price14*1.26;
 price_cad15 = price15*1.26;
 price_cad16 = price16*1.26;
 price_cad17 = price17*1.26;
 
run;
 
proc print data=pricedata_cad;
 var price1-price3 price_cad1-price_cad3;
run;
As you can see the partial output shown below, the creation of the new variables in CAD was successful:
Since we are working with 17 variables, this task requires lots of repetitive and tedious code since we are essentially repeating the exact same calculation 17 times on similar variables. This type of programing is also error prone since it requires lots of coding. A task such as this is a great candidate for a SAS array as it will greatly reduce the lines of code required.
 
In the array version of this data step we need to define two arrays since we would like to have a variable for both the Canadian dollar price and the US dollar price.
 
In this example we are creating 2 basic arrays to group both the original PRICE# variables and the newly created PRICE_CAD# variables. The first array is defined as follows:
  1. Name: PRICE_CAD
  2. Number of elements: 17
  3. List of variables: PRICE_CAD1-PRICE_CAD17
 
Similarly, the second array is defined as follows:
  1. Name: PRICE_USD
  2. Number of elements: 17
  3. List of variables: PRICE-PRICE17
 
After using the above information to construct our arrays, we define a simple DO LOOP to iterate through each of the 17 variables and perform the USD to CAD conversion. The converted values are stored in the PRICE_CAD# variables, as defined by our price_cad{} array and the original USD values are retrieved from the PRICE# variables as defined by our price_usd{} array.
 
As before, a PROC PRINT is also used to display the first 3 PRICE variables:
data pricedata_cad_array;
 set sashelp.pricedata;
 
 array price_cad{17} price_cad1-price_cad17;
 array price_usd{17} price1-price17;
 
 do i = 1 to 17;
  price_cad{i} = price_usd{i}*1.26;
 end;
run;
 
proc print data=pricedata_cad_array;
 var price1-price3 price_cad1-price_cad3;
run;
Before examining the output, the first two iterations of the loop are broken down below to help you gain a better understanding of what’s happening at each iteration.
As you can see in the partial output shown below, the result is the same as the previous example without using arrays, but now with less SAS code.

Do you have a hard time learning SAS?

Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!

Example 3 – Manipulating Character Variables

 
In addition to performing calculations on numeric variables, you can also use one-dimensional arrays to manipulate character variables. In the SASHELP.CARS dataset, say for example that you would like to convert the values from the MAKE, MODEL, TYPE, ORIGIN and DRIVETRAIN variables to all uppercase.
 
Using an array and a simple DO LOOP, this can easily be accomplished with just a few lines of code. In our array statement we will define the following elements:
 
  1. Array name: character_vars
  2. Number of elements: (*) – Recall the asterisks tells SAS to count the number of elements in the array for you
  3. Array elements: MAKE MODEL TYPE ORIGIN DRIVETRAIN
 
After defining our array using the above parameters, we can simply add a DO LOOP to loop through the 5 elements and apply the UPCASE function to each variable as shown in the following syntax:
data cars_char_array;
 set sashelp.cars;
 
 array character_vars{*} make model type origin drivetrain;
 
 do i = 1 to 5;
  character_vars{i} = upcase(character_vars{i});
 end;
run;
As you can see in the Output Data shown partially below, the MAKE, MODEL, TYPE, ORIGIN and DRIVETRAIN variables have been successfully converted to upper case values:
In this example, since we are interested in converting all the character variables in the SASHELP.CARS dataset to uppercase, we can further simplify the array by using the reserved variable _character_, which tells SAS to list all character variables in the dataset. By using _character_ in place of the variables listed for the array elements, we can shorten our array statement as shown below:
data cars_char_array;
 set sashelp.cars;
 
 array character_vars{*} _character_;
 
 do i = 1 to 5;
  character_vars{i} = upcase(character_vars{i});
 end;
run;
As you can see in the Output Data shown partially below, the result is the same as when the array elements were listed individually:

Defining Array Bounds

 

By default, SAS automatically starts the bounds of an ARRAY at 1. So, if you define an array as follows:

array Prices{*} price3 price4 price5 price6

then price{1} = price3, price{2} = price4 and so on. This may make it confusing to read and understand your DO LOOP since the the element number 1 does not match the 3 suffix on the variable PRICE3. To make your array easier to work with, you can define custom bounds to make your array start at a number different then 1.

Example 1 – Using Customized Array Bounds

For example, in the SASHELP.PRICEDATA data set you would like to discount the price on products 3 through 7 by 25%. Since you are only working with the variables PRICE3-PRICE7, you’d also like to index these values from 3 through 7. You would also like to create a new set of variables, SALE_PRICE# which contains the discounted price.

Similar to previous examples, we will define two arrays. The first array will contain the group of new variables and the second array will contain the original price variables to be used in the calculation. To define the custom bounds for the array, the start and end points of the bounds are placed inside the curly brackets ({}) and separated by a colon, as shown below. Finally, a PROC PRINT is used to compare the original and newly created price variables:

data pricedata_sale;
 set sashelp.pricedata;
 
 array sale_price{3:7} sale_price3-sale_price7;
 array prices{3:7} price3-price7;
 
 do i = 3 to 7;
  sale_price{i} = prices{i}*0.75;
 end;
run;
 
proc print data = pricedata_sale(obs = 10);
 var price3-price7 sale_price3-sale_price7;
run;
As you can see in the PROC PRINT output below, we now have a series of both PRICE and SALE_PRICE variables created by the array:

Become a Certified SAS Specialist

Get access to two SAS base certification prep courses and 150+ practice exercises

DO OVER and Implicit Arrays

There are two main types of arrays, implicit and explicit. Explicit arrays are the most common type of arrays and are the only type of arrays that have been discussed thus far in the article.
 
However, implicit arrays are an alternative type of array available in SAS and can also be useful. With explicit arrays, the index specification (either an asterisks or number of elements in the array) must be explicitly defined in the array statement after the array name in curly parentheses ({}). However in an implicit array, an index specification indicating the number of array elements is not required.
 
The advantage of not having to count the number of elements in the array is that you can use the DO OVER statement in place of a traditional DO LOOP. The DO OVER statement will automatically iterate across all elements in the array, without you having to specify the number of elements. Let’s look at an example on how to utilize DO OVER with an implicit array.

Example 1 – Using DO OVER to Perform a Calculation on all Character Variables

In an earlier example, we demonstrated how to use the _character_ reserved variable in SAS to easily group all character variables in a single array without having to type them individually. With an implicit array, we can simplify this code even further. To define an implicit array, we simply omit the array dimension (number of elements) after stating the array name. After defining the array name and array elements, we can use the DO OVER LOOP to loop through all values of the array without listing any of them.
 
In the syntax below, we are building upon the previous example whereby all character variables in the SASHELP.CARS dataset were grouped into a single array and then their values were converted to upper case. The two differences here are that we are (1) now using an implicit array (i.e the number of elements is not defined in the array) and (2) a DO OVER loop is used in place of a traditional DO LOOP:
data cars_implicit_array;
 set sashelp.cars;
 
 array impl_character_vars make model type origin drivetrain;
 
 do over impl_character_vars;
  impl_character_vars = upcase(impl_character_vars);
 end;
run;
As you can see in the Output Data set shown partially below, the result is the same as the earlier example and all character variables have had their values successfully converted to uppercase:

Multi-Dimensional Arrays

A Multi-Dimensional array is a more complex version of an array that can be used to combine multiple groups of elements into a single array. A multi-dimensional array allows you to create nested DO LOOPS (a DO LOOP within a DO LOOP) for more complex data manipulation tasks.
 
While the utility of multidimensional arrays becomes more apparent with the most complex data manipulation tasks, we will just demonstrate here with a simplified example how to iterate over a nested DO LOOP with a two dimensional array.
 
In the dataset ANNUAL_SALES generated below, quarterly cost and revenue amounts for a business are each stored in 4 separate variables. That is, costs are in the variables COSTS_Q1 to COSTS_Q4 and Revenue is found in the REVENUE_Q1 to REVENUE_Q4 variables. Each row represents 1 year of sales data.
 
Using the following syntax, generate the ANNUAL_SALES data which will be stored in the WORK directory:
 
data annual_sales;
 input year costs_Q1 costs_Q2 costs_Q3 costs_Q4 revenue_Q1 revenue_Q2 revenue_Q3 revenue_Q4;
 datalines;
 2000 10131 11234 15153 11344 12316 11564 16591 11564
 2001 10999 11335 12546 16001 10654 15861 13461 15618
 2002 15611 15650 17051 18796 16846 15616 18764 19055
 ;
run;
After running the code above, you should see the WORK.ANNUAL_SALES data, shown partially below:
Next, if for example you would like to perform the same calculation on both the 4 quarterly Cost and Revenue variables, you can take advantage of a multi-dimensional array. The multi-dimensional array allows you to group both the cost and revenue variables in a single array, and then iterate through them separately with a nested DO LOOP.
 
In this example, the goal is to apply a 3% increase to both the cost and revenue values, which will be done by multiplying each value by 1.03. However before we can perform the calculation, we must define our 2 dimensional array.
 
To define a 2 dimensional array, we start with the array name (sales) followed the number of variable groups in the array (2), a comma, and then the number of elements in each group (4). Here, we have 2 groups of variables (COSTS_Q1-Q4 and REVENUE_Q1-Q4) and 4 elements (Q1, Q2, Q3, and Q4) in each. So we define our array statement as:
 
array sales{2,4} costs_q1-costs_q4 revenue_q1-revenue_q4;
 
Now that the multi-dimensional array is established, we iterate through both the number of groups (using “i" from 1 to 2) and the number of elements in each group (using “j” from 1 to 4).
 
The complete syntax is as follows:
 
data annual_sales_array;
 set annual_sales;
 
 array sales{2,4} costs_q1-costs_q4 revenue_q1-revenue_q4;
 
 do i = 1 to 2;
 
  do j = 1 to 4;
   sales{i,j} = sales{i,j}*1.03;
  end;
 
 end;
 
 drop i j; /* drop temporary indexing variables used for do loop */
 
run;
After running the code above and creating the ANNUAL_SALES_ARRAY data set in the WORK directory, you can see in the output data set (shown partially below) that each cost and revenue value has been increased by 3%:

Master SAS in 30 Days

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
iconmail

Get latest articles from SASCrunch

SAS Base Certification Exam Prep Course

Two Certificate Prep Courses and 300+ Practice Exercises