This article will address different types of arrays and walk through different a variety of examples where arrays can be used on a variety of data manipulation tasks.
In particular this article will cover:
- Overview of Arrays
- One-Dimensional Arrays
- Using arrays to perform a repetitive calculation
- Creating new variables with arrays
- Manipulating character variables with arrays- Defining array bounds - Implicit Arrays and DO OVER
- Multi-dimensional arrays
A variety of data sets from the SASHELP library are used throughout this article. The datasets used include:
- SASHELP.APPLIANC – Sales Time Series for 24 Appliances by Cycle
- SASHELP.PRICEDATA – Simulated monthly sales data
- SASHELP.CARS – Data about 2004 cars
Array Overview
First, let’s walkthrough the different components of a SAS array. The most commonly used array type is the explicit SAS Array, which can be broken down into 6 main components:
array array-name {X} $ length array-elements initial-values
Each array statement must at minimum contain these 3 elements:
- Array-name: The name of the array
- X: the number of elements in the array
- Array-elements: the list of variables to be grouped within the array
Optionally, the array statement can also include:
- $: A dollar sign ($) to denote character variables in the array
- length: A length value to declare a common length for elements in the array
- initial-value(s): An initial value to assign to element(s) in the array
In the next section, we will walkthrough a simple array example to help you better understand the structure of SAS arrays.
One-Dimensional Arrays
The simplest form of SAS arrays are a one-dimensional arrays. In one-dimension arrays, a grouping of SAS variables is grouped under a single array. Once variables are grouped under a single array, you can easily perform the same calculation on all the variables with just a few lines of code.
Let’s look at an example where we perform the same task both with and without SAS arrays to compare and contrast the two methods.
Example 1A – Performing a Repetitive Calculation on Multiple Variables, Without an Array
In many cases, you often need to perform the same calculation on multiple similar variables. This type of task is well suited for arrays because it can greatly reduce the amount of code you need to write.
In the SASHELP dataset APPLIANC, the number of units sold for 24 appliances by cycle are stored in 24 variables, UNITS_1 to UNITS_24. Let’s say for example due to a computer glitch you need to add 3 units sold to the first 10 appliances (i.e. the UNITS_1 to UNITS_10 variables).
To demonstrate how arrays can simplify your code, let’s first look at how this calculation can be done without using arrays. Adding 3 units to each UNIT_# variable is a simple arithmetic operation, but as you know it can become quite long and repetitive as there are 10 unit variables in the APPLIANC dataset which each need to have their values modified.
Below is the basic Data Step code to complete this task. Two PROC PRINT statements are also added to allow for an easy comparison of the first 10 observations of the original and modified datasets:
set sashelp.applianc;
units_1 = units_1 + 3;
units_2 = units_2 + 3;
units_3 = units_3 + 3;
units_4 = units_4 + 3;
units_5 = units_5 + 3;
units_6 = units_6 + 3;
units_7 = units_7 + 3;
units_8 = units_8 + 3;
units_9 = units_9 + 3;
units_10 = units_10 + 3;
run;
proc print data = sashelp.applianc (obs=10);
var units_1-units_10;
title "First 10 records of unmodified SASHELP.APPLIANC dataset";
run;
proc print data = applianc (obs=10);
var units_1-units_10;
title "First 10 records of modified APPLIANC dataset";
run;

Example 1B – Performing a Repetitive Calculation on Multiple Variables, With an Array
To simplify this task with SAS array programming, we need to define a single array which will group all the UNITS_# variables together that we wish to modify. This array will be defined as follows:
- Array name: units_sold
- Number of elements: (*) – the asterisks can be used in place of an explicit number which tells SAS to count the number of array elements for you
- Array elements: units_1-units_10
After defining the array, a DO LOOP needs to be set up to loop through each of the 10 elements and then increase the number of units sold by 3 for each appliance.
The complete syntax is as follows:
set sashelp.applianc;
array units_sold{*} units_1-units_10;
do i = 1 to 10;
units_sold{i} = units_sold{i}+3;
end;
run;
proc print data = sashelp.applianc (obs=10);
var units_1-units_10;
title "First 10 records of unmodified SASHELP.APPLIANC dataset";
run;
proc print data = applianc (obs=10);
var units_1-units_10;
title "First 10 records of modified APPLIANC_ARRAY dataset";
run;

Example 2 – Creating New Variables with an Array
The PRICEDATA dataset in the SASHELP library contains simulated data of the prices of 17 different products. The prices of the 17 different products are stored in the variables PRICE1-PRICE17 in USD. To convert these prices to Canadian Dollars, we would need to multiple by approximately 1.26 (based on the current exchange rate).
Without using arrays, let’s first look at how we would do this with traditional Data Step code. In the following code, a new PRICE_CAD# variable is created by multiplying the original PRICE# variable by 1.26 to convert the values from USD (United States Dollars) to Canadian Dollars (CAD). Finally, a PROC PRINT is used to print out the first 3 PRICE variables and their newly created Canadian Dollar equivalents so we can see the differences in the calculated values:
set sashelp.pricedata;
price_cad1 = price1*1.26;
price_cad2 = price2*1.26;
price_cad3 = price3*1.26;
price_cad4 = price4*1.26;
price_cad5 = price5*1.26;
price_cad6 = price6*1.26;
price_cad7 = price7*1.26;
price_cad8 = price8*1.26;
price_cad9 = price9*1.26;
price_cad10 = price10*1.26;
price_cad11 = price11*1.26;
price_cad12 = price12*1.26;
price_cad13 = price13*1.26;
price_cad14 = price14*1.26;
price_cad15 = price15*1.26;
price_cad16 = price16*1.26;
price_cad17 = price17*1.26;
run;
proc print data=pricedata_cad;
var price1-price3 price_cad1-price_cad3;
run;

In the array version of this data step we need to define two arrays since we would like to have a variable for both the Canadian dollar price and the US dollar price.
In this example we are creating 2 basic arrays to group both the original PRICE# variables and the newly created PRICE_CAD# variables. The first array is defined as follows:
- Name: PRICE_CAD
- Number of elements: 17
- List of variables: PRICE_CAD1-PRICE_CAD17
Similarly, the second array is defined as follows:
- Name: PRICE_USD
- Number of elements: 17
- List of variables: PRICE-PRICE17
After using the above information to construct our arrays, we define a simple DO LOOP to iterate through each of the 17 variables and perform the USD to CAD conversion. The converted values are stored in the PRICE_CAD# variables, as defined by our price_cad{} array and the original USD values are retrieved from the PRICE# variables as defined by our price_usd{} array.
As before, a PROC PRINT is also used to display the first 3 PRICE variables:
set sashelp.pricedata;
array price_cad{17} price_cad1-price_cad17;
array price_usd{17} price1-price17;
do i = 1 to 17;
price_cad{i} = price_usd{i}*1.26;
end;
run;
proc print data=pricedata_cad_array;
var price1-price3 price_cad1-price_cad3;
run;



Do you have a hard time learning SAS?
Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!
Example 3 – Manipulating Character Variables
Using an array and a simple DO LOOP, this can easily be accomplished with just a few lines of code. In our array statement we will define the following elements:
- Array name: character_vars
- Number of elements: (*) – Recall the asterisks tells SAS to count the number of elements in the array for you
- Array elements: MAKE MODEL TYPE ORIGIN DRIVETRAIN
After defining our array using the above parameters, we can simply add a DO LOOP to loop through the 5 elements and apply the UPCASE function to each variable as shown in the following syntax:
set sashelp.cars;
array character_vars{*} make model type origin drivetrain;
do i = 1 to 5;
character_vars{i} = upcase(character_vars{i});
end;
run;

set sashelp.cars;
array character_vars{*} _character_;
do i = 1 to 5;
character_vars{i} = upcase(character_vars{i});
end;
run;

Defining Array Bounds
By default, SAS automatically starts the bounds of an ARRAY at 1. So, if you define an array as follows:
array Prices{*} price3 price4 price5 price6
then price{1} = price3, price{2} = price4 and so on. This may make it confusing to read and understand your DO LOOP since the the element number 1 does not match the 3 suffix on the variable PRICE3. To make your array easier to work with, you can define custom bounds to make your array start at a number different then 1.
Example 1 – Using Customized Array Bounds
Similar to previous examples, we will define two arrays. The first array will contain the group of new variables and the second array will contain the original price variables to be used in the calculation. To define the custom bounds for the array, the start and end points of the bounds are placed inside the curly brackets ({}) and separated by a colon, as shown below. Finally, a PROC PRINT is used to compare the original and newly created price variables:
set sashelp.pricedata;
array sale_price{3:7} sale_price3-sale_price7;
array prices{3:7} price3-price7;
do i = 3 to 7;
sale_price{i} = prices{i}*0.75;
end;
run;
proc print data = pricedata_sale(obs = 10);
var price3-price7 sale_price3-sale_price7;
run;

Become a Certified SAS Specialist
Get access to two SAS base certification prep courses and 150+ practice exercises
DO OVER and Implicit Arrays
However, implicit arrays are an alternative type of array available in SAS and can also be useful. With explicit arrays, the index specification (either an asterisks or number of elements in the array) must be explicitly defined in the array statement after the array name in curly parentheses ({}). However in an implicit array, an index specification indicating the number of array elements is not required.
The advantage of not having to count the number of elements in the array is that you can use the DO OVER statement in place of a traditional DO LOOP. The DO OVER statement will automatically iterate across all elements in the array, without you having to specify the number of elements. Let’s look at an example on how to utilize DO OVER with an implicit array.
Example 1 – Using DO OVER to Perform a Calculation on all Character Variables
In the syntax below, we are building upon the previous example whereby all character variables in the SASHELP.CARS dataset were grouped into a single array and then their values were converted to upper case. The two differences here are that we are (1) now using an implicit array (i.e the number of elements is not defined in the array) and (2) a DO OVER loop is used in place of a traditional DO LOOP:
set sashelp.cars;
array impl_character_vars make model type origin drivetrain;
do over impl_character_vars;
impl_character_vars = upcase(impl_character_vars);
end;
run;

Multi-Dimensional Arrays
While the utility of multidimensional arrays becomes more apparent with the most complex data manipulation tasks, we will just demonstrate here with a simplified example how to iterate over a nested DO LOOP with a two dimensional array.
In the dataset ANNUAL_SALES generated below, quarterly cost and revenue amounts for a business are each stored in 4 separate variables. That is, costs are in the variables COSTS_Q1 to COSTS_Q4 and Revenue is found in the REVENUE_Q1 to REVENUE_Q4 variables. Each row represents 1 year of sales data.
Using the following syntax, generate the ANNUAL_SALES data which will be stored in the WORK directory:
input year costs_Q1 costs_Q2 costs_Q3 costs_Q4 revenue_Q1 revenue_Q2 revenue_Q3 revenue_Q4;
datalines;
2000 10131 11234 15153 11344 12316 11564 16591 11564
2001 10999 11335 12546 16001 10654 15861 13461 15618
2002 15611 15650 17051 18796 16846 15616 18764 19055
;
run;

In this example, the goal is to apply a 3% increase to both the cost and revenue values, which will be done by multiplying each value by 1.03. However before we can perform the calculation, we must define our 2 dimensional array.
To define a 2 dimensional array, we start with the array name (sales) followed the number of variable groups in the array (2), a comma, and then the number of elements in each group (4). Here, we have 2 groups of variables (COSTS_Q1-Q4 and REVENUE_Q1-Q4) and 4 elements (Q1, Q2, Q3, and Q4) in each. So we define our array statement as:
array sales{2,4} costs_q1-costs_q4 revenue_q1-revenue_q4;
Now that the multi-dimensional array is established, we iterate through both the number of groups (using “i" from 1 to 2) and the number of elements in each group (using “j” from 1 to 4).
The complete syntax is as follows:
set annual_sales;
array sales{2,4} costs_q1-costs_q4 revenue_q1-revenue_q4;
do i = 1 to 2;
do j = 1 to 4;
sales{i,j} = sales{i,j}*1.03;
end;
end;
drop i j; /* drop temporary indexing variables used for do loop */
run;
