In this article, we will show you 7 different ways to analyze your data using the FREQ procedure.
You will learn how to see frequencies of different variables, find the most/least commonly occurring values in your data, check for missing values,…
Let’s get started!



1. Basic Usage
The most basic usage of Proc Freq is to determine the frequency (number of occurrences) for all values found within each variable of your dataset.
Using the CARS dataset as an example, you can determine the frequencies of all variables within your dataset with the following code:
Run;
For example, below is a frequency table for the variable MAKE.


By default, the TABLES statement used with Proc Freq will output a table which lists the values found within the variable(s) specified, the frequency of each value, the percentage of that value relative to all other value as well as the cumulative frequencies and cumulative percentages.
The cumulative frequencies and percentages are rolling totals determined by adding the number from each row to the row above it.

However, using Proc Freq in this manner without any options is usually not recommended, particularly if you have a large dataset which contains variables that have many unique values (levels).
A variable such as Model with a large number of unique (distinct) values will produce a very long output which will be difficult to read and not very useful:

Here, the TABLES statement is used to only output the frequencies and percentages of the Origin variable to determine how many cars originate from which continent:
Tables Origin;
Run;
The resulting table from this code is shown here:

2. Sort output to determine the most/least commonly occurring values
You can use proc freq to determine the most or least commonly occurring values within a variable or multiple variable(s).
Using the order option, you can easily see the most or least commonly occurring values of both Type and Origin variables:
Tables type origin;
Run;

3. Check for Missing Values
Proc freq is an excellent tool to check for missing values in your dataset.
For this example, the SASHELP.HEART dataset is used. The SASHELP.HEART dataset can be accessed in the same way as the CARS dataset described above.
To check for the frequency of missing values in the DeathCause variable from the HEART dataset, you would use the following code:
Tables deathcause;
Run;

Tables deathcause /missing;
Run;

This can be done with the MISSPRINT option:
Tables deathcause /missprint;
Run;
Using the Unknown value as an example, the percentage of records that have an Unknown value for Cause of Death is 5.63% with MISSPRINT, compared to only 2.15% in the previous table with the MISSING option:

Do you have a hard time learning SAS?
Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!
4. Create an Output Data Set
Frequencies and percentages calculated using Proc Freq can also be saved to an output dataset using the OUT option combined with the TABLES statement.
The OUTCUM option can also be added to include the cumulative frequencies in the output dataset if desired:
Tables type /out=cars_freq outcum;
Run;

5. Use the FORMAT statement to categorize and analyze data
When combined with Proc Format and a FORMAT statement, Proc Freq also becomes a powerful tool to categorize and subsequently analyze continuous variables (or variables with a large number of unique values).
Using the MSRP (Manufacturer’s Suggested Retail Price) variable in the Cars dataset as an example, you can see that the standard Proc Freq output shown below does not produce very useful information for a variable such as MSRP:
Tables msrp;
Run

Value msrp_groups
10000-19999 = ‘10,000-19,999’
20000-29999 = ‘20,000-29,999’
30000-39999 = ‘30,000-39,999’
40000-high = ‘40,000+’
;
Run;
Tables msrp;
Format msrp msrp_groups.;
Run;

Become a Certified SAS Specialist
Get access to two SAS base certification prep courses and 150+ practice exercises
6. Cross-tabulation – Create 2×2 or nxn multi-way tables
Proc freq can also be used to produce 2×2 or higher nxn multi-way tables to determine the distribution (or frequency) of records that fall into 2 or more combinations of categories.
For example, if you would like to compare the different car DriveTrain types by the continent of Origin from the Cars dataset, you could use the following code:
Tables origin*drivetrain;
Run;

While this table may seem overwhelming at first, let’s walk through it step-by-step to understand what each component refers to.
As shown in the legend, the first row corresponds to the frequencies. For example, the 34 in the top left box indicates that there are 34 cars from Asia that have an “All” for DriveTrain.
Moving from left to right, the 99 in the top middle box indicates that there are 99 cars from Asia that have a “Front” drivetrain, and so on.

The second row contains the percentages relative to the other 8 combinations. Using the top left box again as an example, the 7.94% indicates that out of the 9 possible combinations of Origin and DriveTrain, 7.94% of records have Origin=Asia and DriveTrain=All.

The third row contains what is known as the row percentages. Starting with the top left box as an example, the 21.52 indicates that of those records with Origin=Asia, 21.52% have a DriveTrain=All. Moving across the row from left to right, you can see that for Origin=Asian cars, 62.66% have DriveTrain=Front, and 15.82% have a DriveTrain=Rear. Notice that these 3 percentages total 100% when summed (added together) across the row.


Depending on the desired results, you can choose to suppress some of these numbers from the output. The NOCOL, NOROW, NOFREQ and NOPERCENT options can be used to suppress the column percentages, row percentages, frequencies and overall percentages from your output. These options can be used independently or in different combinations together.
For example, if you wanted to suppress the row and column percentages, but keep the frequencies and overall percentages, you would use the following code:
Tables origin*drivetrain /nocol norow;
Run;
This produces the following table, which contains only the frequencies and overall percentages:

Two-way or multi-way tables can also be displayed in more of a list format for improved readability. This is especially useful when there are many possible combinations between the two variables. To display a cross tabulation in the long form “list” format, you can simply use the LIST option:
Tables origin*drivetrain /list;
Run;
The results are identical to those produced without the LIST option, the only change is in how the information is displayed:

7. Produce dot and bar plots
Another useful feature of Proc Freq is the ability to create graphical representations of the frequencies and percentages.
Within Proc Freq, you have the ability to create either dot or bar plots, which can be created based on either the frequencies or the overall percentages.
In the following example, the TABLES statement is used to create both a 1-way frequency table for the Origin variable, and a 3×3 frequency table for the DriveTrain variable crossed with Origin.
To produce a dot plot for these variables, the plots=freqplot (type=dot) option is added. In order to produce these graphs, ODS graphics must also be turned ON (and subsequently turned OFF) as shown below:
Proc freq data=sashelp.cars order=freq;
Tables origin drivetrain*origin / plots=freqplot(type=dot);
Run;
Ods graphics off;


Using some of the code discussed earlier on this page to group and report on the MSRPs, the type=bar and scale=percent options are added to produce a bar plot that graphically represents the corresponding percentages with bars.
Value msrp_groups
10000-19999 = ‘10,000-19,999’
20000-29999 = ‘20,000-29,999’
30000-39999 = ‘30,000-39,999’
40000-high = ‘40,000+’
;
Run;
Ods graphics on;
Proc freq data=sashelp.cars order=freq;
Tables msrp / plots=freqplot (type=bar scale=percent);
Format msrp msrp_groups.;
Run;
Ods graphics off;
