In this article, we will show you 7 different ways to analyze your data using the FREQ procedure.
You will learn how to see frequencies of different variables, find the most/least commonly occurring values in your data, check for missing values,…
Let's get started!



1. Basic Usage
The most basic usage of Proc Freq is to determine the frequency (number of occurrences) for all values found within each variable of your dataset.
Run;
For example, below is a frequency table for the variable MAKE.


By default, the TABLES statement used with Proc Freq will output a table which lists the values found within the variable(s) specified, the frequency of each value, the percentage of that value relative to all other value as well as the cumulative frequencies and cumulative percentages.

However, using Proc Freq in this manner without any options is usually not recommended, particularly if you have a large dataset which contains variables that have many unique values (levels).

Here, the TABLES statement is used to only output the frequencies and percentages of the Origin variable to determine how many cars originate from which continent:
Tables Origin;
Run;
The resulting table from this code is shown here:

2. Sort output to determine the most/least commonly occurring values
You can use proc freq to determine the most or least commonly occurring values within a variable or multiple variable(s).
Tables type origin;
Run;

3. Check for Missing Values
Proc freq is an excellent tool to check for missing values in your dataset.
To check for the frequency of missing values in the DeathCause variable from the HEART dataset, you would use the following code:
Tables deathcause;
Run;

Tables deathcause /missing;
Run;

Tables deathcause /missprint;
Run;

Do you have a hard time learning SAS?
Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!
4. Create an Output Data Set
Frequencies and percentages calculated using Proc Freq can also be saved to an output dataset using the OUT option combined with the TABLES statement.
Tables type /out=cars_freq outcum;
Run;

5.
When combined with Proc Format and a FORMAT statement, Proc Freq also becomes a powerful tool to categorize and subsequently analyze continuous variables (or variables with a large number of unique values).
Using the MSRP (Manufacturer’s Suggested Retail Price) variable in the Cars dataset as an example, you can see that the standard Proc Freq output shown below does not produce very useful information for a variable such as MSRP:
Tables msrp;
Run

Value msrp_groups
10000-19999 = '10,000-19,999'
20000-29999 = '20,000-29,999'
30000-39999 = '30,000-39,999'
40000-high = '40,000+'
;
Run;
Tables msrp;
Format msrp msrp_groups.;
Run;

Become a Certified SAS Specialist
Get access to two SAS base certification prep courses and 150+ practice exercises
6. Cross-tabulation – Create 2x2 or nxn multi-way tables
Proc freq can also be used to produce 2x2 or higher nxn multi-way tables to determine the distribution (or frequency) of records that fall into 2 or more combinations of categories.
Tables origin*drivetrain;
Run;

While this table may seem overwhelming at first, let’s walk through it step-by-step to understand what each component refers to.
As shown in the legend, the first row corresponds to the frequencies. For example, the 34 in the top left box indicates that there are 34 cars from Asia that have an “All” for DriveTrain.

The second row contains the percentages relative to the other 8 combinations. Using the top left box again as an example, the 7.94% indicates that out of the 9 possible combinations of Origin and DriveTrain, 7.94% of records have Origin=Asia and DriveTrain=All.

The third row contains what is known as the row percentages. Starting with the top left box as an example, the 21.52 indicates that of those records with Origin=Asia, 21.52% have a DriveTrain=All. Moving across the row from left to right, you can see that for Origin=Asian cars, 62.66% have DriveTrain=Front, and 15.82% have a DriveTrain=Rear. Notice that these 3 percentages total 100% when summed (added together) across the row.


Depending on the desired results, you can choose to suppress some of these numbers from the output. The NOCOL, NOROW, NOFREQ and NOPERCENT options can be used to suppress the column percentages, row percentages, frequencies and overall percentages from your output. These options can be used independently or in different combinations together.
For example, if you wanted to suppress the row and column percentages, but keep the frequencies and overall percentages, you would use the following code:
Tables origin*drivetrain /nocol norow;
Run;
This produces the following table, which contains only the frequencies and overall percentages:

Two-way or multi-way tables can also be displayed in more of a list format for improved readability. This is especially useful when there are many possible combinations between the two variables. To display a cross tabulation in the long form “list” format, you can simply use the LIST option:
Tables origin*drivetrain /list;
Run;
The results are identical to those produced without the LIST option, the only change is in how the information is displayed:

7. Produce dot and bar plots
Another useful feature of Proc Freq is the ability to create graphical representations of the frequencies and percentages.
Within Proc Freq, you have the ability to create either dot or bar plots, which can be created based on either the frequencies or the overall percentages.
In the following example, the TABLES statement is used to create both a 1-way frequency table for the Origin variable, and a 3x3 frequency table for the DriveTrain variable crossed with Origin.
Proc freq data=sashelp.cars order=freq;
Tables origin drivetrain*origin / plots=freqplot(type=dot);
Run;
Ods graphics off;


Using some of the code discussed earlier on this page to group and report on the MSRPs, the type=bar and scale=percent options are added to produce a bar plot that graphically represents the corresponding percentages with bars.
Value msrp_groups
10000-19999 = '10,000-19,999'
20000-29999 = '20,000-29,999'
30000-39999 = '30,000-39,999'
40000-high = '40,000+'
;
Run;
Ods graphics on;
Proc freq data=sashelp.cars order=freq;
Tables msrp / plots=freqplot (type=bar scale=percent);
Format msrp msrp_groups.;
Run;
Ods graphics off;
