This article will address all of the above and provide an extensive end-to-end guide on creating, using and managing SAS formats.
In particular, this article will cover:
- Using Built-in SAS Formats
(a) Character Formats
(b) Numeric Formats
(c) Date Formats
- PROC FORMAT
(a) Creating a Simple Numeric Format
(b) Creating a Character Format
(c) Creating a Numeric Format with Ranges
(d) Saving and Retrieving a Permanent Format Catalog
(e) Viewing Formats in a Catalog
(f) Importing and Exporting Format Catalogs
- CARS – Data about 2004 cars
- ORSALES – Fictitious Sports and Outdoors Store Sales data
- PRICEDATA - Simulated monthly sales data
Using Built-in SAS Formats
Both built-in formats and custom formats follow a specific naming convention. For both built-in and custom formats, character formats always start with a dollar sign ($) while numeric formats do not. With both character and numeric built-in formats, the format name ends in either a “w” (width) or a “w.d” for the width and number of digits that will be shown to the right of a decimal point.
Here are a few sample built-in SAS formats and their naming conventions:
- $UPCASEw. – Example: $UPCASE9. is a character format called “UPCASE” with width 9
- DATEw. – Example: DATE9. is a numeric format called DATE with width 9
- DOLLARw.d – Example: DOLLAR10.1 is a numeric format called DOLLAR with width 10 and 1 decimal point to the right of the decimal place
Next, let’s walk through a few examples of using these formats to understand how SAS formats work.
A character format is a format that can be used with a character variable in a SAS data set. As mentioned, one example of a built-in character format is the $UPCASEw. format. The $UPCASE can format can be used to convert all the letters in a variable to upper case.
For example, to convert the names of the car makes in the SASHELP.CARS dataset to upper case, we can use the FORMAT statement with the $UPCASE format as follows:
format make $upcase.;
As you might expect, numeric formats are formats which can be used with numeric variables. As mentioned earlier, a common numeric format is the DOLLARw.d format. The DOLLAR format can be used with numeric variables which contain dollar amounts to apply the dollar sign and adjust the number of decimal places shown.
The SASHELP.ORSALES dataset contains the numeric variable PROFIT as shown partially below:
format profit dollar8.1;
The SASHELP.PRICEDATA contains the variable DATE, which is formatted as MONYY5 by default:
format date mmddyy10.;
Do you have a hard time learning SAS?
Take our Practical SAS Training Course for Absolute Beginners and learn how to write your first SAS program!
While much of what you can accomplish with PROC FORMAT could ultimately be handled with DATA Step programming, PROC FORMAT is a much more efficient solution, particularly with larger datasets as it requires far fewer computational resources.
Creating a Simple Numeric Format
Before we walk through how to create a numeric format with PROC FORMAT, let’s start by illustrating how this can be achieved with traditional SAS Data Step programming.
In the syntax below, we define a series of IF statements to create a new variable, cylinders_text, which contains the desired description for the number of cylinders in words. We can then verify the results by running PROC FREQ on both the original CYLINDERS variable as well as the newly created CYLINDERS_TEXT variable:
length cylinders_txt $6;
if cylinders = 3 then cylinders_text = 'three';
if cylinders = 4 then cylinders_text = 'four';
if cylinders = 5 then cylinders_text = 'five';
if cylinders = 6 then cylinders_text = 'six';
if cylinders = 8 then cylinders_text = 'eight';
if cylinders = 10 then cylinders_text = 'ten';
if cylinders = 12 then cylinders_text = 'twelve';
proc freq data = cars_coded order = freq;
tables cylinders cylinders_text;
The PROC FORMAT call starts with a PROC FORMAT statement. By default, PROC FORMAT will store the custom formats in the WORK library and they will only be available during this SAS session. By using the LIBRARY option, you can specify the desired location for the PROC FORMAT catalog, however in this case we will save the catalog to WORK for simplicity.
Next, the VALUE statement is used to name the format and also define the characteristics of the format. In this example, our format is named CYLINDER_FMT and text values for 3,4,5,6,8,10 and 12 are defined by placing the desired words in quotations after the equal sign as shown in the syntax below:
3 = 'three'
4 = 'four'
5 = 'five'
6 = 'six'
8 = 'eight'
10 = 'ten'
12 = 'twelve'
format cylinders cylinder_fmt.;
- The DATA Step method requires the use of IF statements to iterate through every observation in the data set whereas PROC FORMAT simply alters the metadata of the dataset.
- The DATA Step method also requires you to read in and out a dataset to make the modification, whereas PROC FORMAT does not require writing out any new datasets.
- Both using IF statements to iterate through all observations and reading/writing new datasets are potentially time consuming and resource intensive tasks, especially with large datasets.
- The DATA Step methods requires you to create a new variable, whereas PROC FORMAT does not require you to create any new variables.
- Creating additional variables could significantly increase the size of your data set, particularly if the formatted values are wide or if you have a large number of observations
Creating a Character Format
The PROC FORMAT syntax to create a custom format for a character variable is very similar to the syntax used for creating a custom numeric variable.
As before, we first start with a PROC FORMAT statement and specify that we would like to save the format in WORK with the library option (recall this is actually the PROC FORMAT default).
Next, we begin defining the format with a VALUE statement followed by the desired format name. There are 2 differences here when compared with creating a numeric format. First, the format name must start with a dollar sign ($) for a character format and second the values to be formatted must also be in quotation marks since they are character values.
In the following syntax, a character format $CAR_TYPE is created which can be applied to the SASHELP.CARS dataset variable TYPE. The $CAR_TYPE format expands the values of TYPE so that they are easier to understand:
"Hybrid" = "Hybrid Drivetrain"
"SUV" = "Sports Utility Vehicle"
"Sedan" = "4-door Sedan"
"Truck" = "Pickup Truck"
"Wagon" = "Station Wagon"
format type $car_type.;
Become a Certified SAS Specialist
Get access to two SAS base certification prep courses and 150+ practice exercises
Creating a Numeric format with Ranges
PROC FORMAT is also a useful tool for grouping your data to help with certain analyses, categorization, and data interpretation.
In this example, you would like to better understand the distribution of invoice prices for all the vehicle models in the SASHELP.CARS dataset. For this analysis, you’d like to know how many vehicles fall into the falling price categories:
- $20,000 or less
- $20,001 to $30,000
- $30,001 to $50,000
- $50,001 or more
Using PROC FORMAT, we can create a custom format called INVOICE_GROUPS to apply to the INVOICE variable in the SASHELP.CARS dataset. As before, we will use the default options of PROC FORMAT and create our new format in the WORK library.
Using the VALUE statement, we will define our new format as INVOICE_GROUPS. When defining the ranges, there are a few important points to consider:
Minimum and maximum values for each range are separated by a dash (-)
“low” and “high” can be used as minimum and maximum values when defining ranges to capture the true minimum and maximum values found within a dataset
Ranges within a format cannot overlap
Keeping the above under consideration, here is the PROC FORMAT syntax to create the INVOICE_GROUPS format:
low-20000 = '$20,000 or less'
20001-30000 = '$20,001-$30,001'
30001-50000 = '$30,001-$50,000'
50001-high = '$50,000 or more'
invoice_format = put(invoice,invoice_groups.);
keep make model invoice invoice_format;
Using PROC FREQ with a TABLES statement and a FORMAT statement as shown below, we can easily generate this report:
format invoice invoice_groups.;
Saving and Retrieving a Permanent Format Catalog
So far, all the formats we have created were saved to the WORK directory, which is the default location that PROC FORMAT saves user created formats. If you would like to store your formats for future use without having to re-run the PROC FORMAT code each time, PROC FORMAT also has the ability to save format catalogs into a permanent SAS library.
To demonstrate how to store a permanent format catalog, let’s create a new simple format for classifying Miles Per Gallon in the City (MPG_CITY) in the SASHELP.CARS dataset.
Before you can save to permanent SAS library, you must first use the LIBNAME statement to define a new permanent library on your system. Note that if you are using SAS Studio you may be able to use the exact same LIBNAME statement shown below, but depending on the SAS version you are using and your system configuration, the path “/folders/myfolders” may need to be replaced with a different path that is available to you on your system.
The syntax for creating the actual format is the same as before, but this time we will use the library option to point to another location on your system:
proc format library = mylib;
0-15 = "Poor MPG"
16-30 = "Average MPG"
31-high = "Good MPG"
options fmtsearch = (mylib);
proc freq data = sashelp.cars;
format mpg_city mpg_groups.;
Viewing Formats in a Catalog
Once you have saved formats in a permanent catalog, you can review them and see the values you have defined. Using PROC FORMAT with the FMTLIB option, you can easily print out a list of all the formats found in that library:
|After running the code above, you can see the details of the MPG_GROUPS format in the Results shown below:|
Importing and Exporting Format Catalogs
As you may have realized, manually typing out many custom format values can be quite time consuming. Fortunately PROC FORMAT also has a utility to create formats based on an existing dataset. This allows you to import tables which may already contain your format definitions and use them to create new custom formats.
Before you can import a SAS data set into PROC FORMAT, it must contain at least the following 4 variables:
- FMTNAME – the name of the format you’d like to create
- START – the minimum value of the number/character you’d like to format (if you have a character format or if your format will not include a range then this is simply the value you’d like to format)
- LABEL – The formatted value you’d like to apply to your data points
- TYPE – The type of format you’d like to create (C=Character or N=Numeric are the most common values used here)
- (Optionally) END – the maximum value of the number range you’d like to apply a format to
There are other variables that can be found in this dataset as well for more advanced custom formats, but these are the mandatory variables you must have.
Using the following Data Step code, we will create a data set which will be compatible with PROC FORMAT. The format we are creating will be called ENGINE_GROUPS and can be used to group the ENGINESIZE variable from SASHELP.CARS into 3 categories: Small (1-2), Medium (2.1-3.5) and Large (3.6-8.3).
length fmtname $15;
input fmtname $ start end label $ type $;
engine_groups 1 2 Small N
engine_groups 2.1 3.5 Medium N
engine_groups 3.6 8.3 Large N
format enginesize engine_groups.;
If you went through this entire article you should now have 3 formats in your WORK library: ENGINE_GROUPS, INVOICE_GROUPS and $CAR_TYPE. To save all these formats into a single dataset, ALL_WORK_FORMATS, we can add the CNTRLOUT option to the PROC FORMAT statement as shown below: