Debugging Errors [5-5]

h

Practice Exercise

In this section, we will go through a sample exam question.

Let's look at the SASHELP.HEART data set:

proc contents data=sashelp.heart; 
run;

The SASHELP.HEART data set contains 5,209 observations and 17 variables:

​Below is the question.

This project will work with the following program: 

data work.lowchol work.highchol;
set sashelp.heart;
if cholesterol lt 200 output work.lowchol;
if cholesterol ge 200 output work.highchol;
if cholesterol is missing output work.misschol;
run;

This program is intended to:

  • Divide the observations of sashelp.heart into three data sets, work.highchol, work.lowchol, and work.misschol
  • Only observations with cholesterol below 200 should be in the work.lowchol data set.
  • Only observations with cholesterol that are 200 and above should be in the work.highchol data set.
  • Observations with missing cholesterol values should only be in the work.misschol data set.

Fix the errors in the program above. There may be multiple errors in the program. Errors may be syntax errors, program structure errors, or logic errors. In the case of logic errors, the program may not produce an error in the log.

After fixing all of the errors in the program, answer the following questions:

Question 1:
How many observations are in the work.highchol data set?

Question 2:
How many observations are in the work.lowchol data set? 


We are going to follow the steps below to answer these questions:

  • Step 1: identify and fix all syntax errors
  • Step 2: identify and fix all logic errors
  • Step 3: answer Q1 and Q2 above.

Step 1: Identify and fix all syntax errors

As we have seen in the first two sections of this lesson, syntax errors can be found by checking the SAS log.

Let's run the program and check the SAS log:

The SAS log indicates there is an issue with the IF statement.

When we carefully review the code, we see that the IF statement is missing the THEN keyword.

E.g.
Incorrect
if cholesterol lt 200 output work.lowchol;

Correct
if cholesterol lt 200 then output work.lowchol;

The THEN keyword is required for all three IF-THEN statements.


In addition, we see the following error message:

ERROR 388-185: Expecting an arithmetic operator.

The 'IS MISSING' operator cannot be used in the data step.

We have to change it.

E.g.
Incorrect
if cholesterol is missing output work.misschol;

Correct
if cholesterol =. THEN output work.misschol;

These are the two syntax errors we have discussed so far.

Let's correct it and run the code:

data work.lowchol work.highchol;
set sashelp.heart;
if cholesterol lt 200 then output work.lowchol;
if cholesterol ge 200 then output work.highchol;
if cholesterol =. then output work.misschol;
run;

Unfortunately, we run into another issue:

​ERROR 455-185: Data set was not specified on the DATA statement.

The error message is quite intuitive.

It indicates that the WORK.MISSCHOL is not specified in the DATA statement (in the very beginning).

E.g.
Incorrect
data work.lowchol work.highchol;

Correct
data work.lowchol work.highchol work.misschol;

We will again fix the error and run the program again:

data work.lowchol work.highchol work.misschol;
set sashelp.heart;
if cholesterol lt 200 then output work.lowchol;
if cholesterol ge 200 then output work.highchol;
if cholesterol =. then output work.misschol;
run;

The program runs fine now without any error messages!

The syntax errors are fixed.


Logic Error

Now, we need to check if there is any logic error.

We will see if the initial objectives are achieved in the code.

Let's look at each objective individually:

Objective 1:

  • Divide the observations of sashelp.heart into three data sets, work.highchol, work.lowchol and work.misschol​.

Based on the SAS log, we created the three data sets:

This objective is achieved.

Objective 2:

  • Only observations with cholesterol below 200 should be in the work.lowchol data set.

​This is a little tricky.

The WORK.LOWCHOL data set is created based on the following IF statement:

if cholesterol lt 200 then output work.lowchol;

The LT symbol stands for Less-Than.

It tells SAS to write the observations to WORK.LOWCHOL when the cholesterol is less than 200.

This clearly makes sense, except for the fact that a missing value is also considered to be less than 200.

In SAS, a missing value is the smallest value. 

If the observation is missing a cholesterol value, this would be written to the WORK.LOWCHOL data set.

This violates the fourth objective:

  • Observations with missing cholesterol values should only be in the work.misschol data set.

​This is a logic error that needs to be fixed.



Before we fix the program, let's look at objective #3:

  • Only observations with cholesterol that is 200 and above should be in the work.highchol data set. 

The WORK.HIGHCHOL data set is created based on the following IF statement:

if cholesterol ge 200 then output work.highchol;

The symbol GE stands for Greater-than-or-Equal-to.

This does exactly what we want to do, so we’ve achieved this objective.


Now, we are going to add a new IF statement to ensure the missing cholesterol values are written to the WORK.MISSCHOL data set.

data work.lowchol work.highchol work.misschol;
set sashelp.heart;
if cholesterol ^= . then do;
if cholesterol lt 200 then output work.lowchol;
if cholesterol ge 200 then output work.highchol;
end;
else output work.misschol;
run;

​The code above includes the missing cholesterol value in the MISSCHOL data set.

The logic error is fixed.

Now, we can answer the two questions:

Question 1: 
How many observations are in the work.highchol data set?

Question 2: 
How many observations are in the work.lowchol data set? ​

To answer these questions, we will run the program and look at the SAS log:

There are 3,652 and 1,405 observations in the HIGHCHOL and LOWCHOL data sets, respectively.


Exercise

Download and run the code in the text file below:

Download File


The INPUT44 data set contains seven columns. 

Review and run the following program in SAS:

data out;
set input44;
drop=bp_status weight_status smoking_status;
if cholesterol is not missing then do;
if cholesterol < 200 then chol_status='Safe';
else if cholesterol <= 239 then chol_status='High-Borderline';
else if cholesterol >= 240 then chol_status='High';
run;

proc contents data=out;
run;

proc freq out; 
table chol_status;
run;

This program is intended to:

  • Drop variables: bp_status, weight_status, and smoking_status
  • Create a new column, chol_status, based on the following values of cholesterol:
    - less than 200: "Safe"
    - 200-239: "High - Borderline"
    ​- 240 and higher: "High"
  • Should not calculate chol_status for missing cholesterol values

There are multiple errors in the program. These may be syntax errors, logic errors, or problems with program structure. Logic errors might not produce messages in the log, but will cause the program to produce unintended results.

Correct the errors, run the program, and then use the results to answer the following two questions:

Question 1
How many observations are in the 'High' group?

Question 2
How many observations are in the 'Safe' group?

Need some help?

Get Hint

Get Solution