Reading data using formatted inputs

Reading data using formatted inputs

Till now we have been reading data from external files; we used DATALINES to place data directly into the SAS program and also understood how to read data from fixed columns.

In today's article, I discuss the most important and powerful feature of SAS Programming ie INFORMATS.  Any time you have nonstandard data like numerical values, such as numbers with dollar signs and commas, and dates in a variety of formats; you should consider using formatted input to read the file. 

Let's read the bank.txt using informats.

data financial;
	infile '/home/u41108928/Ron Cody Data/bank.txt';
	input @1 Subj $3. 
          @4 DOB mmddyy10. 
          @14 Gender $1. 
          @15 Balance 7.;
run;


The @ (at) signs in the INPUT statement are called column pointers. For example, @4 says to SAS, go to column 4. Following the variable names are SAS informats.

Informats are built-in instructions that tell SAS how to read a data value. The choice of which informat to use is dictated by the data. Two of the most basic informats are w.d and $w.

The w.d format: reads standard numeric values. The w tells SAS how many columns to read. The optional d tells SAS that there is a decimal point in the value, the decimal point counts as one of the columns to be read. For example, if the variable Balance containing 3 digits values (such as 145 ) has an informat of 3. ; Another example, value 2.1 takes up 3 column spaces, 2 digits + 1 decimal point, so it has an informat of 3.1

The $w. informat: tells SAS to read w columns of character data. In the example, the variable Subj is read as character data and takes up three columns, and has informat of $3.; values of Gender take up a single column, and has informat of $1.

Now it’s time to read the date. The MMDDYY10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date.

SAS stores dates as numeric values equal to the number of days from January 1, 1960.

So, if you read the value 01/01/1960 with the MMDDYY10. informat, SAS stores a value of 0.

The date 01/02/1960 read with the same informat would result in a value of 1, and so forth. SAS knows all about leap years and correctly converts any date from 1582 to way into the future (1582 is the year Pope Gregory started the Gregorian calendar—dates before this are not defined in SAS). Similarly, the date 12/31/1959 read with the same informat would result in the value of -1.

So, getting back to our example, since date values are in the mm/dd/yyyy form and start in column 4, you use @4 to move the column pointer to column 4 and the MMDDYY10. informat to tell SAS to read the next 10 columns as a date in this form. SAS then computes the number of days from January 1, 1960, corresponding to each of the date values. Let’s see what happens when we use PROC PRINT to see the contents of this data set. 

title "Reading data using informats";

proc print data=financial;
run; 
No alt text provided for this image

Well, the dates (variable DOB) look rather strange. What you are seeing are the actual values SAS is storing for each DOB. You need a way to display these dates in a more traditional form, such as the way the dates were displayed in the raw data file (10/21/1955, in the first observation) or in some other form (such as 10Oct1955). While you are at it, why not add dollar signs and commas to the Balance figures? 

You can accomplish both of these tasks by associating a FORMAT with each of these two variables. There are many built-in formats in SAS that allow you to display dates and financial values in easily readable ways. You associate these formats with the appropriate variables in a FORMAT statement.

title "Reading data using informats and displaying using formats";


proc print data=financial;
	format DOB mmddyy10. 
           Balance dollar11.2;
run;

Here we are using the MMDDYY10. format to print the DOB values and the dollar11.2 format to print the Balance values. Notice the period in each of the formats. All SAS formats need to end either in a period or in a period followed by a number. The 11.2 following the dollar format says to allow up to 11 columns to print the Balance values (including the dollar sign, the decimal point, and possibly a comma or a minus sign). The 2 following the period says to include two decimal places after the decimal point. Here is the updated output: 

No alt text provided for this image

The most common format used with date-based data is DATE9.; the DATE9. format, prints dates as a two-digit day of the month, a three-character month abbreviation, and a four-digit year. This format helps avoid confusion between the month-day-year and day-month-year formats used in the United States and Europe, respectively. 

NOTE: The formats only affect the way these values appear in printed output—the internal values are not changed. 

 Placing a FORMAT statement in a procedure associates the formats and variables only for that procedure. It is usually more useful to place your FORMAT statement in the DATA step. When you do this, there is a permanent association of the formats and variables in the data set. You can override any permanent format by placing a FORMAT statement in a particular procedure where you would like a different format.

Let's put this knowledge into practice by solving this challenging problem.

Challenge Problem 1:

 You are given a text file called stockprices.txt containing information on the purchase and sale of stocks. The data layout is as follows: 

No alt text provided for this image

Create a SAS data set (call it Stocks) by reading the data from this file. Use formatted input. Compute several new variables as follows: 

No alt text provided for this image

Print out the contents of this data set using PROC PRINT. 

Do try this out. I will be uploading the solution soon.

To view or add a comment, sign in

More articles by Sunayana Pati

  • Working with Dates: Part 2

    In continuation from the previous article on working with dates, today's article will cover various date functions in…

  • Working with Dates: Part 1

    Most data sets contain date information, such as date of birth or transaction date. SAS can read dates in almost any…

  • Performing Iterative Processing: Part 2

    In continuation from the previous article on iterative processing, today's article will cover DO UNTIL statements and…

  • Performing Iterative Processing

    Many programming tasks require that blocks of code be run more than once. SAS provides several ways to accomplish this.

    2 Comments
  • Performing Conditional Processing

    This article describes the tools that allow programs to “make decisions” based on data values. For example, you may…

  • Reding data from Excel files

    It is quite common to be given a Microsoft Office Excel spreadsheet as your data source. Luckily, SAS has several…

  • Storing user-defined formats permanently

    If you have user-defined formats with permanent SAS data sets, it is important to make your formats permanent also…

  • Formats and Labels in SAS

    LABELS If you are using SAS to produce listings and reports for others, you will want to make the output more readable…

  • SAS Libraries

    When you write a DATA statement such as data test; SAS creates a temporary SAS data set called test. When you close…

  • Reading data from Fixed-columns

    You know, how to read data from external files. Today I discuss how to read data from fixed columns.

Others also viewed

Explore content categories