Browse Ask a Question
Tools Add
Rss Categories

Generalizing numbers in PARAT

Author: Khaled El Emam Views: 231 Created: 18-01-2010 19:00 Last Updated: 04-02-2010 22:11

The PARAT tool allows the user to define a generalization hierarchy for numbers. A number can be an integer or a real (floating point) number. The way PARAT deals with numbers is the same irrespective of type.

The generalization of a number means putting it in an interval. For this article we will use maternal age as a running example.

A maternal age generalization can be:
20-, 21-24, 25-30, 31-34, 35-39, 40+

This example illustrates bottom-coding ("20-"), top-coding ("40+"), and equal sized intervals of 5 years. Such a generalization can be specified at each level of a generalization hierarchy.

Each level of the generalization hierarchy can be specified using three elements:

  • The starting point of the interval
  • The length of the interval
  • Top code limit (optional)


Just as a side note, we tried different ways to express generalizations of numbers and we found this approach to be the least complex, most usable and quite expressive.

The starting point specifies where the generalization will start. If we enter the value "20" then all ages lower than 20 will be left as is, for example:
13, 14, 15, 16, 17, 18, 19, 20-24, ....

If the starting point is specified with a minus sign after it, then this tells the system to define a bottom code. For example, if the starting point is "20-" then all ages below 20, inclusive, will be treated as a single category:
20-, 21-25, ....

It is not allowed to enter a real number as a starting value. The system will give you an error message if you enter, say, "19.5-" as the starting value. If the actual age is a real number, then all values in the data set will be rounded up to the next integer. For example, if we specify "20-" as the starting value, then a 19.5 age will be included in the bottom code, and an age of 20.5 will be in the 21-25 age range.

The next part of the specification is the range. All intervals will have the same range. At this point it is not possible to specify varying range sizes. For example, specifying a range of 5 means that all ages will be in 5 year categories. The round-up rule will be used here as well for real numbers.

The last part of the specification is the top-code. While the first two parts of the specification are mandatory, the last part is optional. If a user does not provide the top-code then the system will create 5-year age intervals up to the highest value in the data.

If a user types "40" as the top code, then all ages above 40 will kept as is. For example, we would have:
... , 36-40, 41, 42, 43, 44, 46, ....

If a user types "40+" as the top code then all ages of 40 and above, inclusive, will be entered into the same category. For example, we would have:
..., 30-34 , 35-39, 40+

The round up rule for real numbers is used here again for deciding whether a particular value falls in the top-code or the interval before. For example, an age of 39.5 would go in the 40+ category.


The author(s) retain all copyright to this knowledgebase article. Please include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.

 

Rss Comments
  • There are no comments for this article.
Info Add Comment
Nickname: Email (will not be shown): Subject: Question:
Info Ask a Question