The PARAT tool allows
the user to define a generalization hierarchy for numbers. A number can be an
integer or a real (floating point) number. The way PARAT deals with numbers is
the same irrespective of type.
The generalization of a number means putting it in an interval. For this
article we will use maternal age as a running example.
A maternal age generalization can be:
20-, 21-24, 25-30, 31-34, 35-39, 40+
This example illustrates bottom-coding ("20-"), top-coding
("40+"), and equal sized intervals of 5 years. Such a generalization
can be specified at each level of a generalization hierarchy.
Each level of the generalization hierarchy can be specified using three elements:
-
The starting point of the interval
-
The length of the interval
-
Top code limit (optional)
Just as a side note, we tried different ways to express generalizations of numbers and we found this approach to be the least complex, most usable and quite expressive.
The starting point specifies where the generalization will start. If we enter
the value "20" then all ages lower than 20 will be left as is, for
example:
13, 14, 15, 16, 17, 18, 19, 20-24, ....
If the starting point is specified with a minus sign after it, then this tells
the system to define a bottom code. For example, if the starting point is
"20-" then all ages below 20, inclusive, will be treated as a single
category:
20-, 21-25, ....
It is not allowed to enter a real number as a starting value. The system will
give you an error message if you enter, say, "19.5-" as the starting
value. If the actual age is a real number, then all values in the data set will be rounded up
to the next integer. For example, if we specify "20-" as the starting
value, then a 19.5 age will be included in the bottom code, and an age of 20.5
will be in the 21-25 age range.
The next part of the specification is the range. All intervals will have the
same range. At this point it is not possible to specify varying range sizes.
For example, specifying a range of 5 means that all ages will be in 5 year
categories. The round-up rule will be used here as well for real numbers.
The last part of the specification is the top-code. While the first two parts
of the specification are mandatory, the last part is optional. If a user does not provide the top-code then the system will create 5-year age intervals up to the highest value in the data.
If a user types "40" as the top code, then all ages above 40 will kept as is. For example, we would have:
... , 36-40, 41, 42, 43, 44, 46, ....
If a user types "40+" as the top code then all ages of 40 and above,
inclusive, will be entered into the same category. For example, we would have:
..., 30-34 , 35-39, 40+
The round up rule for real
numbers is used here again for deciding whether a particular value falls in the
top-code or the interval before. For example, an age of 39.5 would go in the
40+ category.
The
author(s) retain all copyright to this knowledgebase article. Please
include a citation to the web page if you reuse this material. More information is available at our lab web site: http://www.ehealthinformation.ca/.