![]() |
Ordering of Characters and Strings |
![]() |
|||||||||
Ordering of Characters and StringsStrohmeier Alfred, Genillard Christian, Weber Mats Published in Ada Letters (1990), Vol. 10, No. 7, pp. 70-84 Swiss Federal Institute of Technology Keywords: Ordering, filing, character set, string, Ada. Summary: Arranging characters and strings in order of representation codes is not suitable for application programming. Even when character ordering is logical, the usual filing of words in dictionaries does not correspond with the mathematical lexicographic ordering of strings, but some equivalence relation on characters is involved. As a solution to this problem, we propose a generic package. The paper also explains the theory behind the solution. Some definitions of classical mathematics are recalled in the appendix. 1. Statement of the ProblemArranging characters, and by extension strings, in order of representation codes may be convenient and efficient when the effect is hidden in a program, but it is clearly not suitable for application programming. For instance the ordering: (1.1) A < a < B < b < ... < Z < z may be more appropriate than the well-known ASCII collating sequence: (1.2) A < B < ... < Z < ... < a < b < ... < z Lexicographic ordering of strings based on ASCII codes (1.2) yields: (1.3) A < Ada < Adam < ... < B < BA < Ba < ... < a < ad < adage and when based on logical order (1.1), the result would be: (1.4) A < Ada < Adam < ... < a < ad < adage < ... < B < BA < Ba But a dictionary would file these words as: (1.5) A < a < ad < Ada < adage < Adam < ... < B < BA < Ba < ... This sequence can be produced by a two step sorting: first, the words are sorted in lexicographic order based on (1.1) but ignoring case differences, and, secondly, equivalent strings, i.e. words differing by their case only, are sorted among each other. When diacritical marks are added to letters, or when special characters are part of strings, the situation is similar, although more complicated. For instance in Webster's dictionary [WEBSTER], one may observe the following sequence:
Clearly, the filing is case insensitive, diacritical marks are ignored and the same holds for spaces, hyphens, apostrophes and periods, except when "equivalent" strings are ordered. We noticed the same ordering rules in French and German dictionaries. Some characters, which are ligatures, diphthongs, digraphs or special letters, pose a particular problem. For filing purposes, they must be translated into two characters, e.g. the small diphthong a with e, i.e. "¾", translates into "ae", and the small German sharp s, i.e. "§", translates into "ss". In a German dictionary, you may therefore notice the sequence:
A character such as "1/2" poses a similar problem and may be translated to the string "1 2" for filing. If some characters have to be ignored or others expanded, then a look-ahead algorithm must be used for string ordering, or alternately the strings must be transformed before ordering. We will not take into account this kind of need, since there are too many specifics to be accommodated to, as shown by the following examples. In bibliographic filing, ISO states that a letter with a diaeresis mark is filed as if no graphic addition existed [ISO/TR 8393, 5.7.3.2.4.4], whereas a German standard in the field treats them as diphthongs, i.e. "" is filed as "ae" [RAK, 9.2.2]. Sometimes, depending on the application, compounds are grouped with their first word, instead of being spread over other words. Having ruled out this kind of problem, string ordering may be achieved by character-to-character comparison alone. This comparison will be based on an equivalence relation and a linear order on characters. The equivalence relation provides a means for grouping together all variants of a letter, i.e. upper and lower case, with or without a diacritical mark. 2. Overview of our SolutionOur solution applies to character sets defined as enumeration types. A first candidate is the well-known ASCII character set [ANSI X3.4], implemented in Ada by the predefined type CHARACTER together with the predefined package ASCII [LRM, C(13) and C(15)]. Other candidates are Latin alphabets [ISO 8859, parts 1, 2, 3, 4 and 9], such as the one devised in a companion paper [STROH]. Specifics to mixed alphabets such as Latin/Cyrillic, Latin/Arabic, Latin/Greek or Latin/Hebrew are not dealt with [ISO 8859, parts 5, 6, 7 and 8]. An overview of the architecture of our solution is given by figure 1:
Package TEXT_UTILITIES_G has the generic formal parameters CHARACTER_TYPE and STRING_TYPE. It must be instantiated once for each character set. For example, using the predefined types CHARACTER and STRING as generic actual parameters, the result is package TEXT_UTILITIES. This instance, like its template, contains a local generic package TEXT_ROUTINES_G. The generic formal parameters of this package are tables which model character ordering and typical transformations on character sets. The actual tables are easily expressed using aggregates and enumeration literals. The result of instantiation, e.g. TEXT_ROUTINES in figure 1, is a package providing functions for transforming and ordering characters and strings. The complete source of the specification of the generic package TEXT_UTILITIES_G is given in section 9. In section 10, this package is instantiated with a Latin alphabet. In section 11, we then outline the instantiation of the TEXT_ROUTINES_G package with sensible tables for a Latin alphabet. The following sections provide the mathematical background to our design. They may be skipped by readers interested only in the use of the package. 3. Transformations of Character Sets and StringsThe elements of a character set may be classified as control characters or graphic characters, which are in turn subdivided into letters, digits and special (non-alphanumeric) characters (figure 2). Letters belonging to a Latin alphabet may further be characterized as lower case, upper case or as having a diacritical mark. These classes are not disjoint, e.g. an upper case letter may have a diacritical mark, neither are they always well defined, e.g. superscript one "1" may be seen as a digit or a special character.
There is clearly a need to transform lower case letters to upper case, and vice versa, and to suppress diacritical marks, thereby yielding the base letters. Such a transformation may be modeled by a function g from character set A to itself, i.e. g: A®A. Notice that the function g is defined for all characters, and not only letters. Whenever x is not a letter, or whenever for a letter x a suitable transform does not exist, a sensible choice is to set g(x) = x. Furthermore, typical transformations give rise to functions having the idempotent property, that is the restriction of the function to its image is the identity function: g(g(x)) = g(x) for every x in A. As is always the case for a function whose domain is a finite set, function g may be given by a table. The generic package TEXT_ROUTINES_G therefore has three generic formal object parameters BASE_CONVERSION_TABLE, UC_CONVERSION_TABLE and LC_CONVERSION_TABLE, which are the tables used for implementing functions BASE, UPPER_CASE and LOWER_CASE both for characters and strings. The idempotent property translates to:
where x is any character. One may also express the fact that the upper case of a lower case letter is the same letter in upper case, and vice versa by exchanging their roles:
One may ask if there are similar relationships between function BASE on the one hand and functions UPPER_CASE and LOWER_CASE on the other hand, e.g.:
Unfortunately, these properties do not hold for all character sets and sensible transformations. As a consequence, we will not enforce them. For instance in Latin alphabet No. 1 [ISO 8859, part 1], there is a small letter y with diaeresis mark, i.e. "Ø", but the corresponding capital letter is missing. For sensible transformations, we therefore have BASE ('Ø') = 'y' and UPPER_CASE ('Ø') = 'Ø', but then:
Function NORMALIZE is defined as the composition of functions BASE and UPPER_CASE:
As the other transformations, NORMALIZE should be idempotent:
Moreover, if this transformation is really a sort of normalization abstracting from case and diacritical marks, then it should not be affected when these transformations are applied first:
Notice that 3.8 follows from 3.1, and that 3.7 is implied by 3.1 and 3.9. 4. EQUIVALENCE RELATIONS ON CHARACTER SETS AND STRINGSLet g be a function from a set A to itself, i.e. g: A®A, and let the (binary) relation ~ on A be defined as follows:
Then ~ is an equivalence relation on A (appendix A.4). The other way round, given an equivalence relation ~, a function g : A®A can be built such that (4.1) holds. The idea is to choose in every equivalence class o(x,~) (with respect to ~) some element o(x,~)0, called a representative, and then to map all the elements belonging to a same equivalence class to the representative of the class: if x À o(x,~), then g(x) = o(x,~)0. The previous result could be used to deduce equivalence relations from functions BASE, UPPER_CASE and LOWER_CASE, but their usefulness seemed rather limited and, if needed, they are easily programmed. What we will provide is the equivalence relation based on function NORMALIZE:
This equivalence relation will be heavily used, especially for ordering strings. Let S* be the set of strings over A. If s is an element of S*, i.e. a finite sequence of elements of A, then si will denote the element of s at position i. Let ~ be an equivalence relation on A, and let the relation on S* be defined as follows:
Then is an equivalence relation on S*. This result is used to extend the boolean function EQUIVALENT to string parameters. 5. Linear Order on Character SetsWe will first recall a theorem which justifies the current practice where a shift from a partial order (appendix A.7) to a strict order (appendix A.9), or vice versa, is quite natural. Let A be a set. Let be a partial order on A, and let the relation < on A be defined as:
Then < is a strict order on A. The other way round, let < be a strict order on A, and let the relation on A be defined as:
Then is a partial order on A. Therefore, each of the two relations and < uniquely determines the other, and we will call them associated partial and strict orders. Moreover, if and < are associated partial and strict orders, then is a linear order (appendix A.8) if and only if < is a trichotomic strict order (appendix A.3.7). The following result shows how a linear order may be expressed on behalf of a function. Let A be a finite nonempty set of cardinality n, let I denote the interval [0, n-1] of natural numbers, and let f be a bijective function from A to I (appendix A.1.3). Then the relation on A defined as:
is a linear order on A. The value of function f may be interpreted as the rank of an element of A for the linear order . The other way round, if is a linear order on A, then a bijective function f: A®I may be built such that (5.3) holds. The idea is simply to rank the elements of A, first its least element, then the least element of the remaining ones, etc., and then to associate numbers to them starting with zero. As f is a bijective function, it has an inverse, f-1: I®A (appendix A.1.4). It is easy to define this function by a table, simply by enumerating the elements of A in the increasing order of intent. This table is modeled by the generic formal object parameter CHARACTER_ORDER of the generic package TEXT_ROUTINES_G and its purpose is to define a linear order on the character set. CHARACTER_ORDER is therefore an array of characters whose index ranges from zero to the number of characters within the character set minus one. Moreover, as CHARACTER_ORDER is a function between two finite sets of same cardinality, it is bijective from the moment that it is injective (appendix A.1.2):
i.e. no character is listed more than once. In conformance with attribute names in Ada, function f-1 identical to CHARACTER_ORDER is named VAL and function f is named POS. Package TEXT_ROUTINES_G provides also functions FIRST, LAST, SUCC and PRED related to the linear order on the character set defined by CHARACTER_ORDER. Finally, the boolean function LESS provides comparison of two characters as defined by the associated strict order. 6. LINEAR order and equivalence relationLet A be a set and let be a linear order on A. We will say that the equivalence relation ~ on A is order-preserving (with respect to ), if:
If the equivalence relation ~ on A is defined by a function g: A®A (see 4.1), then it is order-preserving if and only if:
In other words, g is a monotonic increasing function with respect to (appendix A.10). Let us now go back to package TEXT_ROUTINES_G. That the equivalence relation EQUIVALENT defined by function NORMALIZE is order-preserving with respect to the strict order LESS may be verified by a simple loop using condition 6.2: (6.3) declare X : CHARACTER_TYPE := FIRST; begin while X /= LAST loop if LESS (NORMALIZE (SUCC (X)), NORMALIZE (X)) then -- order-preserving error end if; X := SUCC (X); end loop; end; In section 7, we will see that this property is mandatory for defining a "logical" linear order on strings. Sometimes, it may be desirable that transformations UPPER_CASE and LOWER_CASE are order-preserving. 7. Linear Order on StringsLet A be a nonempty set, let be a linear order on A and let ~ be an order-preserving equivalence relation on A. We denote by S* the set of strings over A. In section 4., we have already shown that the equivalence relation ~ on A may be extended to an equivalence relation on S*. Now we will also extend the linear order. Let u and v be two elements of S*. The relation u << v is true, if and only if one of the following conditions is satisfied:
It may be proven that relation << on S* is a trichotomic strict order and therefore defines a linear order on S*. In package TEXT_ROUTINES_G, the strict order on strings is given by the boolean function LESS having string parameters. 8. VerificationsAn instantiation of package TEXT_ROUTINES_G is only valid if conditions 3.1, 3.2, 3.3, 3.4, 3.5, 3.9, 3.10, 5.4 and 6.3 are satisfied. All these verifications could be done in the sequence of statements of its body, but then they would be executed at each elaboration of an instance. This seemed too costly and we therefore located the verifications in a test program which is executed once for each instantiation of TEXT_ROUTINES_G. We made an exception for condition 5.4. Function CHARACTER_ORDER must be inverted in the sequence of statements of the body of TEXT_ROUTINES_G and the verification that it is injective can thus be made on the fly. If this verification fails, exception CHARACTER_ORDER_ERROR is raised on elaboration of the instance.
9. Specification of the Generic Package Text_Utilities_G--+ TITLE: Utility routines for characters and strings.
--+ AUTHORS: Christian Genillard and Alfred Strohmeier.
Swiss Federal Institute of Technology (EPFL),
1015 Lausanne, Switzerland.
--+ DATE: February 1990.
generic
type CHARACTER_TYPE is (<>);
-- The characters CH such that CHARACTER_TYPE'POS(CH) <= 127 are named ASCII
-- characters.
type STRING_TYPE is array (POSITIVE range <>) of CHARACTER_TYPE;
package TEXT_UTILITIES_G is
------------------------
subtype RANK_TYPE is INTEGER range 0..CHARACTER_TYPE'POS(CHARACTER_TYPE'LAST);
type ORDER_TYPE is array (RANK_TYPE) of CHARACTER_TYPE;
type CONVERSION_TABLE_TYPE is array (CHARACTER_TYPE) of CHARACTER_TYPE;
CHARACTER_ORDER_ERROR : exception;
generic
CHARACTER_ORDER : in ORDER_TYPE;
-- Defines the logical order on the characters.
-- If a character is listed twice, CHARACTER_ORDER_ERROR is raised on
-- elaboration.
BASE_CONVERSION_TABLE : in CONVERSION_TABLE_TYPE;
-- Defines the conversion to base characters (for example to convert
-- diacritical characters to the corresponding non-diacritical ones).
UC_CONVERSION_TABLE : in CONVERSION_TABLE_TYPE;
-- Defines the conversion to upper-case characters.
LC_CONVERSION_TABLE : in CONVERSION_TABLE_TYPE;
-- Defines the conversion to lower-case characters.
package TEXT_ROUTINES_G is
-----------------------
function UPPER_CASE(ITEM: CHARACTER_TYPE) return CHARACTER_TYPE;
function LOWER_CASE(ITEM: CHARACTER_TYPE) return CHARACTER_TYPE;
function BASE(ITEM: CHARACTER_TYPE) return CHARACTER_TYPE;
function NORMALIZE(ITEM: CHARACTER_TYPE) return CHARACTER_TYPE;
--+ OVERVIEW:
--+ Converts the character to the base character and then to upper-case.
function EQUIVALENT(LEFT, RIGHT: CHARACTER_TYPE) return BOOLEAN;
--+ OVERVIEW:
--+ Returns TRUE if NORMALIZE(LEFT) = NORMALIZE(RIGHT).
function LESS(LEFT, RIGHT: CHARACTER_TYPE) return BOOLEAN;
--+ OVERVIEW:
--+ Returns TRUE if LEFT is listed before RIGHT in CHARACTER_ORDER.
function FIRST return CHARACTER_TYPE;
function LAST return CHARACTER_TYPE;
function SUCC(ITEM: CHARACTER_TYPE) return CHARACTER_TYPE;
function PRED(ITEM: CHARACTER_TYPE) return CHARACTER_TYPE;
function POS(ITEM: CHARACTER_TYPE) return NATURAL;
function VAL(X: NATURAL) return CHARACTER_TYPE;
--+ OVERVIEW:
--+ These functions return results based on CHARACTER_ORDER.
--+ ERRORS:
--+ SUCC, PRED and VAL may raise CONSTRAINT_ERROR.
pragma INLINE(UPPER_CASE, LOWER_CASE, BASE, NORMALIZE, EQUIVALENT,
LESS, FIRST, LAST, SUCC, PRED, POS, VAL);
function UPPER_CASE(ITEM: STRING_TYPE) return STRING_TYPE;
function LOWER_CASE(ITEM: STRING_TYPE) return STRING_TYPE;
function BASE(ITEM: STRING_TYPE) return STRING_TYPE;
function NORMALIZE(ITEM: STRING_TYPE) return STRING_TYPE;
--+ OVERVIEW:
--+ Converts the characters of ITEM to base characters and then to
--+ upper-case.
function EQUIVALENT(LEFT, RIGHT: STRING_TYPE) return BOOLEAN;
--+ OVERVIEW:
--+ Returns TRUE if LEFT and RIGHT have the same length and hold
--+ EQUIVALENT characters at any position.
function LESS(LEFT, RIGHT: STRING_TYPE) return BOOLEAN;
--+ OVERVIEW:
--+ Strict ordering of strings based on the order defined by
--+ CHARACTER_ORDER and the equivalence relation defined by NORMALIZE.
--+ First the two strings are compared after having converted their base
--+ characters to upper-case (NORMALIZE). If this comparison yields
--+ TRUE, then the strings are compared using the order defined by
--+ CHARACTER_ORDER.
--+ For a sensible CHARACTER_ORDER and sensible conversion tables, the
--+ following calls will return TRUE:
--+ LESS("aB", "AC") , LESS("AB", "aB").
end TEXT_ROUTINES_G;
end TEXT_UTILITIES_G;
10. Instantiation of Text_Utilities_G with a Latin Alphabet--+ TITLE: Text utilities package for Latin alphabet No. 1.
--+ AUTHORS: Christian Genillard and Alfred Strohmeier.
Swiss Federal Institute of Technology (EPFL),
1015 Lausanne, Switzerland.
--+ DATE: February 1990.
with LATIN1_CHARACTER_SET;
with TEXT_UTILITIES_G;
package LATIN1_TEXT_UTILITIES is new TEXT_UTILITIES_G
(LATIN1_CHARACTER_SET.LATIN1_CHARACTER,
LATIN1_CHARACTER_SET.LATIN1_STRING);
11. Summary of the Instantiation of Text_Routines_G for a Latin Alphabet--+ TITLE: Text routines package for Latin alphabet No.1.
--+ AUTHORS: Christian Genillard and Alfred Strohmeier.
Swiss Federal Institute of Technology (EPFL),
1015 Lausanne, Switzerland.
--+ DATE: February 1990.
with LATIN1_CHARACTER_SET; use LATIN1_CHARACTER_SET;
with LATIN1_TEXT_UTILITIES; use LATIN1_TEXT_UTILITIES;
package LATIN1_TEXT_ROUTINES is new LATIN1_TEXT_UTILITIES.TEXT_ROUTINES_G
(CHARACTER_ORDER =>
-- First: control characters, ordered in conformance with the enumeration
-- type definition:
(NUL,..., US, DEL, RESERVED_128,..., APC,
-- Second: special characters, ordered in conformance with the enumeration
-- type definition:
' ',..., '/', ':',..., '@',..., '[',..., '`', '{',..., '~',
NO_BREAK_SPACE, ..., INVERTED_QUESTION, MULTIPLICATION_SIGN, DIVISION_SIGN,
-- Third: digits, ordered in conformance with the enumeration type
-- definition:
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
-- Last: letters, in a "logical" order:
'A', 'a', UC_A_GRAVE, LC_A_GRAVE, UC_A_ACUTE, LC_A_ACUTE,
UC_A_CIRCUMFLEX, LC_A_CIRCUMFLEX, UC_A_TILDE, LC_A_TILDE,
UC_A_DIAERESIS, LC_A_DIAERESIS, UC_A_RING, LC_A_RING,
UC_AE_DIPHTHONG, LC_AE_DIPHTHONG,
'B', 'b',
'C', 'c', UC_C_CEDILLA, LC_C_CEDILLA,
'D', 'd', UC_ICELANDIC_ETH, LC_ICELANDIC_ETH,
'E', 'e', UC_E_GRAVE, LC_E_GRAVE, UC_E_ACUTE, LC_E_ACUTE,
UC_E_CIRCUMFLEX, LC_E_CIRCUMFLEX, UC_E_DIAERESIS, LC_E_DIAERESIS,
'F', 'f',
'G', 'g',
'H', 'h',
'I', 'i', UC_I_GRAVE, LC_I_GRAVE, UC_I_ACUTE, LC_I_ACUTE,
UC_I_CIRCUMFLEX, LC_I_CIRCUMFLEX, UC_I_DIAERESIS, LC_I_DIAERESIS,
'J', 'j',
'K', 'k',
'L', 'l',
'M', 'm',
'N', 'n', UC_N_TILDE, LC_N_TILDE,
'O', 'o', UC_O_GRAVE, LC_O_GRAVE, UC_O_ACUTE, LC_O_ACUTE,
UC_O_CIRCUMFLEX, LC_O_CIRCUMFLEX, UC_O_TILDE, LC_O_TILDE,
UC_O_DIAERESIS, LC_O_DIAERESIS,
UC_O_OBLIQUE_STROKE, LC_O_OBLIQUE_STROKE,
'P', 'p',
'Q', 'q',
'R', 'r',
'S', 's', LC_GERMAN_SHARP_S,
'T', 't', UC_ICELANDIC_THORN, LC_ICELANDIC_THORN,
'U', 'u', UC_U_GRAVE, LC_U_GRAVE, UC_U_ACUTE, LC_U_ACUTE,
UC_U_CIRCUMFLEX, LC_U_CIRCUMFLEX, UC_U_DIAERESIS, LC_U_DIAERESIS,
'V', 'v',
'W', 'w',
'X', 'x',
'Y', 'y', UC_Y_ACUTE, LC_Y_ACUTE, LC_Y_DIAERESIS,
'Z', 'z'),
BASE_CONVERSION_TABLE =>
(NUL => NUL, ...
-- Identity for control characters, digits, special characters and letters
-- without diacritical mark and which are not special letters.
-- Associations for letters with diacritical mark and special letters may be
-- as follows:
UC_A_GRAVE | UC_A_ACUTE | UC_A_CIRCUMFLEX | UC_A_TILDE | UC_A_DIAERESIS |
UC_A_RING | UC_AE_DIPHTHONG => 'A',
UC_C_CEDILLA => 'C',
UC_E_GRAVE | UC_E_ACUTE | UC_E_CIRCUMFLEX | UC_E_DIAERESIS => 'E',
UC_I_GRAVE | UC_I_ACUTE | UC_I_CIRCUMFLEX | UC_I_DIAERESIS => 'I',
UC_ICELANDIC_ETH => 'D',
UC_N_TILDE => 'N',
UC_O_GRAVE | UC_O_ACUTE | UC_O_CIRCUMFLEX | UC_O_TILDE | UC_O_DIAERESIS => 'O',
MULTIPLICATION_SIGN => MULTIPLICATION_SIGN,
UC_O_OBLIQUE_STROKE => 'O',
UC_U_GRAVE | UC_U_ACUTE | UC_U_CIRCUMFLEX | UC_U_DIAERESIS => 'U',
UC_Y_ACUTE => 'Y',
UC_ICELANDIC_THORN => 'T',
LC_GERMAN_SHARP_S => LC_GERMAN_SHARP_S,
LC_A_GRAVE | LC_A_ACUTE | LC_A_CIRCUMFLEX | LC_A_TILDE | LC_A_DIAERESIS |
LC_A_RING | LC_AE_DIPHTHONG => 'a',
LC_C_CEDILLA => 'c',
LC_E_GRAVE | LC_E_ACUTE | LC_E_CIRCUMFLEX | LC_E_DIAERESIS => 'e',
LC_I_GRAVE | LC_I_ACUTE | LC_I_CIRCUMFLEX | LC_I_DIAERESIS => 'i',
LC_ICELANDIC_ETH => 'd',
LC_N_TILDE => 'n',
LC_O_GRAVE | LC_O_ACUTE | LC_O_CIRCUMFLEX | LC_O_TILDE | LC_O_DIAERESIS => 'o',
DIVISION_SIGN => DIVISION_SIGN,
LC_O_OBLIQUE_STROKE => 'o',
LC_U_GRAVE | LC_U_ACUTE | LC_U_CIRCUMFLEX | LC_U_DIAERESIS => 'u',
LC_Y_ACUTE => 'y',
LC_ICELANDIC_THORN => 't',
LC_Y_DIAERESIS => 'y'),
UC_CONVERSION_TABLE =>
(NUL => NUL, ...
-- Identity for control characters, digits, special characters, letters which
-- are already in upper-case and letters without upper-case,
-- i.e. LC_GERMAN_SHARP_S and LC_Y_DIAERESIS.
-- Two examples of non-identity transforms follow:
'a' => 'A',
LC_A_GRAVE => UC_A_GRAVE),
LC_CONVERSION_TABLE =>
(NUL => NUL, ...
-- Identity for control characters, digits, special characters and letters
-- which are already in lower-case.
-- Two examples of non-identity transforms follow:
'A' => 'a',
UC_A_GRAVE => LC_A_GRAVE));
References [ANSI X3.4] American National Standard Code for Information Interchange (ASCII); American National Standards Institute, 1977. [BENE] Benedicty M., Sledge F.R.; Discrete Mathematical Structures; Harcourt Brace Jovanovich, 1987. [ISO/TR 8393] Documentation - ISO bibliographic filing rules - Exemplification of bibliographic filing principles in a model set of rules (Technical Report); International Organization for Standardization, 1985. [ISO 8859] 8-bit single-byte coded graphic character sets; International Organization for Standardization, 1987. [LRM] Reference Manual for the Ada Programming Language, ANSI/MIL-STD 1815A, 1983. [RAK] Regeln für die alphabetische Katalogisierung; Verein Deutscher Bibliothekare; L. Reichert, Wiesbaden; 1977. [STAN] Stanat D.F., McAllister D.F.; Discrete Mathematics in Computer Science; Prentice Hall, 1977. [STROH] Strohmeier A., Genillard C., Weber M.; Implementation of 8-bit coded character sets in Ada; submitted to Ada Letters. [WEBSTER] Webster's New World Dictionary, Third College Edition; 1988. Appendix: Some Definitions of Mathematics In this appendix we recall some definitions of mathematics, drawn mostly from [BENE] and [STAN], two excellent books. A and B will denote arbitrary sets. (A.1) Let f be a function from A to B, f: A®B. A is called the domain of f, and B its codomain. Set f(A) is called the image of f. (A.1.1) f is surjective (onto) if f(A) = B. (A.1.2) f is injective (one-to-one) if a a' implies f(a) f(a') whenever a, a' in A. (A.1.3) f is bijective (one-to-one and onto) if f is both injective and surjective. (A.1.4) If f is bijective, then there is a (unique) inverse function of f, denoted f-1, f-1: B®A, having the properties:
(A.2) A (binary) relation on a set A is a subset R of the Cartesian product AxA. As usual, (x,\x11y) ÀR is denoted by xRy and (x, y) ÀR by xO(R,/)y. (A.3) Let R be a relation on A. (A.3.1) R is reflexive if xRx for every x in A. (A.3.2) R is irreflexive if xO(R,/)x for every x in A. (A.3.3) R is symmetric if whenever xRy then yRx. (A.3.4) R is asymmetric if whenever xRy then yO(R,/)x. (A.3.5) R is antisymmetric if whenever xRy and yRx then x = y. (A.3.6) R is transitive if whenever xRy and yRz then xRz. (A.3.7) R is trichotomic if for all x and y in A, precisely one of xRy, or x = y, or yRx is true. (A.4) A relation ~ on A is an equivalence relation if ~ is reflexive, symmetric and transitive. (A.5) Let ~ be an equivalence relation on A. For every x in A, the equivalence class of x with respect to ~ , denoted o(x,~), is the set {y | y ~ x}. (A.6) Let f be a function f: A®B and let the (binary) relation ~ on A be defined as follows:
Then ~ is an equivalence relation on A. (A.7) A relation on A is a partial order if it is reflexive, antisymmetric and transitive. The pair (A , ) is then called a partially ordered set or poset. (A.8) A partial order on A is called a linear order or simple order or total order if for every x and y in A, either x y or y x, that is to say, elements of A are always comparable. The pair (A , ) is then called a linearly ordered set or loset. (A.9) A relation < on A is a strict order or quasi order if it is irreflexive and transitive. (A.10) Let (A , S(,A)) and (B , S(,B)) be posets and let f: A®B be a function. Function f is called monotone increasing or order-preserving if:
|
| EPFL | IC | LGL | Teaching | Ada | LGL Components | |||
![]() |
|||
| Last modified |
![]() |
||