Pattern Matching |
|
|
Pattern matching is a way to test whether data has a particular structure. It can be used for data validation and as a means to extract the part of a data item that matches against a specified element of the pattern. The pattern matching operations are the query processor LIKE and UNLIKE keywords, the QMBasic MATCHES operator, MATCHFIELD() and MATCHESS() functions, and the M search of ED. All of these compare a character string with a pattern template.
Pattern matching breaks the character set into three classes of character, each represented by a character type code:
On ECS mode systems, determination of character class is based on the character map in use.
There are also three ways to specify how many characters are present:
The template consists of up to a maximum of 30 concatenated elements formed from pairs of lengths and character type:
The values n and m are integers with any number of digits. m must be greater than or equal to n.
The 0X code is a wildcard that matches against anything. It has a commonly used synonym:
The 0A, nA, 0N, nN and "string" patterns may be preceded by a tilde (~) to invert the match condition. For example, ~4N matches four non-numeric characters such as ABCD (not a string which is not four numeric characters such as 12C4).
A null string matches patterns ..., 0A, 0X, 0N, their inverses (~0A, etc) and "".
The 0X and n-mX patterns match against as few characters as necessary before control passes to the next pattern. For example, the string ABC123DEF matched against the pattern 0X2N0X matches the pattern components as ABC, 12 and 3DEF.
The 0N, n-mN, 0A, and n-mA patterns match against as many characters as possible. For example, the string ABC123DEF matched against the pattern 0X2-3N0X matches the pattern components as ABC, 123 and DEF.
A pattern may contain unquoted literal elements so long as they do not cause ambiguity. Note that each character will be treated as a separate literal element such that a pattern 3AXYZ3A has five elements and will match a string that is formed from three letters followed by the three literal characters X, Y, Z and a further three letters. The significance of the literal characters being treated as separate elements comes with MATCHFIELD() and PARSE().
The template string may contain alternative patterns separated by value marks. The source data will match the overall pattern if any of the pattern values match. If a match is found, the INMAT() function can be used to retrieve the value position within the pattern that matched.
The MATCHESS() function can be used to compare each element of a dynamic array with a pattern, returning a equivalently structured dynamic array of True/False values. Note the spelling of this function with the trailing S to "pluralise" the name in the same way as other multivalue function names.
Examples
"A123BCD" would match successfully against patterns of 1A3N3A 1A1-3N3A 'A'1-3N3A 0A0N0A 1A...3A 1A~3A3A and many more
It is often acceptable to omit the quotes around literal components. The above example would also match A1-3N3A There is no confusion between the leading A as a literal or as a character type as it is not preceded by a length value. It is, however, recommended that the quotes should be included. Omitting the quotes in a pattern used in the MATCHFIELD() function may affect the function's behaviour as each character of the literal will be counted as a separate component of the pattern.
A program might need to test whether data entered by a user is a non-negative integer (whole number) value. The QMBasic NUM() function can be used to test for numeric data but this would allow fractional or negative values. Testing against a pattern of "1-4N" would allow only integer values in the range 0 to 9999. To remove the upper limit, a pattern of 1N0N tests for one digit followed by any number of further digits, including none. |