9.19. Using Patterns

This page is part of the Mobile Dashboard online help. For an introduction to the online help manual, see Section 1.1, “Mobile Dashboard Online Help Manual Overview”.

This section explains how to use patterns. A pattern is composed of text, which may contain special characters, that describes a specific format that some other piece of text will be matched against. For example, the text being matched against the pattern might be an input from the user, and if the text entered by the user does not match the required pattern, the user would be told to correct their input. (To use a pattern to restrict the values a user can enter for an input, you would define a custom data format for that input; see (Section 7.31, “Restrict Data Values”)). Another example would be when a pattern is used to filter content to be extracted from a Web page. Only content items on the Web page that matched the specified pattern would be extracted.

Advanced users who wish to refer to the specifications for patterns, which are technically called 'regular expressions', should see Section 'F' of XML Schema Part 2: Datatypes. There are advanced features in the specification that we do not describe here, but we support the entire regular expression language specified there. You may also search for "regular expressions xml schema" on the Internet to find other tutorials and examples.

Unfortunately, there are some commonly-needed tasks that are not easy to do with patterns, including matching everything except text containing some word or arbitrary pattern, or matching a word regardless of whether it is uppercase or lowercase.

The following subsections give several examples of patterns to use for different purposes, and explain most of the special characters and features that are commonly used in patterns. Patterns and symbols that are included in a sentence will be enclosed in double quotes ("), but these quotes are not part of the pattern itself.

In the examples below, the text that the pattern is being matched against is sometimes called the "target" text.

Special Characters

There are eleven symbols that have special meanings in patterns, listed here (in double quotes): " . \ * + ? [ ] ( ) { } ". To match one of these symbols in a pattern, instead of having it treated as a special character, you must precede it by the "\" symbol. So, to match a "\" in a pattern, you would use "\\".

Basic Examples

This subsection gives some simple examples of using patterns.

Pattern Description A pattern like this Matches text(s) like this
To limit a user to entering one of three words in an account status field, for example, you could use this pattern. The expression in parenthesis means to match only one of the items separated by "|"'s. (Open|Closed|Pending)

Open
Closed
Pending
BUT NOT:
closed (needs uppercase "C")

If a table on a Web page contained rows describing the status of customer accounts, and you wanted to see only those that were ether in collections or over 60 days past due, you could use a pattern like this. The "." matches any single character, the "*" means zero or more instances of whatever is just before it in the pattern, and the expression in parenthesis means match any one of the items separated by "|", either the text "ollections" or the text "60 days". The ".*" matches anything that might be before or after the "ollections" or "60 days". We use a common technique here, and omit the initial "c" from "collections" so that the pattern doesn't depend on the first letter being uppercase or lowercase. See the next example for an important limitation, however. .*(ollections|60 days).*

Joe Smith -- (In Collections)
Jane Jones -- 60 days past due
BUT NOT:
Harry Doe -- 30 days past due
Richard Richards -- collected $375.45

Note that if there were not exactly one space between the "60" and "days", the previous example pattern wouldn't match. To allow one or more whitespace characters, use the pattern "60\s+days", which uses the "\s" shorthand for any whitespace character, and the "+" symbol, meaning one or more instances of whatever is before it. .*(ollections|60\s+days).*

Jane Doe -- 60    days past due
BUT NOT:
Jane Doe -- 60 -days past due ("-" is not whitespace)

To match just a few copies of one character, you can often just repeat that character in the the pattern, but to match a line containing, for example, at least two uses of the word "customer", you could use a pattern which uses {}'s, the notation for counted repetitions. If you wanted to indicate exactly three copies, or at least three but not more than five copies, you would use "{3}" and "{3,5}", respectively. However, notice that if you put a ".*" inside ()'s, then it will match anything, and will not prevent a pattern such as "(.*cat.*){2,4} from also matching text that contains 5 or more uses of the word "cat". (.*customer.*){2,}

Sales per-customer; top customers only
BUT NOT:
Sales per-cutomer (only one copy)

Pattern Features and Details

This subsection gives detailed descriptions of the commonly-used features available in patterns.

Pattern Description A pattern like this Matches text(s) like this
To match specific text, use the text itself, adding the "\" special character in front of any special characters you want to match as symbols. A lower case letter is not identical with its upper case version, and vice versa. Each and every space and character in the pattern must match exactly the same character in the target text. Introduction -- Version 3\.2\(a\) Introduction -- Version 3.2(a)
The "." symbol stands for any single character, including space and tab, but not the newline character. Thus, you can match almost anything using ".*", or anything containing a dollar sign with ".*$.*". To match some text with a period in it, use "\.". ..\...

25.32
ab.cd
$_.&!
BUT NOT:
2.435 ('.' won't match '4')

Certain kinds of characters have shorthand notation to refer to them. "\d" will match any digit from 0-9, and "\D" will match anything except a digit. "\s" will match any whitespace character, including space, tab, newline or linefeed; "\S" will match any non-whitespace character. "\t" matches the tab character, "\n" matches newline, "\r" matches the return character. "\w" will match any character normally used in a word, including letters, digits and symbols, but not punctuation or separator characters. To match just a letter, use "\p{L}"; for an uppercase or lowercase letter, use "\p{Lu}" or "\p{Ll}". \d\s\w\D\p{Lu}

3 a.Q
3 abZ
BUT NOT:
3 a!z ('z' is lowercase)
3 .bA ('.' is punctuation)

To match zero or more copies of a character, use the "*" symbol. To match one or more copies, use the "+" symbol. To match zero or one copy, use the "?" symbol. a+ b+ c?

b
a b
aaa b
aaa bbb
b c
BUT NOT:
aaa (at least one 'b' needed)
a b cc (too many 'c's)

Any pattern can be enclosed in ()'s and be treated as a unit. (abc d)+

abc dabc d
BUT NOT:
abcd abcd

To match a specific number of copies of something, you can use {}'s. To match exactly 3 a's, use "a{3}". To match at least 3 and at most 5 copies, use "a{3,5}". To match 3 or more copies, use "a{3,}". However, if you put a ".*" inside ()'s, then it will match anything, and will not prevent a pattern such as "(.*cat.*){1,2} from matching text that contains 3 or more uses of the word "cat". ({3,5}

xxx
xxxx
xxxxx
BUT NOT:
xx (too few 'x's)
xxxxxx (too many letters)

To match sets of characters, list them inside []'s, e.g., "[aeiou]" will match any lowercase vowel. For letter or number ranges, you may also use only the first and last character of a range, with a dash in the middle, e.g. "[A-Ma-m]" will match any lowercase or uppercase letter in the first half of the alphabet. "[0-9]" would match any digit; you can also use a shorthand notation of "\d" for a digit. Inside []'s, most special characters are treated like any others, and only "[", "]", "-" and "^" need a "\" in front of them. [a-z0-9]+

ab293
1234567890zwx
a1b2c3
BUT NOT:
A29 ('A' is uppercase)

To match anything except certain characters from a set, start the set with the "^" symbol. When "^" is not the first character in a set, it is treated as itself; for a set that is just '^', use "[\^]" [^0-9]+

abc,.;:" Hi there!
BUT NOT:
I'm 5 years old ('5' not allowed)

You can also subtract sets of characters to exclude certain ones: "[a-z0-9]-[Cc3]" matches any letter or number except "c" or "3". ([a-z0-9]-[c3]){4}

abd5
23xy
BUT NOT:
abc9

You can match one of several alternatives by separating them with "|": "Jan|Feb|Mar" would match any of those abbreviated months. (North|South)(east|west)

Northeast
Southwest
BUT NOT:
NorthNorth

More Examples

This subsection gives some examples of more complicated patterns.

Pattern Description A pattern like this Matches text(s) like this
To match a formatted date, e.g., 2010-02-25, you could use a pattern like this: "\d{4}-\d\d-\d\d". Note that this pattern does not ensure that the digits actually form a valid date. This pattern would also match "1234-99-77". It thus would not ensure that a user's entry was accurate, but could be useful to detect dates on a Web page which are known to be correct. \d*\d \p{Lu}(\p{Ll}){2} 20\d{2}

3 Mar 2010
13 Apr 2047
BUT NOT:
13 apr 2047

To match a currency amount, containing commas and optional cents, you could use this pattern. It would ensure that users enter only valid currency values. $(\d{1,3}(,\d{3})*|(\d+))(\.\d{2})?

$0.95
$2,000
$2,500,387.94
BUT NOT:
$.95