UNIX Regular Expressions | ||
UNIX regular expressions are defined as follows: | ||
^ | Matches beginning of line. | |
$ | Matches end of line. | |
. | Matches any character except newline. | |
X+ | Maximal match of one or more occurrences of X. See "Minimal Versus Maximal Matching for more information on minimal and maximal matching. | |
X* | Maximal match of zero or more occurrences of X. | |
X? | Maximal match of zero or one occurrences of X. | |
X{n1} | Match exactly n1 occurrences of X. | |
X{n1,} | Maximal match of at least n1 occurrences of X. | |
X{,n2} | Maximal match of at least 0 occurrences but not more than n2 occurrences of X. | |
X{n1,n2} | Maximal match of at least n1 occurrences but not more than n2 occurrences of X. | |
X+? | Minimal match of one or more occurrences of X. | |
X*? | Minimal match of zero or more occurrences of X. | |
X?? | Minimal match of zero or one occurrences of X. | |
X{n1}? | Matches exactly n1 occurrences of X. | |
X{n1,}? | Minimal match of at least n1 occurrences of X. | |
X{,n2}? | Minimal match of at least 0 occurrences but not more than n2 occurrences of X. | |
X{n1,n2}? | Minimal match of at least n1 occurrences but not more than n2 occurrences of X. | |
(?!X) | Search fails if expression X is matched. The expression ^(?!if) matches the beginning of all lines that do not start with "if". | |
(X) | Matches sub-expression X and specifies a new tagged expression. See "Tagged Search Expressions for more information. No more tagged expressions are defined once an explicit tagged expression number is specified as shown below. | |
(?dX) | Matches sub-expression X and specifies to use tagged expression number d where 0<=d<=9. No more tagged expressions are defined by the sub-expression syntax "(X)" once this sub-expression syntax is used. This is the best way to make sure you have enough tagged expressions. | |
(?:X) | Matches sub-expression X but does not define a tagged expression. | |
X|Y | Matches X or Y. | |
[char-set] | Matches any one of the characters specified by char-set. A '-' character may be used to specify ranges. The expression [A-Z] matches any uppercase letter. '\' may be used inside the square brackets to define literal characters or define ASCII characters. For example, "\-" specifies a literal dash character. The expression [\d0-\d27] matches ASCII character codes 0..27. The expression []] matches a right bracket. In MicroEdge regular expressions, [] matches no characters. In both syntax, the expression [\]] matches a right bracket. The expression [^] matches a '^' character but this does not work for MicroEdge regular expressions. In both syntax, [\^] matches a '^' character. | |
[^char-set] | Matches any character not specified by char-set. A '-' character may be used to specify ranges. The expression [^A-Z] matches all characters except uppercase letters. | |
\d | Defines a back reference to tagged expression number d where 0<=d<=9. For example, "{abc}def\1" matches the string "abcdefabc". If the tagged expression has not been set, the search path fails. | |
\c | Specifies cursor position if match is found. If the expression xyz\c is found the cursor is placed after the z. | |
\n | Matches newline character sequence. Useful for matching multi-line search strings. What this matches depends on whether the buffer is a DOS (ASCII 13,10 or just ASCII 10), UNIX (ASCII 10), Macintosh (ASCII 13), or user defined ASCII file. Use "\d10" if you want to match a 10 character. | |
\r | Matches carriage return (ASCII 13). | |
\t | Matches tab character. | |
\f | Matches form feed character. | |
\om | Turns on multi-line matching. This enhances the match character set, or match any character primitives to support matching end of line characters. For example, "\om.+" matches the rest of the buffer. WARNING: Test your regular expression on a very small file before using your regular expression on a large file. This option may cause the editor to use A LOT OF MEMORY. | |
\ol | Turns off multi-line matching (default). You can still use "\n" to create regular expressions which match one or more lines. However, expressions like ".+" will not match multiple lines. This is much safer and usually faster than using the "\om" option. | |
\xhh | Matches hexadecimal character hh. | |
\dddd | Matches decimal character ddd. | |
\char | Declares character after slash to be literal. For example, '\*' represents the star character. | |
\:char | Matches predefined expression corresponding to char. | |
The predefined expressions are: | ||
\:a [A-Za-z0-9] | Matches an alphanumeric character | |
\:b ([ \t]+) | Matches blanks | |
\:c [A-Za-z] | Matches an alphabeticcharacter | |
\:d [0-9] | Matches a digit | |
\:f ([^\[\]\:\\/<>|=+;, \t"']+) | non-UNIX platforms: Matches a filename part | |
\:f ([^/ \t"']+) | UNIX: Matches a filename part | |
\:h ([0-9A-Fa-f]+) | Matches a hex number | |
\:i ([0-9]+) | Matches an integer | |
\:n (([0-9]+(\.[0-9]+|)|\.[0-9]+)([Ee](\+|-|)[0-9]+|)) | ||
Matches a floating number | ||
\:p (([A-Za-z]:|)(\\|/|)(\:f(\\|/))*\:f) | non-UNIX platforms: Matches a path | |
\:p ((/|)(:f(/))*\:f) | UNIX: Matches a path | |
\:q (\"[^\"]*\"|'[^']*') | Matches a quoted string | |
\:v ([A-Za-z_$][A-Za-z0-9_$]*) | Matches a C variable | |
\:w ([A-Za-z]+) | Matches a word | |
NOTE: The \:f and \:p predefined expressions are not intended to support all operating system file names. Instead they are intended to be useful in practical cases. For non-UNIX platforms \:f is designed for FAT (DOS) file systems. The space character is not in :f because it is more typically a file name separator. | ||
Precedence of operators from highest to lowest. | ||
+,*,?, {},+?,*?,??, {}? | These operators have the same precedence | |
concatenation | ||
| | ||
Sample Regular Expressions: | ||
^defproc | Matches lines that begin with the word defproc. | |
^definit$ | Matches lines that only contain the word definit. | |
^\*name | Matches lines that begin with the string "*name". Notice that the backslash must prefix the special character '*'. | |
[\t ] | Matches tab and space characters. | |
[\d9\d32] | Matches tab and space characters. | |
[\x9\x20] | Matches tab and space characters. | |
p.t | Matches any three letter string starting with the letter 'p' and ending with the letter 't'. Two possible matches are "pot" and "pat". | |
s.*?t | Matches the letter 's' followed by any number of characters followed by the nearest letter 't'. Two possible matches are "seat" and "st". | |
for|while | Matches the strings "for" or "while". | |
^\:p | Matches lines beginning with a file name. | |
xy+z | Matches x followed by one or more occurrences of y followed by z. |