| [main] [misc] [graphics] [page design] [site design] [xhtml] [css] [xml] [xsl] [schema] [javascript] [php] [mysql] | |
Note that all external links will open up in a separate window. This is a stripped down version of these pages for older browsers. These pages are really meant to be viewed in a standards compliant browser. |
GroupingThese tutorials are about Regular Expressions. Unless otherwise noted, examples assume JavaScript. Grouping ExpressionsAnother thing you can do with regular expressions is group the elements that make up the expression. Grouping together terms is done with parentheses. There are three benefits to grouping:
The first benefit is straight forward. If you want to look for a string that may begin with one or more of the sub-string "abc" in series, you could code:
The OR ConditionalYou can also use groupings to establish the OR conditional. The OR conditional is represented by a single vertical bar in regular expressions ( | ). This is not to be confused with the double bar that many scripting languages normally use ( || ).
If you wanted to test for a string that either began with "abc" or "def", you could code: Let's try a more complex example. A phone number. A phone number may or may not include an area code and may be written a few different ways. Here are our options, which may not account for all conditions, but we will assume for now that any other format should be considered a data entry error: (505) 222-1234 (505)222-1234 505-222-1234 505 222-1234 505 222 1234 5052221234 222-1234 2221234 How are we going to test for all those different formats? Lets use grouping to help us. For starters, the area code may or may not exist and may or may not be between parentheses. So we can group it together as entirely optional, as well as making the parentheses optional. Note that since parentheses are special characters in a regular expression, we need to escape them if we want them taken as literals.
/(\(?\d{3}\)?)?/
Then there may be a space, as dash, or nothing. We don't really need to group this bit, since characters that are optional count for zero positions if they are not there, but we will group them anyway. It makes the code clearer. Don't forget that spaces count in regular expressions.
/(\(?\d{3}\)?)?(-| )?/
The last seven digits are easy. They are three followed by an optional space or dash and then four more. This would make our final grouping look like this:
/(\(?\d{3}\)?)?(-| )?\d{3}(-| )?\d{4}/
It may look like a monster, but it is certainly easier than writing a conditional to test for all those possible combinations, for any possible phone number. If you are concerned about the dashes being special characters, you can also escape them. This might be a good idea if the string is, for instance, being passed back to a server where you don't know the language used to code the processing application. That would give us:
/(\(?\d{3}\)?)?(\-| )?\d{3}(\-| )?\d{4}/
Referencing GroupsOnce you have grouped something, you can also reference it within the regular expression. Each group is assigned a number, in the order in which is it declared in the regular expression. If groups are nested, then the count is based on the position of the left-hand parenthesis.
You reference it by listing its number in the order preceeded by a backslash. Thus, // regular expression for double file suffixes matchStr = /\S+\.(html)\.\1/; Note that since the periods are special characters, we have to escape them so that they will be taken literally. You could also genericize the string by coding it as follows: // regular expression for double file suffixes matchStr = /\S+\.(\S+)\.\1/; Both of these assume that there are no spaces in your file names. Now, let us assume that you want to write a program to fix the file names. First let's fix up the expression to make cover a few more possibilities. We will include word boundaries to make sure we only change the file name and not surrounding string elements. We will also set it to global so that it checks all occurances in the string. We will also group the file name, which means that the suffix is now group 2. // regular expression for double file suffixes matchStr = /(\b\S+)\.(\S+)\.\2\b/g;
Now we can use the // regular expression for double file suffixes matchStr = /(\b\S+)\.(\S+)\.\2\b/g; badFilesString.replace(matchStr, $1+'.'+$2);
These pages can be found at:
[http://academ.hvcc.edu/~kantopet/]
|