[main] [misc] [graphics] [page design] [site design] [xhtml] [css] [xml] [xsl] [schema] [javascript] [php] [mysql]

HVCC Home
Blackboard HVCC
Blackboard Manual
Faculty Association

javascript main
1. javascript basics
2. core javascript
3. js statements
4. js functions
5. js arrays
6. js objects
7. debugging js
8. js client side
9. the js bom
10. js frames and windows
11. js forms
12. js regexp
a. js regexp basics
b. regexp special chars
* c. regexp grouping
d. js regexp object
e. using regexp in js
13. js cookies
14. basic dhtml


print version

Note that all external links will open up in a separate window.

This is a stripped down version of these pages for older browsers. These pages are really meant to be viewed in a standards compliant browser.

Directions for surfing with audio.

Grouping

These tutorials are about Regular Expressions. Unless otherwise noted, examples assume JavaScript.

Grouping Expressions

Another thing you can do with regular expressions is group the elements that make up the expression. Grouping together terms is done with parentheses. There are three benefits to grouping:

  • you can apply repetition operators to a group of characters,
  • you can use the OR conditional, and
  • you can reference the groupings within the regular expression and within functions using that regular expression.

The first benefit is straight forward. If you want to look for a string that may begin with one or more of the sub-string "abc" in series, you could code: /^(abc)+/i.

The OR Conditional

You can also use groupings to establish the OR conditional. The OR conditional is represented by a single vertical bar in regular expressions ( | ). This is not to be confused with the double bar that many scripting languages normally use ( || ).

If you wanted to test for a string that either began with "abc" or "def", you could code: /^(abc|def)/i.

Let's try a more complex example. A phone number. A phone number may or may not include an area code and may be written a few different ways. Here are our options, which may not account for all conditions, but we will assume for now that any other format should be considered a data entry error:

(505) 222-1234
(505)222-1234
505-222-1234
505 222-1234
505 222 1234
5052221234
222-1234
2221234

How are we going to test for all those different formats? Lets use grouping to help us. For starters, the area code may or may not exist and may or may not be between parentheses. So we can group it together as entirely optional, as well as making the parentheses optional. Note that since parentheses are special characters in a regular expression, we need to escape them if we want them taken as literals.

/(\(?\d{3}\)?)?/

Then there may be a space, as dash, or nothing. We don't really need to group this bit, since characters that are optional count for zero positions if they are not there, but we will group them anyway. It makes the code clearer. Don't forget that spaces count in regular expressions.

/(\(?\d{3}\)?)?(-| )?/

The last seven digits are easy. They are three followed by an optional space or dash and then four more. This would make our final grouping look like this:

/(\(?\d{3}\)?)?(-| )?\d{3}(-| )?\d{4}/

It may look like a monster, but it is certainly easier than writing a conditional to test for all those possible combinations, for any possible phone number.

If you are concerned about the dashes being special characters, you can also escape them. This might be a good idea if the string is, for instance, being passed back to a server where you don't know the language used to code the processing application. That would give us:

/(\(?\d{3}\)?)?(\-| )?\d{3}(\-| )?\d{4}/

Referencing Groups

Once you have grouped something, you can also reference it within the regular expression. Each group is assigned a number, in the order in which is it declared in the regular expression. If groups are nested, then the count is based on the position of the left-hand parenthesis.

You reference it by listing its number in the order preceeded by a backslash. Thus, /(abc){2}/ could also be written /(abc)\1/. The first says match where this grouping repeats twice, the second says match this grouping and then match it again. This is especially useful if the identical strings are non-contiguous. For instance, assuming you have a list of file names, and in it you accidentally named a bunch of files somename.html.html, you could write the following regular expression to find them:

// regular expression for double file suffixes
matchStr = /\S+\.(html)\.\1/;

Note that since the periods are special characters, we have to escape them so that they will be taken literally. You could also genericize the string by coding it as follows:

// regular expression for double file suffixes
matchStr = /\S+\.(\S+)\.\1/;

Both of these assume that there are no spaces in your file names.

Now, let us assume that you want to write a program to fix the file names. First let's fix up the expression to make cover a few more possibilities. We will include word boundaries to make sure we only change the file name and not surrounding string elements. We will also set it to global so that it checks all occurances in the string. We will also group the file name, which means that the suffix is now group 2.

// regular expression for double file suffixes
matchStr = /(\b\S+)\.(\S+)\.\2\b/g;

Now we can use the string.replace( ) method to write out the file names without duplicate suffixes. We can use the groupings to do this since we can address the groupings from within the method. In JavaScript, we do this by preceding the number with a dollar sign ( $ ). The replace method takes two arguments. One is the string to be searched for. The other is what to replace it with.

// regular expression for double file suffixes
matchStr = /(\b\S+)\.(\S+)\.\2\b/g;
badFilesString.replace(matchStr, $1+'.'+$2);

[top]