REGEXP string Manipulation



String manipulation and searching contribute to a large percentage of the logic within a Webbased
application. Regular expressions are a method of describing both simple and complex
patterns for searching and manipulating. Usage ranges from the simple (“Find the word SAN in
the text”) to the more complex (“Extract all URLs from the text”) to the archaic (“Find all words
in which every second character is a vowel”).
From  Oracle Database 10g, you can use both SQL and PL/SQL to implement regular expression
support. The National Language Support Run Time Library (NLSRTL) provides all of the
linguistic elements used to implement regular expression support. The following functions have
been introduced:
• REGEXP_LIKE: Similar to the LIKE operator, but performs regular expression matching
instead of simple pattern matching
• REGEXP_REPLACE: Searches for a regular expression pattern and replaces it with a
replacement string
• REGEXP_INSTR: Searches for a given string for a regular expression pattern and returns
the position were the match is found
• REGEXP_SUBSTR: Searches for a regular expression pattern within a given string and
returns the matched sub-string

 

SELECT first_name, last_name, salary,
TO_CHAR(hire_date, ‘yyyy’)
FROM emp
WHERE REGEXP_LIKE(
TO_CHAR(hire_date, ‘yyyy’), ‘^201[1-4]$’);

This example illustrates the use of the REGEXP_LIKE functions to display employee names and
salaries for employees hired between 2011 and 2014. The example uses the ‘^’ to indicate that
the beginning of the line has to be 201, and [ – ] with ‘$’ to specify a range of valid characters as
the very last character.

Integrity Constraints using Regular expression

Regular expressions are a useful way to enforce integrity constraints. For example, suppose that you want to ensure that phone numbers are entered into the database in a standard format. Lets creates a emp_contact table and adds a check constraint to the phone_number column to enforce the following format mask:

(AA) AAA-AAA-AAA

CREATE TABLE emp_contact
(
Emp_name    VARCHAR2(30),
phone_number  VARCHAR2(30)
CONSTRAINT phone_number_format
CHECK ( REGEXP_LIKE ( p_number, ‘^\(\d{2}\) \d{3}-\d{3}-\d{3}$’ ) )
);

 

Regular Expression Element Matches . . .
^ The beginning of the string.
\( A left parenthesis. The backward slash (\) is an escape character that indicates that the left parenthesis following it is a literal rather than a grouping expression.
\d{2} Exactly three digits.
\) A right parenthesis. The backward slash (\) is an escape character that indicates that the right parenthesis following it is a literal rather than a grouping expression.
(space character) A space character.
\d{3} Exactly three digits.
A hyphen.
\d{3} Exactly four digits.
A hyphen.
\d{3} Exactly four digits.
$ The end of the string.

Regular expression metacharacter in Oracle Database

Syntax Operator Name Description Example
. Any Character — Dot Matches any character in the database character set. If the n flag is set, it matches the newline character. The expression a.b matches the strings abb, acb, and adb, but does not match acc.
+ One or More — Plus Quantifier Matches one or more occurrences of the preceding subexpression. The expression a+ matches the strings a, aa, and aaa, but does not match bbb.
? Zero or One — Question Mark Quantifier Matches zero or one occurrence of the preceding subexpression. The expression ab?c matches the strings abcand ac, but does not match abbc.
* Zero or More — Star Quantifier Matches zero or more occurrences of the preceding subexpression. By default, a quantifier match is greedy because it matches as many times as possible while still allowing the rest of the match to succeed. The expression ab*c matches the strings ac, abc, and abbc, but does not match abb.
{m} Interval—Exact Count Matches exactly m occurrences of the preceding subexpression. The expression a{3} matches the strings aaa, but does not match aa.
{m,} Interval—At Least Count Matches at least m occurrences of the preceding subexpression. The expression a{3,} matches the strings aaaand aaaa, but does not match aa.
{m,n} Interval—Between Count Matches at least m, but not more than noccurrences of the preceding subexpression. The expression a{3,5}matches the strings aaa, aaaa, and aaaaa, but does not match aa.
[ … ] Matching Character List Matches any single character in the list within the brackets. The expression [abc] matches the first character in the strings all, bill, and cold, but does not match any characters in doll.
[^ … ] Non-Matching Character List Matches any single character not in the list within the brackets. Characters not in the non-matching character list are returned as a match. Refer to the description of the Matching Character List operator for an account of metacharacters allowed in the character list. The expression [^abc] matches the character d in the string abcdef, but not the character a, b, or c. The expression [^abc]+matches the sequence defin the string abcdef, but not a, b, or c.

The expression [^a-i] excludes any character between a and i from the search result. This expression matches the character j in the string hij, but does not match any characters in the string abcdefghi.

| Or Matches one of the alternatives. The expression a|b matches character a or character b.
( … ) Subexpression or Grouping Treats the expression within parentheses as a unit. The subexpression can be a string of literals or a complex expression containing operators. The expression (abc)?def matches the optional string abc, followed by def. Thus, the expression matches abcdefghi and def, but does not match ghi.
\n Backreference Matches the nth preceding subexpression, that is, whatever is grouped within parentheses, where n is an integer from 1 to 9. The parentheses cause an expression to be remembered; a backreference refers to it. A backreference counts subexpressions from left to right, starting with the opening parenthesis of each preceding subexpression. The expression is invalid if the source string contains fewer than n subexpressions preceding the \n.

Oracle supports the backreference expression in the regular expression pattern and the replacement string of the REGEXP_REPLACEfunction.

The expression (abc|def)xy\1 matches the strings abcxyabc and defxydef, but does not match abcxydef or abcxy.

A backreference enables you to search for a repeated string without knowing the actual string ahead of time. For example, the expression ^(.*)\1$matches a line consisting of two adjacent instances of the same string.

\ Escape Character Treats the subsequent metacharacter in the expression as a literal. Use a backslash (\) to search for a character that is normally treated as a metacharacter. Use consecutive backslashes (\\) to match the backslash literal itself. The expression \+searches for the plus character (+). It matches the plus character in the string abc+def, but does not match abcdef.
^ Beginning of Line Anchor Matches the beginning of a string (default). In multiline mode, it matches the beginning of any line within the source string. The expression ^def matches def in the string defghibut does not match def in abcdef.
$ End of Line Anchor Matches the end of a string (default). In multiline mode, it matches the beginning of any line within the source string. The expression def$ matches def in the string abcdefbut does not match def in the string defghi.
[:class:] POSIX Character Class Matches any character belonging to the specified POSIX character class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported. The expression [[:upper:]]+searches for one or more consecutive uppercase characters. This expression matches DEF in the string abcDEFghi but does not match the string abcdefghi.
[.element.] POSIX Collating Element Operator Specifies a collating element to use in the regular expression. The element must be a defined collating element in the current locale. Use any collating element defined in the locale, including single-character and multicharacter elements. The NLS_SORT initialization parameter determines supported collation elements.This operator lets you use a multicharacter collating element in cases where only one character would otherwise be allowed. For example, you can ensure that the collating element ch, when defined in a locale such as Traditional Spanish, is treated as one character in operations that depend on the ordering of characters. The expression [[.ch.]] searches for the collating element chand matches ch in string chabc, but does not match cdefg. The expression [a-[.ch.]]specifies the range a to ch.
[=character=] POSIX Character Equivalence Class Matches all characters that are members of the same character equivalence class in the current locale as the specified character.

The character equivalence class must occur within a character list, so the character equivalence class is always nested within the brackets for the character list in the regular expression.

Usage of character equivalents depends on how canonical rules are defined for your database locale. Refer to the Oracle Database Globalization Support Guide for more information on linguistic sorting and string searching.

The expression [[=n=]] searches for characters equivalent to nin a Spanish locale. It matches both N and ñ in the string El Niño.

Related Articles

How to use google translate URL in Oracle plsql

How to work with date in Oracle sql

Oracle PLSQL Tables

approx_count_distinct : Oracle Database 12c

Oracle sql decode processing

Oracle Case Statement Explained with Tips and Examples


Leave a Reply