Top | Up

lib_mysqludf_preg

Documentation | Source package

lib_mysqludf_preg is a library of mysql UDFs (user-defined-functions) that provide access to the PCRE (perl compatible-regular-expressions) library for pattern matching. The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. This syntax can often handle more complex expressions and capturing than standard regular expression implementations. For more information about PCRE, please see: http://www.pcre.org/.

lib_mysqludf_preg is a useful performance optimization for those applications that are already performing these regular expression matches in a high level language (ie. PHP) on the client side. It is also helpful when there is a need to capture a parenthesized subexpression from a regular expression, or simply as a slight performance boost over the builtin RLIKE/REGEXP functions. In most cases, however, these functions should not be used as a replacement for something that can be indexed, such as LIKE 'foo%' or MATCH (...) AGAINST( +'foo' IN BOOLEAN MODE).

 

Installation

lib_mysqludf_preg is currently only distributed as a source package. The instructions below provide some information about how to configure and compile the library. Please consult the INSTALL file included with the source package for more details.

Prerequisites

These UDFs require that the libpcre headers and library are installed. For debian/ubuntu type systems, installing libpcre3-dev should be sufficient. (apt-get install libpcre3-dev)

Compilation

Most users should be able to simply type: ./configure ; make install

If mysql is an unsual place, you might need to add --with-mysql=<mysql directory>/bin/mysql_config.

Similarly, if licpcre is in an unusual place, --with-pcre can be added.

Example (for macosx using fink): ./configure --with-pcre=/sw --with-mysql=/sw/bin/mysql_config

Installing the functions

Provided the library has been installled into a directory that mysql server already has in its LD_LIBRARY_PATH, installation of the functions should be as easy as typing: make installdb. Any problems encountered are likely related to the server's environment and the installation directory. If no problems are encountered, type make test to perform some basic tests.

 

Functions

lib_mysqludf_preg currently offers two functions that interface with the PCRE library.

preg_rlike

Test if a string matches a perl-compatible regular expression

Function Installation

CREATE FUNCTION preg_rlike RETURNS INTEGER SONAME 'lib_mysqludf_preg.so';
Synopsis
PREG_RLIKE( pattern , subject )
Parameters:
pattern  - is a string that is a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php This expression passed to this function should have delimiters and can contain the standard perl modifiers after the ending delimiter.
subject  - is the data to perform the test on.
Returns:
1 - a match was found

0 - no match

Examples:

SELECT PREG_RLIKE('/The quick brown fox/i' , 'the quick brown fox' );

Yields:

   +---------------------------------------------------------------+
   | PREG_RLIKE('/The quick brown fox/i' , 'the quick brown fox' ) |
   +---------------------------------------------------------------+
   |                                                             1 |
   +---------------------------------------------------------------+
 * 

SELECT * from products WHERE PREG_RLIKE( '/organic/i' , products.title )

Yields: all of the products with 'organic' in their titles


preg_capture

Capture a parenthesized subexpression from a PCRE pattern

Function Installation
CREATE FUNCTION preg_capture RETURNS STRING SONAME 'lib_mysqludf_preg.so';
Synopsis
PREG_CAPTURE( pattern , subject, group )
Parameters:
pattern  - is a string that is a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php This expression passed to this function should have delimiters and can contain the standard perl modifiers after the ending delimiter.
subject  -is the data to perform the match & capture on
group  - is the capture group that should be returned. This can be a numeric capture group or a named capture group. Numeric groups should be passed in as INT while named groups should be strings.
Returns:
- string that was captured - if there was a match and the desired capture group is valid

- string that is the entire portion of subject which matches the pattern - if 0 is passed in as the group and pattern matches subject

- NULL - if pattern does not match the subject or group is not a valid capture group for the given pattern and subject.

Examples:

SELECT PREG_CAPTURE('/(.*?)(fox)/' , 'the quick brown fox' ,2 );

Yields:

+----------------------------------------------------------+
| PREG_CAPTURE('/(.*?)(fox)/' , 'the quick brown fox' ,2 ) |
+----------------------------------------------------------+
| fox                                                      | 
+----------------------------------------------------------+
 * 

SELECT PREG_CAPTURE('/(?:^|\\s)(organic)(.*?[^\\s]+)\s/i' ,products.title, 2 ) AS w FROM products HAVING w IS NOT NULL;

Yields: the word following organic in all of the product names

SELECT PREG_CAPTURE('/(?:^|\\s)(organic)(?P<follow>.*?[^\\s]+)\\s/i' ,products.title, 'follow' ) AS w FROM products HAVING w IS NOT NULL;

Yields: the same as above but with using named capture group

Note:
Remember to add a backslash to escape patterns that use \ notation

preg_replace

Perform regular expression search & replace using PCRE.

Function Installation
CREATE FUNCTION preg_replace RETURNS STRING SONAME 'lib_mysqludf_preg.so';
Synopsis
PREG_REPLACE( pattern , replacement , subject [ , limit ] )
Parameters:
pattern  - is a string that is a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php This expression passed to this function should have delimiters and can contain the standard perl modifiers after the ending delimiter.
replacement  - is the string to use as the replacement. This string may contain capture group references such as \1. You can also use $1 for these in a similar fashion as in PHP.
subject  -is the data to perform the match & replace on
limit  - optional number that is the maximum replacements to perform. Use -1 (or leave empty) for no limit.
Returns:
- string - 'subject' with the instances of pattern replaced

- string - the same as passed in if there were no matches

preg_replace is a udf that performs a regular expression search and replace on a given piece of data using a PCRE as the replacement pattern. If limit is not speficied or is -1, preg_replace works on all of the ocurrences of the pattern in the subject data. Otherwise, preg_replace will only replace the first <limit> occurences.

 

Examples:
SELECT PREG_REPLACE('/(.*?)(fox)/' , '$1dog' , 'the quick brown fox' );

Yields:

+-----------------------------------------------------------------+
| PREG_REPLACE('/(.*?)(fox)/' , '$1dog' , 'the quick brown fox' ) |
+-----------------------------------------------------------------+
| the quick brown dog                                             | 
+-----------------------------------------------------------------+
 * 

SELECT PREG_REPLACE('/\s\s/+', ' ' , products.title FROM products;

Yields: The product names with all of the extra whitespace removed

 

Note:
Remember to add a backslash to escape patterns that use \ notation. Also, using $ notation makes thinks a little clearer when using backreferences in the replacement.

 

lib_mysqludf_preg_info

Obtain version information for lib_mysqludf_preg package

Function Installation
CREATE FUNCTION lib_mysqludf_preg_info RETURNS STRING SONAME 'lib_mysqludf_preg.so' ;
Synopsis
LIB_MYSQLUDF_PREG_INFO()
Returns:
string - version information for the lib_mysqludf_preg package
Examples:
SELECT LIB_MYSQLUDF_PREG_INFO();
Yields:
+--------------------------+
| LIB_MYSQLUDF_PREG_INFO() |
+--------------------------+
| lib_mysqludf_preg 0.6.2  | 
+--------------------------+