PCRE_SPLIT table function¶
Searches for all occurrences of regular expression PATTERN in TEXT, returning a table of all matches and the text between each match.
Prototypes¶
PCRE_SPLIT(PATTERN VARCHAR(1000), TEXT VARCHAR(4000))
RETURNS TABLE(
ELEMENT INTEGER,
SEPARATOR INTEGER,
POSITION INTEGER,
CONTENT VARCHAR(4000)
)
Description¶
PCRE string splitting function. Given a regular expression in PATTERN, and some text in TEXT, the function searches for every occurence of PATTERN in TEXT and breaks TEXT into chunks based on those matches. Each chunk is returned as a row in the result table which details whether or not the chunk was a result of a match, or text between the match.
Parameters¶
- PATTERN
- The Perl-compatible Regular Expression (PCRE) to search for.
- TEXT
- The text to search within.
Returns¶
- ELEMENT
- The 1-based index of the chunk. Note that there are usually two rows for each index, one where SEPARATOR is zero and another where SEPARATOR is one. Therefore, one could consider the key of the result table to be (ELEMENT, SEPARATOR)
- SEPARATOR
- Contains 1 if the row represents a match for PATTERN, and 0 if the row represents text between matches.
- POSITION
- The 1-based position of CONTENT within the original TEXT parameter.
- CONTENT
- The extract from TEXT.
Examples¶
An example demonstrating a simple split. Note that a row is still returned for the “missing” value, albeit with an empty CONTENT value:
SELECT
T.ELEMENT,
T.SEPARATOR,
T.POSITION,
T.CONTENT
FROM
TABLE(
PCRE_SPLIT(':', 'A:B:C::E')
) AS T
ELEMENT SEPARATOR POSITION CONTENT
------- --------- -------- -------------------
1 0 1 A
1 1 2 :
2 0 3 B
2 1 4 :
3 0 5 C
3 1 6 :
4 0 7
4 1 7 :
5 0 8 E
An example demonstrating a very rudimentary CSV parser. Note that to keep things simple, we actually treat the separator pattern as the data here, filter out the interleaved commas and remove the quotes surrounding delimited values:
SELECT
T.ELEMENT,
CASE WHEN LEFT(T.CONTENT, 1) = '"'
THEN SUBSTR(T.CONTENT, 2, LENGTH(T.CONTENT) - 2)
ELSE T.CONTENT
END AS CONTENT
FROM
TABLE(
PCRE_SPLIT('([^",][^,]*|"[^"]*")', '"Some",CSV,",data"')
) AS T
WHERE
T.SEPARATOR = 1
ELEMENT CONTENT
------- -------------------
1 Some
2 CSV
3 ,data