[Solved-2 Solutions] Regex matching in pig ?
REGEX_EXTRACT
- It Performs regular expression matching and extracts the matched group defined by an index parameter.
Syntax
Terms
String - The string in which to perform the match.
Regex - The regular expression.
index - The index of the matched group to return.
Usage
- Use the
REGEX_EXTRACT_ALL
function to perform regular expression matching and to extract all matched groups. The function uses Java regular expression form. - The function returns a tuple where each field represents a matched expression. If there is no match, an empty tuple is returned.
Example
This example will return the tuple (192.168.1.5,8020).
Problem:
- Using apache pig and the text
- It is an example of matching "my brother just didnt do anything wrong."
- If you want to match anything beginning with "my brother just" and end with either punctuation(end of sentence) or EOL.
Looking at the pig docs, and then following the link to java.util.regex.Pattern
Solution 1:
- In this case we want to match only up to the first punctuation mark.
- To solve we can use the quantifier
- Note that the use of ? Here is different from its use as a quantifier where it means 'match zero or one'.
Solution 2:
Try Below Expression
- It looks like that our expression wanted the my brother part to be the begining of the string, but in the example it's in the middle of the string so we have to account for everything before my brother.