otp.Operation.str.extract#
- extract(pat, rewrite='\\0', caseless=False)#
 Match the string against a regular expression specified by
patand return the first match. Therewriteparameter can optionally be used to arrange the matched substrings and embed them within the string specified inrewrite.- Parameters
 pat (str or Column or Operation) – Pattern to search for specified via the POSIX extended regular expression syntax.
rewrite (str or Column or Operation) – A string that specifies how to arrange the matched text.
\\0refers to the entire matched text.\\1to\\9refer to the text matched by the corresponding parenthesized group inpat.\\uand\\lmodifiers within therewritestring convert the case of the text that matches the corresponding parenthesized group (e.g.,\\u1converts\\1to uppercase).caseless (bool) – If the
caselessflag is set toTrue, matching is case-insensitive.
- Returns
 String matched by
patwith format specified inrewrite.- Return type
 
Examples
>>> data = otp.Ticks(X=["Mr. Smith: +1348 +4781", "Ms. Smith: +8971"]) >>> data["TEL"] = data["X"].str.extract(r"\+\d{4}") >>> otp.run(data)["TEL"] 0 +1348 1 +8971 Name: TEL, dtype: object
You can specify the group to extract in the rewrite param
>>> data = otp.Ticks(X=["Mr. Smith: 1992/12/22", "Ms. Smith: 1989/10/15"]) >>> data["BIRTH_YEAR"] = data["X"].str.extract(r"(\d{4})/(\d{2})/(\d{2})", rewrite="birth year: \\1") >>> otp.run(data)["BIRTH_YEAR"] 0 birth year: 1992 1 birth year: 1989 Name: BIRTH_YEAR, dtype: object
You can use a column as a rewrite format orand pattern
>>> data = otp.Ticks(X=["Kelly, Mr. James", "Wilkes, Mrs. James", "Connolly, Miss. Kate"], ... PAT=["(Mrs?)\.", "(Mrs?)\.", "(Miss)\."], ... REWRITE=["Title 1: \\1", "Title 2: \\1", "Title 3: \\1"]) >>> data["TITLE"] = data["X"].str.extract(data["PAT"], rewrite=data["REWRITE"]) >>> otp.run(data)["TITLE"] 0 Title 1: Mr 1 Title 2: Mrs 2 Title 3: Miss Name: TITLE, dtype: object
Case of the extracted string can be changed by adding l and u to extract group
>>> data = otp.Ticks(NAME=["mr. BroWn", "Ms. smITh"]) >>> data["NAME"] = data["NAME"].str.extract(r"(m)([rs]\. )([a-z])([a-z]*)", r"\u1\l2\u3\l4", caseless=True) >>> otp.run(data)["NAME"] 0 Mr. Brown 1 Ms. Smith Name: NAME, dtype: object
See also