# otp.Operation.str.extract

### extract(pat, rewrite='\\\\0', caseless=False)

Match the string against a regular expression specified by `pat` and return the first match.
The `rewrite` parameter can optionally be used to arrange the matched substrings and embed them within the
string specified in `rewrite`.

* **Parameters:**
  * **pat** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*Column*](../root.md#onetick.py.Column) *or* [*Operation*](../root.md#onetick.py.Operation)) -- Pattern to search for specified via the POSIX extended regular expression syntax.
  * **rewrite** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*Column*](../root.md#onetick.py.Column) *or* [*Operation*](../root.md#onetick.py.Operation)) -- A string that specifies how to arrange the matched text. `\0` refers to the entire matched text.
    `\1` to `\9` refer to the text matched by the corresponding parenthesized group in `pat`.
    `\u` and `\l` modifiers within the `rewrite` string convert the case of the text that
    matches the corresponding parenthesized group (e.g., `\u1` converts `\1` to uppercase).
  * **caseless** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) -- If the `caseless` flag is set to `True`, matching is case-insensitive.
* **Returns:**
  String matched by `pat` with format specified in `rewrite`.
* **Return type:**
  [Operation](../root.md#onetick.py.Operation)

### Examples

```pycon
>>> data = otp.Ticks(X=['Mr. Smith: +1348 +4781', 'Ms. Smith: +8971'])
>>> data['TEL'] = data['X'].str.extract(r'\+\d{4}')
>>> otp.run(data)
                     Time                       X    TEL
0 2003-12-01 00:00:00.000  Mr. Smith: +1348 +4781  +1348
1 2003-12-01 00:00:00.001        Ms. Smith: +8971  +8971
```

You can specify the group to extract in the `rewrite` parameter:

```pycon
>>> data = otp.Ticks(X=['Mr. Smith: 1992/12/22', 'Ms. Smith: 1989/10/15'])
>>> data['BIRTH_YEAR'] = data['X'].str.extract(r'(\d{4})/(\d{2})/(\d{2})', rewrite=r'birth year: \1')
>>> otp.run(data)
                     Time                      X        BIRTH_YEAR
0 2003-12-01 00:00:00.000  Mr. Smith: 1992/12/22  birth year: 1992
1 2003-12-01 00:00:00.001  Ms. Smith: 1989/10/15  birth year: 1989
```

You can use a column as a `rewrite` or `pat` parameter:

```pycon
>>> data = otp.Ticks(X=['Kelly, Mr. James', 'Wilkes, Mrs. James', 'Connolly, Miss. Kate'],
...                  PAT=['(Mrs?)\\.', '(Mrs?)\\.', '(Miss)\\.'],
...                  REWRITE=['Title 1: \\1', 'Title 2: \\1', 'Title 3: \\1'])
>>> data['TITLE'] = data['X'].str.extract(data['PAT'], rewrite=data['REWRITE'])
>>> otp.run(data)
                     Time                     X       PAT      REWRITE          TITLE
0 2003-12-01 00:00:00.000      Kelly, Mr. James  (Mrs?)\.  Title 1: \1  Title 1:   Mr
1 2003-12-01 00:00:00.001    Wilkes, Mrs. James  (Mrs?)\.  Title 2: \1  Title 2:  Mrs
2 2003-12-01 00:00:00.002  Connolly, Miss. Kate  (Miss)\.  Title 3: \1  Title 3: Miss
```

Case of the extracted string can be changed by adding `l` and `u` to extract group:

```pycon
>>> data = otp.Ticks(NAME=['mr. BroWn', 'Ms. smITh'])
>>> data['RESULT'] = data['NAME'].str.extract(r'(m)([rs]\. )([a-z])([a-z]*)', r'\u1\l2\u3\l4', caseless=True)
>>> otp.run(data)
                     Time       NAME     RESULT
0 2003-12-01 00:00:00.000  mr. BroWn  Mr. Brown
1 2003-12-01 00:00:00.001  Ms. smITh  Ms. Smith
```

#### SEE ALSO
[`regex_replace`](regex_replace.md#onetick.py.core.column_operations.accessors.str_accessor.regex_replace)
