# Filtering

Let's start with an unfiltered time series:

```ipython3
import onetick.py as otp

s = otp.dt(2024, 2, 1, 9, 30)
e = otp.dt(2024, 2, 1, 9, 30, 1)

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                             Time    PRICE  SIZE  COND EXCHANGE
0   2024-02-01 09:30:00.000961260  184.010   302  @FT         P
1   2024-02-01 09:30:00.000961491  184.000   100  @FT         P
2   2024-02-01 09:30:00.000961701  184.000     1  @FTI        P
3   2024-02-01 09:30:00.000973163  184.000     1  @FTI        P
4   2024-02-01 09:30:00.000973355  184.000     5  @FTI        P
..                            ...      ...   ...   ...      ...
574 2024-02-01 09:30:00.987184691  183.900     9  @F I        K
575 2024-02-01 09:30:00.990378350  183.920     1  @  I        D
576 2024-02-01 09:30:00.991941892  183.935     1  @  I        D
577 2024-02-01 09:30:00.993785116  183.905   300  @           D
578 2024-02-01 09:30:00.996512511  183.934     5  @  I        D

[579 rows x 5 columns]
```

We can filter by the value of a field using method [`where`](../../api/source/where.md#onetick.py.Source.where):

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where(q['EXCHANGE'] == 'K')
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                            Time   PRICE  SIZE  COND EXCHANGE
0  2024-02-01 09:30:00.080154806  183.95     1  @  I        K
1  2024-02-01 09:30:00.080155543  183.95     5  @  I        K
2  2024-02-01 09:30:00.080534432  183.90     2  @  I        K
3  2024-02-01 09:30:00.080893236  183.97    15  @F I        K
4  2024-02-01 09:30:00.080984540  183.90    86  @  I        K
..                           ...     ...   ...   ...      ...
66 2024-02-01 09:30:00.954126905  183.92     2  @F I        K
67 2024-02-01 09:30:00.954404272  183.92     5  @F I        K
68 2024-02-01 09:30:00.954697550  183.92     2  @F I        K
69 2024-02-01 09:30:00.985733751  183.90    19  @F I        K
70 2024-02-01 09:30:00.987184691  183.90     9  @F I        K

[71 rows x 5 columns]
```

Note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.

Filtering for a specific trade condition can be done with the string matching methods:

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where(q['COND'].str.contains('I'))
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                             Time    PRICE  SIZE  COND EXCHANGE
0   2024-02-01 09:30:00.000961701  184.000     1  @FTI        P
1   2024-02-01 09:30:00.000973163  184.000     1  @FTI        P
2   2024-02-01 09:30:00.000973355  184.000     5  @FTI        P
3   2024-02-01 09:30:00.000973517  184.000     5  @FTI        P
4   2024-02-01 09:30:00.000973674  184.000     5  @FTI        P
..                            ...      ...   ...   ...      ...
464 2024-02-01 09:30:00.985733751  183.900    19  @F I        K
465 2024-02-01 09:30:00.987184691  183.900     9  @F I        K
466 2024-02-01 09:30:00.990378350  183.920     1  @  I        D
467 2024-02-01 09:30:00.991941892  183.935     1  @  I        D
468 2024-02-01 09:30:00.996512511  183.934     5  @  I        D

[469 rows x 5 columns]
```

Trade filter that limits attention to on-exchange continuous trading trades looks like this (it's used when creating bars):

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.character_present(q['COND'], 'O6TUHILNRWZ47QMBCGPV', discard_on_match=True)
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                            Time    PRICE  SIZE  COND EXCHANGE
0  2024-02-01 09:30:00.080096628  183.985   154  @F          P
1  2024-02-01 09:30:00.080109023  183.900   150  @F          P
2  2024-02-01 09:30:00.080109175  183.900   477  @F          P
3  2024-02-01 09:30:00.080904251  183.900   523  @F          P
4  2024-02-01 09:30:00.081016071  183.900   100  @           P
..                           ...      ...   ...   ...      ...
80 2024-02-01 09:30:00.795482321  183.945   100  @           D
81 2024-02-01 09:30:00.796118938  183.920   100  @F          U
82 2024-02-01 09:30:00.833291631  183.910   100  @F          Q
83 2024-02-01 09:30:00.945031779  183.920   100  @           K
84 2024-02-01 09:30:00.993785116  183.905   300  @           D

[85 rows x 5 columns]
```

Filters can include 'and' and/or 'or' clauses:

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where((q['EXCHANGE'] == 'K') & (q['PRICE'] > 183.950))
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                            Time   PRICE  SIZE  COND EXCHANGE
0  2024-02-01 09:30:00.080893236  183.97    15  @F I        K
1  2024-02-01 09:30:00.153161386  183.98    50  @F I        K
2  2024-02-01 09:30:00.153953889  183.98    29  @F I        K
3  2024-02-01 09:30:00.154413668  183.98    21  @F I        K
4  2024-02-01 09:30:00.155845868  183.98    50  @F I        K
5  2024-02-01 09:30:00.161347824  183.98    29  @F I        K
6  2024-02-01 09:30:00.161368459  183.98    21  @F I        K
7  2024-02-01 09:30:00.260713608  183.98    50  @F I        K
8  2024-02-01 09:30:00.260714836  183.99    25  @F I        K
9  2024-02-01 09:30:00.270803260  184.01    16  @F I        K
10 2024-02-01 09:30:00.277375099  184.00     2  @  I        K
11 2024-02-01 09:30:00.297914416  183.98    48  @F I        K
12 2024-02-01 09:30:00.362168043  184.01   100  @           K
13 2024-02-01 09:30:00.614422140  183.96    25  @F I        K
14 2024-02-01 09:30:00.621498637  183.96    52  @  I        K
15 2024-02-01 09:30:00.621587688  183.96   382  @           K
16 2024-02-01 09:30:00.623837515  183.96    25  @  I        K
17 2024-02-01 09:30:00.623851984  183.96     1  @  I        K
```

We can also return two branches after filtering with method [`__getitem__`](../../api/source/__getitem__.md#onetick.py.Source.__getitem__) (or [`where_clause`](../../api/source/where_clause.md#onetick.py.Source.where_clause)).
The first branch contains all ticks that satisfy the condition and the other branch contains all ticks that do not satisfy the condition:

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
exchange_k, other = q[q['EXCHANGE'] == 'K']
otp.run(exchange_k, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                            Time   PRICE  SIZE  COND EXCHANGE
0  2024-02-01 09:30:00.080154806  183.95     1  @  I        K
1  2024-02-01 09:30:00.080155543  183.95     5  @  I        K
2  2024-02-01 09:30:00.080534432  183.90     2  @  I        K
3  2024-02-01 09:30:00.080893236  183.97    15  @F I        K
4  2024-02-01 09:30:00.080984540  183.90    86  @  I        K
..                           ...     ...   ...   ...      ...
66 2024-02-01 09:30:00.954126905  183.92     2  @F I        K
67 2024-02-01 09:30:00.954404272  183.92     5  @F I        K
68 2024-02-01 09:30:00.954697550  183.92     2  @F I        K
69 2024-02-01 09:30:00.985733751  183.90    19  @F I        K
70 2024-02-01 09:30:00.987184691  183.90     9  @F I        K

[71 rows x 5 columns]
```

```ipython3
otp.run(other, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                             Time    PRICE  SIZE  COND EXCHANGE
0   2024-02-01 09:30:00.000961260  184.010   302  @FT         P
1   2024-02-01 09:30:00.000961491  184.000   100  @FT         P
2   2024-02-01 09:30:00.000961701  184.000     1  @FTI        P
3   2024-02-01 09:30:00.000973163  184.000     1  @FTI        P
4   2024-02-01 09:30:00.000973355  184.000     5  @FTI        P
..                            ...      ...   ...   ...      ...
503 2024-02-01 09:30:00.972086329  183.935     1  @  I        D
504 2024-02-01 09:30:00.990378350  183.920     1  @  I        D
505 2024-02-01 09:30:00.991941892  183.935     1  @  I        D
506 2024-02-01 09:30:00.993785116  183.905   300  @           D
507 2024-02-01 09:30:00.996512511  183.934     5  @  I        D

[508 rows x 5 columns]
```

We can retrieve ticks by index with the help of
[`first`](../../api/source/first.md#onetick.py.Source.first) and [`last`](../../api/source/last.md#onetick.py.Source.last).

The example below returns the second tick:

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get the last of the first two ticks, thus getting the second tick
q = q.first(2)
q = q.last()
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                           Time  PRICE  SIZE  COND EXCHANGE
0 2024-02-01 09:30:00.000961491  184.0   100  @FT         P
```

Python-like slice syntax is also supported:

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get last three ticks
q = q[-3:]
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                           Time    PRICE  SIZE  COND EXCHANGE
0 2024-02-01 09:30:00.991941892  183.935     1  @  I        D
1 2024-02-01 09:30:00.993785116  183.905   300  @           D
2 2024-02-01 09:30:00.996512511  183.934     5  @  I        D
```
