Filtering#

Let’s start with an unfiltered time series.

import onetick.py as otp

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=otp.dt(2023, 3, 29, 9, 30), end=otp.dt(2023, 3, 29, 16), symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-03-29 09:30:00.000877568 399.920 400 T P
1 2023-03-29 09:30:00.001151232 399.920 1000 T T
2 2023-03-29 09:30:00.001154304 399.920 1000 T T
3 2023-03-29 09:30:00.001921280 399.930 657 T T
4 2023-03-29 09:30:00.010831360 399.925 100 F Z
... ... ... ... ... ...
537609 2023-03-29 15:59:59.994555136 401.350 643 F P
537610 2023-03-29 15:59:59.995045376 401.350 900 F P
537611 2023-03-29 15:59:59.997313024 401.340 100 Z
537612 2023-03-29 15:59:59.997354752 401.340 498 N
537613 2023-03-29 15:59:59.997406208 401.340 200 T

537614 rows × 5 columns

We can filter by the value of a field.

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['EXCHANGE'] == 'T']
otp.run(q, start=otp.dt(2023, 3, 29, 9, 30), end=otp.dt(2023, 3, 29, 16), symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-03-29 09:30:00.001151232 399.92 1000 T T
1 2023-03-29 09:30:00.001154304 399.92 1000 T T
2 2023-03-29 09:30:00.001921280 399.93 657 T T
3 2023-03-29 09:30:00.034302720 399.92 100 T T
4 2023-03-29 09:30:00.050233856 399.93 10 TI T
... ... ... ... ... ...
85509 2023-03-29 15:59:59.990925568 401.35 297 F T
85510 2023-03-29 15:59:59.992829184 401.35 500 T
85511 2023-03-29 15:59:59.993334272 401.35 257 F T
85512 2023-03-29 15:59:59.993776384 401.35 200 F T
85513 2023-03-29 15:59:59.997406208 401.34 200 T

85514 rows × 5 columns

Note that the result of a filter expression returns 2 values. The first value is the query returning ticks that satisfy the conditions. The second value is the query returning ticks that do not satisfy the conditions. Most of the time we’d be interested just in the ticks that pass the filter and can use a placeholder _ for the second value.

Also, note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.

Filtering for a specific trade condition is done with the string matching methods.

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.contains('I')]
otp.run(q, start=otp.dt(2023, 3, 29, 9, 30), end=otp.dt(2023, 3, 29, 16), symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-03-29 09:30:00.012671744 399.920 15 F I K
1 2023-03-29 09:30:00.020451328 399.925 1 I Z
2 2023-03-29 09:30:00.024571136 399.925 1 I Z
3 2023-03-29 09:30:00.029166080 399.925 10 I Z
4 2023-03-29 09:30:00.050233856 399.930 10 TI T
... ... ... ... ... ...
231518 2023-03-29 15:59:59.948900352 401.340 1 I D
231519 2023-03-29 15:59:59.966633728 401.345 3 I P
231520 2023-03-29 15:59:59.990119680 401.350 3 I P
231521 2023-03-29 15:59:59.990203648 401.360 3 F I P
231522 2023-03-29 15:59:59.990712064 401.350 3 I P

231523 rows × 5 columns

Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars).

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.match('^[^O6TUHILNRWZ47QMBCGPV]*$')]
otp.run(q, start=otp.dt(2023, 3, 29, 9, 30), end=otp.dt(2023, 3, 29, 16), symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-03-29 09:30:00.010831360 399.925 100 F Z
1 2023-03-29 09:30:00.011159296 399.930 100 F K
2 2023-03-29 09:30:00.030200576 399.920 100 K
3 2023-03-29 09:30:00.035513856 399.925 100 Z
4 2023-03-29 09:30:00.060566528 399.920 200 K
... ... ... ... ... ...
305070 2023-03-29 15:59:59.994555136 401.350 643 F P
305071 2023-03-29 15:59:59.995045376 401.350 900 F P
305072 2023-03-29 15:59:59.997313024 401.340 100 Z
305073 2023-03-29 15:59:59.997354752 401.340 498 N
305074 2023-03-29 15:59:59.997406208 401.340 200 T

305075 rows × 5 columns

Filters can include ‘and’ and/or ‘or’ clauses.

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[(q['EXCHANGE'] == 'T') & (q['PRICE'] > 400)]
otp.run(q, start=otp.dt(2023, 3, 29, 9, 30), end=otp.dt(2023, 3, 29, 16), symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-03-29 09:48:06.039307008 400.03 5 F I T
1 2023-03-29 09:48:06.039391232 400.03 95 F I T
2 2023-03-29 09:48:06.039394304 400.03 200 F T
3 2023-03-29 09:48:06.039400192 400.03 100 T
4 2023-03-29 09:48:06.039481344 400.03 130 T
... ... ... ... ... ...
38832 2023-03-29 15:59:59.990925568 401.35 297 F T
38833 2023-03-29 15:59:59.992829184 401.35 500 T
38834 2023-03-29 15:59:59.993334272 401.35 257 F T
38835 2023-03-29 15:59:59.993776384 401.35 200 F T
38836 2023-03-29 15:59:59.997406208 401.34 200 T

38837 rows × 5 columns