Filtering#
Let’s start with an unfiltered time series.
import onetick.py as otp
otp.config['tz'] = 'EST5EDT'
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE','SIZE','COND','EXCHANGE']]
otp.run(q, start=otp.dt(2023,3,29,9,30), end=otp.dt(2023,3,29,16), symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-03-29 09:30:00.000877568 | 399.920 | 400 | T | P |
1 | 2023-03-29 09:30:00.001151232 | 399.920 | 1000 | T | T |
2 | 2023-03-29 09:30:00.001154304 | 399.920 | 1000 | T | T |
3 | 2023-03-29 09:30:00.001921280 | 399.930 | 657 | T | T |
4 | 2023-03-29 09:30:00.010831360 | 399.925 | 100 | F | Z |
... | ... | ... | ... | ... | ... |
537609 | 2023-03-29 15:59:59.994555136 | 401.350 | 643 | F | P |
537610 | 2023-03-29 15:59:59.995045376 | 401.350 | 900 | F | P |
537611 | 2023-03-29 15:59:59.997313024 | 401.340 | 100 | Z | |
537612 | 2023-03-29 15:59:59.997354752 | 401.340 | 498 | N | |
537613 | 2023-03-29 15:59:59.997406208 | 401.340 | 200 | T |
537614 rows × 5 columns
We can filter by the value of a field.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE','SIZE','COND','EXCHANGE']]
q, _ = q[q['EXCHANGE'] == 'T']
otp.run(q, start=otp.dt(2023,3,29,9,30), end=otp.dt(2023,3,29,16), symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-03-29 09:30:00.001151232 | 399.92 | 1000 | T | T |
1 | 2023-03-29 09:30:00.001154304 | 399.92 | 1000 | T | T |
2 | 2023-03-29 09:30:00.001921280 | 399.93 | 657 | T | T |
3 | 2023-03-29 09:30:00.034302720 | 399.92 | 100 | T | T |
4 | 2023-03-29 09:30:00.050233856 | 399.93 | 10 | TI | T |
... | ... | ... | ... | ... | ... |
85509 | 2023-03-29 15:59:59.990925568 | 401.35 | 297 | F | T |
85510 | 2023-03-29 15:59:59.992829184 | 401.35 | 500 | T | |
85511 | 2023-03-29 15:59:59.993334272 | 401.35 | 257 | F | T |
85512 | 2023-03-29 15:59:59.993776384 | 401.35 | 200 | F | T |
85513 | 2023-03-29 15:59:59.997406208 | 401.34 | 200 | T |
85514 rows × 5 columns
Note that the result of a filter expression returns 2 values. The first value is the query returning ticks that satisfy the conditions. The second value is the query returning ticks that do not satisfy the conditions. Most of the time we’d be interested just in the ticks that pass the filter and can use a placeholder _
for the second value.
Also, note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.
Filtering for a specific trade condition is done with the string matching methods.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE','SIZE','COND','EXCHANGE']]
q, _ = q[q['COND'].str.contains('I')]
otp.run(q, start=otp.dt(2023,3,29,9,30), end=otp.dt(2023,3,29,16), symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-03-29 09:30:00.012671744 | 399.920 | 15 | F I | K |
1 | 2023-03-29 09:30:00.020451328 | 399.925 | 1 | I | Z |
2 | 2023-03-29 09:30:00.024571136 | 399.925 | 1 | I | Z |
3 | 2023-03-29 09:30:00.029166080 | 399.925 | 10 | I | Z |
4 | 2023-03-29 09:30:00.050233856 | 399.930 | 10 | TI | T |
... | ... | ... | ... | ... | ... |
231518 | 2023-03-29 15:59:59.948900352 | 401.340 | 1 | I | D |
231519 | 2023-03-29 15:59:59.966633728 | 401.345 | 3 | I | P |
231520 | 2023-03-29 15:59:59.990119680 | 401.350 | 3 | I | P |
231521 | 2023-03-29 15:59:59.990203648 | 401.360 | 3 | F I | P |
231522 | 2023-03-29 15:59:59.990712064 | 401.350 | 3 | I | P |
231523 rows × 5 columns
Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars).
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE','SIZE','COND','EXCHANGE']]
q, _ = q[q['COND'].str.match('^[^O6TUHILNRWZ47QMBCGPV]*$')]
otp.run(q, start=otp.dt(2023,3,29,9,30), end=otp.dt(2023,3,29,16), symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-03-29 09:30:00.010831360 | 399.925 | 100 | F | Z |
1 | 2023-03-29 09:30:00.011159296 | 399.930 | 100 | F | K |
2 | 2023-03-29 09:30:00.030200576 | 399.920 | 100 | K | |
3 | 2023-03-29 09:30:00.035513856 | 399.925 | 100 | Z | |
4 | 2023-03-29 09:30:00.060566528 | 399.920 | 200 | K | |
... | ... | ... | ... | ... | ... |
305070 | 2023-03-29 15:59:59.994555136 | 401.350 | 643 | F | P |
305071 | 2023-03-29 15:59:59.995045376 | 401.350 | 900 | F | P |
305072 | 2023-03-29 15:59:59.997313024 | 401.340 | 100 | Z | |
305073 | 2023-03-29 15:59:59.997354752 | 401.340 | 498 | N | |
305074 | 2023-03-29 15:59:59.997406208 | 401.340 | 200 | T |
305075 rows × 5 columns
Filters can include ‘and’ and/or ‘or’ clauses.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE','SIZE','COND','EXCHANGE']]
q, _ = q[(q['EXCHANGE'] == 'T') & (q['PRICE'] > 400)]
otp.run(q, start=otp.dt(2023,3,29,9,30), end=otp.dt(2023,3,29,16), symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-03-29 09:48:06.039307008 | 400.03 | 5 | F I | T |
1 | 2023-03-29 09:48:06.039391232 | 400.03 | 95 | F I | T |
2 | 2023-03-29 09:48:06.039394304 | 400.03 | 200 | F | T |
3 | 2023-03-29 09:48:06.039400192 | 400.03 | 100 | T | |
4 | 2023-03-29 09:48:06.039481344 | 400.03 | 130 | T | |
... | ... | ... | ... | ... | ... |
38832 | 2023-03-29 15:59:59.990925568 | 401.35 | 297 | F | T |
38833 | 2023-03-29 15:59:59.992829184 | 401.35 | 500 | T | |
38834 | 2023-03-29 15:59:59.993334272 | 401.35 | 257 | F | T |
38835 | 2023-03-29 15:59:59.993776384 | 401.35 | 200 | F | T |
38836 | 2023-03-29 15:59:59.997406208 | 401.34 | 200 | T |
38837 rows × 5 columns