Filtering#
Let’s start with an unfiltered time series.
import onetick.py as otp
s = otp.dt(2023, 5, 15, 9, 30)
e = otp.dt(2023, 5, 15, 9, 30, 1)
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000178688 | 412.22 | 100 | T | P |
1 | 2023-05-15 09:30:00.000776704 | 412.22 | 247 | Z | |
2 | 2023-05-15 09:30:00.003603456 | 412.22 | 100 | T | T |
3 | 2023-05-15 09:30:00.006352128 | 412.24 | 1 | I | K |
4 | 2023-05-15 09:30:00.007128064 | 412.24 | 3 | I | K |
... | ... | ... | ... | ... | ... |
310 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | T | T |
311 | 2023-05-15 09:30:00.975609344 | 412.24 | 2 | I | D |
312 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 | I | D |
313 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | T | |
314 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | Q | T |
315 rows × 5 columns
We can filter by the value of a field.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['EXCHANGE'] == 'T']
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.003603456 | 412.22 | 100 | T | T |
1 | 2023-05-15 09:30:00.033275392 | 412.24 | 58 | FTI | T |
2 | 2023-05-15 09:30:00.033278464 | 412.25 | 100 | FT | T |
3 | 2023-05-15 09:30:00.054481920 | 412.25 | 100 | T | T |
4 | 2023-05-15 09:30:00.124218112 | 412.24 | 200 | T | T |
5 | 2023-05-15 09:30:00.124500992 | 412.24 | 147 | FT | T |
6 | 2023-05-15 09:30:00.124581888 | 412.24 | 574 | T | T |
7 | 2023-05-15 09:30:00.124927744 | 412.24 | 147 | FT | T |
8 | 2023-05-15 09:30:00.125032960 | 412.24 | 147 | FT | T |
9 | 2023-05-15 09:30:00.125264640 | 412.24 | 147 | FT | T |
10 | 2023-05-15 09:30:00.125298176 | 412.24 | 500 | T | T |
11 | 2023-05-15 09:30:00.125324800 | 412.24 | 790 | T | T |
12 | 2023-05-15 09:30:00.125357824 | 412.24 | 233 | T | T |
13 | 2023-05-15 09:30:00.125920256 | 412.24 | 147 | FT | T |
14 | 2023-05-15 09:30:00.126912512 | 412.24 | 100 | FT | T |
15 | 2023-05-15 09:30:00.126939648 | 412.23 | 100 | T | T |
16 | 2023-05-15 09:30:00.135816960 | 412.24 | 400 | FT | T |
17 | 2023-05-15 09:30:00.150320384 | 412.24 | 500 | T | T |
18 | 2023-05-15 09:30:00.156800768 | 412.24 | 100 | T | T |
19 | 2023-05-15 09:30:00.168296960 | 412.24 | 240 | FT | T |
20 | 2023-05-15 09:30:00.169088000 | 412.22 | 15 | TI | T |
21 | 2023-05-15 09:30:00.176428544 | 412.22 | 85 | TI | T |
22 | 2023-05-15 09:30:00.176437504 | 412.22 | 15 | TI | T |
23 | 2023-05-15 09:30:00.204837376 | 412.24 | 100 | T | T |
24 | 2023-05-15 09:30:00.260433664 | 412.25 | 100 | FT | T |
25 | 2023-05-15 09:30:00.260436736 | 412.25 | 100 | FT | T |
26 | 2023-05-15 09:30:00.277258752 | 412.25 | 5 | TI | T |
27 | 2023-05-15 09:30:00.277670656 | 412.25 | 100 | T | T |
28 | 2023-05-15 09:30:00.307667200 | 412.25 | 100 | FT | T |
29 | 2023-05-15 09:30:00.315053056 | 412.25 | 100 | T | T |
30 | 2023-05-15 09:30:00.315116032 | 412.25 | 100 | T | T |
31 | 2023-05-15 09:30:00.315614976 | 412.25 | 100 | T | T |
32 | 2023-05-15 09:30:00.315909888 | 412.25 | 95 | TI | T |
33 | 2023-05-15 09:30:00.320802304 | 412.25 | 11 | TI | T |
34 | 2023-05-15 09:30:00.321833984 | 412.25 | 89 | FTI | T |
35 | 2023-05-15 09:30:00.342056960 | 412.25 | 100 | T | T |
36 | 2023-05-15 09:30:00.342105088 | 412.25 | 100 | T | T |
37 | 2023-05-15 09:30:00.342505216 | 412.25 | 100 | T | T |
38 | 2023-05-15 09:30:00.352696320 | 412.25 | 100 | T | T |
39 | 2023-05-15 09:30:00.353129472 | 412.25 | 100 | T | T |
40 | 2023-05-15 09:30:00.406870528 | 412.25 | 27 | TI | T |
41 | 2023-05-15 09:30:00.421151744 | 412.25 | 100 | T | T |
42 | 2023-05-15 09:30:00.526528768 | 412.25 | 100 | T | T |
43 | 2023-05-15 09:30:00.650535168 | 412.24 | 1 | TI | T |
44 | 2023-05-15 09:30:00.787749120 | 412.24 | 1000 | T | T |
45 | 2023-05-15 09:30:00.931313920 | 412.27 | 100 | FT | T |
46 | 2023-05-15 09:30:00.934032640 | 412.27 | 8 | TI | T |
47 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | T | T |
48 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | T | |
49 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | Q | T |
Note that the result of a filter expression returns 2 values. The first value is the query returning ticks that satisfy the conditions. The second value is the query returning ticks that do not satisfy the conditions. Most of the time we’d be interested just in the ticks that pass the filter and can use a placeholder _
for the second value.
Also, note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.
Filtering for a specific trade condition is done with the string matching methods.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.contains('I')]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.006352128 | 412.24 | 1 | I | K |
1 | 2023-05-15 09:30:00.007128064 | 412.24 | 3 | I | K |
2 | 2023-05-15 09:30:00.007870976 | 412.24 | 1 | I | K |
3 | 2023-05-15 09:30:00.008099328 | 412.24 | 1 | I | K |
4 | 2023-05-15 09:30:00.009025536 | 412.24 | 1 | I | K |
... | ... | ... | ... | ... | ... |
196 | 2023-05-15 09:30:00.933644288 | 412.26 | 7 | F I | K |
197 | 2023-05-15 09:30:00.933821952 | 412.26 | 8 | F I | H |
198 | 2023-05-15 09:30:00.934032640 | 412.27 | 8 | TI | T |
199 | 2023-05-15 09:30:00.975609344 | 412.24 | 2 | I | D |
200 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 | I | D |
201 rows × 5 columns
Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars).
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.match('^[^O6TUHILNRWZ47QMBCGPV]*$')]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000776704 | 412.220 | 247 | Z | |
1 | 2023-05-15 09:30:00.019069440 | 412.230 | 100 | F | K |
2 | 2023-05-15 09:30:00.033083136 | 412.250 | 100 | F | K |
3 | 2023-05-15 09:30:00.070456064 | 412.240 | 203 | Z | |
4 | 2023-05-15 09:30:00.119806720 | 412.230 | 130 | Z | |
... | ... | ... | ... | ... | ... |
66 | 2023-05-15 09:30:00.874213888 | 412.240 | 100 | D | |
67 | 2023-05-15 09:30:00.875233280 | 412.240 | 149 | D | |
68 | 2023-05-15 09:30:00.931127552 | 412.260 | 100 | F | P |
69 | 2023-05-15 09:30:00.932179968 | 412.255 | 160 | D | |
70 | 2023-05-15 09:30:00.985391616 | 412.280 | 100 | T |
71 rows × 5 columns
Filters can include ‘and’ and/or ‘or’ clauses.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[(q['EXCHANGE'] == 'T') & (q['PRICE'] > 412.25)]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.931313920 | 412.27 | 100 | FT | T |
1 | 2023-05-15 09:30:00.934032640 | 412.27 | 8 | TI | T |
2 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | T | T |
3 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | T | |
4 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | Q | T |