Filtering#
Let’s start with an unfiltered time series.
import onetick.py as otp
s = otp.dt(2023, 5, 15, 9, 30)
e = otp.dt(2023, 5, 15, 9, 30, 1)
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=s, end=e, symbols=['SPY'])
/home/aserechenko/onetick-py/onetick/__init__.py:43: UserWarning: MAIN_ONE_TICK_DIR environment variable is not set. It is a recommended way to let onetick-py know where OneTick python libraries are located. We will try to use default value for your system: /opt/one_market_data/one_tick.
warnings.warn(message)
/home/aserechenko/onetick-py/onetick/__init__.py:43: UserWarning: MAIN_ONE_TICK_DIR environment variable is not set. It is a recommended way to let onetick-py know where OneTick python libraries are located. We will try to use default value for your system: /opt/one_market_data/one_tick.
warnings.warn(message)
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000178688 | 412.22 | 100 | T | P |
1 | 2023-05-15 09:30:00.000776704 | 412.22 | 247 | Z | |
2 | 2023-05-15 09:30:00.003603456 | 412.22 | 100 | T | T |
3 | 2023-05-15 09:30:00.006352128 | 412.24 | 1 | I | K |
4 | 2023-05-15 09:30:00.007128064 | 412.24 | 3 | I | K |
... | ... | ... | ... | ... | ... |
310 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | T | T |
311 | 2023-05-15 09:30:00.975609344 | 412.24 | 2 | I | D |
312 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 | I | D |
313 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | T | |
314 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | Q | T |
315 rows × 5 columns
We can filter by the value of a field.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['EXCHANGE'] == 'T']
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.003603456 | 412.22 | 100 | T | T |
1 | 2023-05-15 09:30:00.033275392 | 412.24 | 58 | FTI | T |
2 | 2023-05-15 09:30:00.033278464 | 412.25 | 100 | FT | T |
3 | 2023-05-15 09:30:00.054481920 | 412.25 | 100 | T | T |
4 | 2023-05-15 09:30:00.124218112 | 412.24 | 200 | T | T |
5 | 2023-05-15 09:30:00.124500992 | 412.24 | 147 | FT | T |
6 | 2023-05-15 09:30:00.124581888 | 412.24 | 574 | T | T |
7 | 2023-05-15 09:30:00.124927744 | 412.24 | 147 | FT | T |
8 | 2023-05-15 09:30:00.125032960 | 412.24 | 147 | FT | T |
9 | 2023-05-15 09:30:00.125264640 | 412.24 | 147 | FT | T |
10 | 2023-05-15 09:30:00.125298176 | 412.24 | 500 | T | T |
11 | 2023-05-15 09:30:00.125324800 | 412.24 | 790 | T | T |
12 | 2023-05-15 09:30:00.125357824 | 412.24 | 233 | T | T |
13 | 2023-05-15 09:30:00.125920256 | 412.24 | 147 | FT | T |
14 | 2023-05-15 09:30:00.126912512 | 412.24 | 100 | FT | T |
15 | 2023-05-15 09:30:00.126939648 | 412.23 | 100 | T | T |
16 | 2023-05-15 09:30:00.135816960 | 412.24 | 400 | FT | T |
17 | 2023-05-15 09:30:00.150320384 | 412.24 | 500 | T | T |
18 | 2023-05-15 09:30:00.156800768 | 412.24 | 100 | T | T |
19 | 2023-05-15 09:30:00.168296960 | 412.24 | 240 | FT | T |
20 | 2023-05-15 09:30:00.169088000 | 412.22 | 15 | TI | T |
21 | 2023-05-15 09:30:00.176428544 | 412.22 | 85 | TI | T |
22 | 2023-05-15 09:30:00.176437504 | 412.22 | 15 | TI | T |
23 | 2023-05-15 09:30:00.204837376 | 412.24 | 100 | T | T |
24 | 2023-05-15 09:30:00.260433664 | 412.25 | 100 | FT | T |
25 | 2023-05-15 09:30:00.260436736 | 412.25 | 100 | FT | T |
26 | 2023-05-15 09:30:00.277258752 | 412.25 | 5 | TI | T |
27 | 2023-05-15 09:30:00.277670656 | 412.25 | 100 | T | T |
28 | 2023-05-15 09:30:00.307667200 | 412.25 | 100 | FT | T |
29 | 2023-05-15 09:30:00.315053056 | 412.25 | 100 | T | T |
30 | 2023-05-15 09:30:00.315116032 | 412.25 | 100 | T | T |
31 | 2023-05-15 09:30:00.315614976 | 412.25 | 100 | T | T |
32 | 2023-05-15 09:30:00.315909888 | 412.25 | 95 | TI | T |
33 | 2023-05-15 09:30:00.320802304 | 412.25 | 11 | TI | T |
34 | 2023-05-15 09:30:00.321833984 | 412.25 | 89 | FTI | T |
35 | 2023-05-15 09:30:00.342056960 | 412.25 | 100 | T | T |
36 | 2023-05-15 09:30:00.342105088 | 412.25 | 100 | T | T |
37 | 2023-05-15 09:30:00.342505216 | 412.25 | 100 | T | T |
38 | 2023-05-15 09:30:00.352696320 | 412.25 | 100 | T | T |
39 | 2023-05-15 09:30:00.353129472 | 412.25 | 100 | T | T |
40 | 2023-05-15 09:30:00.406870528 | 412.25 | 27 | TI | T |
41 | 2023-05-15 09:30:00.421151744 | 412.25 | 100 | T | T |
42 | 2023-05-15 09:30:00.526528768 | 412.25 | 100 | T | T |
43 | 2023-05-15 09:30:00.650535168 | 412.24 | 1 | TI | T |
44 | 2023-05-15 09:30:00.787749120 | 412.24 | 1000 | T | T |
45 | 2023-05-15 09:30:00.931313920 | 412.27 | 100 | FT | T |
46 | 2023-05-15 09:30:00.934032640 | 412.27 | 8 | TI | T |
47 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | T | T |
48 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | T | |
49 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | Q | T |
Note that the result of a filter expression returns 2 values. The first value is the query returning ticks that satisfy the conditions. The second value is the query returning ticks that do not satisfy the conditions. Most of the time we’d be interested just in the ticks that pass the filter and can use a placeholder _
for the second value.
Also, note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.
Filtering for a specific trade condition is done with the string matching methods.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.contains('I')]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.006352128 | 412.24 | 1 | I | K |
1 | 2023-05-15 09:30:00.007128064 | 412.24 | 3 | I | K |
2 | 2023-05-15 09:30:00.007870976 | 412.24 | 1 | I | K |
3 | 2023-05-15 09:30:00.008099328 | 412.24 | 1 | I | K |
4 | 2023-05-15 09:30:00.009025536 | 412.24 | 1 | I | K |
... | ... | ... | ... | ... | ... |
196 | 2023-05-15 09:30:00.933644288 | 412.26 | 7 | F I | K |
197 | 2023-05-15 09:30:00.933821952 | 412.26 | 8 | F I | H |
198 | 2023-05-15 09:30:00.934032640 | 412.27 | 8 | TI | T |
199 | 2023-05-15 09:30:00.975609344 | 412.24 | 2 | I | D |
200 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 | I | D |
201 rows × 5 columns
Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars).
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.match('^[^O6TUHILNRWZ47QMBCGPV]*$')]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000776704 | 412.220 | 247 | Z | |
1 | 2023-05-15 09:30:00.019069440 | 412.230 | 100 | F | K |
2 | 2023-05-15 09:30:00.033083136 | 412.250 | 100 | F | K |
3 | 2023-05-15 09:30:00.070456064 | 412.240 | 203 | Z | |
4 | 2023-05-15 09:30:00.119806720 | 412.230 | 130 | Z | |
... | ... | ... | ... | ... | ... |
66 | 2023-05-15 09:30:00.874213888 | 412.240 | 100 | D | |
67 | 2023-05-15 09:30:00.875233280 | 412.240 | 149 | D | |
68 | 2023-05-15 09:30:00.931127552 | 412.260 | 100 | F | P |
69 | 2023-05-15 09:30:00.932179968 | 412.255 | 160 | D | |
70 | 2023-05-15 09:30:00.985391616 | 412.280 | 100 | T |
71 rows × 5 columns
Filters can include ‘and’ and/or ‘or’ clauses.
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[(q['EXCHANGE'] == 'T') & (q['PRICE'] > 412.25)]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.931313920 | 412.27 | 100 | FT | T |
1 | 2023-05-15 09:30:00.934032640 | 412.27 | 8 | TI | T |
2 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | T | T |
3 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | T | |
4 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | Q | T |
We can retrieve ticks by index with the help of
first
and last
.
The example below returns the second tick:
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get the last of the first two ticks, thus getting the second tick
q = q.first(2)
q = q.last()
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000776704 | 412.22 | 247 | Z |
Python-like slice syntax is also supported:
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get last three ticks
q = q[-3:]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 | I | D |
1 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | T | |
2 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | Q | T |