Filtering#

Let’s start with an unfiltered time series.

import onetick.py as otp

s = otp.dt(2023, 5, 15, 9, 30)
e = otp.dt(2023, 5, 15, 9, 30, 1)

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-05-15 09:30:00.000178688 412.22 100 T P
1 2023-05-15 09:30:00.000776704 412.22 247 Z
2 2023-05-15 09:30:00.003603456 412.22 100 T T
3 2023-05-15 09:30:00.006352128 412.24 1 I K
4 2023-05-15 09:30:00.007128064 412.24 3 I K
... ... ... ... ... ...
310 2023-05-15 09:30:00.934032640 412.27 160 T T
311 2023-05-15 09:30:00.975609344 412.24 2 I D
312 2023-05-15 09:30:00.980264448 412.27 1 I D
313 2023-05-15 09:30:00.985391616 412.28 100 T
314 2023-05-15 09:30:00.985394944 412.28 100 Q T

315 rows × 5 columns

We can filter by the value of a field.

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['EXCHANGE'] == 'T']
otp.run(q, start=s, end=e, symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-05-15 09:30:00.003603456 412.22 100 T T
1 2023-05-15 09:30:00.033275392 412.24 58 FTI T
2 2023-05-15 09:30:00.033278464 412.25 100 FT T
3 2023-05-15 09:30:00.054481920 412.25 100 T T
4 2023-05-15 09:30:00.124218112 412.24 200 T T
5 2023-05-15 09:30:00.124500992 412.24 147 FT T
6 2023-05-15 09:30:00.124581888 412.24 574 T T
7 2023-05-15 09:30:00.124927744 412.24 147 FT T
8 2023-05-15 09:30:00.125032960 412.24 147 FT T
9 2023-05-15 09:30:00.125264640 412.24 147 FT T
10 2023-05-15 09:30:00.125298176 412.24 500 T T
11 2023-05-15 09:30:00.125324800 412.24 790 T T
12 2023-05-15 09:30:00.125357824 412.24 233 T T
13 2023-05-15 09:30:00.125920256 412.24 147 FT T
14 2023-05-15 09:30:00.126912512 412.24 100 FT T
15 2023-05-15 09:30:00.126939648 412.23 100 T T
16 2023-05-15 09:30:00.135816960 412.24 400 FT T
17 2023-05-15 09:30:00.150320384 412.24 500 T T
18 2023-05-15 09:30:00.156800768 412.24 100 T T
19 2023-05-15 09:30:00.168296960 412.24 240 FT T
20 2023-05-15 09:30:00.169088000 412.22 15 TI T
21 2023-05-15 09:30:00.176428544 412.22 85 TI T
22 2023-05-15 09:30:00.176437504 412.22 15 TI T
23 2023-05-15 09:30:00.204837376 412.24 100 T T
24 2023-05-15 09:30:00.260433664 412.25 100 FT T
25 2023-05-15 09:30:00.260436736 412.25 100 FT T
26 2023-05-15 09:30:00.277258752 412.25 5 TI T
27 2023-05-15 09:30:00.277670656 412.25 100 T T
28 2023-05-15 09:30:00.307667200 412.25 100 FT T
29 2023-05-15 09:30:00.315053056 412.25 100 T T
30 2023-05-15 09:30:00.315116032 412.25 100 T T
31 2023-05-15 09:30:00.315614976 412.25 100 T T
32 2023-05-15 09:30:00.315909888 412.25 95 TI T
33 2023-05-15 09:30:00.320802304 412.25 11 TI T
34 2023-05-15 09:30:00.321833984 412.25 89 FTI T
35 2023-05-15 09:30:00.342056960 412.25 100 T T
36 2023-05-15 09:30:00.342105088 412.25 100 T T
37 2023-05-15 09:30:00.342505216 412.25 100 T T
38 2023-05-15 09:30:00.352696320 412.25 100 T T
39 2023-05-15 09:30:00.353129472 412.25 100 T T
40 2023-05-15 09:30:00.406870528 412.25 27 TI T
41 2023-05-15 09:30:00.421151744 412.25 100 T T
42 2023-05-15 09:30:00.526528768 412.25 100 T T
43 2023-05-15 09:30:00.650535168 412.24 1 TI T
44 2023-05-15 09:30:00.787749120 412.24 1000 T T
45 2023-05-15 09:30:00.931313920 412.27 100 FT T
46 2023-05-15 09:30:00.934032640 412.27 8 TI T
47 2023-05-15 09:30:00.934032640 412.27 160 T T
48 2023-05-15 09:30:00.985391616 412.28 100 T
49 2023-05-15 09:30:00.985394944 412.28 100 Q T

Note that the result of a filter expression returns 2 values. The first value is the query returning ticks that satisfy the conditions. The second value is the query returning ticks that do not satisfy the conditions. Most of the time we’d be interested just in the ticks that pass the filter and can use a placeholder _ for the second value.

Also, note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.

Filtering for a specific trade condition is done with the string matching methods.

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.contains('I')]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-05-15 09:30:00.006352128 412.24 1 I K
1 2023-05-15 09:30:00.007128064 412.24 3 I K
2 2023-05-15 09:30:00.007870976 412.24 1 I K
3 2023-05-15 09:30:00.008099328 412.24 1 I K
4 2023-05-15 09:30:00.009025536 412.24 1 I K
... ... ... ... ... ...
196 2023-05-15 09:30:00.933644288 412.26 7 F I K
197 2023-05-15 09:30:00.933821952 412.26 8 F I H
198 2023-05-15 09:30:00.934032640 412.27 8 TI T
199 2023-05-15 09:30:00.975609344 412.24 2 I D
200 2023-05-15 09:30:00.980264448 412.27 1 I D

201 rows × 5 columns

Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars).

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.match('^[^O6TUHILNRWZ47QMBCGPV]*$')]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-05-15 09:30:00.000776704 412.220 247 Z
1 2023-05-15 09:30:00.019069440 412.230 100 F K
2 2023-05-15 09:30:00.033083136 412.250 100 F K
3 2023-05-15 09:30:00.070456064 412.240 203 Z
4 2023-05-15 09:30:00.119806720 412.230 130 Z
... ... ... ... ... ...
66 2023-05-15 09:30:00.874213888 412.240 100 D
67 2023-05-15 09:30:00.875233280 412.240 149 D
68 2023-05-15 09:30:00.931127552 412.260 100 F P
69 2023-05-15 09:30:00.932179968 412.255 160 D
70 2023-05-15 09:30:00.985391616 412.280 100 T

71 rows × 5 columns

Filters can include ‘and’ and/or ‘or’ clauses.

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[(q['EXCHANGE'] == 'T') & (q['PRICE'] > 412.25)]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-05-15 09:30:00.931313920 412.27 100 FT T
1 2023-05-15 09:30:00.934032640 412.27 8 TI T
2 2023-05-15 09:30:00.934032640 412.27 160 T T
3 2023-05-15 09:30:00.985391616 412.28 100 T
4 2023-05-15 09:30:00.985394944 412.28 100 Q T

We can retrieve ticks by index with the help of first and last.

The example below returns the second tick:

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get the last of the first two ticks, thus getting the second tick
q = q.first(2)
q = q.last()
otp.run(q, start=s, end=e, symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-05-15 09:30:00.000776704 412.22 247 Z

Python-like slice syntax is also supported:

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get last three ticks
q = q[-3:]
otp.run(q, start=s, end=e, symbols=['SPY'])
Time PRICE SIZE COND EXCHANGE
0 2023-05-15 09:30:00.980264448 412.27 1 I D
1 2023-05-15 09:30:00.985391616 412.28 100 T
2 2023-05-15 09:30:00.985394944 412.28 100 Q T