Filtering#

Let’s start with an unfiltered time series:

import onetick.py as otp

s = otp.dt(2024, 2, 1, 9, 30)
e = otp.dt(2024, 2, 1, 9, 30, 1)

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.000961260 184.010 302 @FT P
1 2024-02-01 09:30:00.000961491 184.000 100 @FT P
2 2024-02-01 09:30:00.000961701 184.000 1 @FTI P
3 2024-02-01 09:30:00.000973163 184.000 1 @FTI P
4 2024-02-01 09:30:00.000973355 184.000 5 @FTI P
... ... ... ... ... ...
574 2024-02-01 09:30:00.987184691 183.900 9 @F I K
575 2024-02-01 09:30:00.990378350 183.920 1 @ I D
576 2024-02-01 09:30:00.991941892 183.935 1 @ I D
577 2024-02-01 09:30:00.993785116 183.905 300 @ D
578 2024-02-01 09:30:00.996512511 183.934 5 @ I D

579 rows × 5 columns

We can filter by the value of a field using method where:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where(q['EXCHANGE'] == 'K')
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.080154806 183.95 1 @ I K
1 2024-02-01 09:30:00.080155543 183.95 5 @ I K
2 2024-02-01 09:30:00.080534432 183.90 2 @ I K
3 2024-02-01 09:30:00.080893236 183.97 15 @F I K
4 2024-02-01 09:30:00.080984540 183.90 86 @ I K
... ... ... ... ... ...
66 2024-02-01 09:30:00.954126905 183.92 2 @F I K
67 2024-02-01 09:30:00.954404272 183.92 5 @F I K
68 2024-02-01 09:30:00.954697550 183.92 2 @F I K
69 2024-02-01 09:30:00.985733751 183.90 19 @F I K
70 2024-02-01 09:30:00.987184691 183.90 9 @F I K

71 rows × 5 columns

Note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.

Filtering for a specific trade condition can be done with the string matching methods:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where(q['COND'].str.contains('I'))
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.000961701 184.000 1 @FTI P
1 2024-02-01 09:30:00.000973163 184.000 1 @FTI P
2 2024-02-01 09:30:00.000973355 184.000 5 @FTI P
3 2024-02-01 09:30:00.000973517 184.000 5 @FTI P
4 2024-02-01 09:30:00.000973674 184.000 5 @FTI P
... ... ... ... ... ...
464 2024-02-01 09:30:00.985733751 183.900 19 @F I K
465 2024-02-01 09:30:00.987184691 183.900 9 @F I K
466 2024-02-01 09:30:00.990378350 183.920 1 @ I D
467 2024-02-01 09:30:00.991941892 183.935 1 @ I D
468 2024-02-01 09:30:00.996512511 183.934 5 @ I D

469 rows × 5 columns

Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars):

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.character_present(q['COND'], 'O6TUHILNRWZ47QMBCGPV', discard_on_match=True)
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.080096628 183.985 154 @F P
1 2024-02-01 09:30:00.080109023 183.900 150 @F P
2 2024-02-01 09:30:00.080109175 183.900 477 @F P
3 2024-02-01 09:30:00.080904251 183.900 523 @F P
4 2024-02-01 09:30:00.081016071 183.900 100 @ P
... ... ... ... ... ...
80 2024-02-01 09:30:00.795482321 183.945 100 @ D
81 2024-02-01 09:30:00.796118938 183.920 100 @F U
82 2024-02-01 09:30:00.833291631 183.910 100 @F Q
83 2024-02-01 09:30:00.945031779 183.920 100 @ K
84 2024-02-01 09:30:00.993785116 183.905 300 @ D

85 rows × 5 columns

Filters can include ‘and’ and/or ‘or’ clauses:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where((q['EXCHANGE'] == 'K') & (q['PRICE'] > 183.950))
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.080893236 183.97 15 @F I K
1 2024-02-01 09:30:00.153161386 183.98 50 @F I K
2 2024-02-01 09:30:00.153953889 183.98 29 @F I K
3 2024-02-01 09:30:00.154413668 183.98 21 @F I K
4 2024-02-01 09:30:00.155845868 183.98 50 @F I K
5 2024-02-01 09:30:00.161347824 183.98 29 @F I K
6 2024-02-01 09:30:00.161368459 183.98 21 @F I K
7 2024-02-01 09:30:00.260713608 183.98 50 @F I K
8 2024-02-01 09:30:00.260714836 183.99 25 @F I K
9 2024-02-01 09:30:00.270803260 184.01 16 @F I K
10 2024-02-01 09:30:00.277375099 184.00 2 @ I K
11 2024-02-01 09:30:00.297914416 183.98 48 @F I K
12 2024-02-01 09:30:00.362168043 184.01 100 @ K
13 2024-02-01 09:30:00.614422140 183.96 25 @F I K
14 2024-02-01 09:30:00.621498637 183.96 52 @ I K
15 2024-02-01 09:30:00.621587688 183.96 382 @ K
16 2024-02-01 09:30:00.623837515 183.96 25 @ I K
17 2024-02-01 09:30:00.623851984 183.96 1 @ I K

We can also return two branches after filtering with method __getitem__ (or where_clause). The first branch contains all ticks that satisfy the condition and the other branch contains all ticks that do not satisfy the condition:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
exchange_k, other = q[q['EXCHANGE'] == 'K']
otp.run(exchange_k, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.080154806 183.95 1 @ I K
1 2024-02-01 09:30:00.080155543 183.95 5 @ I K
2 2024-02-01 09:30:00.080534432 183.90 2 @ I K
3 2024-02-01 09:30:00.080893236 183.97 15 @F I K
4 2024-02-01 09:30:00.080984540 183.90 86 @ I K
... ... ... ... ... ...
66 2024-02-01 09:30:00.954126905 183.92 2 @F I K
67 2024-02-01 09:30:00.954404272 183.92 5 @F I K
68 2024-02-01 09:30:00.954697550 183.92 2 @F I K
69 2024-02-01 09:30:00.985733751 183.90 19 @F I K
70 2024-02-01 09:30:00.987184691 183.90 9 @F I K

71 rows × 5 columns

otp.run(other, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.000961260 184.010 302 @FT P
1 2024-02-01 09:30:00.000961491 184.000 100 @FT P
2 2024-02-01 09:30:00.000961701 184.000 1 @FTI P
3 2024-02-01 09:30:00.000973163 184.000 1 @FTI P
4 2024-02-01 09:30:00.000973355 184.000 5 @FTI P
... ... ... ... ... ...
503 2024-02-01 09:30:00.972086329 183.935 1 @ I D
504 2024-02-01 09:30:00.990378350 183.920 1 @ I D
505 2024-02-01 09:30:00.991941892 183.935 1 @ I D
506 2024-02-01 09:30:00.993785116 183.905 300 @ D
507 2024-02-01 09:30:00.996512511 183.934 5 @ I D

508 rows × 5 columns

We can retrieve ticks by index with the help of first and last.

The example below returns the second tick:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get the last of the first two ticks, thus getting the second tick
q = q.first(2)
q = q.last()
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.000961491 184.0 100 @FT P

Python-like slice syntax is also supported:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get last three ticks
q = q[-3:]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.991941892 183.935 1 @ I D
1 2024-02-01 09:30:00.993785116 183.905 300 @ D
2 2024-02-01 09:30:00.996512511 183.934 5 @ I D