Filtering#

Let’s start with an unfiltered time series.

import onetick.py as otp

s = otp.dt(2024, 2, 1, 9, 30)
e = otp.dt(2024, 2, 1, 9, 30, 1)

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.000961260 184.010 302 @FT P
1 2024-02-01 09:30:00.000961491 184.000 100 @FT P
2 2024-02-01 09:30:00.000961701 184.000 1 @FTI P
3 2024-02-01 09:30:00.000973163 184.000 1 @FTI P
4 2024-02-01 09:30:00.000973355 184.000 5 @FTI P
... ... ... ... ... ...
574 2024-02-01 09:30:00.987184691 183.900 9 @F I K
575 2024-02-01 09:30:00.990378350 183.920 1 @ I D
576 2024-02-01 09:30:00.991941892 183.935 1 @ I D
577 2024-02-01 09:30:00.993785116 183.905 300 @ D
578 2024-02-01 09:30:00.996512511 183.934 5 @ I D

579 rows × 5 columns

We can filter by the value of a field.

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['EXCHANGE'] == 'K']
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.080154806 183.95 1 @ I K
1 2024-02-01 09:30:00.080155543 183.95 5 @ I K
2 2024-02-01 09:30:00.080534432 183.90 2 @ I K
3 2024-02-01 09:30:00.080893236 183.97 15 @F I K
4 2024-02-01 09:30:00.080984540 183.90 86 @ I K
... ... ... ... ... ...
66 2024-02-01 09:30:00.954126905 183.92 2 @F I K
67 2024-02-01 09:30:00.954404272 183.92 5 @F I K
68 2024-02-01 09:30:00.954697550 183.92 2 @F I K
69 2024-02-01 09:30:00.985733751 183.90 19 @F I K
70 2024-02-01 09:30:00.987184691 183.90 9 @F I K

71 rows × 5 columns

Note that the result of a filter expression returns 2 values. The first value is the query returning ticks that satisfy the conditions. The second value is the query returning ticks that do not satisfy the conditions. Most of the time we’d be interested just in the ticks that pass the filter and can use a placeholder _ for the second value.

Also, note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.

Filtering for a specific trade condition is done with the string matching methods.

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[q['COND'].str.contains('I')]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.000961701 184.000 1 @FTI P
1 2024-02-01 09:30:00.000973163 184.000 1 @FTI P
2 2024-02-01 09:30:00.000973355 184.000 5 @FTI P
3 2024-02-01 09:30:00.000973517 184.000 5 @FTI P
4 2024-02-01 09:30:00.000973674 184.000 5 @FTI P
... ... ... ... ... ...
464 2024-02-01 09:30:00.985733751 183.900 19 @F I K
465 2024-02-01 09:30:00.987184691 183.900 9 @F I K
466 2024-02-01 09:30:00.990378350 183.920 1 @ I D
467 2024-02-01 09:30:00.991941892 183.935 1 @ I D
468 2024-02-01 09:30:00.996512511 183.934 5 @ I D

469 rows × 5 columns

Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars).

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.character_present(q['COND'], 'O6TUHILNRWZ47QMBCGPV', discard_on_match=True)
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.080096628 183.985 154 @F P
1 2024-02-01 09:30:00.080109023 183.900 150 @F P
2 2024-02-01 09:30:00.080109175 183.900 477 @F P
3 2024-02-01 09:30:00.080904251 183.900 523 @F P
4 2024-02-01 09:30:00.081016071 183.900 100 @ P
... ... ... ... ... ...
80 2024-02-01 09:30:00.795482321 183.945 100 @ D
81 2024-02-01 09:30:00.796118938 183.920 100 @F U
82 2024-02-01 09:30:00.833291631 183.910 100 @F Q
83 2024-02-01 09:30:00.945031779 183.920 100 @ K
84 2024-02-01 09:30:00.993785116 183.905 300 @ D

85 rows × 5 columns

Filters can include ‘and’ and/or ‘or’ clauses.

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q, _ = q[(q['EXCHANGE'] == 'K') & (q['PRICE'] > 183.950)]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.080893236 183.97 15 @F I K
1 2024-02-01 09:30:00.153161386 183.98 50 @F I K
2 2024-02-01 09:30:00.153953889 183.98 29 @F I K
3 2024-02-01 09:30:00.154413668 183.98 21 @F I K
4 2024-02-01 09:30:00.155845868 183.98 50 @F I K
5 2024-02-01 09:30:00.161347824 183.98 29 @F I K
6 2024-02-01 09:30:00.161368459 183.98 21 @F I K
7 2024-02-01 09:30:00.260713608 183.98 50 @F I K
8 2024-02-01 09:30:00.260714836 183.99 25 @F I K
9 2024-02-01 09:30:00.270803260 184.01 16 @F I K
10 2024-02-01 09:30:00.277375099 184.00 2 @ I K
11 2024-02-01 09:30:00.297914416 183.98 48 @F I K
12 2024-02-01 09:30:00.362168043 184.01 100 @ K
13 2024-02-01 09:30:00.614422140 183.96 25 @F I K
14 2024-02-01 09:30:00.621498637 183.96 52 @ I K
15 2024-02-01 09:30:00.621587688 183.96 382 @ K
16 2024-02-01 09:30:00.623837515 183.96 25 @ I K
17 2024-02-01 09:30:00.623851984 183.96 1 @ I K

We can retrieve ticks by index with the help of first and last.

The example below returns the second tick:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get the last of the first two ticks, thus getting the second tick
q = q.first(2)
q = q.last()
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.000961491 184.0 100 @FT P

Python-like slice syntax is also supported:

q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get last three ticks
q = q[-3:]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time PRICE SIZE COND EXCHANGE
0 2024-02-01 09:30:00.991941892 183.935 1 @ I D
1 2024-02-01 09:30:00.993785116 183.905 300 @ D
2 2024-02-01 09:30:00.996512511 183.934 5 @ I D