Filtering#
Let’s start with an unfiltered time series:
import onetick.py as otp
s = otp.dt(2024, 2, 1, 9, 30)
e = otp.dt(2024, 2, 1, 9, 30, 1)
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.000961260 | 184.010 | 302 | @FT | P |
1 | 2024-02-01 09:30:00.000961491 | 184.000 | 100 | @FT | P |
2 | 2024-02-01 09:30:00.000961701 | 184.000 | 1 | @FTI | P |
3 | 2024-02-01 09:30:00.000973163 | 184.000 | 1 | @FTI | P |
4 | 2024-02-01 09:30:00.000973355 | 184.000 | 5 | @FTI | P |
... | ... | ... | ... | ... | ... |
574 | 2024-02-01 09:30:00.987184691 | 183.900 | 9 | @F I | K |
575 | 2024-02-01 09:30:00.990378350 | 183.920 | 1 | @ I | D |
576 | 2024-02-01 09:30:00.991941892 | 183.935 | 1 | @ I | D |
577 | 2024-02-01 09:30:00.993785116 | 183.905 | 300 | @ | D |
578 | 2024-02-01 09:30:00.996512511 | 183.934 | 5 | @ I | D |
579 rows × 5 columns
We can filter by the value of a field using method where
:
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where(q['EXCHANGE'] == 'K')
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.080154806 | 183.95 | 1 | @ I | K |
1 | 2024-02-01 09:30:00.080155543 | 183.95 | 5 | @ I | K |
2 | 2024-02-01 09:30:00.080534432 | 183.90 | 2 | @ I | K |
3 | 2024-02-01 09:30:00.080893236 | 183.97 | 15 | @F I | K |
4 | 2024-02-01 09:30:00.080984540 | 183.90 | 86 | @ I | K |
... | ... | ... | ... | ... | ... |
66 | 2024-02-01 09:30:00.954126905 | 183.92 | 2 | @F I | K |
67 | 2024-02-01 09:30:00.954404272 | 183.92 | 5 | @F I | K |
68 | 2024-02-01 09:30:00.954697550 | 183.92 | 2 | @F I | K |
69 | 2024-02-01 09:30:00.985733751 | 183.90 | 19 | @F I | K |
70 | 2024-02-01 09:30:00.987184691 | 183.90 | 9 | @F I | K |
71 rows × 5 columns
Note that all of the filtering is done in OneTick not in Python, which is much more efficient and lets us work with much bigger data sets.
Filtering for a specific trade condition can be done with the string matching methods:
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where(q['COND'].str.contains('I'))
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.000961701 | 184.000 | 1 | @FTI | P |
1 | 2024-02-01 09:30:00.000973163 | 184.000 | 1 | @FTI | P |
2 | 2024-02-01 09:30:00.000973355 | 184.000 | 5 | @FTI | P |
3 | 2024-02-01 09:30:00.000973517 | 184.000 | 5 | @FTI | P |
4 | 2024-02-01 09:30:00.000973674 | 184.000 | 5 | @FTI | P |
... | ... | ... | ... | ... | ... |
464 | 2024-02-01 09:30:00.985733751 | 183.900 | 19 | @F I | K |
465 | 2024-02-01 09:30:00.987184691 | 183.900 | 9 | @F I | K |
466 | 2024-02-01 09:30:00.990378350 | 183.920 | 1 | @ I | D |
467 | 2024-02-01 09:30:00.991941892 | 183.935 | 1 | @ I | D |
468 | 2024-02-01 09:30:00.996512511 | 183.934 | 5 | @ I | D |
469 rows × 5 columns
Trade filter that limits attention to on-exchange continuous trading trades looks like this (it’s used when creating bars):
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.character_present(q['COND'], 'O6TUHILNRWZ47QMBCGPV', discard_on_match=True)
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.080096628 | 183.985 | 154 | @F | P |
1 | 2024-02-01 09:30:00.080109023 | 183.900 | 150 | @F | P |
2 | 2024-02-01 09:30:00.080109175 | 183.900 | 477 | @F | P |
3 | 2024-02-01 09:30:00.080904251 | 183.900 | 523 | @F | P |
4 | 2024-02-01 09:30:00.081016071 | 183.900 | 100 | @ | P |
... | ... | ... | ... | ... | ... |
80 | 2024-02-01 09:30:00.795482321 | 183.945 | 100 | @ | D |
81 | 2024-02-01 09:30:00.796118938 | 183.920 | 100 | @F | U |
82 | 2024-02-01 09:30:00.833291631 | 183.910 | 100 | @F | Q |
83 | 2024-02-01 09:30:00.945031779 | 183.920 | 100 | @ | K |
84 | 2024-02-01 09:30:00.993785116 | 183.905 | 300 | @ | D |
85 rows × 5 columns
Filters can include ‘and’ and/or ‘or’ clauses:
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
q = q.where((q['EXCHANGE'] == 'K') & (q['PRICE'] > 183.950))
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.080893236 | 183.97 | 15 | @F I | K |
1 | 2024-02-01 09:30:00.153161386 | 183.98 | 50 | @F I | K |
2 | 2024-02-01 09:30:00.153953889 | 183.98 | 29 | @F I | K |
3 | 2024-02-01 09:30:00.154413668 | 183.98 | 21 | @F I | K |
4 | 2024-02-01 09:30:00.155845868 | 183.98 | 50 | @F I | K |
5 | 2024-02-01 09:30:00.161347824 | 183.98 | 29 | @F I | K |
6 | 2024-02-01 09:30:00.161368459 | 183.98 | 21 | @F I | K |
7 | 2024-02-01 09:30:00.260713608 | 183.98 | 50 | @F I | K |
8 | 2024-02-01 09:30:00.260714836 | 183.99 | 25 | @F I | K |
9 | 2024-02-01 09:30:00.270803260 | 184.01 | 16 | @F I | K |
10 | 2024-02-01 09:30:00.277375099 | 184.00 | 2 | @ I | K |
11 | 2024-02-01 09:30:00.297914416 | 183.98 | 48 | @F I | K |
12 | 2024-02-01 09:30:00.362168043 | 184.01 | 100 | @ | K |
13 | 2024-02-01 09:30:00.614422140 | 183.96 | 25 | @F I | K |
14 | 2024-02-01 09:30:00.621498637 | 183.96 | 52 | @ I | K |
15 | 2024-02-01 09:30:00.621587688 | 183.96 | 382 | @ | K |
16 | 2024-02-01 09:30:00.623837515 | 183.96 | 25 | @ I | K |
17 | 2024-02-01 09:30:00.623851984 | 183.96 | 1 | @ I | K |
We can also return two branches after filtering with method __getitem__
(or where_clause
).
The first branch contains all ticks that satisfy the condition and the other branch contains all ticks that do not satisfy the condition:
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
exchange_k, other = q[q['EXCHANGE'] == 'K']
otp.run(exchange_k, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.080154806 | 183.95 | 1 | @ I | K |
1 | 2024-02-01 09:30:00.080155543 | 183.95 | 5 | @ I | K |
2 | 2024-02-01 09:30:00.080534432 | 183.90 | 2 | @ I | K |
3 | 2024-02-01 09:30:00.080893236 | 183.97 | 15 | @F I | K |
4 | 2024-02-01 09:30:00.080984540 | 183.90 | 86 | @ I | K |
... | ... | ... | ... | ... | ... |
66 | 2024-02-01 09:30:00.954126905 | 183.92 | 2 | @F I | K |
67 | 2024-02-01 09:30:00.954404272 | 183.92 | 5 | @F I | K |
68 | 2024-02-01 09:30:00.954697550 | 183.92 | 2 | @F I | K |
69 | 2024-02-01 09:30:00.985733751 | 183.90 | 19 | @F I | K |
70 | 2024-02-01 09:30:00.987184691 | 183.90 | 9 | @F I | K |
71 rows × 5 columns
otp.run(other, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.000961260 | 184.010 | 302 | @FT | P |
1 | 2024-02-01 09:30:00.000961491 | 184.000 | 100 | @FT | P |
2 | 2024-02-01 09:30:00.000961701 | 184.000 | 1 | @FTI | P |
3 | 2024-02-01 09:30:00.000973163 | 184.000 | 1 | @FTI | P |
4 | 2024-02-01 09:30:00.000973355 | 184.000 | 5 | @FTI | P |
... | ... | ... | ... | ... | ... |
503 | 2024-02-01 09:30:00.972086329 | 183.935 | 1 | @ I | D |
504 | 2024-02-01 09:30:00.990378350 | 183.920 | 1 | @ I | D |
505 | 2024-02-01 09:30:00.991941892 | 183.935 | 1 | @ I | D |
506 | 2024-02-01 09:30:00.993785116 | 183.905 | 300 | @ | D |
507 | 2024-02-01 09:30:00.996512511 | 183.934 | 5 | @ I | D |
508 rows × 5 columns
We can retrieve ticks by index with the help of
first
and last
.
The example below returns the second tick:
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get the last of the first two ticks, thus getting the second tick
q = q.first(2)
q = q.last()
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.000961491 | 184.0 | 100 | @FT | P |
Python-like slice syntax is also supported:
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]
# get last three ticks
q = q[-3:]
otp.run(q, start=s, end=e, symbols=['AAPL'])
Time | PRICE | SIZE | COND | EXCHANGE | |
---|---|---|---|---|---|
0 | 2024-02-01 09:30:00.991941892 | 183.935 | 1 | @ I | D |
1 | 2024-02-01 09:30:00.993785116 | 183.905 | 300 | @ | D |
2 | 2024-02-01 09:30:00.996512511 | 183.934 | 5 | @ I | D |