Quick Start#
onetick-py is a pandas-like API for querying data (in particular, tick-by-tick market data) in OneTick. This notebooks follows common business use cases to illustrate how they can be implemented with onetick-py.
# %%capture
# !pip install -U onetick.py
import pandas as pd
import random
import itertools
import onetick.py as otp
# set default onetick.py configuration
from datetime import datetime
otp.config['default_start_time'] = datetime(2003, 12, 1, 0, 0, 0)
otp.config['default_end_time'] = datetime(2003, 12, 4, 0, 0, 0)
otp.config['tz'] = 'EST5EDT' # set the timezone
otp.config['default_db'] = 'DEMO_L1'
otp.config['default_symbol'] = 'AAPL'
# define some parameters to re-use
start = otp.dt(2022, 3, 2, 10)
end = otp.dt(2022, 3, 2, 10, 5)
emini = 'NQ\H22' #symbol for Nasdaq-100 E-mini future
print(otp.__version__)
1.58.0
from datetime import datetime
otp.config['tz'] = 'EST5EDT'
otp.config['default_db'] = 'DEMO_L1'
otp.config['default_symbol'] = 'AAPL'
otp.config['default_start_time'] = datetime(2003, 12, 1, 0, 0, 0)
otp.config['default_end_time'] = datetime(2003, 12, 4, 0, 0, 0)
Retrieving data#
Market data is time series data: i.e., each record or “tick” has a timestamp and ticks are ordered by timestamp. Our focus will be on trades, quotes, and order book data. We choose which type of data we look at by specifing the tick type: TRD for trades, QTE for quotes, and PRL for order books. The data is organized by the database, tick type, and security.
# create a query to retrieve data from the TRD tick type of the CME OneTick database
trd = otp.DataSource('CME', tick_type='TRD')
print(trd)
<onetick.py.sources.DataSource object at 0x7f7900545c40>
Observe that the output is a onetick-py DataSource object rather than the data. To retrieve the data into a pandas DataFrame, we invoke the otp.run
method while providing it the time range and the symbols (instead otp.run()
, we can use the shorthand trd.to_df()
or just trd()
).
trd_df = otp.run(trd, start=otp.dt(2022, 3, 2, 10), end=otp.dt(2022, 3, 2, 10, 1), symbols='NQ\H22')
print(trd_df)
Time EXCH_TIME PRICE SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE TRADE_ID BUY_ORDER_ID SELL_ORDER_ID DELETED_TIME TICK_STATUS OMDSEQ
0 2022-03-02 10:00:00.000789627 1969-12-31 19:00:00 14074.75 1 S - - 0 236995996 6830860875515 6830860875655 1969-12-31 19:00:00 0 7
1 2022-03-02 10:00:00.000929635 1969-12-31 19:00:00 14075.75 1 B - - 0 236996008 6830860875705 6830860875691 1969-12-31 19:00:00 0 25
2 2022-03-02 10:00:00.000929635 1969-12-31 19:00:00 14076.00 1 B - - 0 236996009 6830860875705 6830860875569 1969-12-31 19:00:00 0 26
3 2022-03-02 10:00:00.000929635 1969-12-31 19:00:00 14076.00 1 B - - 0 236996009 6830860875705 6830860875694 1969-12-31 19:00:00 0 27
4 2022-03-02 10:00:00.002409827 1969-12-31 19:00:00 14075.75 1 B - - 0 236996017 6830860875820 6830860875752 1969-12-31 19:00:00 0 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1295 2022-03-02 10:00:59.721897255 1969-12-31 19:00:00 14078.50 1 B - - 0 237002551 6830861044714 6830861044694 1969-12-31 19:00:00 0 26
1296 2022-03-02 10:00:59.765659765 1969-12-31 19:00:00 14078.75 1 B - - 0 237002557 6830861044831 6830861044659 1969-12-31 19:00:00 0 0
1297 2022-03-02 10:00:59.765659765 1969-12-31 19:00:00 14078.75 1 B - - 0 237002557 6830861044831 6830861044742 1969-12-31 19:00:00 0 1
1298 2022-03-02 10:00:59.797208925 1969-12-31 19:00:00 14077.75 1 S - - 0 237002575 6830861044437 6830861045124 1969-12-31 19:00:00 0 0
1299 2022-03-02 10:00:59.797208925 1969-12-31 19:00:00 14077.75 1 S - - 0 237002575 6830861044482 6830861045124 1969-12-31 19:00:00 0 1
[1300 rows x 14 columns]
We would rarely want to just retrieve market data (except for plotting it). Typical use cases involve computing analytics based on the tick-by-tick data. This is what onetick-py makes easy to do while keeping all of the processing in the highly optimized, parallelizable, and scalable tick management platform OneTick. The results of the analytics are returned into Python when the query is executed. We cover a few standard use cases next.
Create minute bars from tick data#
The code below prepares a query (a onetick-py DataSource object) to compute minute bars.
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
bars = trd.agg({'VOLUME': otp.agg.sum('SIZE'),
'HIGH': otp.agg.max('PRICE'),
'LOW': otp.agg.min('PRICE'),
'OPEN': otp.agg.first('PRICE'),
'COUNT': otp.agg.count(),
'CLOSE': otp.agg.last('PRICE')},
bucket_interval=60)
print(bars)
<onetick.py.sources.DataSource object at 0x7f7885eade20>
Only the bars, not the tick-by-tick trades used to compute them, are passed to Python when the query is executed.
bars_df = otp.run(bars, start=otp.dt(2022,11,28,9,30), end=otp.dt(2022,11,30,16), symbols=['AA'], apply_times_daily=True)
bars_df
Time | VOLUME | HIGH | LOW | OPEN | COUNT | CLOSE | |
---|---|---|---|---|---|---|---|
0 | 2022-11-28 09:31:00 | 91637 | 47.1100 | 46.700 | 46.9200 | 209 | 46.7160 |
1 | 2022-11-28 09:32:00 | 29902 | 46.7899 | 46.411 | 46.7899 | 381 | 46.4900 |
2 | 2022-11-28 09:33:00 | 26588 | 46.6600 | 46.460 | 46.4900 | 227 | 46.6400 |
3 | 2022-11-28 09:34:00 | 14388 | 46.8600 | 46.560 | 46.5900 | 136 | 46.6850 |
4 | 2022-11-28 09:35:00 | 14871 | 46.8300 | 46.570 | 46.7100 | 153 | 46.7943 |
... | ... | ... | ... | ... | ... | ... | ... |
1165 | 2022-11-30 15:56:00 | 54586 | 50.1700 | 50.000 | 50.1300 | 730 | 50.0500 |
1166 | 2022-11-30 15:57:00 | 72734 | 50.0679 | 50.010 | 50.0550 | 792 | 50.0200 |
1167 | 2022-11-30 15:58:00 | 54803 | 50.0800 | 50.020 | 50.0200 | 706 | 50.0700 |
1168 | 2022-11-30 15:59:00 | 88486 | 50.0900 | 50.040 | 50.0800 | 920 | 50.0600 |
1169 | 2022-11-30 16:00:00 | 134670 | 50.1800 | 50.060 | 50.0600 | 929 | 50.1400 |
1170 rows × 7 columns
Quick tip: you can display documentation for any object by typing “?” after it.
otp.math.exp?
Filters#
Filters are specified the same way as in pandas DataFrames with one caveat. Two objects are returned instead of one: one for to the ticks that pass the filter and the other for the ticks that don’t (which you can store in _
if you don’t need it). We show an example of how to use both objects when we cover advanced concepts below.
trd = otp.DataSource('CME', tick_type='TRD')
trd, _ = trd[(trd['TRADE_PERIOD']!='A') & (trd['TRADE_TYPE'] != 'LEG')] # keep only non-leg and non-auction trades
# trd = trd.first(5) # retrieve the first 5 records only
print(trd) # no data retrieval has been done yet
print(otp.run(trd, start=start, end=end, symbols=emini))
<onetick.py.sources.DataSource object at 0x7f7885ead370>
Time EXCH_TIME PRICE SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE TRADE_ID BUY_ORDER_ID SELL_ORDER_ID DELETED_TIME TICK_STATUS OMDSEQ
0 2022-03-02 10:00:00.000789627 1969-12-31 19:00:00 14074.75 1 S - - 0 236995996 6830860875515 6830860875655 1969-12-31 19:00:00 0 7
1 2022-03-02 10:00:00.000929635 1969-12-31 19:00:00 14075.75 1 B - - 0 236996008 6830860875705 6830860875691 1969-12-31 19:00:00 0 25
2 2022-03-02 10:00:00.000929635 1969-12-31 19:00:00 14076.00 1 B - - 0 236996009 6830860875705 6830860875569 1969-12-31 19:00:00 0 26
3 2022-03-02 10:00:00.000929635 1969-12-31 19:00:00 14076.00 1 B - - 0 236996009 6830860875705 6830860875694 1969-12-31 19:00:00 0 27
4 2022-03-02 10:00:00.002409827 1969-12-31 19:00:00 14075.75 1 B - - 0 236996017 6830860875820 6830860875752 1969-12-31 19:00:00 0 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7061 2022-03-02 10:04:59.555729645 1969-12-31 19:00:00 14097.50 1 B - - 0 237027634 6830861638832 6830861637468 1969-12-31 19:00:00 0 1
7062 2022-03-02 10:04:59.659116873 1969-12-31 19:00:00 14097.50 1 B - - 0 237027642 6830861638919 6830861638796 1969-12-31 19:00:00 0 2
7063 2022-03-02 10:04:59.819164663 1969-12-31 19:00:00 14097.50 1 B - - 0 237027658 6830861621964 6830861639003 1969-12-31 19:00:00 0 0
7064 2022-03-02 10:04:59.839107289 1969-12-31 19:00:00 14097.00 1 S - - 0 237027661 6830861639033 6830861639088 1969-12-31 19:00:00 0 0
7065 2022-03-02 10:04:59.885827259 1969-12-31 19:00:00 14096.75 1 S - - 0 237027668 6830861639191 6830861639214 1969-12-31 19:00:00 0 0
[7066 rows x 14 columns]
Prevailing quote at the time of a trade#
onetick-py provides some functions that are specific to time series analysis and do not exist in pandas. Joining two time series by time is one of such functions that makes finding the prevailing quote at the time of each trade a one-liner.
trd = otp.DataSource('CME', tick_type='TRD')
trd = trd[['PRICE', 'SIZE', 'TRADE_TYPE']] # just need these fields
qte = otp.DataSource('CME', tick_type='QTE')
qte['quote_time'] = qte['Time']
qte = qte[['ASK_PRICE', 'BID_PRICE', 'quote_time']] # just need these fields
# the first argument in the list (`trd` in this example) is the "leading" source
# all of the ticks from the leading source are enriched with the closest preceding
# record from the non-leading source(s) (`qte` in this example).
enriched_trades = otp.join_by_time([trd, qte])
enriched_trades = enriched_trades.first(5)
otp.run(enriched_trades, start=start, end=end, symbols=emini)
Time | PRICE | SIZE | TRADE_TYPE | ASK_PRICE | BID_PRICE | quote_time | |
---|---|---|---|---|---|---|---|
0 | 2022-03-02 10:00:00.000789627 | 14074.75 | 1 | - | 14076.00 | 14074.75 | 2022-03-02 10:00:00.000788627 |
1 | 2022-03-02 10:00:00.000929635 | 14075.75 | 1 | - | 14075.75 | 14074.25 | 2022-03-02 10:00:00.000860873 |
2 | 2022-03-02 10:00:00.000929635 | 14076.00 | 1 | - | 14075.75 | 14074.25 | 2022-03-02 10:00:00.000860873 |
3 | 2022-03-02 10:00:00.000929635 | 14076.00 | 1 | - | 14075.75 | 14074.25 | 2022-03-02 10:00:00.000860873 |
4 | 2022-03-02 10:00:00.002409827 | 14075.75 | 1 | - | 14075.75 | 14075.00 | 2022-03-02 10:00:00.001767875 |
Interval VWAP between order arrival and exit#
Common TCA/BestEx use cases involve getting interval benchmarks such as market VWAP and market volume for each order. First we show how to compute the market volume and the VWAP for a single interval.
q = otp.DataSource('CME', tick_type='TRD')
q = q.agg({
'VOLUME': otp.agg.sum('SIZE'),
'VWAP': otp.agg.vwap('PRICE', 'SIZE')
})
otp.run(q, start=start, end=end, symbols=emini)
Time | VOLUME | VWAP | |
---|---|---|---|
0 | 2022-03-02 10:05:00 | 7520 | 14080.28765 |
Next we compute interval metrics for each of the orders arrival/exit intervals.
First we generate a simulated set of orders. Note that the orders/executions can come from any source (e.g., CSV file, FIX log file, database, Python object, OneTick database, etc).
num_orders = 20
arrival_time = [otp.dt(2022, 3, 2, 10) + otp.Milli(x*500) for x in range(0, num_orders)]
exit_time = [x + otp.Milli(random.randint(100,10000)) for x in arrival_time]
symbols = ['NQ\H22', 'ES\H22']
order_df = pd.DataFrame({
'arrival_time': arrival_time,
'exit_time': exit_time,
'ID': [x for x in range(0, num_orders)],
'SYMBOL_NAME': [random.choice(symbols) for x in range(0, num_orders)],
})
order_df
arrival_time | exit_time | ID | SYMBOL_NAME | |
---|---|---|---|---|
0 | 2022-03-02 10:00:00 | 2022-03-02 10:00:04.704000 | 0 | NQ\H22 |
1 | 2022-03-02 10:00:00.500000 | 2022-03-02 10:00:01.364000 | 1 | NQ\H22 |
2 | 2022-03-02 10:00:01 | 2022-03-02 10:00:10.525000 | 2 | ES\H22 |
3 | 2022-03-02 10:00:01.500000 | 2022-03-02 10:00:01.823000 | 3 | ES\H22 |
4 | 2022-03-02 10:00:02 | 2022-03-02 10:00:04.982000 | 4 | ES\H22 |
5 | 2022-03-02 10:00:02.500000 | 2022-03-02 10:00:08.949000 | 5 | NQ\H22 |
6 | 2022-03-02 10:00:03 | 2022-03-02 10:00:05.622000 | 6 | ES\H22 |
7 | 2022-03-02 10:00:03.500000 | 2022-03-02 10:00:05.009000 | 7 | NQ\H22 |
8 | 2022-03-02 10:00:04 | 2022-03-02 10:00:09.621000 | 8 | NQ\H22 |
9 | 2022-03-02 10:00:04.500000 | 2022-03-02 10:00:13.177000 | 9 | NQ\H22 |
10 | 2022-03-02 10:00:05 | 2022-03-02 10:00:11.956000 | 10 | ES\H22 |
11 | 2022-03-02 10:00:05.500000 | 2022-03-02 10:00:06.392000 | 11 | NQ\H22 |
12 | 2022-03-02 10:00:06 | 2022-03-02 10:00:15.282000 | 12 | NQ\H22 |
13 | 2022-03-02 10:00:06.500000 | 2022-03-02 10:00:07.192000 | 13 | NQ\H22 |
14 | 2022-03-02 10:00:07 | 2022-03-02 10:00:13.364000 | 14 | ES\H22 |
15 | 2022-03-02 10:00:07.500000 | 2022-03-02 10:00:14.606000 | 15 | ES\H22 |
16 | 2022-03-02 10:00:08 | 2022-03-02 10:00:10.651000 | 16 | NQ\H22 |
17 | 2022-03-02 10:00:08.500000 | 2022-03-02 10:00:13.265000 | 17 | NQ\H22 |
18 | 2022-03-02 10:00:09 | 2022-03-02 10:00:09.107000 | 18 | ES\H22 |
19 | 2022-03-02 10:00:09.500000 | 2022-03-02 10:00:09.635000 | 19 | NQ\H22 |
Set up the fields that will enable us to query market trades for symbol and the arrival/exit interval of each order in parallel.
# convert the DataFrame with orders into a onetick-py object to enable calls to onetick-py functions
order = otp.Ticks(order_df.to_dict(orient='list'))
order['_PARAM_START_TIME_NANOS'] = order['arrival_time'] # this will be the start time of the query
order['_PARAM_END_TIME_NANOS'] = order['exit_time'] # this will be the end time of the query
# another relevant field is SYMBOL_NAME (already specified) which will be used as the symbol for the query
# You can see the output by running the line below
# otp.run(order, symbols='any', start=otp.dt(2022,12,1), end=otp.dt(2022,12,2))
We can now define a function to compute VWAP and volume for each order. The function takes an argument (symbol
) that is passed automatically when and the order specific info we defined above.
def market_volume_and_vwap(symbol):
trd = otp.DataSource('CME', tick_type='TRD')
vol_vwap = trd.agg({
'VOLUME': otp.agg.sum('SIZE'),
'VWAP': otp.agg.vwap('PRICE', 'SIZE')
}, bucket_time='start')
vol_vwap['ID'] = symbol['ID']
vol_vwap['exit_time'] = symbol['exit_time']
vol_vwap['sym'] = symbol.name # SYMBOL_NAME is stored in a special attribute '.name'
return vol_vwap
market_volume_and_vwap
can now be executed for each order. The computation is parallelized by symbol. (We’ll explain otp.merge
below).
merged = otp.merge([market_volume_and_vwap], symbols=order, identify_input_ts=True)
# merged.to_otq('missing_symbol.otq')
otp.run(merged)
# merged.render()
Time | VOLUME | VWAP | ID | exit_time | sym | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|---|---|---|---|
0 | 2022-03-02 10:00:00.000 | 221 | 14079.695701 | 0 | 2022-03-02 10:00:04.704 | NQ\H22 | NQ\H22 | TRD |
1 | 2022-03-02 10:00:00.500 | 55 | 14082.763636 | 1 | 2022-03-02 10:00:01.364 | NQ\H22 | NQ\H22 | TRD |
2 | 2022-03-02 10:00:01.000 | 594 | 4341.052698 | 2 | 2022-03-02 10:00:10.525 | ES\H22 | ES\H22 | TRD |
3 | 2022-03-02 10:00:01.500 | 72 | 4340.829861 | 3 | 2022-03-02 10:00:01.823 | ES\H22 | ES\H22 | TRD |
4 | 2022-03-02 10:00:02.000 | 187 | 4340.709677 | 4 | 2022-03-02 10:00:04.982 | ES\H22 | ES\H22 | TRD |
5 | 2022-03-02 10:00:02.500 | 132 | 14079.817829 | 5 | 2022-03-02 10:00:08.949 | NQ\H22 | NQ\H22 | TRD |
6 | 2022-03-02 10:00:03.000 | 132 | 4340.843511 | 6 | 2022-03-02 10:00:05.622 | ES\H22 | ES\H22 | TRD |
7 | 2022-03-02 10:00:03.500 | 34 | 14078.786765 | 7 | 2022-03-02 10:00:05.009 | NQ\H22 | NQ\H22 | TRD |
8 | 2022-03-02 10:00:04.000 | 108 | 14079.523810 | 8 | 2022-03-02 10:00:09.621 | NQ\H22 | NQ\H22 | TRD |
9 | 2022-03-02 10:00:04.500 | 185 | 14078.752747 | 9 | 2022-03-02 10:00:13.177 | NQ\H22 | NQ\H22 | TRD |
10 | 2022-03-02 10:00:05.000 | 350 | 4341.183571 | 10 | 2022-03-02 10:00:11.956 | ES\H22 | ES\H22 | TRD |
11 | 2022-03-02 10:00:05.500 | 22 | 14081.284091 | 11 | 2022-03-02 10:00:06.392 | NQ\H22 | NQ\H22 | TRD |
12 | 2022-03-02 10:00:06.000 | 223 | 14076.611364 | 12 | 2022-03-02 10:00:15.282 | NQ\H22 | NQ\H22 | TRD |
13 | 2022-03-02 10:00:06.500 | 8 | 14079.468750 | 13 | 2022-03-02 10:00:07.192 | NQ\H22 | NQ\H22 | TRD |
14 | 2022-03-02 10:00:07.000 | 267 | 4341.014981 | 14 | 2022-03-02 10:00:13.364 | ES\H22 | ES\H22 | TRD |
15 | 2022-03-02 10:00:07.500 | 331 | 4340.860272 | 15 | 2022-03-02 10:00:14.606 | ES\H22 | ES\H22 | TRD |
16 | 2022-03-02 10:00:08.000 | 70 | 14078.871429 | 16 | 2022-03-02 10:00:10.651 | NQ\H22 | NQ\H22 | TRD |
17 | 2022-03-02 10:00:08.500 | 96 | 14077.903646 | 17 | 2022-03-02 10:00:13.265 | NQ\H22 | NQ\H22 | TRD |
18 | 2022-03-02 10:00:09.000 | 11 | 4340.886364 | 18 | 2022-03-02 10:00:09.107 | ES\H22 | ES\H22 | TRD |
19 | 2022-03-02 10:00:09.500 | 2 | 14079.375000 | 19 | 2022-03-02 10:00:09.635 | NQ\H22 | NQ\H22 | TRD |
Point-in-time benchmarks: BBO at a given time#
Another common use case is find the prevailing quote (best bid/ask) at the time of an execution or at different time intervals (markouts) before/after the execution.
The following snippet find the prevailing quote at a give time.
q = otp.DataSource('CME', tick_type='QTE', back_to_first_tick=600, keep_first_tick_timestamp='BBO_TIME')
q = q[['ASK_PRICE','BID_PRICE', 'BBO_TIME']]
time = otp.dt(2022,3,1,19,25,48,123456)
otp.run(q, start=time, end=time, symbols=emini)
Time | ASK_PRICE | BID_PRICE | BBO_TIME | |
---|---|---|---|---|
0 | 2022-03-01 19:25:48.123456 | 14058.0 | 14056.25 | 2022-03-01 19:25:47.266282997 |
Point-in-time benchmarks: BBO at different offsets (markouts)#
We’ll find the BBO at different offets in seconds around the time of each execution.
We create a simulated set of executions as we did above for orders.
num_execs = 5
exec_time = [otp.dt(2022, 3, 2, 10) + otp.Milli(x*500) for x in range(0, num_execs)]
symbols = ['NQ\H22', 'ES\H22']
exec_df = pd.DataFrame({
'exex_time': exec_time,
'ID': [x for x in range(0, num_execs)],
'SYMBOL_NAME': [random.choice(symbols) for x in range(0, num_execs)],
})
exec_df
exex_time | ID | SYMBOL_NAME | |
---|---|---|---|
0 | 2022-03-02 10:00:00 | 0 | NQ\H22 |
1 | 2022-03-02 10:00:00.500000 | 1 | NQ\H22 |
2 | 2022-03-02 10:00:01 | 2 | NQ\H22 |
3 | 2022-03-02 10:00:01.500000 | 3 | NQ\H22 |
4 | 2022-03-02 10:00:02 | 4 | NQ\H22 |
Create a separate record for each markout with the corresponding timestamp.
markouts = [-.5, 0, .5, 1, 5, 10] # in seconds: e.g., .5 means half a second after the execution
executions_with_markouts_df = pd.DataFrame({
'markout_time': [x + otp.Milli(int(m*1000)) for x in exec_time for m in markouts],
'ID': [x for x in range(0, num_execs) for m in markouts],
'SYMBOL_NAME': list(itertools.chain(*[[random.choice(symbols)] * len(markouts) for x in range(0, num_execs)])),
'markout': [m for x in range(0, num_execs) for m in markouts],
})
executions_with_markouts_df
markout_time | ID | SYMBOL_NAME | markout | |
---|---|---|---|---|
0 | 2022-03-02 09:59:59.500000 | 0 | NQ\H22 | -0.5 |
1 | 2022-03-02 10:00:00 | 0 | NQ\H22 | 0.0 |
2 | 2022-03-02 10:00:00.500000 | 0 | NQ\H22 | 0.5 |
3 | 2022-03-02 10:00:01 | 0 | NQ\H22 | 1.0 |
4 | 2022-03-02 10:00:05 | 0 | NQ\H22 | 5.0 |
5 | 2022-03-02 10:00:10 | 0 | NQ\H22 | 10.0 |
6 | 2022-03-02 10:00:00 | 1 | ES\H22 | -0.5 |
7 | 2022-03-02 10:00:00.500000 | 1 | ES\H22 | 0.0 |
8 | 2022-03-02 10:00:01 | 1 | ES\H22 | 0.5 |
9 | 2022-03-02 10:00:01.500000 | 1 | ES\H22 | 1.0 |
10 | 2022-03-02 10:00:05.500000 | 1 | ES\H22 | 5.0 |
11 | 2022-03-02 10:00:10.500000 | 1 | ES\H22 | 10.0 |
12 | 2022-03-02 10:00:00.500000 | 2 | NQ\H22 | -0.5 |
13 | 2022-03-02 10:00:01 | 2 | NQ\H22 | 0.0 |
14 | 2022-03-02 10:00:01.500000 | 2 | NQ\H22 | 0.5 |
15 | 2022-03-02 10:00:02 | 2 | NQ\H22 | 1.0 |
16 | 2022-03-02 10:00:06 | 2 | NQ\H22 | 5.0 |
17 | 2022-03-02 10:00:11 | 2 | NQ\H22 | 10.0 |
18 | 2022-03-02 10:00:01 | 3 | NQ\H22 | -0.5 |
19 | 2022-03-02 10:00:01.500000 | 3 | NQ\H22 | 0.0 |
20 | 2022-03-02 10:00:02 | 3 | NQ\H22 | 0.5 |
21 | 2022-03-02 10:00:02.500000 | 3 | NQ\H22 | 1.0 |
22 | 2022-03-02 10:00:06.500000 | 3 | NQ\H22 | 5.0 |
23 | 2022-03-02 10:00:11.500000 | 3 | NQ\H22 | 10.0 |
24 | 2022-03-02 10:00:01.500000 | 4 | NQ\H22 | -0.5 |
25 | 2022-03-02 10:00:02 | 4 | NQ\H22 | 0.0 |
26 | 2022-03-02 10:00:02.500000 | 4 | NQ\H22 | 0.5 |
27 | 2022-03-02 10:00:03 | 4 | NQ\H22 | 1.0 |
28 | 2022-03-02 10:00:07 | 4 | NQ\H22 | 5.0 |
29 | 2022-03-02 10:00:12 | 4 | NQ\H22 | 10.0 |
Convert DataFrame to a onetick-py
object and specify start/end times. Note that for the markouts query, the start and end times are the same as we’ll look backwards from the markout timestamp.
executions_with_markouts = otp.Ticks(executions_with_markouts_df.to_dict(orient='list'))
executions_with_markouts['_PARAM_START_TIME_NANOS'] = executions_with_markouts['markout_time']
executions_with_markouts['_PARAM_END_TIME_NANOS'] = executions_with_markouts['markout_time']
Define a function that looks back from the start of the query until it finds the first tick. Execute the function for each execution/markout.
def bbo_at_time(symbol):
q = otp.DataSource('CME', tick_type='QTE', back_to_first_tick=600, keep_first_tick_timestamp='BBO_TIME')
q = q[['ASK_PRICE','BID_PRICE', 'BBO_TIME']]
q['ID'] = symbol['ID']
q['markout'] = symbol['markout']
q['sym'] = symbol.name
return q
merged = otp.merge([bbo_at_time], symbols=executions_with_markouts, identify_input_ts=True)
otp.run(merged)
Time | ASK_PRICE | BID_PRICE | BBO_TIME | ID | markout | sym | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|---|---|---|---|---|
0 | 2022-03-02 09:59:59.500 | 14074.00 | 14073.25 | 2022-03-02 09:59:59.493561793 | 0 | -0.5 | NQ\H22 | NQ\H22 | QTE |
1 | 2022-03-02 10:00:00.000 | 14076.00 | 14075.25 | 2022-03-02 09:59:59.992733435 | 0 | 0.0 | NQ\H22 | NQ\H22 | QTE |
2 | 2022-03-02 10:00:00.000 | 4339.50 | 4339.25 | 2022-03-02 09:59:59.953397647 | 1 | -0.5 | ES\H22 | ES\H22 | QTE |
3 | 2022-03-02 10:00:00.500 | 14077.75 | 14076.75 | 2022-03-02 10:00:00.485313263 | 0 | 0.5 | NQ\H22 | NQ\H22 | QTE |
4 | 2022-03-02 10:00:00.500 | 4340.25 | 4339.75 | 2022-03-02 10:00:00.483290781 | 1 | 0.0 | ES\H22 | ES\H22 | QTE |
5 | 2022-03-02 10:00:00.500 | 14077.75 | 14076.75 | 2022-03-02 10:00:00.485313263 | 2 | -0.5 | NQ\H22 | NQ\H22 | QTE |
6 | 2022-03-02 10:00:01.000 | 14084.75 | 14084.00 | 2022-03-02 10:00:00.991406067 | 0 | 1.0 | NQ\H22 | NQ\H22 | QTE |
7 | 2022-03-02 10:00:01.000 | 4342.00 | 4341.75 | 2022-03-02 10:00:00.996314127 | 1 | 0.5 | ES\H22 | ES\H22 | QTE |
8 | 2022-03-02 10:00:01.000 | 14084.75 | 14084.00 | 2022-03-02 10:00:00.991406067 | 2 | 0.0 | NQ\H22 | NQ\H22 | QTE |
9 | 2022-03-02 10:00:01.000 | 14084.75 | 14084.00 | 2022-03-02 10:00:00.991406067 | 3 | -0.5 | NQ\H22 | NQ\H22 | QTE |
10 | 2022-03-02 10:00:01.500 | 4342.00 | 4341.50 | 2022-03-02 10:00:01.466008427 | 1 | 1.0 | ES\H22 | ES\H22 | QTE |
11 | 2022-03-02 10:00:01.500 | 14084.75 | 14083.75 | 2022-03-02 10:00:01.494346505 | 2 | 0.5 | NQ\H22 | NQ\H22 | QTE |
12 | 2022-03-02 10:00:01.500 | 14084.75 | 14083.75 | 2022-03-02 10:00:01.494346505 | 3 | 0.0 | NQ\H22 | NQ\H22 | QTE |
13 | 2022-03-02 10:00:01.500 | 14084.75 | 14083.75 | 2022-03-02 10:00:01.494346505 | 4 | -0.5 | NQ\H22 | NQ\H22 | QTE |
14 | 2022-03-02 10:00:02.000 | 14081.25 | 14080.25 | 2022-03-02 10:00:01.972081949 | 2 | 1.0 | NQ\H22 | NQ\H22 | QTE |
15 | 2022-03-02 10:00:02.000 | 14081.25 | 14080.25 | 2022-03-02 10:00:01.972081949 | 3 | 0.5 | NQ\H22 | NQ\H22 | QTE |
16 | 2022-03-02 10:00:02.000 | 14081.25 | 14080.25 | 2022-03-02 10:00:01.972081949 | 4 | 0.0 | NQ\H22 | NQ\H22 | QTE |
17 | 2022-03-02 10:00:02.500 | 14079.00 | 14078.25 | 2022-03-02 10:00:02.499183309 | 3 | 1.0 | NQ\H22 | NQ\H22 | QTE |
18 | 2022-03-02 10:00:02.500 | 14079.00 | 14078.25 | 2022-03-02 10:00:02.499183309 | 4 | 0.5 | NQ\H22 | NQ\H22 | QTE |
19 | 2022-03-02 10:00:03.000 | 14082.00 | 14081.25 | 2022-03-02 10:00:02.989870913 | 4 | 1.0 | NQ\H22 | NQ\H22 | QTE |
20 | 2022-03-02 10:00:05.000 | 14079.75 | 14079.00 | 2022-03-02 10:00:04.972474913 | 0 | 5.0 | NQ\H22 | NQ\H22 | QTE |
21 | 2022-03-02 10:00:05.500 | 4341.50 | 4341.25 | 2022-03-02 10:00:05.492164737 | 1 | 5.0 | ES\H22 | ES\H22 | QTE |
22 | 2022-03-02 10:00:06.000 | 14081.50 | 14081.00 | 2022-03-02 10:00:05.999545921 | 2 | 5.0 | NQ\H22 | NQ\H22 | QTE |
23 | 2022-03-02 10:00:06.500 | 14080.00 | 14079.25 | 2022-03-02 10:00:06.499182037 | 3 | 5.0 | NQ\H22 | NQ\H22 | QTE |
24 | 2022-03-02 10:00:07.000 | 14079.25 | 14078.75 | 2022-03-02 10:00:06.893076527 | 4 | 5.0 | NQ\H22 | NQ\H22 | QTE |
25 | 2022-03-02 10:00:10.000 | 14080.75 | 14079.75 | 2022-03-02 10:00:09.998393413 | 0 | 10.0 | NQ\H22 | NQ\H22 | QTE |
26 | 2022-03-02 10:00:10.500 | 4341.00 | 4340.50 | 2022-03-02 10:00:10.487766465 | 1 | 10.0 | ES\H22 | ES\H22 | QTE |
27 | 2022-03-02 10:00:11.000 | 14076.25 | 14075.50 | 2022-03-02 10:00:10.991862443 | 2 | 10.0 | NQ\H22 | NQ\H22 | QTE |
28 | 2022-03-02 10:00:11.500 | 14077.50 | 14076.75 | 2022-03-02 10:00:11.477362175 | 3 | 10.0 | NQ\H22 | NQ\H22 | QTE |
29 | 2022-03-02 10:00:12.000 | 14077.25 | 14076.50 | 2022-03-02 10:00:11.971269289 | 4 | 10.0 | NQ\H22 | NQ\H22 | QTE |
Upticks/Downticks#
You can mark each trade as an uptick or downtick.
trd = otp.DataSource('CME', tick_type='TRD')
def uptick(row):
if row['PRICE'] == otp.nan or row['PRICE'][-1] == otp.nan:
return otp.nan
if row['PRICE'] > row['PRICE'][-1]:
return 1
elif row['PRICE'] < row['PRICE'][-1]:
return -1
else:
return 0
trd['UPTICK'] = trd.apply(uptick)
trd = trd[['PRICE', 'UPTICK']]
trd = trd.first(5)
otp.run(trd, start=start, end=end, symbols=emini)
Time | PRICE | UPTICK | |
---|---|---|---|
0 | 2022-03-02 10:00:00.000789627 | 14074.75 | NaN |
1 | 2022-03-02 10:00:00.000929635 | 14075.75 | 1.0 |
2 | 2022-03-02 10:00:00.000929635 | 14076.00 | 1.0 |
3 | 2022-03-02 10:00:00.000929635 | 14076.00 | 0.0 |
4 | 2022-03-02 10:00:00.002409827 | 14075.75 | -1.0 |
Symbol Operations#
Quering multiple symbols#
A separate data series for each symbol#
symbols=['AAPL','MSFT'] # sybmols can be specified as a list
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE']]
trd = trd.first(5)
trd_df = otp.run(trd, symbols=symbols, concurrency=8, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
The result is a dict
keyed by symbol.
trd_df
{'AAPL': Time PRICE SIZE EXCHANGE
0 2022-03-01 10:00:00.003529175 165.75 100 Q
1 2022-03-01 10:00:00.003530928 165.75 100 Q
2 2022-03-01 10:00:00.003532720 165.75 100 Q
3 2022-03-01 10:00:00.003676379 165.75 100 Z
4 2022-03-01 10:00:00.004032611 165.75 104 H,
'MSFT': Time PRICE SIZE EXCHANGE
0 2022-03-01 10:00:00.003577531 298.53 50 Q
1 2022-03-01 10:00:00.012147126 298.53 60 K
2 2022-03-01 10:00:00.012251683 298.53 40 K
3 2022-03-01 10:00:00.012524905 298.54 27 P
4 2022-03-01 10:00:00.012602204 298.54 2 P}
trd_df['MSFT']
Time | PRICE | SIZE | EXCHANGE | |
---|---|---|---|---|
0 | 2022-03-01 10:00:00.003577531 | 298.53 | 50 | Q |
1 | 2022-03-01 10:00:00.012147126 | 298.53 | 60 | K |
2 | 2022-03-01 10:00:00.012251683 | 298.53 | 40 | K |
3 | 2022-03-01 10:00:00.012524905 | 298.54 | 27 | P |
4 | 2022-03-01 10:00:00.012602204 | 298.54 | 2 | P |
Ticks from multiple symbols merged into a single data series#
otp.merge
creates a single series ordered by time by merging ticks from per-symbol series.
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
trd = trd.first(5)
merged = otp.merge([trd], symbols=['AAPL','MSFT'], identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time | PRICE | SIZE | EXCHANGE | COND | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|---|---|---|
0 | 2022-03-01 10:00:00.003529175 | 165.75 | 100 | Q | @ | AAPL | TRD |
1 | 2022-03-01 10:00:00.003530928 | 165.75 | 100 | Q | @ | AAPL | TRD |
2 | 2022-03-01 10:00:00.003532720 | 165.75 | 100 | Q | @ | AAPL | TRD |
3 | 2022-03-01 10:00:00.003577531 | 298.53 | 50 | Q | @ I | MSFT | TRD |
4 | 2022-03-01 10:00:00.003676379 | 165.75 | 100 | Z | @ | AAPL | TRD |
5 | 2022-03-01 10:00:00.004032611 | 165.75 | 104 | H | @F | AAPL | TRD |
6 | 2022-03-01 10:00:00.012147126 | 298.53 | 60 | K | @F I | MSFT | TRD |
7 | 2022-03-01 10:00:00.012251683 | 298.53 | 40 | K | @ I | MSFT | TRD |
8 | 2022-03-01 10:00:00.012524905 | 298.54 | 27 | P | @F I | MSFT | TRD |
9 | 2022-03-01 10:00:00.012602204 | 298.54 | 2 | P | @F I | MSFT | TRD |
Symbols can come from different databases, in which case, the database name and ‘::’ would appear before the symbol name. The schema must be speficied explicitly in this case.
trd = otp.DataSource(tick_type='TRD')
trd = trd.table(PRICE=float, SIZE=int, EXCH_TIME=otp.nsectime, EXCHANGE=otp.string[8])
trd = trd.first(5)
merged = otp.merge([trd], symbols=['IDC1075::FL','MS113::FL.113'], identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time | PRICE | SIZE | EXCH_TIME | EXCHANGE | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|---|---|---|
0 | 2022-03-01 10:01:27.612 | 2.83 | 2000 | 2022-03-01 10:01:27.582372000 | ALP | IDC1075::FL | TRD |
1 | 2022-03-01 10:02:00.803 | 2.83 | 100 | 2022-03-01 10:02:00.803000000 | CHI | IDC1075::FL | TRD |
2 | 2022-03-01 10:02:00.803 | 2.83 | 600 | 2022-03-01 10:02:00.803162000 | CDX | IDC1075::FL | TRD |
3 | 2022-03-01 10:02:00.914 | 2.83 | 600 | 2022-03-01 10:02:00.803162203 | MS113::FL.113 | TRD | |
4 | 2022-03-01 10:07:05.436 | 2.84 | 3 | 2022-03-01 10:07:05.436496000 | TCM | IDC1075::FL | TRD |
5 | 2022-03-01 10:08:49.335 | 2.84 | 100 | 2022-03-01 10:08:49.422834453 | MS113::FL.113 | TRD | |
6 | 2022-03-01 10:08:49.410 | 2.84 | 55 | 2022-03-01 10:08:49.408000000 | PUR | IDC1075::FL | TRD |
7 | 2022-03-01 10:09:31.254 | 2.83 | 300 | 2022-03-01 10:09:31.435787439 | MS113::FL.113 | TRD | |
8 | 2022-03-01 10:09:31.254 | 2.83 | 100 | 2022-03-01 10:09:31.435920844 | MS113::FL.113 | TRD | |
9 | 2022-03-01 10:09:31.254 | 2.83 | 100 | 2022-03-01 10:09:31.435920850 | MS113::FL.113 | TRD |
Get all symbols from a database#
symbols = otp.Symbols(db='NYSE_TAQ', date=otp.dt(2022, 3, 1))
otp.run(symbols)
Time | SYMBOL_NAME | |
---|---|---|
0 | 2022-03-01 | A |
1 | 2022-03-01 | AA |
2 | 2022-03-01 | AAA |
3 | 2022-03-01 | AAAU |
4 | 2022-03-01 | AAC |
... | ... | ... |
12041 | 2022-03-01 | ZXZZT |
12042 | 2022-03-01 | ZY |
12043 | 2022-03-01 | ZYME |
12044 | 2022-03-01 | ZYNE |
12045 | 2022-03-01 | ZYXI |
12046 rows × 2 columns
Get all symbols that match a pattern#
symbols = otp.Symbols(db=['NYSE_TAQ'], date=otp.dt(2022, 3, 1), pattern='BBA%')
otp.run(symbols)
Time | SYMBOL_NAME | |
---|---|---|
0 | 2022-03-01 | BBAI |
1 | 2022-03-01 | BBAIWS |
2 | 2022-03-01 | BBAR |
3 | 2022-03-01 | BBAX |
Get trades for all symbols (or the ones matching a pattern) from a DB#
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
merged = otp.merge([trd], symbols=otp.Symbols('NYSE_TAQ', pattern='BBA%'), identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time | PRICE | SIZE | EXCHANGE | COND | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|---|---|---|
0 | 2022-03-01 10:00:01.586188800 | 3.1669 | 5 | D | I | BBAR | TRD |
1 | 2022-03-01 10:00:06.961051648 | 5.8367 | 25 | D | I | BBAI | TRD |
2 | 2022-03-01 10:00:13.122142720 | 3.1700 | 200 | N | BBAR | TRD | |
3 | 2022-03-01 10:00:13.122561280 | 3.1700 | 2136 | K | BBAR | TRD | |
4 | 2022-03-01 10:00:13.122561280 | 3.1700 | 218 | K | BBAR | TRD | |
... | ... | ... | ... | ... | ... | ... | ... |
275 | 2022-03-01 10:14:11.723998464 | 52.4100 | 1 | Z | I | BBAX | TRD |
276 | 2022-03-01 10:14:11.723998464 | 52.4100 | 38 | B | F I | BBAX | TRD |
277 | 2022-03-01 10:14:11.724048640 | 52.4100 | 38 | B | F I | BBAX | TRD |
278 | 2022-03-01 10:14:44.295828480 | 5.7900 | 12 | J | F I | BBAI | TRD |
279 | 2022-03-01 10:14:44.295892992 | 5.7900 | 88 | J | F I | BBAI | TRD |
280 rows × 7 columns
Count the number of trades per symbol#
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
otp.run(count, concurrency=8, start=otp.dt(2022,3,1), end=otp.dt(2022,3,2), symbols=otp.Symbols(db='NYSE_TAQ', pattern='BBA%'))
{'BBAI': Time COUNT
0 2022-03-02 1063,
'BBAIWS': Time COUNT
0 2022-03-02 137,
'BBAR': Time COUNT
0 2022-03-02 3281,
'BBAX': Time COUNT
0 2022-03-02 1115}
Count the number of trades across symbols#
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
count = otp.merge([count], symbols=otp.Symbols(db='NYSE_TAQ', pattern='AA%'), identify_input_ts=True)
total_count = count.agg({'COUNT': otp.agg.sum('COUNT')})
otp.run(total_count, start=otp.dt(2022,3,1), end=otp.dt(2022,3,2))
Time | COUNT | |
---|---|---|
0 | 2022-03-02 | 1160353 |
Find the most traded symbols and query them (Multi-stage queries
)#
%%time
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
count = otp.merge([count], symbols=otp.Symbols(db='NYSE_TAQ', pattern='AA%'), identify_input_ts=True)
most_traded = count.high('COUNT', n=5)
otp.run(most_traded, start=otp.dt(2022,3,1), end=otp.dt(2022,3,2))
CPU times: user 554 ms, sys: 18.1 ms, total: 572 ms
Wall time: 1.09 s
Time | COUNT | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|
0 | 2022-03-02 | 749660 | AAPL | TRD |
1 | 2022-03-02 | 199756 | AAL | TRD |
2 | 2022-03-02 | 131644 | AA | TRD |
3 | 2022-03-02 | 23995 | AAP | TRD |
4 | 2022-03-02 | 15136 | AAWW | TRD |
Now that we have a query that finds the most traded symbols, we can use it to provide these symbols for querying
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
count = otp.merge([count], symbols=otp.Symbols(db='NYSE_TAQ', pattern='AA%'), identify_input_ts=True)
most_traded = count.high('COUNT', n=5)
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE']]
trd = trd.first(5)
merged = otp.merge([trd], symbols=most_traded, identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time | PRICE | SIZE | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|---|
0 | 2022-03-01 10:00:00.002859008 | 84.030 | 30 | AA | TRD |
1 | 2022-03-01 10:00:00.002949376 | 84.050 | 100 | AA | TRD |
2 | 2022-03-01 10:00:00.003290112 | 84.050 | 118 | AA | TRD |
3 | 2022-03-01 10:00:00.003295744 | 84.060 | 5 | AA | TRD |
4 | 2022-03-01 10:00:00.003515648 | 84.060 | 70 | AA | TRD |
5 | 2022-03-01 10:00:00.003529175 | 165.750 | 100 | AAPL | TRD |
6 | 2022-03-01 10:00:00.003530928 | 165.750 | 100 | AAPL | TRD |
7 | 2022-03-01 10:00:00.003532720 | 165.750 | 100 | AAPL | TRD |
8 | 2022-03-01 10:00:00.003676379 | 165.750 | 100 | AAPL | TRD |
9 | 2022-03-01 10:00:00.004032611 | 165.750 | 104 | AAPL | TRD |
10 | 2022-03-01 10:00:00.007610769 | 78.740 | 50 | AAXJ | TRD |
11 | 2022-03-01 10:00:00.007661983 | 78.750 | 50 | AAXJ | TRD |
12 | 2022-03-01 10:00:00.007663857 | 78.750 | 8 | AAXJ | TRD |
13 | 2022-03-01 10:00:00.017791644 | 16.575 | 100 | AAL | TRD |
14 | 2022-03-01 10:00:00.017915297 | 16.580 | 100 | AAL | TRD |
15 | 2022-03-01 10:00:00.017926970 | 16.580 | 300 | AAL | TRD |
16 | 2022-03-01 10:00:00.018955115 | 16.580 | 68 | AAL | TRD |
17 | 2022-03-01 10:00:00.018963402 | 16.580 | 232 | AAL | TRD |
18 | 2022-03-01 10:00:00.059167744 | 206.190 | 15 | AAP | TRD |
19 | 2022-03-01 10:00:00.059167744 | 206.210 | 3 | AAP | TRD |
20 | 2022-03-01 10:00:00.059202816 | 206.210 | 15 | AAP | TRD |
21 | 2022-03-01 10:00:00.059348992 | 206.080 | 17 | AAP | TRD |
22 | 2022-03-01 10:00:00.059398656 | 206.210 | 15 | AAP | TRD |
23 | 2022-03-01 10:00:00.175845646 | 78.790 | 3 | AAXJ | TRD |
24 | 2022-03-01 10:00:00.190534276 | 78.800 | 2 | AAXJ | TRD |
Symbologies#
onetick-py
can query data using different symbologies.
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
trd = trd.first(1)
merged = otp.merge([trd], symbols=['BTKR::::AAPL US','FGV::::BBG000BPH459'], identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15), symbol_date=otp.dt(2022, 10, 1))
Time | PRICE | SIZE | EXCHANGE | COND | SYMBOL_NAME | TICK_TYPE | |
---|---|---|---|---|---|---|---|
0 | 2022-03-01 10:00:00.003529175 | 165.75 | 100 | Q | @ | BTKR::::AAPL US | TRD |
1 | 2022-03-01 10:00:00.003577531 | 298.53 | 50 | Q | @ I | FGV::::BBG000BPH459 | TRD |
You can view and select by pattern symbols in a non-default symbology.
figi = otp.Symbols('NYSE_TAQ', symbology='FGV', show_original_symbols=True, for_tick_type='TRD', date=otp.dt(2022,12,1))
figi, _ = figi[figi['SYMBOL_NAME']!='']
otp.run(figi)
Time | SYMBOL_NAME | ORIGINAL_SYMBOL_NAME | |
---|---|---|---|
0 | 2022-12-01 | BBG000C2V3D6 | NYSE_TAQ::A |
1 | 2022-12-01 | BBG00B3T3HD3 | NYSE_TAQ::AA |
2 | 2022-12-01 | BBG01B0JRCS6 | NYSE_TAQ::AAA |
3 | 2022-12-01 | BBG00LPXX872 | NYSE_TAQ::AAAU |
4 | 2022-12-01 | BBG00YZC2Z91 | NYSE_TAQ::AAC |
... | ... | ... | ... |
10652 | 2022-12-01 | BBG00YZ2VTP7 | NYSE_TAQ::ZWRK |
10653 | 2022-12-01 | BBG000H8R0N8 | NYSE_TAQ::ZWS |
10654 | 2022-12-01 | BBG019XSYC89 | NYSE_TAQ::ZYME |
10655 | 2022-12-01 | BBG007BBS8B7 | NYSE_TAQ::ZYNE |
10656 | 2022-12-01 | BBG000BJBXZ2 | NYSE_TAQ::ZYXI |
10657 rows × 3 columns
Order book analytics#
onetick-py
offers functions for analyzing tick-by-tick order book. There are three representations of an order book. We’ll show top 3 levels only for ease of exposition.
A book can be displayed with a tick per level per side. We refer to a level in the book as a ‘price level’ or ‘prl’.
snapshot_time=start
prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL', max_levels=3)
otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time)
Time | PRICE | UPDATE_TIME | SIZE | LEVEL | BUY_SELL_FLAG | |
---|---|---|---|---|---|---|
0 | 2022-03-02 10:00:00 | 14076.00 | 2022-03-02 09:59:59.992707035 | 1 | 1 | 1 |
1 | 2022-03-02 10:00:00 | 14076.25 | 2022-03-02 09:59:59.953516579 | 2 | 2 | 1 |
2 | 2022-03-02 10:00:00 | 14076.50 | 2022-03-02 09:59:59.893691735 | 4 | 3 | 1 |
3 | 2022-03-02 10:00:00 | 14075.25 | 2022-03-02 09:59:59.992733435 | 1 | 1 | 0 |
4 | 2022-03-02 10:00:00 | 14075.00 | 2022-03-02 09:59:59.954720897 | 2 | 2 | 0 |
5 | 2022-03-02 10:00:00 | 14074.75 | 2022-03-02 09:59:59.948987231 | 2 | 3 | 0 |
Alternatively, a book can show a tick per level with both ask and bid price/size info.
snapshot_time=start
prl = otp.ObSnapshotWide(db='CME', tick_type='PRL_FULL', max_levels=3)
otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time)
Time | BID_PRICE | BID_UPDATE_TIME | BID_SIZE | ASK_PRICE | ASK_UPDATE_TIME | ASK_SIZE | LEVEL | |
---|---|---|---|---|---|---|---|---|
0 | 2022-03-02 10:00:00 | 14075.25 | 2022-03-02 09:59:59.992733435 | 1 | 14076.00 | 2022-03-02 09:59:59.992707035 | 1 | 1 |
1 | 2022-03-02 10:00:00 | 14075.00 | 2022-03-02 09:59:59.954720897 | 2 | 14076.25 | 2022-03-02 09:59:59.953516579 | 2 | 2 |
2 | 2022-03-02 10:00:00 | 14074.75 | 2022-03-02 09:59:59.948987231 | 2 | 14076.50 | 2022-03-02 09:59:59.893691735 | 4 | 3 |
Finally, all levels can be displayed in one tick.
snapshot_time=start
prl = otp.ObSnapshotFlat(db='CME', tick_type='PRL_FULL', max_levels=3)
print(otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time))
Time BID_PRICE1 BID_UPDATE_TIME1 BID_SIZE1 ASK_PRICE1 ASK_UPDATE_TIME1 ASK_SIZE1 BID_PRICE2 BID_UPDATE_TIME2 BID_SIZE2 ASK_PRICE2 ASK_UPDATE_TIME2 ASK_SIZE2 BID_PRICE3 BID_UPDATE_TIME3 BID_SIZE3 ASK_PRICE3 ASK_UPDATE_TIME3 ASK_SIZE3
0 2022-03-02 10:00:00 14075.25 2022-03-02 09:59:59.992733435 1 14076.0 2022-03-02 09:59:59.992707035 1 14075.0 2022-03-02 09:59:59.954720897 2 14076.25 2022-03-02 09:59:59.953516579 2 14074.75 2022-03-02 09:59:59.948987231 2 14076.5 2022-03-02 09:59:59.893691735 4
We can output the book (in any of the three representation) on every change to price/size at any of the levels.
prl = otp.ObSnapshotFlat(db='CME', tick_type='PRL_FULL', max_levels=3, running=True)
prl = prl.drop(r".+TIME\d")
print(otp.run(prl, symbols=emini, start=start, end=start+otp.Milli(100)))
Time BID_PRICE1 BID_SIZE1 ASK_PRICE1 ASK_SIZE1 BID_PRICE2 BID_SIZE2 ASK_PRICE2 ASK_SIZE2 BID_PRICE3 BID_SIZE3 ASK_PRICE3 ASK_SIZE3
0 2022-03-02 10:00:00.000000000 14075.25 1 14076.00 1 14075.00 2 14076.25 2 14074.75 2 14076.50 4
1 2022-03-02 10:00:00.000788455 14075.00 2 14076.00 1 14074.75 2 14076.25 2 14074.50 2 14076.50 4
2 2022-03-02 10:00:00.000788627 14074.75 2 14076.00 1 14074.50 2 14076.25 2 14074.25 2 14076.50 4
3 2022-03-02 10:00:00.000789627 14074.50 2 14076.00 1 14074.25 2 14076.25 2 14074.00 2 14076.50 4
4 2022-03-02 10:00:00.000790993 14074.50 1 14076.00 1 14074.25 2 14076.25 2 14074.00 2 14076.50 4
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
341 2022-03-02 10:00:00.091403659 14074.50 2 14075.25 1 14074.25 3 14075.50 1 14073.75 2 14075.75 1
342 2022-03-02 10:00:00.091700529 14074.50 2 14075.50 1 14074.25 3 14075.75 1 14073.75 2 14076.00 1
343 2022-03-02 10:00:00.091924533 14074.50 2 14075.50 2 14074.25 3 14075.75 1 14073.75 2 14076.00 1
344 2022-03-02 10:00:00.094879495 14074.50 2 14075.25 1 14074.25 3 14075.50 2 14073.75 2 14075.75 1
345 2022-03-02 10:00:00.097022501 14074.50 2 14075.50 2 14074.25 3 14075.75 1 14073.75 2 14076.00 1
[346 rows x 13 columns]
The ObSnapshot
method doesn’t require specifying max_levels
. The entire book is returned when the parameter is not specified.
snapshot_time=start
prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL')
otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time)
Time | PRICE | UPDATE_TIME | SIZE | LEVEL | BUY_SELL_FLAG | |
---|---|---|---|---|---|---|
0 | 2022-03-02 10:00:00 | 14076.00 | 2022-03-02 09:59:59.992707035 | 1 | 1 | 1 |
1 | 2022-03-02 10:00:00 | 14076.25 | 2022-03-02 09:59:59.953516579 | 2 | 2 | 1 |
2 | 2022-03-02 10:00:00 | 14076.50 | 2022-03-02 09:59:59.893691735 | 4 | 3 | 1 |
3 | 2022-03-02 10:00:00 | 14076.75 | 2022-03-02 09:59:59.893054207 | 2 | 4 | 1 |
4 | 2022-03-02 10:00:00 | 14077.00 | 2022-03-02 09:59:59.334717127 | 2 | 5 | 1 |
... | ... | ... | ... | ... | ... | ... |
1628 | 2022-03-02 10:00:00 | 6490.00 | 2022-03-01 17:59:59.999000000 | 3 | 809 | 0 |
1629 | 2022-03-02 10:00:00 | 1586.00 | 2022-03-01 17:59:59.999000000 | 1 | 810 | 0 |
1630 | 2022-03-02 10:00:00 | 786.50 | 2022-03-01 17:59:59.999000000 | 1 | 811 | 0 |
1631 | 2022-03-02 10:00:00 | 200.00 | 2022-03-01 17:59:59.999000000 | 1 | 812 | 0 |
1632 | 2022-03-02 10:00:00 | 1.00 | 2022-03-01 17:59:59.999000000 | 1 | 813 | 0 |
1633 rows × 6 columns
Book Imbalance#
Let’s find the time weighted book imbalance. The imbalance at a given time is defined as the sum of the bid sizes at the top x levels minus the sum of the ask sizes at the top x levels divided by the sum of these two terms: the values close to 1 mean the book is much heavier on the bid side, close to -1 – on the ask side, equal to zero means the sizes are the same.
x = 3
prl = otp.ObSnapshotWide(db='CME', tick_type='PRL_FULL', max_levels=x, running=True)
prls_df = otp.run(prl, symbols=emini, start=start, end=start+otp.Milli(100))
print(prls_df.head(7))
prl = prl.agg({'ask_vol': otp.agg.sum('ASK_SIZE'), 'bid_vol': otp.agg.sum('BID_SIZE')}, bucket_units='ticks', bucket_interval=x)
prl['imb'] = (prl['bid_vol'] - prl['ask_vol']) / (prl['bid_vol'] + prl['ask_vol'])
prls_df = otp.run(prl, symbols=emini, start=start, end=start+otp.Milli(100))
print(prls_df.head())
imb_stats = prl.agg({
'tw_imb': otp.agg.tw_average('imb'),
'mean': otp.agg.average('imb'),
'stdev': otp.agg.stddev('imb'),
})
print(otp.run(imb_stats, symbols=emini, start=start, end=start+otp.Milli(100)))
Time BID_PRICE BID_UPDATE_TIME BID_SIZE ASK_PRICE ASK_UPDATE_TIME ASK_SIZE LEVEL
0 2022-03-02 10:00:00.000000000 14075.25 2022-03-02 09:59:59.992733435 1 14076.00 2022-03-02 09:59:59.992707035 1 1
1 2022-03-02 10:00:00.000000000 14075.00 2022-03-02 09:59:59.954720897 2 14076.25 2022-03-02 09:59:59.953516579 2 2
2 2022-03-02 10:00:00.000000000 14074.75 2022-03-02 09:59:59.948987231 2 14076.50 2022-03-02 09:59:59.893691735 4 3
3 2022-03-02 10:00:00.000788455 14075.00 2022-03-02 09:59:59.954720897 2 14076.00 2022-03-02 09:59:59.992707035 1 1
4 2022-03-02 10:00:00.000788455 14074.75 2022-03-02 09:59:59.948987231 2 14076.25 2022-03-02 09:59:59.953516579 2 2
5 2022-03-02 10:00:00.000788455 14074.50 2022-03-02 09:59:59.893306297 2 14076.50 2022-03-02 09:59:59.893691735 4 3
6 2022-03-02 10:00:00.000788627 14074.75 2022-03-02 09:59:59.948987231 2 14076.00 2022-03-02 09:59:59.992707035 1 1
Time ask_vol bid_vol imb
0 2022-03-02 10:00:00.000000000 7 5 -0.166667
1 2022-03-02 10:00:00.000788455 7 6 -0.076923
2 2022-03-02 10:00:00.000788627 7 6 -0.076923
3 2022-03-02 10:00:00.000789627 7 6 -0.076923
4 2022-03-02 10:00:00.000790993 7 5 -0.166667
Time tw_imb mean stdev
0 2022-03-02 10:00:00.100 0.018318 0.040778 0.204065
Book sweep#
There are two version of book sweep: by price and by quantity. Book sweep by price, take a price as an input and returns the total quatity available at that price or better. Book sweep by quantity, takes a quantity as an input and returns the VWAP if the quantity were executed immediately.
def side_to_direction(side):
return 1 if side == 'ASK' else -1
def sweep_by_price(side, price):
prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL', side=side)
direction = side_to_direction(side)
prl, _ = prl[direction * prl['PRICE'] <= direction * price]
prl = prl.agg({'total_qty': otp.agg.sum('SIZE')})
return otp.run(prl, symbols=emini, start=start, end=start)
print(sweep_by_price('BID', 14075))
print(sweep_by_price('ASK', 14077))
Time total_qty
0 2022-03-02 10:00:00 3
Time total_qty
0 2022-03-02 10:00:00 11
def sweep_by_qty(side, qty):
prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL', side=side)
prl = prl.agg({'total_qty': otp.agg.sum('SIZE')}, running=True, all_fields=True)
direction = side_to_direction(side)
prl, _ = prl[prl['total_qty'] - prl['SIZE'] < qty]
# update the SIZE in the last tick only so that total_qty is exactly qty
prl['SIZE'] = prl.apply(lambda tick: prl['SIZE'] - (prl['total_qty'] - qty) if prl['total_qty'] > qty else prl['SIZE'])
prl = prl.agg({'VWAP': otp.agg.vwap('PRICE', 'SIZE')})
return otp.run(prl, symbols=emini, start=start, end=start)
print(sweep_by_qty('BID', 10))
print(sweep_by_qty('ASK', 10))
Time VWAP
0 2022-03-02 10:00:00 14074.625
Time VWAP
0 2022-03-02 10:00:00 14076.5
Market By Order#
Order Book data may be annotated with ‘key’ fields lets you break down the book by each value of the ‘key’ fields. For example, a book could by keyed by market participant ID, allowing you to see the book with the orders of a given market participant only. Some exchanges provide ‘market-by-order’ data where the book is keyed by order id. Set show_full_detail
to True
to see the book broken down to the most granular level. The example below is a market-by-order book.
prl = otp.ObSnapshot('CME', tick_type='PRL_FULL', side='BID', show_full_detail=True)
prl = prl.first(5)
print(otp.run(prl, symbols=emini, start=start, end=start))
Time ORDER_ID BUY_SELL_FLAG ORDER_TYPE PRICE SIZE TIME_PRIORITY RECORD_TYPE DELETED_TIME TICK_STATUS OMDSEQ LEVEL UPDATE_TIME
0 2022-03-02 10:00:00 6830860875633 0 L 14075.25 1 43132163778 R 1969-12-31 19:00:00 0 3 1 2022-03-02 09:59:59.992733435
1 2022-03-02 10:00:00 6830860875525 0 L 14075.00 1 43132163642 R 1969-12-31 19:00:00 0 11 2 2022-03-02 09:59:59.949569201
2 2022-03-02 10:00:00 6830860875517 0 L 14075.00 1 43132163634 R 1969-12-31 19:00:00 0 3 2 2022-03-02 09:59:59.949264581
3 2022-03-02 10:00:00 6830860875515 0 L 14074.75 1 43132163632 R 1969-12-31 19:00:00 0 0 3 2022-03-02 09:59:59.948987231
4 2022-03-02 10:00:00 6830860875480 0 L 14074.75 1 43132163583 R 1969-12-31 19:00:00 0 3 3 2022-03-02 09:59:59.910612819
Market-by-order data can be used to analyze/validate the priority mechanism used by the exchange.``
prl = otp.ObSnapshot('CME', tick_type='PRL_FULL', side='BID', show_full_detail=True)
"""
ORDER_TYPE:
L = Limit order
I = Implied order
Implied liquidity doesn’t have priority as it's always last to execute at any price level.
It also doesn’t have an order ID, so the IDs that you see in the db are synthetic
(consisting of 1 or 2 for the 1st/2nd implied level, and E/F for the buy/sell side respectively).
In order to rank the orders within a given price point by priority, you need to sort first by ORDER_TYPE (“L” comes before “I”),
then by TIME_PRIORITY (lowest value comes first).
"""
prl = prl.sort(['LEVEL','ORDER_TYPE', 'TIME_PRIORITY'],ascending=[True,False, True])
orders = otp.run(prl, symbols=emini, start=start, end=start);
orders = orders[['ORDER_ID', 'PRICE', 'LEVEL', 'TIME_PRIORITY','SIZE', 'BUY_SELL_FLAG', 'ORDER_TYPE', 'RECORD_TYPE', 'UPDATE_TIME', 'TICK_STATUS']]
orders.head()
ORDER_ID | PRICE | LEVEL | TIME_PRIORITY | SIZE | BUY_SELL_FLAG | ORDER_TYPE | RECORD_TYPE | UPDATE_TIME | TICK_STATUS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 6830860875633 | 14075.25 | 1 | 43132163778 | 1 | 0 | L | R | 2022-03-02 09:59:59.992733435 | 0 |
1 | 6830860875517 | 14075.00 | 2 | 43132163634 | 1 | 0 | L | R | 2022-03-02 09:59:59.949264581 | 0 |
2 | 6830860875525 | 14075.00 | 2 | 43132163642 | 1 | 0 | L | R | 2022-03-02 09:59:59.949569201 | 0 |
3 | 6830860875480 | 14074.75 | 3 | 43132163583 | 1 | 0 | L | R | 2022-03-02 09:59:59.910612819 | 0 |
4 | 6830860875515 | 14074.75 | 3 | 43132163632 | 1 | 0 | L | R | 2022-03-02 09:59:59.948987231 | 0 |
Best practices#
The golden rule of otp: call otp.run
as late as possible#
Processing before the otp.run
call is good because
it takes place in OneTick which is optimized for time series processing
it is parallelized by symbol and query (e.g., when “join with query” is called)
it takes place in a scalable environment
Processing after the otp.run
call is bad because
it takes place in python which is slow
it takes place in the notebook which is limited by a fixed and normally small instance size where the notebook runs
Bad:
trd = otp.DataSource('CME', tick_type='TRD')
trd_df = otp.run(trd, start=start, end=end, symbols=emini)
trd_df['SIZE'].agg('sum')
7520
Good:
trd = otp.DataSource('CME', tick_type='TRD')
trd = trd.agg({'VOLUME': otp.agg.sum(trd['SIZE'])})
otp.run(trd, start=start, end=end, symbols=emini)
Time | VOLUME | |
---|---|---|
0 | 2022-03-02 10:05:00 | 7520 |
Bad:
trd = otp.DataSource('CME', tick_type='TRD')
trd_df = otp.run(trd, start=start, end=end, symbols=emini)
print(trd_df[trd_df['TRADE_TYPE'] == 'LEG'].head())
Time EXCH_TIME PRICE SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE TRADE_ID BUY_ORDER_ID SELL_ORDER_ID DELETED_TIME TICK_STATUS OMDSEQ
271 2022-03-02 10:00:07.592961899 1969-12-31 19:00:00 NaN 1 LEG - 1 1969-12-31 19:00:00 0 9
272 2022-03-02 10:00:07.593666573 1969-12-31 19:00:00 NaN 2 LEG - 1 1969-12-31 19:00:00 0 0
3114 2022-03-02 10:02:20.046438331 1969-12-31 19:00:00 NaN 3 LEG - 1 1969-12-31 19:00:00 0 0
4753 2022-03-02 10:03:15.584272405 1969-12-31 19:00:00 NaN 1 LEG - 1 1969-12-31 19:00:00 0 3
5328 2022-03-02 10:03:35.612187685 1969-12-31 19:00:00 NaN 1 LEG - 1 1969-12-31 19:00:00 0 0
Good:
trd = otp.DataSource('CME', tick_type='TRD')
leg, notleg = trd[trd['TRADE_TYPE'] == 'LEG']
leg = leg.first(5)
print(otp.run(leg, start=start, end=end, symbols=emini))
Time EXCH_TIME PRICE SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE TRADE_ID BUY_ORDER_ID SELL_ORDER_ID DELETED_TIME TICK_STATUS OMDSEQ
0 2022-03-02 10:00:07.592961899 1969-12-31 19:00:00 NaN 1 LEG - 1 1969-12-31 19:00:00 0 9
1 2022-03-02 10:00:07.593666573 1969-12-31 19:00:00 NaN 2 LEG - 1 1969-12-31 19:00:00 0 0
2 2022-03-02 10:02:20.046438331 1969-12-31 19:00:00 NaN 3 LEG - 1 1969-12-31 19:00:00 0 0
3 2022-03-02 10:03:15.584272405 1969-12-31 19:00:00 NaN 1 LEG - 1 1969-12-31 19:00:00 0 3
4 2022-03-02 10:03:35.612187685 1969-12-31 19:00:00 NaN 1 LEG - 1 1969-12-31 19:00:00 0 0
Bad:
trd = otp.DataSource('CME', tick_type='TRD')
trd_df = otp.run(trd, start=start, end=end, symbols=emini)
trd_df['TRADE_TYPE'].unique()
array(['-', 'LEG'], dtype=object)
Good:
trd = otp.DataSource('CME', tick_type='TRD')
trd = trd.distinct(keys=['TRADE_TYPE'])
otp.run(trd, start=start, end=end, symbols=emini)
Time | TRADE_TYPE | |
---|---|---|
0 | 2022-03-02 10:05:00 | - |
1 | 2022-03-02 10:05:00 | LEG |
More advanced concepts#
Timezones#
# this sets the timezone for all of the otp calls until its redefined
otp.config['tz'] = 'GMT'
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
print(otp.run(trd.first()[['PRICE']],start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))
otp.config['tz'] = 'America/Chicago'
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
print(otp.run(trd.first()[['PRICE']],start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))
otp.config['tz'] = 'EST5EDT'
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
print(otp.run(trd.first()[['PRICE']],start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))
Time PRICE
0 2022-03-01 10:00:04.133843068 164.01
Time PRICE
0 2022-03-01 10:00:00.010168563 165.2822
Time PRICE
0 2022-03-01 10:00:00.003529175 165.75
# you can also pass a timezone to otp.run()
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
print(otp.run(trd,timezone='GMT').head())
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
print(otp.run(trd,timezone='America/Chicago').head())
Time EXCHANGE COND STOP_STOCK SOURCE TRF TTE TICKER PRICE DELETED_TIME TICK_STATUS SIZE CORR SEQ_NUM TRADE_ID PARTICIPANT_TIME TRF_TIME OMDSEQ
0 2022-03-01 10:00:04.133843068 P @ TI N 0 AAPL 164.01 1970-01-01 0 1 0 18621 452 2022-03-01 10:00:04.133499136 1970-01-01 0
1 2022-03-01 10:00:08.519225755 P @ TI N 0 AAPL 164.00 1970-01-01 0 1 0 18635 453 2022-03-01 10:00:08.518881024 1970-01-01 0
2 2022-03-01 10:00:15.746801681 Q @FTI N 1 AAPL 164.01 1970-01-01 0 1 0 18649 405 2022-03-01 10:00:15.746784019 1970-01-01 0
3 2022-03-01 10:00:30.490236820 P @ TI N 0 AAPL 164.00 1970-01-01 0 1 0 18686 454 2022-03-01 10:00:30.489892608 1970-01-01 0
4 2022-03-01 10:00:30.497604203 K @FTI N 1 AAPL 164.01 1970-01-01 0 1 0 18687 131 2022-03-01 10:00:30.497364000 1970-01-01 0
Time EXCHANGE COND STOP_STOCK SOURCE TRF TTE TICKER PRICE DELETED_TIME TICK_STATUS SIZE CORR SEQ_NUM TRADE_ID PARTICIPANT_TIME TRF_TIME OMDSEQ
0 2022-03-01 10:00:00.010168563 D @4 W N Q 1 AAPL 165.2822 1969-12-31 18:00:00 0 109 0 2283964 53706 2022-03-01 10:00:00.000085000 2022-03-01 10:00:00.010144942 0
1 2022-03-01 10:00:00.012746348 Z @ N 0 AAPL 165.1700 1969-12-31 18:00:00 0 100 0 2283965 31202 2022-03-01 10:00:00.012545000 1969-12-31 18:00:00.000000000 0
2 2022-03-01 10:00:00.012754964 Z @ N 0 AAPL 165.1700 1969-12-31 18:00:00 0 100 0 2283966 31203 2022-03-01 10:00:00.012545000 1969-12-31 18:00:00.000000000 1
3 2022-03-01 10:00:00.012757825 Z @ N 0 AAPL 165.1700 1969-12-31 18:00:00 0 100 0 2283967 31204 2022-03-01 10:00:00.012545000 1969-12-31 18:00:00.000000000 2
4 2022-03-01 10:00:00.012902668 U @ N 0 AAPL 165.1700 1969-12-31 18:00:00 0 100 0 2283968 12450 2022-03-01 10:00:00.012689732 1969-12-31 18:00:00.000000000 3
Reverting to OneTick when otp is not enough: the sink() method#
We are working to extend otp to cover most of the cases but OneTick is big and powerful and oftentimes it offers functionality that is not yet available in otp. The ‘sink()’ method lets you (pretty seamlessly) transition to a lower level API called onetick.query that supports all of OneTick features. See docs/onetick.query.api.python.html in OneTick documentation.
import onetick.query as otq
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd.sink(otq.Variance(input_field_name='PRICE', output_field_name='VAR_PRICE', bucket_interval=60))
trd = trd.table(VAR_PRICE=float, strict=False) # need to explicitly add the fields added by the `sink` methods to the schema
trd['std'] = otp.math.sqrt(trd['VAR_PRICE'])
otp.run(trd, start=otp.dt(2022,9,1,10), end=otp.dt(2022,9,1,16), symbols='AA')
Time | VAR_PRICE | std | |
---|---|---|---|
0 | 2022-09-01 10:01:00 | 0.016861 | 0.129851 |
1 | 2022-09-01 10:02:00 | 0.001083 | 0.032903 |
2 | 2022-09-01 10:03:00 | 0.004457 | 0.066762 |
3 | 2022-09-01 10:04:00 | 0.001522 | 0.039015 |
4 | 2022-09-01 10:05:00 | 0.001840 | 0.042895 |
... | ... | ... | ... |
355 | 2022-09-01 15:56:00 | 0.000230 | 0.015159 |
356 | 2022-09-01 15:57:00 | 0.000491 | 0.022159 |
357 | 2022-09-01 15:58:00 | 0.000508 | 0.022529 |
358 | 2022-09-01 15:59:00 | 0.000034 | 0.005794 |
359 | 2022-09-01 16:00:00 | 0.000625 | 0.024996 |
360 rows × 3 columns
Filters: using both if
and else
branches#
Filters return two objects: one for the ticks that satisfy the conditions and the other one for the ones that don’t. Having both objects allows combining the two streams without duplication anything that happens before the filter. An example is below.
# this processing happens before a filter
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
trd = trd.first(5)
print(otp.run(trd, symbols='AAPL', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))
# the filter is applied
q, non_q = trd[trd['EXCHANGE']=='Q']
# different processing can be applied to the different braches of the filter
q = q.first()
# the two branches can them be merged
merged = otp.merge([q, non_q])
otp.run(merged, symbols='AAPL', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time PRICE SIZE EXCHANGE COND
0 2022-03-01 10:00:00.003529175 165.75 100 Q @
1 2022-03-01 10:00:00.003530928 165.75 100 Q @
2 2022-03-01 10:00:00.003532720 165.75 100 Q @
3 2022-03-01 10:00:00.003676379 165.75 100 Z @
4 2022-03-01 10:00:00.004032611 165.75 104 H @F
Time | PRICE | SIZE | EXCHANGE | COND | |
---|---|---|---|---|---|
0 | 2022-03-01 10:00:00.003529175 | 165.75 | 100 | Q | @ |
1 | 2022-03-01 10:00:00.003676379 | 165.75 | 100 | Z | @ |
2 | 2022-03-01 10:00:00.004032611 | 165.75 | 104 | H | @F |