Quick Start#

onetick-py is a pandas-like API for querying data (in particular, tick-by-tick market data) in OneTick. This notebooks follows common business use cases to illustrate how they can be implemented with onetick-py.

# %%capture
# !pip install -U onetick.py
import pandas as pd
import random
import itertools

import onetick.py as otp
# set default onetick.py configuration
from datetime import datetime
otp.config['default_start_time'] = datetime(2003, 12, 1, 0, 0, 0)
otp.config['default_end_time'] = datetime(2003, 12, 4, 0, 0, 0)
otp.config['tz'] = 'EST5EDT' # set the timezone
otp.config['default_db'] = 'DEMO_L1'
otp.config['default_symbol'] = 'AAPL'

# define some parameters to re-use
start = otp.dt(2022, 3, 2, 10)
end = otp.dt(2022, 3, 2, 10, 5)
emini = 'NQ\H22' #symbol for Nasdaq-100 E-mini future

print(otp.__version__)
1.58.0
from datetime import datetime
otp.config['tz'] = 'EST5EDT'
otp.config['default_db'] = 'DEMO_L1'
otp.config['default_symbol'] = 'AAPL'
otp.config['default_start_time'] = datetime(2003, 12, 1, 0, 0, 0)
otp.config['default_end_time'] = datetime(2003, 12, 4, 0, 0, 0)

Retrieving data#

Market data is time series data: i.e., each record or “tick” has a timestamp and ticks are ordered by timestamp. Our focus will be on trades, quotes, and order book data. We choose which type of data we look at by specifing the tick type: TRD for trades, QTE for quotes, and PRL for order books. The data is organized by the database, tick type, and security.

# create a query to retrieve data from the TRD tick type of the CME OneTick database
trd = otp.DataSource('CME', tick_type='TRD')
print(trd)
<onetick.py.sources.DataSource object at 0x7f7900545c40>

Observe that the output is a onetick-py DataSource object rather than the data. To retrieve the data into a pandas DataFrame, we invoke the otp.run method while providing it the time range and the symbols (instead otp.run(), we can use the shorthand trd.to_df() or just trd()).

trd_df = otp.run(trd, start=otp.dt(2022, 3, 2, 10), end=otp.dt(2022, 3, 2, 10, 1), symbols='NQ\H22')
print(trd_df)
                              Time           EXCH_TIME     PRICE  SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE   TRADE_ID   BUY_ORDER_ID  SELL_ORDER_ID        DELETED_TIME  TICK_STATUS  OMDSEQ
0    2022-03-02 10:00:00.000789627 1969-12-31 19:00:00  14074.75     1              S          -            -         0  236995996  6830860875515  6830860875655 1969-12-31 19:00:00            0       7
1    2022-03-02 10:00:00.000929635 1969-12-31 19:00:00  14075.75     1              B          -            -         0  236996008  6830860875705  6830860875691 1969-12-31 19:00:00            0      25
2    2022-03-02 10:00:00.000929635 1969-12-31 19:00:00  14076.00     1              B          -            -         0  236996009  6830860875705  6830860875569 1969-12-31 19:00:00            0      26
3    2022-03-02 10:00:00.000929635 1969-12-31 19:00:00  14076.00     1              B          -            -         0  236996009  6830860875705  6830860875694 1969-12-31 19:00:00            0      27
4    2022-03-02 10:00:00.002409827 1969-12-31 19:00:00  14075.75     1              B          -            -         0  236996017  6830860875820  6830860875752 1969-12-31 19:00:00            0       6
...                            ...                 ...       ...   ...            ...        ...          ...       ...        ...            ...            ...                 ...          ...     ...
1295 2022-03-02 10:00:59.721897255 1969-12-31 19:00:00  14078.50     1              B          -            -         0  237002551  6830861044714  6830861044694 1969-12-31 19:00:00            0      26
1296 2022-03-02 10:00:59.765659765 1969-12-31 19:00:00  14078.75     1              B          -            -         0  237002557  6830861044831  6830861044659 1969-12-31 19:00:00            0       0
1297 2022-03-02 10:00:59.765659765 1969-12-31 19:00:00  14078.75     1              B          -            -         0  237002557  6830861044831  6830861044742 1969-12-31 19:00:00            0       1
1298 2022-03-02 10:00:59.797208925 1969-12-31 19:00:00  14077.75     1              S          -            -         0  237002575  6830861044437  6830861045124 1969-12-31 19:00:00            0       0
1299 2022-03-02 10:00:59.797208925 1969-12-31 19:00:00  14077.75     1              S          -            -         0  237002575  6830861044482  6830861045124 1969-12-31 19:00:00            0       1

[1300 rows x 14 columns]

We would rarely want to just retrieve market data (except for plotting it). Typical use cases involve computing analytics based on the tick-by-tick data. This is what onetick-py makes easy to do while keeping all of the processing in the highly optimized, parallelizable, and scalable tick management platform OneTick. The results of the analytics are returned into Python when the query is executed. We cover a few standard use cases next.

Create minute bars from tick data#

The code below prepares a query (a onetick-py DataSource object) to compute minute bars.

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
bars = trd.agg({'VOLUME': otp.agg.sum('SIZE'),
               'HIGH': otp.agg.max('PRICE'),
               'LOW': otp.agg.min('PRICE'),
               'OPEN': otp.agg.first('PRICE'),
                'COUNT': otp.agg.count(),
               'CLOSE': otp.agg.last('PRICE')},
               bucket_interval=60)
print(bars)
<onetick.py.sources.DataSource object at 0x7f7885eade20>

Only the bars, not the tick-by-tick trades used to compute them, are passed to Python when the query is executed.

bars_df = otp.run(bars, start=otp.dt(2022,11,28,9,30), end=otp.dt(2022,11,30,16), symbols=['AA'], apply_times_daily=True)
bars_df
Time VOLUME HIGH LOW OPEN COUNT CLOSE
0 2022-11-28 09:31:00 91637 47.1100 46.700 46.9200 209 46.7160
1 2022-11-28 09:32:00 29902 46.7899 46.411 46.7899 381 46.4900
2 2022-11-28 09:33:00 26588 46.6600 46.460 46.4900 227 46.6400
3 2022-11-28 09:34:00 14388 46.8600 46.560 46.5900 136 46.6850
4 2022-11-28 09:35:00 14871 46.8300 46.570 46.7100 153 46.7943
... ... ... ... ... ... ... ...
1165 2022-11-30 15:56:00 54586 50.1700 50.000 50.1300 730 50.0500
1166 2022-11-30 15:57:00 72734 50.0679 50.010 50.0550 792 50.0200
1167 2022-11-30 15:58:00 54803 50.0800 50.020 50.0200 706 50.0700
1168 2022-11-30 15:59:00 88486 50.0900 50.040 50.0800 920 50.0600
1169 2022-11-30 16:00:00 134670 50.1800 50.060 50.0600 929 50.1400

1170 rows × 7 columns

Note: We have minute bars precomputed and available in, e.g., NYSE_TAQ_BARS under tick type TRD_1M. Daily OHLCV data is also available.

Quick tip: you can display documentation for any object by typing “?” after it.

otp.math.exp?

Filters#

Filters are specified the same way as in pandas DataFrames with one caveat. Two objects are returned instead of one: one for to the ticks that pass the filter and the other for the ticks that don’t (which you can store in _ if you don’t need it). We show an example of how to use both objects when we cover advanced concepts below.

trd = otp.DataSource('CME', tick_type='TRD')
trd, _ = trd[(trd['TRADE_PERIOD']!='A') & (trd['TRADE_TYPE'] != 'LEG')] # keep only non-leg and non-auction trades
# trd = trd.first(5) # retrieve the first 5 records only
print(trd) # no data retrieval has been done yet
print(otp.run(trd, start=start, end=end, symbols=emini))
<onetick.py.sources.DataSource object at 0x7f7885ead370>
                              Time           EXCH_TIME     PRICE  SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE   TRADE_ID   BUY_ORDER_ID  SELL_ORDER_ID        DELETED_TIME  TICK_STATUS  OMDSEQ
0    2022-03-02 10:00:00.000789627 1969-12-31 19:00:00  14074.75     1              S          -            -         0  236995996  6830860875515  6830860875655 1969-12-31 19:00:00            0       7
1    2022-03-02 10:00:00.000929635 1969-12-31 19:00:00  14075.75     1              B          -            -         0  236996008  6830860875705  6830860875691 1969-12-31 19:00:00            0      25
2    2022-03-02 10:00:00.000929635 1969-12-31 19:00:00  14076.00     1              B          -            -         0  236996009  6830860875705  6830860875569 1969-12-31 19:00:00            0      26
3    2022-03-02 10:00:00.000929635 1969-12-31 19:00:00  14076.00     1              B          -            -         0  236996009  6830860875705  6830860875694 1969-12-31 19:00:00            0      27
4    2022-03-02 10:00:00.002409827 1969-12-31 19:00:00  14075.75     1              B          -            -         0  236996017  6830860875820  6830860875752 1969-12-31 19:00:00            0       6
...                            ...                 ...       ...   ...            ...        ...          ...       ...        ...            ...            ...                 ...          ...     ...
7061 2022-03-02 10:04:59.555729645 1969-12-31 19:00:00  14097.50     1              B          -            -         0  237027634  6830861638832  6830861637468 1969-12-31 19:00:00            0       1
7062 2022-03-02 10:04:59.659116873 1969-12-31 19:00:00  14097.50     1              B          -            -         0  237027642  6830861638919  6830861638796 1969-12-31 19:00:00            0       2
7063 2022-03-02 10:04:59.819164663 1969-12-31 19:00:00  14097.50     1              B          -            -         0  237027658  6830861621964  6830861639003 1969-12-31 19:00:00            0       0
7064 2022-03-02 10:04:59.839107289 1969-12-31 19:00:00  14097.00     1              S          -            -         0  237027661  6830861639033  6830861639088 1969-12-31 19:00:00            0       0
7065 2022-03-02 10:04:59.885827259 1969-12-31 19:00:00  14096.75     1              S          -            -         0  237027668  6830861639191  6830861639214 1969-12-31 19:00:00            0       0

[7066 rows x 14 columns]
Note: All code before `otp.run()` is executed in OneTick rather than in pandas/Python.

Prevailing quote at the time of a trade#

onetick-py provides some functions that are specific to time series analysis and do not exist in pandas. Joining two time series by time is one of such functions that makes finding the prevailing quote at the time of each trade a one-liner.

trd = otp.DataSource('CME', tick_type='TRD')
trd = trd[['PRICE', 'SIZE', 'TRADE_TYPE']] # just need these fields

qte = otp.DataSource('CME', tick_type='QTE')
qte['quote_time'] = qte['Time']
qte = qte[['ASK_PRICE', 'BID_PRICE', 'quote_time']] # just need these fields

# the first argument in the list (`trd` in this example) is the "leading" source
# all of the ticks from the leading source are enriched with the closest preceding 
# record from the non-leading source(s) (`qte` in this example).
enriched_trades = otp.join_by_time([trd, qte])

enriched_trades = enriched_trades.first(5)

otp.run(enriched_trades, start=start, end=end, symbols=emini)
Time PRICE SIZE TRADE_TYPE ASK_PRICE BID_PRICE quote_time
0 2022-03-02 10:00:00.000789627 14074.75 1 - 14076.00 14074.75 2022-03-02 10:00:00.000788627
1 2022-03-02 10:00:00.000929635 14075.75 1 - 14075.75 14074.25 2022-03-02 10:00:00.000860873
2 2022-03-02 10:00:00.000929635 14076.00 1 - 14075.75 14074.25 2022-03-02 10:00:00.000860873
3 2022-03-02 10:00:00.000929635 14076.00 1 - 14075.75 14074.25 2022-03-02 10:00:00.000860873
4 2022-03-02 10:00:00.002409827 14075.75 1 - 14075.75 14075.00 2022-03-02 10:00:00.001767875

Interval VWAP between order arrival and exit#

Common TCA/BestEx use cases involve getting interval benchmarks such as market VWAP and market volume for each order. First we show how to compute the market volume and the VWAP for a single interval.

q = otp.DataSource('CME', tick_type='TRD')
q = q.agg({
'VOLUME': otp.agg.sum('SIZE'),
'VWAP': otp.agg.vwap('PRICE', 'SIZE')
})
otp.run(q, start=start, end=end, symbols=emini)
Time VOLUME VWAP
0 2022-03-02 10:05:00 7520 14080.28765

Next we compute interval metrics for each of the orders arrival/exit intervals.

First we generate a simulated set of orders. Note that the orders/executions can come from any source (e.g., CSV file, FIX log file, database, Python object, OneTick database, etc).

num_orders = 20
arrival_time = [otp.dt(2022, 3, 2, 10) + otp.Milli(x*500) for x in range(0, num_orders)]
exit_time = [x + otp.Milli(random.randint(100,10000)) for x in arrival_time]
symbols = ['NQ\H22', 'ES\H22']

order_df = pd.DataFrame({
    'arrival_time': arrival_time, 
    'exit_time': exit_time, 
    'ID': [x for x in range(0, num_orders)],
    'SYMBOL_NAME': [random.choice(symbols) for x in range(0, num_orders)],
})
order_df
arrival_time exit_time ID SYMBOL_NAME
0 2022-03-02 10:00:00 2022-03-02 10:00:04.704000 0 NQ\H22
1 2022-03-02 10:00:00.500000 2022-03-02 10:00:01.364000 1 NQ\H22
2 2022-03-02 10:00:01 2022-03-02 10:00:10.525000 2 ES\H22
3 2022-03-02 10:00:01.500000 2022-03-02 10:00:01.823000 3 ES\H22
4 2022-03-02 10:00:02 2022-03-02 10:00:04.982000 4 ES\H22
5 2022-03-02 10:00:02.500000 2022-03-02 10:00:08.949000 5 NQ\H22
6 2022-03-02 10:00:03 2022-03-02 10:00:05.622000 6 ES\H22
7 2022-03-02 10:00:03.500000 2022-03-02 10:00:05.009000 7 NQ\H22
8 2022-03-02 10:00:04 2022-03-02 10:00:09.621000 8 NQ\H22
9 2022-03-02 10:00:04.500000 2022-03-02 10:00:13.177000 9 NQ\H22
10 2022-03-02 10:00:05 2022-03-02 10:00:11.956000 10 ES\H22
11 2022-03-02 10:00:05.500000 2022-03-02 10:00:06.392000 11 NQ\H22
12 2022-03-02 10:00:06 2022-03-02 10:00:15.282000 12 NQ\H22
13 2022-03-02 10:00:06.500000 2022-03-02 10:00:07.192000 13 NQ\H22
14 2022-03-02 10:00:07 2022-03-02 10:00:13.364000 14 ES\H22
15 2022-03-02 10:00:07.500000 2022-03-02 10:00:14.606000 15 ES\H22
16 2022-03-02 10:00:08 2022-03-02 10:00:10.651000 16 NQ\H22
17 2022-03-02 10:00:08.500000 2022-03-02 10:00:13.265000 17 NQ\H22
18 2022-03-02 10:00:09 2022-03-02 10:00:09.107000 18 ES\H22
19 2022-03-02 10:00:09.500000 2022-03-02 10:00:09.635000 19 NQ\H22

Set up the fields that will enable us to query market trades for symbol and the arrival/exit interval of each order in parallel.

# convert the DataFrame with orders into a onetick-py object to enable calls to onetick-py functions
order = otp.Ticks(order_df.to_dict(orient='list'))

order['_PARAM_START_TIME_NANOS'] = order['arrival_time'] # this will be the start time of the query
order['_PARAM_END_TIME_NANOS'] = order['exit_time'] # this will be the end time of the query
# another relevant field is SYMBOL_NAME (already specified) which will be used as the symbol for the query

# You can see the output by running the line below
# otp.run(order, symbols='any', start=otp.dt(2022,12,1), end=otp.dt(2022,12,2))

We can now define a function to compute VWAP and volume for each order. The function takes an argument (symbol) that is passed automatically when and the order specific info we defined above.

def market_volume_and_vwap(symbol):
    trd = otp.DataSource('CME', tick_type='TRD')
    vol_vwap = trd.agg({
    'VOLUME': otp.agg.sum('SIZE'),
    'VWAP': otp.agg.vwap('PRICE', 'SIZE')
    }, bucket_time='start')
    vol_vwap['ID'] = symbol['ID']
    vol_vwap['exit_time'] = symbol['exit_time']
    vol_vwap['sym'] = symbol.name # SYMBOL_NAME is stored in a special attribute '.name'
    
    return vol_vwap

market_volume_and_vwap can now be executed for each order. The computation is parallelized by symbol. (We’ll explain otp.merge below).

merged = otp.merge([market_volume_and_vwap], symbols=order, identify_input_ts=True)
# merged.to_otq('missing_symbol.otq')
otp.run(merged)
# merged.render()
Time VOLUME VWAP ID exit_time sym SYMBOL_NAME TICK_TYPE
0 2022-03-02 10:00:00.000 221 14079.695701 0 2022-03-02 10:00:04.704 NQ\H22 NQ\H22 TRD
1 2022-03-02 10:00:00.500 55 14082.763636 1 2022-03-02 10:00:01.364 NQ\H22 NQ\H22 TRD
2 2022-03-02 10:00:01.000 594 4341.052698 2 2022-03-02 10:00:10.525 ES\H22 ES\H22 TRD
3 2022-03-02 10:00:01.500 72 4340.829861 3 2022-03-02 10:00:01.823 ES\H22 ES\H22 TRD
4 2022-03-02 10:00:02.000 187 4340.709677 4 2022-03-02 10:00:04.982 ES\H22 ES\H22 TRD
5 2022-03-02 10:00:02.500 132 14079.817829 5 2022-03-02 10:00:08.949 NQ\H22 NQ\H22 TRD
6 2022-03-02 10:00:03.000 132 4340.843511 6 2022-03-02 10:00:05.622 ES\H22 ES\H22 TRD
7 2022-03-02 10:00:03.500 34 14078.786765 7 2022-03-02 10:00:05.009 NQ\H22 NQ\H22 TRD
8 2022-03-02 10:00:04.000 108 14079.523810 8 2022-03-02 10:00:09.621 NQ\H22 NQ\H22 TRD
9 2022-03-02 10:00:04.500 185 14078.752747 9 2022-03-02 10:00:13.177 NQ\H22 NQ\H22 TRD
10 2022-03-02 10:00:05.000 350 4341.183571 10 2022-03-02 10:00:11.956 ES\H22 ES\H22 TRD
11 2022-03-02 10:00:05.500 22 14081.284091 11 2022-03-02 10:00:06.392 NQ\H22 NQ\H22 TRD
12 2022-03-02 10:00:06.000 223 14076.611364 12 2022-03-02 10:00:15.282 NQ\H22 NQ\H22 TRD
13 2022-03-02 10:00:06.500 8 14079.468750 13 2022-03-02 10:00:07.192 NQ\H22 NQ\H22 TRD
14 2022-03-02 10:00:07.000 267 4341.014981 14 2022-03-02 10:00:13.364 ES\H22 ES\H22 TRD
15 2022-03-02 10:00:07.500 331 4340.860272 15 2022-03-02 10:00:14.606 ES\H22 ES\H22 TRD
16 2022-03-02 10:00:08.000 70 14078.871429 16 2022-03-02 10:00:10.651 NQ\H22 NQ\H22 TRD
17 2022-03-02 10:00:08.500 96 14077.903646 17 2022-03-02 10:00:13.265 NQ\H22 NQ\H22 TRD
18 2022-03-02 10:00:09.000 11 4340.886364 18 2022-03-02 10:00:09.107 ES\H22 ES\H22 TRD
19 2022-03-02 10:00:09.500 2 14079.375000 19 2022-03-02 10:00:09.635 NQ\H22 NQ\H22 TRD

Point-in-time benchmarks: BBO at a given time#

Another common use case is find the prevailing quote (best bid/ask) at the time of an execution or at different time intervals (markouts) before/after the execution.

The following snippet find the prevailing quote at a give time.

q = otp.DataSource('CME', tick_type='QTE', back_to_first_tick=600, keep_first_tick_timestamp='BBO_TIME')
q = q[['ASK_PRICE','BID_PRICE', 'BBO_TIME']]
time = otp.dt(2022,3,1,19,25,48,123456)
otp.run(q, start=time, end=time, symbols=emini)
Time ASK_PRICE BID_PRICE BBO_TIME
0 2022-03-01 19:25:48.123456 14058.0 14056.25 2022-03-01 19:25:47.266282997

Point-in-time benchmarks: BBO at different offsets (markouts)#

We’ll find the BBO at different offets in seconds around the time of each execution.

We create a simulated set of executions as we did above for orders.

num_execs = 5
exec_time = [otp.dt(2022, 3, 2, 10) + otp.Milli(x*500) for x in range(0, num_execs)]
symbols = ['NQ\H22', 'ES\H22']

exec_df = pd.DataFrame({
    'exex_time': exec_time, 
    'ID': [x for x in range(0, num_execs)],
    'SYMBOL_NAME': [random.choice(symbols) for x in range(0, num_execs)],
})
exec_df
exex_time ID SYMBOL_NAME
0 2022-03-02 10:00:00 0 NQ\H22
1 2022-03-02 10:00:00.500000 1 NQ\H22
2 2022-03-02 10:00:01 2 NQ\H22
3 2022-03-02 10:00:01.500000 3 NQ\H22
4 2022-03-02 10:00:02 4 NQ\H22

Create a separate record for each markout with the corresponding timestamp.

markouts = [-.5, 0, .5, 1, 5, 10] # in seconds: e.g., .5 means half a second after the execution

executions_with_markouts_df = pd.DataFrame({
 'markout_time': [x + otp.Milli(int(m*1000)) for x in exec_time for m in markouts], 
 'ID': [x for x in range(0, num_execs) for m in markouts],
 'SYMBOL_NAME': list(itertools.chain(*[[random.choice(symbols)] * len(markouts) for x in range(0, num_execs)])),
 'markout': [m for x in range(0, num_execs) for m in markouts],
})

executions_with_markouts_df
markout_time ID SYMBOL_NAME markout
0 2022-03-02 09:59:59.500000 0 NQ\H22 -0.5
1 2022-03-02 10:00:00 0 NQ\H22 0.0
2 2022-03-02 10:00:00.500000 0 NQ\H22 0.5
3 2022-03-02 10:00:01 0 NQ\H22 1.0
4 2022-03-02 10:00:05 0 NQ\H22 5.0
5 2022-03-02 10:00:10 0 NQ\H22 10.0
6 2022-03-02 10:00:00 1 ES\H22 -0.5
7 2022-03-02 10:00:00.500000 1 ES\H22 0.0
8 2022-03-02 10:00:01 1 ES\H22 0.5
9 2022-03-02 10:00:01.500000 1 ES\H22 1.0
10 2022-03-02 10:00:05.500000 1 ES\H22 5.0
11 2022-03-02 10:00:10.500000 1 ES\H22 10.0
12 2022-03-02 10:00:00.500000 2 NQ\H22 -0.5
13 2022-03-02 10:00:01 2 NQ\H22 0.0
14 2022-03-02 10:00:01.500000 2 NQ\H22 0.5
15 2022-03-02 10:00:02 2 NQ\H22 1.0
16 2022-03-02 10:00:06 2 NQ\H22 5.0
17 2022-03-02 10:00:11 2 NQ\H22 10.0
18 2022-03-02 10:00:01 3 NQ\H22 -0.5
19 2022-03-02 10:00:01.500000 3 NQ\H22 0.0
20 2022-03-02 10:00:02 3 NQ\H22 0.5
21 2022-03-02 10:00:02.500000 3 NQ\H22 1.0
22 2022-03-02 10:00:06.500000 3 NQ\H22 5.0
23 2022-03-02 10:00:11.500000 3 NQ\H22 10.0
24 2022-03-02 10:00:01.500000 4 NQ\H22 -0.5
25 2022-03-02 10:00:02 4 NQ\H22 0.0
26 2022-03-02 10:00:02.500000 4 NQ\H22 0.5
27 2022-03-02 10:00:03 4 NQ\H22 1.0
28 2022-03-02 10:00:07 4 NQ\H22 5.0
29 2022-03-02 10:00:12 4 NQ\H22 10.0

Convert DataFrame to a onetick-py object and specify start/end times. Note that for the markouts query, the start and end times are the same as we’ll look backwards from the markout timestamp.

executions_with_markouts = otp.Ticks(executions_with_markouts_df.to_dict(orient='list'))
executions_with_markouts['_PARAM_START_TIME_NANOS'] = executions_with_markouts['markout_time']
executions_with_markouts['_PARAM_END_TIME_NANOS'] = executions_with_markouts['markout_time']

Define a function that looks back from the start of the query until it finds the first tick. Execute the function for each execution/markout.

def bbo_at_time(symbol):
    q = otp.DataSource('CME', tick_type='QTE', back_to_first_tick=600, keep_first_tick_timestamp='BBO_TIME')
    q = q[['ASK_PRICE','BID_PRICE', 'BBO_TIME']]
    q['ID'] = symbol['ID']
    q['markout'] = symbol['markout']
    q['sym'] = symbol.name
    return q

merged = otp.merge([bbo_at_time], symbols=executions_with_markouts, identify_input_ts=True)
otp.run(merged)
Time ASK_PRICE BID_PRICE BBO_TIME ID markout sym SYMBOL_NAME TICK_TYPE
0 2022-03-02 09:59:59.500 14074.00 14073.25 2022-03-02 09:59:59.493561793 0 -0.5 NQ\H22 NQ\H22 QTE
1 2022-03-02 10:00:00.000 14076.00 14075.25 2022-03-02 09:59:59.992733435 0 0.0 NQ\H22 NQ\H22 QTE
2 2022-03-02 10:00:00.000 4339.50 4339.25 2022-03-02 09:59:59.953397647 1 -0.5 ES\H22 ES\H22 QTE
3 2022-03-02 10:00:00.500 14077.75 14076.75 2022-03-02 10:00:00.485313263 0 0.5 NQ\H22 NQ\H22 QTE
4 2022-03-02 10:00:00.500 4340.25 4339.75 2022-03-02 10:00:00.483290781 1 0.0 ES\H22 ES\H22 QTE
5 2022-03-02 10:00:00.500 14077.75 14076.75 2022-03-02 10:00:00.485313263 2 -0.5 NQ\H22 NQ\H22 QTE
6 2022-03-02 10:00:01.000 14084.75 14084.00 2022-03-02 10:00:00.991406067 0 1.0 NQ\H22 NQ\H22 QTE
7 2022-03-02 10:00:01.000 4342.00 4341.75 2022-03-02 10:00:00.996314127 1 0.5 ES\H22 ES\H22 QTE
8 2022-03-02 10:00:01.000 14084.75 14084.00 2022-03-02 10:00:00.991406067 2 0.0 NQ\H22 NQ\H22 QTE
9 2022-03-02 10:00:01.000 14084.75 14084.00 2022-03-02 10:00:00.991406067 3 -0.5 NQ\H22 NQ\H22 QTE
10 2022-03-02 10:00:01.500 4342.00 4341.50 2022-03-02 10:00:01.466008427 1 1.0 ES\H22 ES\H22 QTE
11 2022-03-02 10:00:01.500 14084.75 14083.75 2022-03-02 10:00:01.494346505 2 0.5 NQ\H22 NQ\H22 QTE
12 2022-03-02 10:00:01.500 14084.75 14083.75 2022-03-02 10:00:01.494346505 3 0.0 NQ\H22 NQ\H22 QTE
13 2022-03-02 10:00:01.500 14084.75 14083.75 2022-03-02 10:00:01.494346505 4 -0.5 NQ\H22 NQ\H22 QTE
14 2022-03-02 10:00:02.000 14081.25 14080.25 2022-03-02 10:00:01.972081949 2 1.0 NQ\H22 NQ\H22 QTE
15 2022-03-02 10:00:02.000 14081.25 14080.25 2022-03-02 10:00:01.972081949 3 0.5 NQ\H22 NQ\H22 QTE
16 2022-03-02 10:00:02.000 14081.25 14080.25 2022-03-02 10:00:01.972081949 4 0.0 NQ\H22 NQ\H22 QTE
17 2022-03-02 10:00:02.500 14079.00 14078.25 2022-03-02 10:00:02.499183309 3 1.0 NQ\H22 NQ\H22 QTE
18 2022-03-02 10:00:02.500 14079.00 14078.25 2022-03-02 10:00:02.499183309 4 0.5 NQ\H22 NQ\H22 QTE
19 2022-03-02 10:00:03.000 14082.00 14081.25 2022-03-02 10:00:02.989870913 4 1.0 NQ\H22 NQ\H22 QTE
20 2022-03-02 10:00:05.000 14079.75 14079.00 2022-03-02 10:00:04.972474913 0 5.0 NQ\H22 NQ\H22 QTE
21 2022-03-02 10:00:05.500 4341.50 4341.25 2022-03-02 10:00:05.492164737 1 5.0 ES\H22 ES\H22 QTE
22 2022-03-02 10:00:06.000 14081.50 14081.00 2022-03-02 10:00:05.999545921 2 5.0 NQ\H22 NQ\H22 QTE
23 2022-03-02 10:00:06.500 14080.00 14079.25 2022-03-02 10:00:06.499182037 3 5.0 NQ\H22 NQ\H22 QTE
24 2022-03-02 10:00:07.000 14079.25 14078.75 2022-03-02 10:00:06.893076527 4 5.0 NQ\H22 NQ\H22 QTE
25 2022-03-02 10:00:10.000 14080.75 14079.75 2022-03-02 10:00:09.998393413 0 10.0 NQ\H22 NQ\H22 QTE
26 2022-03-02 10:00:10.500 4341.00 4340.50 2022-03-02 10:00:10.487766465 1 10.0 ES\H22 ES\H22 QTE
27 2022-03-02 10:00:11.000 14076.25 14075.50 2022-03-02 10:00:10.991862443 2 10.0 NQ\H22 NQ\H22 QTE
28 2022-03-02 10:00:11.500 14077.50 14076.75 2022-03-02 10:00:11.477362175 3 10.0 NQ\H22 NQ\H22 QTE
29 2022-03-02 10:00:12.000 14077.25 14076.50 2022-03-02 10:00:11.971269289 4 10.0 NQ\H22 NQ\H22 QTE

Upticks/Downticks#

You can mark each trade as an uptick or downtick.

trd = otp.DataSource('CME', tick_type='TRD')

def uptick(row):
    if row['PRICE'] == otp.nan or row['PRICE'][-1] == otp.nan:
        return otp.nan
    if row['PRICE'] > row['PRICE'][-1]:
        return 1
    elif row['PRICE'] < row['PRICE'][-1]:
        return -1
    else:
        return 0

trd['UPTICK'] = trd.apply(uptick)
trd = trd[['PRICE', 'UPTICK']]
trd = trd.first(5)
otp.run(trd, start=start, end=end, symbols=emini)
Time PRICE UPTICK
0 2022-03-02 10:00:00.000789627 14074.75 NaN
1 2022-03-02 10:00:00.000929635 14075.75 1.0
2 2022-03-02 10:00:00.000929635 14076.00 1.0
3 2022-03-02 10:00:00.000929635 14076.00 0.0
4 2022-03-02 10:00:00.002409827 14075.75 -1.0

Symbol Operations#

Quering multiple symbols#

A separate data series for each symbol#

symbols=['AAPL','MSFT'] # sybmols can be specified as a list
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE']]
trd = trd.first(5)
trd_df = otp.run(trd, symbols=symbols, concurrency=8, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))

The result is a dict keyed by symbol.

trd_df
{'AAPL':                            Time   PRICE  SIZE EXCHANGE
 0 2022-03-01 10:00:00.003529175  165.75   100        Q
 1 2022-03-01 10:00:00.003530928  165.75   100        Q
 2 2022-03-01 10:00:00.003532720  165.75   100        Q
 3 2022-03-01 10:00:00.003676379  165.75   100        Z
 4 2022-03-01 10:00:00.004032611  165.75   104        H,
 'MSFT':                            Time   PRICE  SIZE EXCHANGE
 0 2022-03-01 10:00:00.003577531  298.53    50        Q
 1 2022-03-01 10:00:00.012147126  298.53    60        K
 2 2022-03-01 10:00:00.012251683  298.53    40        K
 3 2022-03-01 10:00:00.012524905  298.54    27        P
 4 2022-03-01 10:00:00.012602204  298.54     2        P}
trd_df['MSFT']
Time PRICE SIZE EXCHANGE
0 2022-03-01 10:00:00.003577531 298.53 50 Q
1 2022-03-01 10:00:00.012147126 298.53 60 K
2 2022-03-01 10:00:00.012251683 298.53 40 K
3 2022-03-01 10:00:00.012524905 298.54 27 P
4 2022-03-01 10:00:00.012602204 298.54 2 P

Ticks from multiple symbols merged into a single data series#

otp.merge creates a single series ordered by time by merging ticks from per-symbol series.

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
trd = trd.first(5)
merged = otp.merge([trd], symbols=['AAPL','MSFT'], identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time PRICE SIZE EXCHANGE COND SYMBOL_NAME TICK_TYPE
0 2022-03-01 10:00:00.003529175 165.75 100 Q @ AAPL TRD
1 2022-03-01 10:00:00.003530928 165.75 100 Q @ AAPL TRD
2 2022-03-01 10:00:00.003532720 165.75 100 Q @ AAPL TRD
3 2022-03-01 10:00:00.003577531 298.53 50 Q @ I MSFT TRD
4 2022-03-01 10:00:00.003676379 165.75 100 Z @ AAPL TRD
5 2022-03-01 10:00:00.004032611 165.75 104 H @F AAPL TRD
6 2022-03-01 10:00:00.012147126 298.53 60 K @F I MSFT TRD
7 2022-03-01 10:00:00.012251683 298.53 40 K @ I MSFT TRD
8 2022-03-01 10:00:00.012524905 298.54 27 P @F I MSFT TRD
9 2022-03-01 10:00:00.012602204 298.54 2 P @F I MSFT TRD

Symbols can come from different databases, in which case, the database name and ‘::’ would appear before the symbol name. The schema must be speficied explicitly in this case.

trd = otp.DataSource(tick_type='TRD')
trd = trd.table(PRICE=float, SIZE=int, EXCH_TIME=otp.nsectime, EXCHANGE=otp.string[8])
trd = trd.first(5)
merged = otp.merge([trd], symbols=['IDC1075::FL','MS113::FL.113'], identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time PRICE SIZE EXCH_TIME EXCHANGE SYMBOL_NAME TICK_TYPE
0 2022-03-01 10:01:27.612 2.83 2000 2022-03-01 10:01:27.582372000 ALP IDC1075::FL TRD
1 2022-03-01 10:02:00.803 2.83 100 2022-03-01 10:02:00.803000000 CHI IDC1075::FL TRD
2 2022-03-01 10:02:00.803 2.83 600 2022-03-01 10:02:00.803162000 CDX IDC1075::FL TRD
3 2022-03-01 10:02:00.914 2.83 600 2022-03-01 10:02:00.803162203 MS113::FL.113 TRD
4 2022-03-01 10:07:05.436 2.84 3 2022-03-01 10:07:05.436496000 TCM IDC1075::FL TRD
5 2022-03-01 10:08:49.335 2.84 100 2022-03-01 10:08:49.422834453 MS113::FL.113 TRD
6 2022-03-01 10:08:49.410 2.84 55 2022-03-01 10:08:49.408000000 PUR IDC1075::FL TRD
7 2022-03-01 10:09:31.254 2.83 300 2022-03-01 10:09:31.435787439 MS113::FL.113 TRD
8 2022-03-01 10:09:31.254 2.83 100 2022-03-01 10:09:31.435920844 MS113::FL.113 TRD
9 2022-03-01 10:09:31.254 2.83 100 2022-03-01 10:09:31.435920850 MS113::FL.113 TRD

Get all symbols from a database#

symbols = otp.Symbols(db='NYSE_TAQ', date=otp.dt(2022, 3, 1))
otp.run(symbols)
Time SYMBOL_NAME
0 2022-03-01 A
1 2022-03-01 AA
2 2022-03-01 AAA
3 2022-03-01 AAAU
4 2022-03-01 AAC
... ... ...
12041 2022-03-01 ZXZZT
12042 2022-03-01 ZY
12043 2022-03-01 ZYME
12044 2022-03-01 ZYNE
12045 2022-03-01 ZYXI

12046 rows × 2 columns

Get all symbols that match a pattern#

symbols = otp.Symbols(db=['NYSE_TAQ'], date=otp.dt(2022, 3, 1), pattern='BBA%')
otp.run(symbols)
Time SYMBOL_NAME
0 2022-03-01 BBAI
1 2022-03-01 BBAIWS
2 2022-03-01 BBAR
3 2022-03-01 BBAX

Get trades for all symbols (or the ones matching a pattern) from a DB#

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
merged = otp.merge([trd], symbols=otp.Symbols('NYSE_TAQ', pattern='BBA%'), identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time PRICE SIZE EXCHANGE COND SYMBOL_NAME TICK_TYPE
0 2022-03-01 10:00:01.586188800 3.1669 5 D I BBAR TRD
1 2022-03-01 10:00:06.961051648 5.8367 25 D I BBAI TRD
2 2022-03-01 10:00:13.122142720 3.1700 200 N BBAR TRD
3 2022-03-01 10:00:13.122561280 3.1700 2136 K BBAR TRD
4 2022-03-01 10:00:13.122561280 3.1700 218 K BBAR TRD
... ... ... ... ... ... ... ...
275 2022-03-01 10:14:11.723998464 52.4100 1 Z I BBAX TRD
276 2022-03-01 10:14:11.723998464 52.4100 38 B F I BBAX TRD
277 2022-03-01 10:14:11.724048640 52.4100 38 B F I BBAX TRD
278 2022-03-01 10:14:44.295828480 5.7900 12 J F I BBAI TRD
279 2022-03-01 10:14:44.295892992 5.7900 88 J F I BBAI TRD

280 rows × 7 columns

Count the number of trades per symbol#

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
otp.run(count, concurrency=8, start=otp.dt(2022,3,1), end=otp.dt(2022,3,2), symbols=otp.Symbols(db='NYSE_TAQ', pattern='BBA%'))
{'BBAI':         Time  COUNT
 0 2022-03-02   1063,
 'BBAIWS':         Time  COUNT
 0 2022-03-02    137,
 'BBAR':         Time  COUNT
 0 2022-03-02   3281,
 'BBAX':         Time  COUNT
 0 2022-03-02   1115}

Count the number of trades across symbols#

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
count = otp.merge([count], symbols=otp.Symbols(db='NYSE_TAQ', pattern='AA%'), identify_input_ts=True)
total_count = count.agg({'COUNT': otp.agg.sum('COUNT')})
otp.run(total_count, start=otp.dt(2022,3,1), end=otp.dt(2022,3,2))
Time COUNT
0 2022-03-02 1160353

Find the most traded symbols and query them (Multi-stage queries)#

%%time
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
count = otp.merge([count], symbols=otp.Symbols(db='NYSE_TAQ', pattern='AA%'), identify_input_ts=True)
most_traded = count.high('COUNT', n=5)
otp.run(most_traded, start=otp.dt(2022,3,1), end=otp.dt(2022,3,2))
CPU times: user 554 ms, sys: 18.1 ms, total: 572 ms
Wall time: 1.09 s
Time COUNT SYMBOL_NAME TICK_TYPE
0 2022-03-02 749660 AAPL TRD
1 2022-03-02 199756 AAL TRD
2 2022-03-02 131644 AA TRD
3 2022-03-02 23995 AAP TRD
4 2022-03-02 15136 AAWW TRD

Now that we have a query that finds the most traded symbols, we can use it to provide these symbols for querying

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
count = trd.agg({'COUNT': otp.agg.count()})
count = otp.merge([count], symbols=otp.Symbols(db='NYSE_TAQ', pattern='AA%'), identify_input_ts=True)
most_traded = count.high('COUNT', n=5)

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE']]
trd = trd.first(5)
merged = otp.merge([trd], symbols=most_traded, identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
Time PRICE SIZE SYMBOL_NAME TICK_TYPE
0 2022-03-01 10:00:00.002859008 84.030 30 AA TRD
1 2022-03-01 10:00:00.002949376 84.050 100 AA TRD
2 2022-03-01 10:00:00.003290112 84.050 118 AA TRD
3 2022-03-01 10:00:00.003295744 84.060 5 AA TRD
4 2022-03-01 10:00:00.003515648 84.060 70 AA TRD
5 2022-03-01 10:00:00.003529175 165.750 100 AAPL TRD
6 2022-03-01 10:00:00.003530928 165.750 100 AAPL TRD
7 2022-03-01 10:00:00.003532720 165.750 100 AAPL TRD
8 2022-03-01 10:00:00.003676379 165.750 100 AAPL TRD
9 2022-03-01 10:00:00.004032611 165.750 104 AAPL TRD
10 2022-03-01 10:00:00.007610769 78.740 50 AAXJ TRD
11 2022-03-01 10:00:00.007661983 78.750 50 AAXJ TRD
12 2022-03-01 10:00:00.007663857 78.750 8 AAXJ TRD
13 2022-03-01 10:00:00.017791644 16.575 100 AAL TRD
14 2022-03-01 10:00:00.017915297 16.580 100 AAL TRD
15 2022-03-01 10:00:00.017926970 16.580 300 AAL TRD
16 2022-03-01 10:00:00.018955115 16.580 68 AAL TRD
17 2022-03-01 10:00:00.018963402 16.580 232 AAL TRD
18 2022-03-01 10:00:00.059167744 206.190 15 AAP TRD
19 2022-03-01 10:00:00.059167744 206.210 3 AAP TRD
20 2022-03-01 10:00:00.059202816 206.210 15 AAP TRD
21 2022-03-01 10:00:00.059348992 206.080 17 AAP TRD
22 2022-03-01 10:00:00.059398656 206.210 15 AAP TRD
23 2022-03-01 10:00:00.175845646 78.790 3 AAXJ TRD
24 2022-03-01 10:00:00.190534276 78.800 2 AAXJ TRD

Symbologies#

onetick-py can query data using different symbologies.

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
trd = trd.first(1)
merged = otp.merge([trd], symbols=['BTKR::::AAPL US','FGV::::BBG000BPH459'], identify_input_ts=True)
otp.run(merged, start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15), symbol_date=otp.dt(2022, 10, 1))
Time PRICE SIZE EXCHANGE COND SYMBOL_NAME TICK_TYPE
0 2022-03-01 10:00:00.003529175 165.75 100 Q @ BTKR::::AAPL US TRD
1 2022-03-01 10:00:00.003577531 298.53 50 Q @ I FGV::::BBG000BPH459 TRD

You can view and select by pattern symbols in a non-default symbology.

figi = otp.Symbols('NYSE_TAQ', symbology='FGV', show_original_symbols=True, for_tick_type='TRD', date=otp.dt(2022,12,1))
figi, _ = figi[figi['SYMBOL_NAME']!='']
otp.run(figi)
Time SYMBOL_NAME ORIGINAL_SYMBOL_NAME
0 2022-12-01 BBG000C2V3D6 NYSE_TAQ::A
1 2022-12-01 BBG00B3T3HD3 NYSE_TAQ::AA
2 2022-12-01 BBG01B0JRCS6 NYSE_TAQ::AAA
3 2022-12-01 BBG00LPXX872 NYSE_TAQ::AAAU
4 2022-12-01 BBG00YZC2Z91 NYSE_TAQ::AAC
... ... ... ...
10652 2022-12-01 BBG00YZ2VTP7 NYSE_TAQ::ZWRK
10653 2022-12-01 BBG000H8R0N8 NYSE_TAQ::ZWS
10654 2022-12-01 BBG019XSYC89 NYSE_TAQ::ZYME
10655 2022-12-01 BBG007BBS8B7 NYSE_TAQ::ZYNE
10656 2022-12-01 BBG000BJBXZ2 NYSE_TAQ::ZYXI

10657 rows × 3 columns

Order book analytics#

onetick-py offers functions for analyzing tick-by-tick order book. There are three representations of an order book. We’ll show top 3 levels only for ease of exposition.

A book can be displayed with a tick per level per side. We refer to a level in the book as a ‘price level’ or ‘prl’.

snapshot_time=start
prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL', max_levels=3) 
otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time)
Time PRICE UPDATE_TIME SIZE LEVEL BUY_SELL_FLAG
0 2022-03-02 10:00:00 14076.00 2022-03-02 09:59:59.992707035 1 1 1
1 2022-03-02 10:00:00 14076.25 2022-03-02 09:59:59.953516579 2 2 1
2 2022-03-02 10:00:00 14076.50 2022-03-02 09:59:59.893691735 4 3 1
3 2022-03-02 10:00:00 14075.25 2022-03-02 09:59:59.992733435 1 1 0
4 2022-03-02 10:00:00 14075.00 2022-03-02 09:59:59.954720897 2 2 0
5 2022-03-02 10:00:00 14074.75 2022-03-02 09:59:59.948987231 2 3 0

Alternatively, a book can show a tick per level with both ask and bid price/size info.

snapshot_time=start
prl = otp.ObSnapshotWide(db='CME', tick_type='PRL_FULL', max_levels=3)   
otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time)
Time BID_PRICE BID_UPDATE_TIME BID_SIZE ASK_PRICE ASK_UPDATE_TIME ASK_SIZE LEVEL
0 2022-03-02 10:00:00 14075.25 2022-03-02 09:59:59.992733435 1 14076.00 2022-03-02 09:59:59.992707035 1 1
1 2022-03-02 10:00:00 14075.00 2022-03-02 09:59:59.954720897 2 14076.25 2022-03-02 09:59:59.953516579 2 2
2 2022-03-02 10:00:00 14074.75 2022-03-02 09:59:59.948987231 2 14076.50 2022-03-02 09:59:59.893691735 4 3

Finally, all levels can be displayed in one tick.

snapshot_time=start
prl = otp.ObSnapshotFlat(db='CME', tick_type='PRL_FULL', max_levels=3) 
print(otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time))
                 Time  BID_PRICE1              BID_UPDATE_TIME1  BID_SIZE1  ASK_PRICE1              ASK_UPDATE_TIME1  ASK_SIZE1  BID_PRICE2              BID_UPDATE_TIME2  BID_SIZE2  ASK_PRICE2              ASK_UPDATE_TIME2  ASK_SIZE2  BID_PRICE3              BID_UPDATE_TIME3  BID_SIZE3  ASK_PRICE3              ASK_UPDATE_TIME3  ASK_SIZE3
0 2022-03-02 10:00:00    14075.25 2022-03-02 09:59:59.992733435          1     14076.0 2022-03-02 09:59:59.992707035          1     14075.0 2022-03-02 09:59:59.954720897          2    14076.25 2022-03-02 09:59:59.953516579          2    14074.75 2022-03-02 09:59:59.948987231          2     14076.5 2022-03-02 09:59:59.893691735          4

We can output the book (in any of the three representation) on every change to price/size at any of the levels.

prl = otp.ObSnapshotFlat(db='CME', tick_type='PRL_FULL', max_levels=3, running=True)
prl = prl.drop(r".+TIME\d")
print(otp.run(prl, symbols=emini, start=start,  end=start+otp.Milli(100)))
                             Time  BID_PRICE1  BID_SIZE1  ASK_PRICE1  ASK_SIZE1  BID_PRICE2  BID_SIZE2  ASK_PRICE2  ASK_SIZE2  BID_PRICE3  BID_SIZE3  ASK_PRICE3  ASK_SIZE3
0   2022-03-02 10:00:00.000000000    14075.25          1    14076.00          1    14075.00          2    14076.25          2    14074.75          2    14076.50          4
1   2022-03-02 10:00:00.000788455    14075.00          2    14076.00          1    14074.75          2    14076.25          2    14074.50          2    14076.50          4
2   2022-03-02 10:00:00.000788627    14074.75          2    14076.00          1    14074.50          2    14076.25          2    14074.25          2    14076.50          4
3   2022-03-02 10:00:00.000789627    14074.50          2    14076.00          1    14074.25          2    14076.25          2    14074.00          2    14076.50          4
4   2022-03-02 10:00:00.000790993    14074.50          1    14076.00          1    14074.25          2    14076.25          2    14074.00          2    14076.50          4
..                            ...         ...        ...         ...        ...         ...        ...         ...        ...         ...        ...         ...        ...
341 2022-03-02 10:00:00.091403659    14074.50          2    14075.25          1    14074.25          3    14075.50          1    14073.75          2    14075.75          1
342 2022-03-02 10:00:00.091700529    14074.50          2    14075.50          1    14074.25          3    14075.75          1    14073.75          2    14076.00          1
343 2022-03-02 10:00:00.091924533    14074.50          2    14075.50          2    14074.25          3    14075.75          1    14073.75          2    14076.00          1
344 2022-03-02 10:00:00.094879495    14074.50          2    14075.25          1    14074.25          3    14075.50          2    14073.75          2    14075.75          1
345 2022-03-02 10:00:00.097022501    14074.50          2    14075.50          2    14074.25          3    14075.75          1    14073.75          2    14076.00          1

[346 rows x 13 columns]

The ObSnapshot method doesn’t require specifying max_levels. The entire book is returned when the parameter is not specified.

snapshot_time=start
prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL') 
otp.run(prl, symbols=emini, start=snapshot_time, end=snapshot_time)
Time PRICE UPDATE_TIME SIZE LEVEL BUY_SELL_FLAG
0 2022-03-02 10:00:00 14076.00 2022-03-02 09:59:59.992707035 1 1 1
1 2022-03-02 10:00:00 14076.25 2022-03-02 09:59:59.953516579 2 2 1
2 2022-03-02 10:00:00 14076.50 2022-03-02 09:59:59.893691735 4 3 1
3 2022-03-02 10:00:00 14076.75 2022-03-02 09:59:59.893054207 2 4 1
4 2022-03-02 10:00:00 14077.00 2022-03-02 09:59:59.334717127 2 5 1
... ... ... ... ... ... ...
1628 2022-03-02 10:00:00 6490.00 2022-03-01 17:59:59.999000000 3 809 0
1629 2022-03-02 10:00:00 1586.00 2022-03-01 17:59:59.999000000 1 810 0
1630 2022-03-02 10:00:00 786.50 2022-03-01 17:59:59.999000000 1 811 0
1631 2022-03-02 10:00:00 200.00 2022-03-01 17:59:59.999000000 1 812 0
1632 2022-03-02 10:00:00 1.00 2022-03-01 17:59:59.999000000 1 813 0

1633 rows × 6 columns

Book Imbalance#

Let’s find the time weighted book imbalance. The imbalance at a given time is defined as the sum of the bid sizes at the top x levels minus the sum of the ask sizes at the top x levels divided by the sum of these two terms: the values close to 1 mean the book is much heavier on the bid side, close to -1 – on the ask side, equal to zero means the sizes are the same.

x = 3

prl = otp.ObSnapshotWide(db='CME', tick_type='PRL_FULL', max_levels=x, running=True)
prls_df = otp.run(prl, symbols=emini, start=start, end=start+otp.Milli(100))
print(prls_df.head(7))

prl = prl.agg({'ask_vol': otp.agg.sum('ASK_SIZE'), 'bid_vol': otp.agg.sum('BID_SIZE')}, bucket_units='ticks', bucket_interval=x)
prl['imb'] = (prl['bid_vol'] - prl['ask_vol']) / (prl['bid_vol'] + prl['ask_vol'])
prls_df = otp.run(prl, symbols=emini, start=start, end=start+otp.Milli(100))
print(prls_df.head())

imb_stats = prl.agg({
    'tw_imb': otp.agg.tw_average('imb'),
    'mean':   otp.agg.average('imb'),
    'stdev':  otp.agg.stddev('imb'),
})
print(otp.run(imb_stats, symbols=emini, start=start, end=start+otp.Milli(100)))
                           Time  BID_PRICE               BID_UPDATE_TIME  BID_SIZE  ASK_PRICE               ASK_UPDATE_TIME  ASK_SIZE  LEVEL
0 2022-03-02 10:00:00.000000000   14075.25 2022-03-02 09:59:59.992733435         1   14076.00 2022-03-02 09:59:59.992707035         1      1
1 2022-03-02 10:00:00.000000000   14075.00 2022-03-02 09:59:59.954720897         2   14076.25 2022-03-02 09:59:59.953516579         2      2
2 2022-03-02 10:00:00.000000000   14074.75 2022-03-02 09:59:59.948987231         2   14076.50 2022-03-02 09:59:59.893691735         4      3
3 2022-03-02 10:00:00.000788455   14075.00 2022-03-02 09:59:59.954720897         2   14076.00 2022-03-02 09:59:59.992707035         1      1
4 2022-03-02 10:00:00.000788455   14074.75 2022-03-02 09:59:59.948987231         2   14076.25 2022-03-02 09:59:59.953516579         2      2
5 2022-03-02 10:00:00.000788455   14074.50 2022-03-02 09:59:59.893306297         2   14076.50 2022-03-02 09:59:59.893691735         4      3
6 2022-03-02 10:00:00.000788627   14074.75 2022-03-02 09:59:59.948987231         2   14076.00 2022-03-02 09:59:59.992707035         1      1
                           Time  ask_vol  bid_vol       imb
0 2022-03-02 10:00:00.000000000        7        5 -0.166667
1 2022-03-02 10:00:00.000788455        7        6 -0.076923
2 2022-03-02 10:00:00.000788627        7        6 -0.076923
3 2022-03-02 10:00:00.000789627        7        6 -0.076923
4 2022-03-02 10:00:00.000790993        7        5 -0.166667
                     Time    tw_imb      mean     stdev
0 2022-03-02 10:00:00.100  0.018318  0.040778  0.204065

Book sweep#

There are two version of book sweep: by price and by quantity. Book sweep by price, take a price as an input and returns the total quatity available at that price or better. Book sweep by quantity, takes a quantity as an input and returns the VWAP if the quantity were executed immediately.

def side_to_direction(side):
    return 1 if side == 'ASK' else -1

def sweep_by_price(side, price):
    prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL', side=side)
    direction = side_to_direction(side)
    prl, _ = prl[direction * prl['PRICE'] <= direction * price]
    prl = prl.agg({'total_qty': otp.agg.sum('SIZE')})
    return otp.run(prl, symbols=emini, start=start, end=start)

print(sweep_by_price('BID', 14075))
print(sweep_by_price('ASK', 14077))
                 Time  total_qty
0 2022-03-02 10:00:00          3
                 Time  total_qty
0 2022-03-02 10:00:00         11
def sweep_by_qty(side, qty):
    prl = otp.ObSnapshot(db='CME', tick_type='PRL_FULL', side=side)
    prl = prl.agg({'total_qty': otp.agg.sum('SIZE')}, running=True, all_fields=True)
    direction = side_to_direction(side)
    prl, _ = prl[prl['total_qty'] - prl['SIZE'] < qty]
    # update the SIZE in the last tick only so that total_qty is exactly qty
    prl['SIZE'] = prl.apply(lambda tick: prl['SIZE'] - (prl['total_qty'] - qty) if prl['total_qty'] > qty else prl['SIZE'])
    prl = prl.agg({'VWAP': otp.agg.vwap('PRICE', 'SIZE')})
    return otp.run(prl, symbols=emini, start=start, end=start)
print(sweep_by_qty('BID', 10))
print(sweep_by_qty('ASK', 10))
                 Time       VWAP
0 2022-03-02 10:00:00  14074.625
                 Time     VWAP
0 2022-03-02 10:00:00  14076.5

Market By Order#

Order Book data may be annotated with ‘key’ fields lets you break down the book by each value of the ‘key’ fields. For example, a book could by keyed by market participant ID, allowing you to see the book with the orders of a given market participant only. Some exchanges provide ‘market-by-order’ data where the book is keyed by order id. Set show_full_detail to True to see the book broken down to the most granular level. The example below is a market-by-order book.

prl = otp.ObSnapshot('CME', tick_type='PRL_FULL', side='BID', show_full_detail=True)
prl = prl.first(5)
print(otp.run(prl, symbols=emini, start=start, end=start))
                 Time       ORDER_ID  BUY_SELL_FLAG ORDER_TYPE     PRICE  SIZE  TIME_PRIORITY RECORD_TYPE        DELETED_TIME  TICK_STATUS  OMDSEQ  LEVEL                   UPDATE_TIME
0 2022-03-02 10:00:00  6830860875633              0          L  14075.25     1    43132163778           R 1969-12-31 19:00:00            0       3      1 2022-03-02 09:59:59.992733435
1 2022-03-02 10:00:00  6830860875525              0          L  14075.00     1    43132163642           R 1969-12-31 19:00:00            0      11      2 2022-03-02 09:59:59.949569201
2 2022-03-02 10:00:00  6830860875517              0          L  14075.00     1    43132163634           R 1969-12-31 19:00:00            0       3      2 2022-03-02 09:59:59.949264581
3 2022-03-02 10:00:00  6830860875515              0          L  14074.75     1    43132163632           R 1969-12-31 19:00:00            0       0      3 2022-03-02 09:59:59.948987231
4 2022-03-02 10:00:00  6830860875480              0          L  14074.75     1    43132163583           R 1969-12-31 19:00:00            0       3      3 2022-03-02 09:59:59.910612819

Market-by-order data can be used to analyze/validate the priority mechanism used by the exchange.``

prl = otp.ObSnapshot('CME', tick_type='PRL_FULL', side='BID', show_full_detail=True)

"""
ORDER_TYPE:
L = Limit order
I = Implied order

Implied liquidity doesn’t have priority as it's always last to execute at any price level. 
It also doesn’t have an order ID, so the IDs that you see in the db are synthetic 
(consisting of 1 or 2 for the 1st/2nd implied level, and E/F for the buy/sell side respectively).

In order to rank the orders within a given price point by priority, you need to sort first by ORDER_TYPE (“L” comes before “I”),
then by TIME_PRIORITY (lowest value comes first).
"""
prl = prl.sort(['LEVEL','ORDER_TYPE', 'TIME_PRIORITY'],ascending=[True,False, True])
orders = otp.run(prl, symbols=emini, start=start, end=start);
orders = orders[['ORDER_ID', 'PRICE', 'LEVEL', 'TIME_PRIORITY','SIZE', 'BUY_SELL_FLAG', 'ORDER_TYPE', 'RECORD_TYPE', 'UPDATE_TIME', 'TICK_STATUS']]
orders.head()
ORDER_ID PRICE LEVEL TIME_PRIORITY SIZE BUY_SELL_FLAG ORDER_TYPE RECORD_TYPE UPDATE_TIME TICK_STATUS
0 6830860875633 14075.25 1 43132163778 1 0 L R 2022-03-02 09:59:59.992733435 0
1 6830860875517 14075.00 2 43132163634 1 0 L R 2022-03-02 09:59:59.949264581 0
2 6830860875525 14075.00 2 43132163642 1 0 L R 2022-03-02 09:59:59.949569201 0
3 6830860875480 14074.75 3 43132163583 1 0 L R 2022-03-02 09:59:59.910612819 0
4 6830860875515 14074.75 3 43132163632 1 0 L R 2022-03-02 09:59:59.948987231 0

Best practices#

The golden rule of otp: call otp.run as late as possible#

Processing before the otp.run call is good because

  • it takes place in OneTick which is optimized for time series processing

  • it is parallelized by symbol and query (e.g., when “join with query” is called)

  • it takes place in a scalable environment

Processing after the otp.run call is bad because

  • it takes place in python which is slow

  • it takes place in the notebook which is limited by a fixed and normally small instance size where the notebook runs

Bad:

trd = otp.DataSource('CME', tick_type='TRD')
trd_df = otp.run(trd, start=start, end=end, symbols=emini)
trd_df['SIZE'].agg('sum')
7520

Good:

trd = otp.DataSource('CME', tick_type='TRD')

trd = trd.agg({'VOLUME': otp.agg.sum(trd['SIZE'])})

otp.run(trd, start=start, end=end, symbols=emini)
Time VOLUME
0 2022-03-02 10:05:00 7520

Bad:

trd = otp.DataSource('CME', tick_type='TRD')
trd_df = otp.run(trd, start=start, end=end, symbols=emini)
print(trd_df[trd_df['TRADE_TYPE'] == 'LEG'].head())
                              Time           EXCH_TIME  PRICE  SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE TRADE_ID BUY_ORDER_ID SELL_ORDER_ID        DELETED_TIME  TICK_STATUS  OMDSEQ
271  2022-03-02 10:00:07.592961899 1969-12-31 19:00:00    NaN     1                       LEG            -         1                                     1969-12-31 19:00:00            0       9
272  2022-03-02 10:00:07.593666573 1969-12-31 19:00:00    NaN     2                       LEG            -         1                                     1969-12-31 19:00:00            0       0
3114 2022-03-02 10:02:20.046438331 1969-12-31 19:00:00    NaN     3                       LEG            -         1                                     1969-12-31 19:00:00            0       0
4753 2022-03-02 10:03:15.584272405 1969-12-31 19:00:00    NaN     1                       LEG            -         1                                     1969-12-31 19:00:00            0       3
5328 2022-03-02 10:03:35.612187685 1969-12-31 19:00:00    NaN     1                       LEG            -         1                                     1969-12-31 19:00:00            0       0

Good:

trd = otp.DataSource('CME', tick_type='TRD')
leg, notleg = trd[trd['TRADE_TYPE'] == 'LEG']
leg = leg.first(5)
print(otp.run(leg, start=start, end=end, symbols=emini))
                           Time           EXCH_TIME  PRICE  SIZE AGGRESSOR_SIDE TRADE_TYPE TRADE_PERIOD BOOK_TYPE TRADE_ID BUY_ORDER_ID SELL_ORDER_ID        DELETED_TIME  TICK_STATUS  OMDSEQ
0 2022-03-02 10:00:07.592961899 1969-12-31 19:00:00    NaN     1                       LEG            -         1                                     1969-12-31 19:00:00            0       9
1 2022-03-02 10:00:07.593666573 1969-12-31 19:00:00    NaN     2                       LEG            -         1                                     1969-12-31 19:00:00            0       0
2 2022-03-02 10:02:20.046438331 1969-12-31 19:00:00    NaN     3                       LEG            -         1                                     1969-12-31 19:00:00            0       0
3 2022-03-02 10:03:15.584272405 1969-12-31 19:00:00    NaN     1                       LEG            -         1                                     1969-12-31 19:00:00            0       3
4 2022-03-02 10:03:35.612187685 1969-12-31 19:00:00    NaN     1                       LEG            -         1                                     1969-12-31 19:00:00            0       0

Bad:

trd = otp.DataSource('CME', tick_type='TRD')
trd_df = otp.run(trd, start=start, end=end, symbols=emini)
trd_df['TRADE_TYPE'].unique()
array(['-', 'LEG'], dtype=object)

Good:

trd = otp.DataSource('CME', tick_type='TRD')
trd = trd.distinct(keys=['TRADE_TYPE'])
otp.run(trd, start=start, end=end, symbols=emini)
Time TRADE_TYPE
0 2022-03-02 10:05:00 -
1 2022-03-02 10:05:00 LEG

More advanced concepts#

Timezones#

# this sets the timezone for all of the otp calls until its redefined
otp.config['tz'] = 'GMT'
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
print(otp.run(trd.first()[['PRICE']],start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))

otp.config['tz'] = 'America/Chicago'
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
print(otp.run(trd.first()[['PRICE']],start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))

otp.config['tz'] = 'EST5EDT'
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
print(otp.run(trd.first()[['PRICE']],start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))
                           Time   PRICE
0 2022-03-01 10:00:04.133843068  164.01
                           Time     PRICE
0 2022-03-01 10:00:00.010168563  165.2822
                           Time   PRICE
0 2022-03-01 10:00:00.003529175  165.75
# you can also pass a timezone to otp.run()
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
print(otp.run(trd,timezone='GMT').head())

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
print(otp.run(trd,timezone='America/Chicago').head())
                           Time EXCHANGE  COND STOP_STOCK SOURCE TRF TTE TICKER   PRICE DELETED_TIME  TICK_STATUS  SIZE  CORR  SEQ_NUM TRADE_ID              PARTICIPANT_TIME   TRF_TIME  OMDSEQ
0 2022-03-01 10:00:04.133843068        P  @ TI                 N       0   AAPL  164.01   1970-01-01            0     1     0    18621      452 2022-03-01 10:00:04.133499136 1970-01-01       0
1 2022-03-01 10:00:08.519225755        P  @ TI                 N       0   AAPL  164.00   1970-01-01            0     1     0    18635      453 2022-03-01 10:00:08.518881024 1970-01-01       0
2 2022-03-01 10:00:15.746801681        Q  @FTI                 N       1   AAPL  164.01   1970-01-01            0     1     0    18649      405 2022-03-01 10:00:15.746784019 1970-01-01       0
3 2022-03-01 10:00:30.490236820        P  @ TI                 N       0   AAPL  164.00   1970-01-01            0     1     0    18686      454 2022-03-01 10:00:30.489892608 1970-01-01       0
4 2022-03-01 10:00:30.497604203        K  @FTI                 N       1   AAPL  164.01   1970-01-01            0     1     0    18687      131 2022-03-01 10:00:30.497364000 1970-01-01       0
                           Time EXCHANGE  COND STOP_STOCK SOURCE TRF TTE TICKER     PRICE        DELETED_TIME  TICK_STATUS  SIZE  CORR  SEQ_NUM TRADE_ID              PARTICIPANT_TIME                      TRF_TIME  OMDSEQ
0 2022-03-01 10:00:00.010168563        D  @4 W                 N   Q   1   AAPL  165.2822 1969-12-31 18:00:00            0   109     0  2283964    53706 2022-03-01 10:00:00.000085000 2022-03-01 10:00:00.010144942       0
1 2022-03-01 10:00:00.012746348        Z  @                    N       0   AAPL  165.1700 1969-12-31 18:00:00            0   100     0  2283965    31202 2022-03-01 10:00:00.012545000 1969-12-31 18:00:00.000000000       0
2 2022-03-01 10:00:00.012754964        Z  @                    N       0   AAPL  165.1700 1969-12-31 18:00:00            0   100     0  2283966    31203 2022-03-01 10:00:00.012545000 1969-12-31 18:00:00.000000000       1
3 2022-03-01 10:00:00.012757825        Z  @                    N       0   AAPL  165.1700 1969-12-31 18:00:00            0   100     0  2283967    31204 2022-03-01 10:00:00.012545000 1969-12-31 18:00:00.000000000       2
4 2022-03-01 10:00:00.012902668        U  @                    N       0   AAPL  165.1700 1969-12-31 18:00:00            0   100     0  2283968    12450 2022-03-01 10:00:00.012689732 1969-12-31 18:00:00.000000000       3

Reverting to OneTick when otp is not enough: the sink() method#

We are working to extend otp to cover most of the cases but OneTick is big and powerful and oftentimes it offers functionality that is not yet available in otp. The ‘sink()’ method lets you (pretty seamlessly) transition to a lower level API called onetick.query that supports all of OneTick features. See docs/onetick.query.api.python.html in OneTick documentation.

import onetick.query as otq
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')

trd.sink(otq.Variance(input_field_name='PRICE', output_field_name='VAR_PRICE', bucket_interval=60))
trd = trd.table(VAR_PRICE=float, strict=False) # need to explicitly add the fields added by the `sink` methods to the schema

trd['std'] = otp.math.sqrt(trd['VAR_PRICE'])

otp.run(trd,  start=otp.dt(2022,9,1,10), end=otp.dt(2022,9,1,16), symbols='AA')
Time VAR_PRICE std
0 2022-09-01 10:01:00 0.016861 0.129851
1 2022-09-01 10:02:00 0.001083 0.032903
2 2022-09-01 10:03:00 0.004457 0.066762
3 2022-09-01 10:04:00 0.001522 0.039015
4 2022-09-01 10:05:00 0.001840 0.042895
... ... ... ...
355 2022-09-01 15:56:00 0.000230 0.015159
356 2022-09-01 15:57:00 0.000491 0.022159
357 2022-09-01 15:58:00 0.000508 0.022529
358 2022-09-01 15:59:00 0.000034 0.005794
359 2022-09-01 16:00:00 0.000625 0.024996

360 rows × 3 columns

Filters: using both if and else branches#

Filters return two objects: one for the ticks that satisfy the conditions and the other one for the ones that don’t. Having both objects allows combining the two streams without duplication anything that happens before the filter. An example is below.

# this processing happens before a filter
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE','SIZE','EXCHANGE','COND']]
trd = trd.first(5)
print(otp.run(trd, symbols='AAPL', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15)))

# the filter is applied
q, non_q = trd[trd['EXCHANGE']=='Q']

# different processing can be applied to the different braches of the filter
q = q.first()

# the two branches can them be merged
merged = otp.merge([q, non_q])

otp.run(merged, symbols='AAPL', start=otp.dt(2022,3,1,10), end=otp.dt(2022,3,1,10,15))
                           Time   PRICE  SIZE EXCHANGE  COND
0 2022-03-01 10:00:00.003529175  165.75   100        Q  @   
1 2022-03-01 10:00:00.003530928  165.75   100        Q  @   
2 2022-03-01 10:00:00.003532720  165.75   100        Q  @   
3 2022-03-01 10:00:00.003676379  165.75   100        Z  @   
4 2022-03-01 10:00:00.004032611  165.75   104        H  @F  
Time PRICE SIZE EXCHANGE COND
0 2022-03-01 10:00:00.003529175 165.75 100 Q @
1 2022-03-01 10:00:00.003676379 165.75 100 Z @
2 2022-03-01 10:00:00.004032611 165.75 104 H @F