Time-Based Joins#

Time series analysis often involves looking up information from other time series for relevant time ranges. OneTick has built-in functions for this.

join_by_time: Enhancing a time series with information from other time series at the time of each tick#

Below we’ll enhance the trades with the prevailing quote (i.e., best bid and ask) at the time of each trade. We’ll first look just at trades, then just at quotes, and finally we’ll join the two.

Let’s examine the trades first.

import onetick.py as otp

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]
otp.run(trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time PRICE SIZE
0 2023-03-29 10:00:00.005852928 399.8500 100
1 2023-03-29 10:00:00.005852928 399.8500 100
2 2023-03-29 10:00:00.005856000 399.8500 100
3 2023-03-29 10:00:00.006580736 399.8500 1
4 2023-03-29 10:00:00.012014592 399.8500 1
... ... ... ...
2136 2023-03-29 10:00:59.285156096 399.8800 100
2137 2023-03-29 10:00:59.687216896 399.8900 1
2138 2023-03-29 10:00:59.911493888 399.8899 100
2139 2023-03-29 10:00:59.974551040 399.8701 100
2140 2023-03-29 10:00:59.998262528 399.8900 1

2141 rows × 3 columns

Now let’s take a look at the quotes (or rather the ‘national best bid/offer’).

qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO')
qte = qte[['BID_PRICE', 'ASK_PRICE']]
otp.run(qte, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time BID_PRICE ASK_PRICE
0 2023-03-29 10:00:00.005847040 399.85 399.86
1 2023-03-29 10:00:00.005907200 399.85 399.86
2 2023-03-29 10:00:00.005957376 399.85 399.86
3 2023-03-29 10:00:00.005992448 399.85 399.86
4 2023-03-29 10:00:00.006007296 399.85 399.86
... ... ... ...
26409 2023-03-29 10:00:59.998479872 399.88 399.89
26410 2023-03-29 10:00:59.998615296 399.88 399.89
26411 2023-03-29 10:00:59.998635264 399.88 399.89
26412 2023-03-29 10:00:59.999427072 399.88 399.89
26413 2023-03-29 10:00:59.999838208 399.88 399.89

26414 rows × 3 columns

We “enhance” the trades with the information from the quotes.

enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time PRICE SIZE BID_PRICE ASK_PRICE
0 2023-03-29 10:00:00.005852928 399.8500 100 399.85 399.86
1 2023-03-29 10:00:00.005852928 399.8500 100 399.85 399.86
2 2023-03-29 10:00:00.005856000 399.8500 100 399.85 399.86
3 2023-03-29 10:00:00.006580736 399.8500 1 399.84 399.85
4 2023-03-29 10:00:00.012014592 399.8500 1 399.85 399.87
... ... ... ... ... ...
2136 2023-03-29 10:00:59.285156096 399.8800 100 399.88 399.89
2137 2023-03-29 10:00:59.687216896 399.8900 1 399.87 399.89
2138 2023-03-29 10:00:59.911493888 399.8899 100 399.87 399.89
2139 2023-03-29 10:00:59.974551040 399.8701 100 399.87 399.89
2140 2023-03-29 10:00:59.998262528 399.8900 1 399.88 399.89

2141 rows × 5 columns

In other words, each trade is joined with the quote that was active at the time the trade took place. We can examine the quote time to make sure it’s before the trade time.

qte['quote_time'] = qte['Time']
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time PRICE SIZE BID_PRICE ASK_PRICE quote_time
0 2023-03-29 10:00:00.005852928 399.8500 100 399.85 399.86 2023-03-29 10:00:00.005847040
1 2023-03-29 10:00:00.005852928 399.8500 100 399.85 399.86 2023-03-29 10:00:00.005847040
2 2023-03-29 10:00:00.005856000 399.8500 100 399.85 399.86 2023-03-29 10:00:00.005847040
3 2023-03-29 10:00:00.006580736 399.8500 1 399.84 399.85 2023-03-29 10:00:00.006548736
4 2023-03-29 10:00:00.012014592 399.8500 1 399.85 399.87 2023-03-29 10:00:00.011275008
... ... ... ... ... ... ...
2136 2023-03-29 10:00:59.285156096 399.8800 100 399.88 399.89 2023-03-29 10:00:59.285148672
2137 2023-03-29 10:00:59.687216896 399.8900 1 399.87 399.89 2023-03-29 10:00:59.686418176
2138 2023-03-29 10:00:59.911493888 399.8899 100 399.87 399.89 2023-03-29 10:00:59.879587072
2139 2023-03-29 10:00:59.974551040 399.8701 100 399.87 399.89 2023-03-29 10:00:59.933540352
2140 2023-03-29 10:00:59.998262528 399.8900 1 399.88 399.89 2023-03-29 10:00:59.998239232

2141 rows × 6 columns

Let’s try the same query for a less liquid symbol.

otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['AA'])
Time PRICE SIZE BID_PRICE ASK_PRICE quote_time
0 2023-03-29 10:00:00.734953984 40.9500 7 NaN NaN 1969-12-31 19:00:00.000000000
1 2023-03-29 10:00:01.492248576 40.9500 100 40.92 40.95 2023-03-29 10:00:01.158007296
2 2023-03-29 10:00:03.348103936 40.9500 8 40.92 40.95 2023-03-29 10:00:03.347166976
3 2023-03-29 10:00:03.348121856 40.9500 391 40.92 40.95 2023-03-29 10:00:03.348119296
4 2023-03-29 10:00:03.348169984 40.9500 100 40.92 40.95 2023-03-29 10:00:03.348169472
... ... ... ... ... ... ...
177 2023-03-29 10:00:53.957339392 41.0000 61 40.98 41.00 2023-03-29 10:00:53.957334528
178 2023-03-29 10:00:55.377122560 41.0100 5 40.98 41.01 2023-03-29 10:00:54.791796736
179 2023-03-29 10:00:55.612973568 40.9916 1 40.98 41.01 2023-03-29 10:00:54.791796736
180 2023-03-29 10:00:57.381350656 41.0000 10 40.99 41.03 2023-03-29 10:00:57.381330688
181 2023-03-29 10:00:57.382493184 41.0100 100 41.00 41.03 2023-03-29 10:00:57.381917184

182 rows × 6 columns

We didn’t find a matching quote for the first trade. This happens when the first event following the query start time is a trade rather than a quote. We can fix this by instructing OneTick to look for a quote back in time from the start time. A complete code snipped is below.

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]

qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=60)
qte = qte[['BID_PRICE', 'ASK_PRICE']]
qte['quote_time'] = qte['Time']

enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['AA'])
Time PRICE SIZE BID_PRICE ASK_PRICE quote_time
0 2023-03-29 10:00:00.734953984 40.9500 7 40.92 40.95 2023-03-29 10:00:00.000000000
1 2023-03-29 10:00:01.492248576 40.9500 100 40.92 40.95 2023-03-29 10:00:01.158007296
2 2023-03-29 10:00:03.348103936 40.9500 8 40.92 40.95 2023-03-29 10:00:03.347166976
3 2023-03-29 10:00:03.348121856 40.9500 391 40.92 40.95 2023-03-29 10:00:03.348119296
4 2023-03-29 10:00:03.348169984 40.9500 100 40.92 40.95 2023-03-29 10:00:03.348169472
... ... ... ... ... ... ...
177 2023-03-29 10:00:53.957339392 41.0000 61 40.98 41.00 2023-03-29 10:00:53.957334528
178 2023-03-29 10:00:55.377122560 41.0100 5 40.98 41.01 2023-03-29 10:00:54.791796736
179 2023-03-29 10:00:55.612973568 40.9916 1 40.98 41.01 2023-03-29 10:00:54.791796736
180 2023-03-29 10:00:57.381350656 41.0000 10 40.99 41.03 2023-03-29 10:00:57.381330688
181 2023-03-29 10:00:57.382493184 41.0100 100 41.00 41.03 2023-03-29 10:00:57.381917184

182 rows × 6 columns

Use case: Prevaling quote at the time of a trade#

The code snippet above implements this use case.

Use case: Computing Markouts#

A common TCA and quant research use case is to see what happens to the quote at certain intervals (aka markouts) before/after the trade. We can compute markouts efficiently by shifting the time series for each markout and then doing join_by_time.

s = otp.dt(2023, 3, 8, 10)
e = otp.dt(2023, 3, 9, 10, 0, 1)
markouts = [-1, 1, 5] 

trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]

qte_by_markout = []
for m in markouts:
    qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=86400)
    qte = qte[['ASK_PRICE', 'BID_PRICE']]
    qte = qte.rename({'ASK_PRICE': f'ASK_PRICE_{m}', 
                      'BID_PRICE': f'BID_PRICE_{m}'})
    qte[f'quote_time_{m}'] = qte['Time']
    
    # shift the data by m seconds
    qte = qte.time_interval_shift(m * 1000)
    qte_by_markout.append(qte)

trd = otp.join_by_time([trd] + qte_by_markout)
otp.run(trd, start=s, end=e, symbols='TSLA', apply_times_daily=True)
Time PRICE SIZE ASK_PRICE_-1 BID_PRICE_-1 quote_time_-1 ASK_PRICE_1 BID_PRICE_1 quote_time_1 ASK_PRICE_5 BID_PRICE_5 quote_time_5
0 2023-03-08 10:00:00.014762069 181.4250 300 181.47 181.41 2023-03-08 09:59:59.002564435 181.49 181.48 2023-03-08 10:00:01.000000000 181.15 181.04 2023-03-08 10:00:05.012557243
1 2023-03-08 10:00:00.014764168 181.4400 12 181.47 181.41 2023-03-08 09:59:59.002564435 181.49 181.48 2023-03-08 10:00:01.000000000 181.15 181.04 2023-03-08 10:00:05.012557243
2 2023-03-08 10:00:00.014767570 181.4400 200 181.47 181.41 2023-03-08 09:59:59.002564435 181.49 181.48 2023-03-08 10:00:01.000000000 181.15 181.04 2023-03-08 10:00:05.012557243
3 2023-03-08 10:00:00.014799259 181.4500 2 181.47 181.41 2023-03-08 09:59:59.002564435 181.49 181.48 2023-03-08 10:00:01.000000000 181.15 181.04 2023-03-08 10:00:05.012557243
4 2023-03-08 10:00:00.015374686 181.4400 300 181.47 181.41 2023-03-08 09:59:59.002564435 181.49 181.48 2023-03-08 10:00:01.000000000 181.15 181.04 2023-03-08 10:00:05.012557243
... ... ... ... ... ... ... ... ... ... ... ... ...
705 2023-03-09 10:00:00.956041264 183.5899 500 183.46 183.43 2023-03-09 09:59:59.911620614 183.59 183.54 2023-03-09 10:00:01.945769144 183.65 183.57 2023-03-09 10:00:05.954673686
706 2023-03-09 10:00:00.967881686 183.5800 25 183.46 183.43 2023-03-09 09:59:59.963372673 183.59 183.54 2023-03-09 10:00:01.963256371 183.66 183.57 2023-03-09 10:00:05.967652794
707 2023-03-09 10:00:00.978963708 183.4718 136 183.46 183.42 2023-03-09 09:59:59.971349853 183.59 183.54 2023-03-09 10:00:01.963256371 183.66 183.57 2023-03-09 10:00:05.967652794
708 2023-03-09 10:00:00.983358396 183.5500 20 183.46 183.41 2023-03-09 09:59:59.983288366 183.59 183.54 2023-03-09 10:00:01.963256371 183.66 183.57 2023-03-09 10:00:05.967652794
709 2023-03-09 10:00:00.990945922 183.5650 200 183.44 183.41 2023-03-09 09:59:59.988781014 183.59 183.54 2023-03-09 10:00:01.963256371 183.66 183.57 2023-03-09 10:00:05.967652794

710 rows × 12 columns

join_with_query: Executing a query on each tick#

Sometimes we want to do a look up based on the information provided in a tick. For example, we may have a series of order ticks each containing an order arrival and exit times and we may want to find the market vwap during the interval. Let’s take this one step at a time.

Let’s find the market vwap for a given time range (i.e., for a given start and end).

q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
otp.run(q, start=otp.dt(2023, 4, 10, 10, 25, 59), end=otp.dt(2023,4,10,10,27), symbols='AAPL')
Time market_vwap
0 2023-04-10 10:27:00 160.477922

Next let’s create a couple of orders each with its own values for start/end specified in the arrival/exit columns.

orders = otp.Ticks(arrival=[otp.dt(2023, 4, 10, 10, 25, 59), otp.dt(2023, 4, 10, 10, 26, 9)], 
                   exit=[otp.dt(2023, 4, 10, 10, 26, 7), otp.dt(2023, 4, 10, 10, 27, 48)],
                   sym=['AAPL', 'MSFT'])
otp.run(orders)
Time arrival exit sym
0 2003-12-01 00:00:00.000 2023-04-10 10:25:59 2023-04-10 10:26:07 AAPL
1 2003-12-01 00:00:00.001 2023-04-10 10:26:09 2023-04-10 10:27:48 MSFT

We can wrap the code that finds vwap into a function and call it for each order while passing the relevant parameters for start, end, and symbol.

def vwap(symbol):
    q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
    q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
    return q

orders = otp.Ticks(arrival=[otp.dt(2023, 4, 10, 10, 25, 59), otp.dt(2023, 4, 10, 10, 26, 9)], 
                   exit=[otp.dt(2023, 4, 10, 10, 26, 7), otp.dt(2023, 4, 10, 10, 27, 48)],
                   sym=['AAPL', 'MSFT'])
orders = orders.join_with_query(vwap, start=orders['arrival'], end=orders['exit'], symbol=orders['sym'])
otp.run(orders)
Time market_vwap arrival exit sym
0 2003-12-01 00:00:00.000 160.548890 2023-04-10 10:25:59 2023-04-10 10:26:07 AAPL
1 2003-12-01 00:00:00.001 286.136779 2023-04-10 10:26:09 2023-04-10 10:27:48 MSFT

Use case: Interval VWAP#

The code above provides an implementation for this use case. However, a more efficient implementation may be useful when the number of orders is large. We provide a more efficient implementation below.

orders['_PARAM_START_TIME_NANOS'] = orders['arrival'] 
orders['_PARAM_END_TIME_NANOS'] = orders['exit'] 
orders['SYMBOL_NAME'] = orders['sym'] 
otp.run(vwap, symbols=orders, start=otp.dt(2022,12,1), end=otp.dt(2022,12,2))
{'AAPL':                  Time  market_vwap
 0 2023-04-10 10:26:07    160.54889,
 'MSFT':                  Time  market_vwap
 0 2023-04-10 10:27:48   286.136779}