Time-Based Joins#
Time series analysis often involves looking up information from other time series for relevant time ranges. OneTick has built-in functions for this.
join_by_time
: Enhancing a time series with information from other time series at the time of each tick#
Below we’ll enhance the trades with the prevailing quote (i.e., best bid and ask) at the time of each trade. We’ll first look just at trades, then just at quotes, and finally we’ll join the two.
Let’s examine the trades first.
import onetick.py as otp
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]
otp.run(trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time | PRICE | SIZE | |
---|---|---|---|
0 | 2023-03-29 10:00:00.005852928 | 399.8500 | 100 |
1 | 2023-03-29 10:00:00.005852928 | 399.8500 | 100 |
2 | 2023-03-29 10:00:00.005856000 | 399.8500 | 100 |
3 | 2023-03-29 10:00:00.006580736 | 399.8500 | 1 |
4 | 2023-03-29 10:00:00.012014592 | 399.8500 | 1 |
... | ... | ... | ... |
2136 | 2023-03-29 10:00:59.285156096 | 399.8800 | 100 |
2137 | 2023-03-29 10:00:59.687216896 | 399.8900 | 1 |
2138 | 2023-03-29 10:00:59.911493888 | 399.8899 | 100 |
2139 | 2023-03-29 10:00:59.974551040 | 399.8701 | 100 |
2140 | 2023-03-29 10:00:59.998262528 | 399.8900 | 1 |
2141 rows × 3 columns
Now let’s take a look at the quotes (or rather the ‘national best bid/offer’).
qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO')
qte = qte[['BID_PRICE', 'ASK_PRICE']]
otp.run(qte, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time | BID_PRICE | ASK_PRICE | |
---|---|---|---|
0 | 2023-03-29 10:00:00.005847040 | 399.85 | 399.86 |
1 | 2023-03-29 10:00:00.005907200 | 399.85 | 399.86 |
2 | 2023-03-29 10:00:00.005957376 | 399.85 | 399.86 |
3 | 2023-03-29 10:00:00.005992448 | 399.85 | 399.86 |
4 | 2023-03-29 10:00:00.006007296 | 399.85 | 399.86 |
... | ... | ... | ... |
26409 | 2023-03-29 10:00:59.998479872 | 399.88 | 399.89 |
26410 | 2023-03-29 10:00:59.998615296 | 399.88 | 399.89 |
26411 | 2023-03-29 10:00:59.998635264 | 399.88 | 399.89 |
26412 | 2023-03-29 10:00:59.999427072 | 399.88 | 399.89 |
26413 | 2023-03-29 10:00:59.999838208 | 399.88 | 399.89 |
26414 rows × 3 columns
We “enhance” the trades with the information from the quotes.
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time | PRICE | SIZE | BID_PRICE | ASK_PRICE | |
---|---|---|---|---|---|
0 | 2023-03-29 10:00:00.005852928 | 399.8500 | 100 | 399.85 | 399.86 |
1 | 2023-03-29 10:00:00.005852928 | 399.8500 | 100 | 399.85 | 399.86 |
2 | 2023-03-29 10:00:00.005856000 | 399.8500 | 100 | 399.85 | 399.86 |
3 | 2023-03-29 10:00:00.006580736 | 399.8500 | 1 | 399.84 | 399.85 |
4 | 2023-03-29 10:00:00.012014592 | 399.8500 | 1 | 399.85 | 399.87 |
... | ... | ... | ... | ... | ... |
2136 | 2023-03-29 10:00:59.285156096 | 399.8800 | 100 | 399.88 | 399.89 |
2137 | 2023-03-29 10:00:59.687216896 | 399.8900 | 1 | 399.87 | 399.89 |
2138 | 2023-03-29 10:00:59.911493888 | 399.8899 | 100 | 399.87 | 399.89 |
2139 | 2023-03-29 10:00:59.974551040 | 399.8701 | 100 | 399.87 | 399.89 |
2140 | 2023-03-29 10:00:59.998262528 | 399.8900 | 1 | 399.88 | 399.89 |
2141 rows × 5 columns
In other words, each trade is joined with the quote that was active at the time the trade took place. We can examine the quote time to make sure it’s before the trade time.
qte['quote_time'] = qte['Time']
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['SPY'])
Time | PRICE | SIZE | BID_PRICE | ASK_PRICE | quote_time | |
---|---|---|---|---|---|---|
0 | 2023-03-29 10:00:00.005852928 | 399.8500 | 100 | 399.85 | 399.86 | 2023-03-29 10:00:00.005847040 |
1 | 2023-03-29 10:00:00.005852928 | 399.8500 | 100 | 399.85 | 399.86 | 2023-03-29 10:00:00.005847040 |
2 | 2023-03-29 10:00:00.005856000 | 399.8500 | 100 | 399.85 | 399.86 | 2023-03-29 10:00:00.005847040 |
3 | 2023-03-29 10:00:00.006580736 | 399.8500 | 1 | 399.84 | 399.85 | 2023-03-29 10:00:00.006548736 |
4 | 2023-03-29 10:00:00.012014592 | 399.8500 | 1 | 399.85 | 399.87 | 2023-03-29 10:00:00.011275008 |
... | ... | ... | ... | ... | ... | ... |
2136 | 2023-03-29 10:00:59.285156096 | 399.8800 | 100 | 399.88 | 399.89 | 2023-03-29 10:00:59.285148672 |
2137 | 2023-03-29 10:00:59.687216896 | 399.8900 | 1 | 399.87 | 399.89 | 2023-03-29 10:00:59.686418176 |
2138 | 2023-03-29 10:00:59.911493888 | 399.8899 | 100 | 399.87 | 399.89 | 2023-03-29 10:00:59.879587072 |
2139 | 2023-03-29 10:00:59.974551040 | 399.8701 | 100 | 399.87 | 399.89 | 2023-03-29 10:00:59.933540352 |
2140 | 2023-03-29 10:00:59.998262528 | 399.8900 | 1 | 399.88 | 399.89 | 2023-03-29 10:00:59.998239232 |
2141 rows × 6 columns
Let’s try the same query for a less liquid symbol.
otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['AA'])
Time | PRICE | SIZE | BID_PRICE | ASK_PRICE | quote_time | |
---|---|---|---|---|---|---|
0 | 2023-03-29 10:00:00.734953984 | 40.9500 | 7 | NaN | NaN | 1969-12-31 19:00:00.000000000 |
1 | 2023-03-29 10:00:01.492248576 | 40.9500 | 100 | 40.92 | 40.95 | 2023-03-29 10:00:01.158007296 |
2 | 2023-03-29 10:00:03.348103936 | 40.9500 | 8 | 40.92 | 40.95 | 2023-03-29 10:00:03.347166976 |
3 | 2023-03-29 10:00:03.348121856 | 40.9500 | 391 | 40.92 | 40.95 | 2023-03-29 10:00:03.348119296 |
4 | 2023-03-29 10:00:03.348169984 | 40.9500 | 100 | 40.92 | 40.95 | 2023-03-29 10:00:03.348169472 |
... | ... | ... | ... | ... | ... | ... |
177 | 2023-03-29 10:00:53.957339392 | 41.0000 | 61 | 40.98 | 41.00 | 2023-03-29 10:00:53.957334528 |
178 | 2023-03-29 10:00:55.377122560 | 41.0100 | 5 | 40.98 | 41.01 | 2023-03-29 10:00:54.791796736 |
179 | 2023-03-29 10:00:55.612973568 | 40.9916 | 1 | 40.98 | 41.01 | 2023-03-29 10:00:54.791796736 |
180 | 2023-03-29 10:00:57.381350656 | 41.0000 | 10 | 40.99 | 41.03 | 2023-03-29 10:00:57.381330688 |
181 | 2023-03-29 10:00:57.382493184 | 41.0100 | 100 | 41.00 | 41.03 | 2023-03-29 10:00:57.381917184 |
182 rows × 6 columns
We didn’t find a matching quote for the first trade. This happens when the first event following the query start time is a trade rather than a quote. We can fix this by instructing OneTick to look for a quote back in time from the start time. A complete code snipped is below.
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]
qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=60)
qte = qte[['BID_PRICE', 'ASK_PRICE']]
qte['quote_time'] = qte['Time']
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=otp.dt(2023, 3, 29, 10), end=otp.dt(2023, 3, 29, 10, 1), symbols=['AA'])
Time | PRICE | SIZE | BID_PRICE | ASK_PRICE | quote_time | |
---|---|---|---|---|---|---|
0 | 2023-03-29 10:00:00.734953984 | 40.9500 | 7 | 40.92 | 40.95 | 2023-03-29 10:00:00.000000000 |
1 | 2023-03-29 10:00:01.492248576 | 40.9500 | 100 | 40.92 | 40.95 | 2023-03-29 10:00:01.158007296 |
2 | 2023-03-29 10:00:03.348103936 | 40.9500 | 8 | 40.92 | 40.95 | 2023-03-29 10:00:03.347166976 |
3 | 2023-03-29 10:00:03.348121856 | 40.9500 | 391 | 40.92 | 40.95 | 2023-03-29 10:00:03.348119296 |
4 | 2023-03-29 10:00:03.348169984 | 40.9500 | 100 | 40.92 | 40.95 | 2023-03-29 10:00:03.348169472 |
... | ... | ... | ... | ... | ... | ... |
177 | 2023-03-29 10:00:53.957339392 | 41.0000 | 61 | 40.98 | 41.00 | 2023-03-29 10:00:53.957334528 |
178 | 2023-03-29 10:00:55.377122560 | 41.0100 | 5 | 40.98 | 41.01 | 2023-03-29 10:00:54.791796736 |
179 | 2023-03-29 10:00:55.612973568 | 40.9916 | 1 | 40.98 | 41.01 | 2023-03-29 10:00:54.791796736 |
180 | 2023-03-29 10:00:57.381350656 | 41.0000 | 10 | 40.99 | 41.03 | 2023-03-29 10:00:57.381330688 |
181 | 2023-03-29 10:00:57.382493184 | 41.0100 | 100 | 41.00 | 41.03 | 2023-03-29 10:00:57.381917184 |
182 rows × 6 columns
Use case: Prevaling quote at the time of a trade#
The code snippet above implements this use case.
Use case: Computing Markouts#
A common TCA and quant research use case is to see what happens to the quote at certain intervals (aka markouts) before/after the trade. We can compute markouts efficiently by shifting the time series for each markout and then doing join_by_time
.
s = otp.dt(2023, 3, 8, 10)
e = otp.dt(2023, 3, 9, 10, 0, 1)
markouts = [-1, 1, 5]
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]
qte_by_markout = []
for m in markouts:
qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=86400)
qte = qte[['ASK_PRICE', 'BID_PRICE']]
qte = qte.rename({'ASK_PRICE': f'ASK_PRICE_{m}',
'BID_PRICE': f'BID_PRICE_{m}'})
qte[f'quote_time_{m}'] = qte['Time']
# shift the data by m seconds
qte = qte.time_interval_shift(m * 1000)
qte_by_markout.append(qte)
trd = otp.join_by_time([trd] + qte_by_markout)
otp.run(trd, start=s, end=e, symbols='TSLA', apply_times_daily=True)
Time | PRICE | SIZE | ASK_PRICE_-1 | BID_PRICE_-1 | quote_time_-1 | ASK_PRICE_1 | BID_PRICE_1 | quote_time_1 | ASK_PRICE_5 | BID_PRICE_5 | quote_time_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-03-08 10:00:00.014762069 | 181.4250 | 300 | 181.47 | 181.41 | 2023-03-08 09:59:59.002564435 | 181.49 | 181.48 | 2023-03-08 10:00:01.000000000 | 181.15 | 181.04 | 2023-03-08 10:00:05.012557243 |
1 | 2023-03-08 10:00:00.014764168 | 181.4400 | 12 | 181.47 | 181.41 | 2023-03-08 09:59:59.002564435 | 181.49 | 181.48 | 2023-03-08 10:00:01.000000000 | 181.15 | 181.04 | 2023-03-08 10:00:05.012557243 |
2 | 2023-03-08 10:00:00.014767570 | 181.4400 | 200 | 181.47 | 181.41 | 2023-03-08 09:59:59.002564435 | 181.49 | 181.48 | 2023-03-08 10:00:01.000000000 | 181.15 | 181.04 | 2023-03-08 10:00:05.012557243 |
3 | 2023-03-08 10:00:00.014799259 | 181.4500 | 2 | 181.47 | 181.41 | 2023-03-08 09:59:59.002564435 | 181.49 | 181.48 | 2023-03-08 10:00:01.000000000 | 181.15 | 181.04 | 2023-03-08 10:00:05.012557243 |
4 | 2023-03-08 10:00:00.015374686 | 181.4400 | 300 | 181.47 | 181.41 | 2023-03-08 09:59:59.002564435 | 181.49 | 181.48 | 2023-03-08 10:00:01.000000000 | 181.15 | 181.04 | 2023-03-08 10:00:05.012557243 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
705 | 2023-03-09 10:00:00.956041264 | 183.5899 | 500 | 183.46 | 183.43 | 2023-03-09 09:59:59.911620614 | 183.59 | 183.54 | 2023-03-09 10:00:01.945769144 | 183.65 | 183.57 | 2023-03-09 10:00:05.954673686 |
706 | 2023-03-09 10:00:00.967881686 | 183.5800 | 25 | 183.46 | 183.43 | 2023-03-09 09:59:59.963372673 | 183.59 | 183.54 | 2023-03-09 10:00:01.963256371 | 183.66 | 183.57 | 2023-03-09 10:00:05.967652794 |
707 | 2023-03-09 10:00:00.978963708 | 183.4718 | 136 | 183.46 | 183.42 | 2023-03-09 09:59:59.971349853 | 183.59 | 183.54 | 2023-03-09 10:00:01.963256371 | 183.66 | 183.57 | 2023-03-09 10:00:05.967652794 |
708 | 2023-03-09 10:00:00.983358396 | 183.5500 | 20 | 183.46 | 183.41 | 2023-03-09 09:59:59.983288366 | 183.59 | 183.54 | 2023-03-09 10:00:01.963256371 | 183.66 | 183.57 | 2023-03-09 10:00:05.967652794 |
709 | 2023-03-09 10:00:00.990945922 | 183.5650 | 200 | 183.44 | 183.41 | 2023-03-09 09:59:59.988781014 | 183.59 | 183.54 | 2023-03-09 10:00:01.963256371 | 183.66 | 183.57 | 2023-03-09 10:00:05.967652794 |
710 rows × 12 columns
join_with_query
: Executing a query on each tick#
Sometimes we want to do a look up based on the information provided in a tick. For example, we may have a series of order ticks each containing an order arrival and exit times and we may want to find the market vwap during the interval. Let’s take this one step at a time.
Let’s find the market vwap for a given time range (i.e., for a given start
and end
).
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
otp.run(q, start=otp.dt(2023, 4, 10, 10, 25, 59), end=otp.dt(2023,4,10,10,27), symbols='AAPL')
Time | market_vwap | |
---|---|---|
0 | 2023-04-10 10:27:00 | 160.477922 |
Next let’s create a couple of orders each with its own values for start
/end
specified in the arrival
/exit
columns.
orders = otp.Ticks(arrival=[otp.dt(2023, 4, 10, 10, 25, 59), otp.dt(2023, 4, 10, 10, 26, 9)],
exit=[otp.dt(2023, 4, 10, 10, 26, 7), otp.dt(2023, 4, 10, 10, 27, 48)],
sym=['AAPL', 'MSFT'])
otp.run(orders)
Time | arrival | exit | sym | |
---|---|---|---|---|
0 | 2003-12-01 00:00:00.000 | 2023-04-10 10:25:59 | 2023-04-10 10:26:07 | AAPL |
1 | 2003-12-01 00:00:00.001 | 2023-04-10 10:26:09 | 2023-04-10 10:27:48 | MSFT |
We can wrap the code that finds vwap into a function and call it for each order while passing the relevant parameters for start
, end
, and symbol
.
def vwap(symbol):
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
return q
orders = otp.Ticks(arrival=[otp.dt(2023, 4, 10, 10, 25, 59), otp.dt(2023, 4, 10, 10, 26, 9)],
exit=[otp.dt(2023, 4, 10, 10, 26, 7), otp.dt(2023, 4, 10, 10, 27, 48)],
sym=['AAPL', 'MSFT'])
orders = orders.join_with_query(vwap, start=orders['arrival'], end=orders['exit'], symbol=orders['sym'])
otp.run(orders)
Time | market_vwap | arrival | exit | sym | |
---|---|---|---|---|---|
0 | 2003-12-01 00:00:00.000 | 160.548890 | 2023-04-10 10:25:59 | 2023-04-10 10:26:07 | AAPL |
1 | 2003-12-01 00:00:00.001 | 286.136779 | 2023-04-10 10:26:09 | 2023-04-10 10:27:48 | MSFT |
Use case: Interval VWAP#
The code above provides an implementation for this use case. However, a more efficient implementation may be useful when the number of orders is large. We provide a more efficient implementation below.
orders['_PARAM_START_TIME_NANOS'] = orders['arrival']
orders['_PARAM_END_TIME_NANOS'] = orders['exit']
orders['SYMBOL_NAME'] = orders['sym']
otp.run(vwap, symbols=orders, start=otp.dt(2022,12,1), end=otp.dt(2022,12,2))
{'AAPL': Time market_vwap
0 2023-04-10 10:26:07 160.54889,
'MSFT': Time market_vwap
0 2023-04-10 10:27:48 286.136779}