Time-Based Joins#
Time series analysis often involves looking up information from other time series for relevant time ranges. OneTick has built-in functions for this.
join_by_time
: Enhancing a time series with information from other time series at the time of each tick#
Below we’ll enhance the trades with the prevailing quote (i.e., best bid and ask) at the time of each trade. We’ll first look just at trades, then just at quotes, and finally we’ll join the two.
Let’s examine the trades first.
import onetick.py as otp
s = otp.dt(2023, 5, 15, 9, 30)
e = otp.dt(2023, 5, 15, 9, 30, 1)
trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]
otp.run(trd, start=s, end=e, symbols=['SPY'])
/home/aserechenko/onetick-py/onetick/__init__.py:43: UserWarning: MAIN_ONE_TICK_DIR environment variable is not set. It is a recommended way to let onetick-py know where OneTick python libraries are located. We will try to use default value for your system: /opt/one_market_data/one_tick.
warnings.warn(message)
/home/aserechenko/onetick-py/onetick/__init__.py:43: UserWarning: MAIN_ONE_TICK_DIR environment variable is not set. It is a recommended way to let onetick-py know where OneTick python libraries are located. We will try to use default value for your system: /opt/one_market_data/one_tick.
warnings.warn(message)
Time | PRICE | SIZE | |
---|---|---|---|
0 | 2023-05-15 09:30:00.000178688 | 412.22 | 100 |
1 | 2023-05-15 09:30:00.000776704 | 412.22 | 247 |
2 | 2023-05-15 09:30:00.003603456 | 412.22 | 100 |
3 | 2023-05-15 09:30:00.006352128 | 412.24 | 1 |
4 | 2023-05-15 09:30:00.007128064 | 412.24 | 3 |
... | ... | ... | ... |
310 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 |
311 | 2023-05-15 09:30:00.975609344 | 412.24 | 2 |
312 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 |
313 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 |
314 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 |
315 rows × 3 columns
Now let’s take a look at the quotes (or rather the ‘national best bid/offer’).
Note the back_to_first_tick
parameter
which seeks a tick before query start time
and ensures that there is at least one quote at the start of the query
(its actual timestamp is before the start).
It’s commonly used when we want to find the tick (e.g., last quote or trade) at a given time:
qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=86400)
qte = qte[['BID_PRICE', 'ASK_PRICE']]
given_time = otp.dt(2023, 5, 15, 9, 30)
otp.run(qte, start=given_time, end=given_time, symbols=['SPY'])
Time | BID_PRICE | ASK_PRICE | |
---|---|---|---|
0 | 2023-05-15 09:30:00 | 412.22 | 412.25 |
Or when we want to make sure there is tick at the start of the query interval (e.g., to make sure the trades have a prevailing quote even if it’s from before the query start time):
qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=60)
qte = qte[['BID_PRICE', 'ASK_PRICE']]
otp.run(qte, start=s, end=e, symbols=['SPY'])
Time | BID_PRICE | ASK_PRICE | |
---|---|---|---|
0 | 2023-05-15 09:30:00.000000000 | 412.22 | 412.25 |
1 | 2023-05-15 09:30:00.000174080 | 412.22 | 412.25 |
2 | 2023-05-15 09:30:00.000595200 | 412.22 | 412.24 |
3 | 2023-05-15 09:30:00.000715520 | 412.21 | 412.24 |
4 | 2023-05-15 09:30:00.000790784 | 412.22 | 412.24 |
... | ... | ... | ... |
1150 | 2023-05-15 09:30:00.979763456 | 412.27 | 412.28 |
1151 | 2023-05-15 09:30:00.981818880 | 412.27 | 412.28 |
1152 | 2023-05-15 09:30:00.983727872 | 412.27 | 412.28 |
1153 | 2023-05-15 09:30:00.983737856 | 412.27 | 412.28 |
1154 | 2023-05-15 09:30:00.985296640 | 412.27 | 412.28 |
1155 rows × 3 columns
We “enhance” the trades with the information from the quotes.
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | BID_PRICE | ASK_PRICE | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000178688 | 412.22 | 100 | 412.22 | 412.25 |
1 | 2023-05-15 09:30:00.000776704 | 412.22 | 247 | 412.21 | 412.24 |
2 | 2023-05-15 09:30:00.003603456 | 412.22 | 100 | 412.22 | 412.24 |
3 | 2023-05-15 09:30:00.006352128 | 412.24 | 1 | 412.22 | 412.25 |
4 | 2023-05-15 09:30:00.007128064 | 412.24 | 3 | 412.22 | 412.25 |
... | ... | ... | ... | ... | ... |
310 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | 412.26 | 412.28 |
311 | 2023-05-15 09:30:00.975609344 | 412.24 | 2 | 412.27 | 412.28 |
312 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 | 412.27 | 412.28 |
313 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | 412.27 | 412.28 |
314 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | 412.27 | 412.28 |
315 rows × 5 columns
In other words, each trade is joined with the quote that was active at the time the trade took place. We can examine the quote time to make sure it’s before the trade time.
qte['quote_time'] = qte['Time']
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=s, end=e, symbols=['SPY'])
Time | PRICE | SIZE | BID_PRICE | ASK_PRICE | quote_time | |
---|---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000178688 | 412.22 | 100 | 412.22 | 412.25 | 2023-05-15 09:30:00.000174080 |
1 | 2023-05-15 09:30:00.000776704 | 412.22 | 247 | 412.21 | 412.24 | 2023-05-15 09:30:00.000715520 |
2 | 2023-05-15 09:30:00.003603456 | 412.22 | 100 | 412.22 | 412.24 | 2023-05-15 09:30:00.003562496 |
3 | 2023-05-15 09:30:00.006352128 | 412.24 | 1 | 412.22 | 412.25 | 2023-05-15 09:30:00.006343936 |
4 | 2023-05-15 09:30:00.007128064 | 412.24 | 3 | 412.22 | 412.25 | 2023-05-15 09:30:00.007110656 |
... | ... | ... | ... | ... | ... | ... |
310 | 2023-05-15 09:30:00.934032640 | 412.27 | 160 | 412.26 | 412.28 | 2023-05-15 09:30:00.934030080 |
311 | 2023-05-15 09:30:00.975609344 | 412.24 | 2 | 412.27 | 412.28 | 2023-05-15 09:30:00.970691840 |
312 | 2023-05-15 09:30:00.980264448 | 412.27 | 1 | 412.27 | 412.28 | 2023-05-15 09:30:00.979763456 |
313 | 2023-05-15 09:30:00.985391616 | 412.28 | 100 | 412.27 | 412.28 | 2023-05-15 09:30:00.985296640 |
314 | 2023-05-15 09:30:00.985394944 | 412.28 | 100 | 412.27 | 412.28 | 2023-05-15 09:30:00.985296640 |
315 rows × 6 columns
Join-by-time Use Cases#
join_with_query
: Executing a query on each tick#
Sometimes we want to do a look up based on the information provided in a tick. For example, we may have a series of order ticks each containing an order arrival and exit times and we may want to find the market vwap during the interval. Let’s take this one step at a time.
Let’s find the market vwap for a given time range (i.e., for a given start
and end
).
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
otp.run(q, start=s, end=e, symbols=['SPY'])
Time | market_vwap | |
---|---|---|
0 | 2023-05-15 09:30:01 | 412.212012 |
Next let’s create a couple of orders each with its own values for start
/end
specified in the arrival
/exit
columns.
orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)],
exit=[e, e + otp.Milli(9556)],
sym=['SPY', 'QQQ'])
otp.run(orders)
Time | arrival | exit | sym | |
---|---|---|---|---|
0 | 2003-12-01 00:00:00.000 | 2023-05-15 09:30:00.000 | 2023-05-15 09:30:01.000 | SPY |
1 | 2003-12-01 00:00:00.001 | 2023-05-15 09:30:07.934 | 2023-05-15 09:30:10.556 | QQQ |
We can wrap the code that finds vwap into a function and call it for each order while passing the relevant parameters for start
, end
, and symbol
.
def vwap(symbol):
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
return q
orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)],
exit=[e, e + otp.Milli(9556)],
sym=['SPY', 'QQQ'])
orders = orders.join_with_query(vwap, start=orders['arrival'], end=orders['exit'], symbol=orders['sym'])
otp.run(orders, start=s, end=s + otp.Day(1))
Time | market_vwap | arrival | exit | sym | |
---|---|---|---|---|---|
0 | 2023-05-15 09:30:00.000 | 412.212012 | 2023-05-15 09:30:00.000 | 2023-05-15 09:30:01.000 | SPY |
1 | 2023-05-15 09:30:00.001 | 325.318988 | 2023-05-15 09:30:07.934 | 2023-05-15 09:30:10.556 | QQQ |
Interval VWAP: the efficient way#
The code above provides an implementation for this use case. However, a more efficient implementation may be useful when the number of orders is large. It appears below.
def vwap(symbol):
q = otp.DataSource('NYSE_TAQ', tick_type='TRD')
q = q.agg({'market_vwap': otp.agg.vwap('PRICE','SIZE')})
return q
orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)],
exit=[e, e + otp.Milli(9556)],
sym=['SPY', 'QQQ'])
orders['_PARAM_START_TIME_NANOS'] = orders['arrival']
orders['_PARAM_END_TIME_NANOS'] = orders['exit']
orders['SYMBOL_NAME'] = orders['sym']
otp.run(vwap, symbols=orders)
{'SPY': Time market_vwap
0 2023-05-15 09:30:01 412.212012,
'QQQ': Time market_vwap
0 2023-05-15 09:30:10.556 325.318988}
A separate query is executed for each order in parallel. Each order becomes a symbol
that specifies the security and the start/end time. The logic in vwap()
is executed for every (“unbound”) symbol. This is more efficient than calling join_with_query
as it can be parallelized better. See “Databases, symbols, and tick types” under Concepts for more info.
Note that the start and end parameters are not important for the run method as each of the symbols specifies its own start/end time in _PARAM_START_TIME_NANOS
and _PARAM_END_TIME_NANOS
.