# Time-Based Joins

Time series analysis often involves looking up information from other time series for relevant time ranges. OneTick has built-in functions for this.

# `join_by_time`: Enhancing a time series with information from other time series at the time of each tick

Below we'll enhance the trades with the prevailing quote (i.e., best bid and ask) at the time of each trade. We'll first look just at trades, then just at quotes, and finally we'll join the two.

Let's examine the trades first.

```ipython3
import onetick.py as otp

s = otp.dt(2024, 2, 1, 9, 30)
e = otp.dt(2024, 2, 1, 9, 30, 1)

trd = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
trd = trd[['PRICE', 'SIZE']]
otp.run(trd, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                             Time    PRICE  SIZE
0   2024-02-01 09:30:00.000961260  184.010   302
1   2024-02-01 09:30:00.000961491  184.000   100
2   2024-02-01 09:30:00.000961701  184.000     1
3   2024-02-01 09:30:00.000973163  184.000     1
4   2024-02-01 09:30:00.000973355  184.000     5
..                            ...      ...   ...
574 2024-02-01 09:30:00.987184691  183.900     9
575 2024-02-01 09:30:00.990378350  183.920     1
576 2024-02-01 09:30:00.991941892  183.935     1
577 2024-02-01 09:30:00.993785116  183.905   300
578 2024-02-01 09:30:00.996512511  183.934     5

[579 rows x 3 columns]
```

Now let's take a look at the quotes (or rather the 'national best bid/offer').

Note the [`back_to_first_tick`](../../api/sources/data_source.md#onetick.py.DataSource) parameter
which seeks a tick before query start time
and ensures that there is at least one quote at the start of the query
(its actual timestamp is before the start).

It's commonly used when we want to find the tick (e.g., last quote or trade) at a given time:

```ipython3
qte = otp.DataSource('US_COMP_SAMPLE', tick_type='NBBO', back_to_first_tick=86400)
qte = qte[['BID_PRICE', 'ASK_PRICE']]

given_time = otp.dt(2024, 2, 15, 9, 30)
otp.run(qte, start=given_time, end=given_time, symbols=['AAPL'])
```

```myst-ansi
                 Time  BID_PRICE  ASK_PRICE
0 2024-02-15 09:30:00     183.41     183.53
```

Or when we want to make sure there is tick at the start of the query interval
(e.g., to make sure the trades have a prevailing quote even if it's from before the query start time):

```ipython3
qte = otp.DataSource('US_COMP_SAMPLE', tick_type='NBBO', back_to_first_tick=60)
qte = qte[['BID_PRICE', 'ASK_PRICE']]
otp.run(qte, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                             Time  BID_PRICE  ASK_PRICE
0   2024-02-01 09:30:00.000000000     184.00     184.14
1   2024-02-01 09:30:00.000860953     184.00     184.14
2   2024-02-01 09:30:00.000969529     183.90     184.14
3   2024-02-01 09:30:00.005907962     183.90     184.07
4   2024-02-01 09:30:00.006859453     183.90     184.07
..                            ...        ...        ...
497 2024-02-01 09:30:00.955093516     183.88     183.93
498 2024-02-01 09:30:00.959097743     183.88     183.93
499 2024-02-01 09:30:00.960533572     183.88     183.93
500 2024-02-01 09:30:00.973387417     183.89     183.93
501 2024-02-01 09:30:00.987461418     183.89     183.93

[502 rows x 3 columns]
```

We "enhance" the trades with the information from the quotes.

```ipython3
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                             Time    PRICE  SIZE  BID_PRICE  ASK_PRICE
0   2024-02-01 09:30:00.000961260  184.010   302     184.00     184.14
1   2024-02-01 09:30:00.000961491  184.000   100     184.00     184.14
2   2024-02-01 09:30:00.000961701  184.000     1     184.00     184.14
3   2024-02-01 09:30:00.000973163  184.000     1     183.90     184.14
4   2024-02-01 09:30:00.000973355  184.000     5     183.90     184.14
..                            ...      ...   ...        ...        ...
574 2024-02-01 09:30:00.987184691  183.900     9     183.89     183.93
575 2024-02-01 09:30:00.990378350  183.920     1     183.89     183.93
576 2024-02-01 09:30:00.991941892  183.935     1     183.89     183.93
577 2024-02-01 09:30:00.993785116  183.905   300     183.89     183.93
578 2024-02-01 09:30:00.996512511  183.934     5     183.89     183.93

[579 rows x 5 columns]
```

In other words, each trade is joined with the quote that was active at the time the trade took place. We can examine the quote time to make sure it's before the trade time.

```ipython3
qte['quote_time'] = qte['Time']
enh_trd = otp.join_by_time([trd, qte])
otp.run(enh_trd, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                             Time    PRICE  SIZE  BID_PRICE  ASK_PRICE                    quote_time
0   2024-02-01 09:30:00.000961260  184.010   302     184.00     184.14 2024-02-01 09:30:00.000860953
1   2024-02-01 09:30:00.000961491  184.000   100     184.00     184.14 2024-02-01 09:30:00.000860953
2   2024-02-01 09:30:00.000961701  184.000     1     184.00     184.14 2024-02-01 09:30:00.000860953
3   2024-02-01 09:30:00.000973163  184.000     1     183.90     184.14 2024-02-01 09:30:00.000969529
4   2024-02-01 09:30:00.000973355  184.000     5     183.90     184.14 2024-02-01 09:30:00.000969529
..                            ...      ...   ...        ...        ...                           ...
574 2024-02-01 09:30:00.987184691  183.900     9     183.89     183.93 2024-02-01 09:30:00.973387417
575 2024-02-01 09:30:00.990378350  183.920     1     183.89     183.93 2024-02-01 09:30:00.987461418
576 2024-02-01 09:30:00.991941892  183.935     1     183.89     183.93 2024-02-01 09:30:00.987461418
577 2024-02-01 09:30:00.993785116  183.905   300     183.89     183.93 2024-02-01 09:30:00.987461418
578 2024-02-01 09:30:00.996512511  183.934     5     183.89     183.93 2024-02-01 09:30:00.987461418

[579 rows x 6 columns]
```

## Join-by-time Use Cases

[Prevailing quote at the time of a trade](use_cases/prevailing_quote.md)

[Computing Markouts](use_cases/markouts.md)

# `join_with_query`: Executing a query on each tick

Sometimes we want to do a look up based on the information provided in a tick. For example, we may have a series of order ticks each containing an order arrival and exit times and we may want to find the market vwap during the interval. Let's take this one step at a time.

Let's find the market vwap for a given time range (i.e., for a given `start` and `end`).

```ipython3
q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
otp.run(q, start=s, end=e, symbols=['AAPL'])
```

```myst-ansi
                 Time  market_vwap
0 2024-02-01 09:30:01   183.901435
```

Next let's create a couple of orders each with its own values for `start`/`end` specified in the `arrival`/`exit` columns.

```ipython3
orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)],
                   exit=[e, e + otp.Milli(9556)],
                   sym=['AAPL', 'MSFT'])
otp.run(orders)
```

```myst-ansi
                     Time                 arrival                    exit   sym
0 2003-12-01 00:00:00.000 2024-02-01 09:30:00.000 2024-02-01 09:30:01.000  AAPL
1 2003-12-01 00:00:00.001 2024-02-01 09:30:07.934 2024-02-01 09:30:10.556  MSFT
```

We can wrap the code that finds vwap into a function and call it for each order while passing the relevant parameters for `start`, `end`, and `symbol`.

```ipython3
def vwap(symbol):
    q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
    q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})
    return q

orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)],
                   exit=[e, e + otp.Milli(9556)],
                   sym=['AAPL', 'MSFT'])
orders = orders.join_with_query(vwap, start=orders['arrival'], end=orders['exit'], symbol=orders['sym'])
otp.run(orders, start=s, end=s + otp.Day(1))
```

```myst-ansi
                     Time  market_vwap                 arrival                    exit   sym
0 2024-02-01 09:30:00.000   183.901435 2024-02-01 09:30:00.000 2024-02-01 09:30:01.000  AAPL
1 2024-02-01 09:30:00.001   402.840416 2024-02-01 09:30:07.934 2024-02-01 09:30:10.556  MSFT
```

<a id="vwap-symbol-params"></a>

# Interval VWAP: the efficient way

The code above provides an implementation for this use case. However, a more efficient implementation may be useful when the number of orders is large. It appears below.

```ipython3
def vwap(symbol):
    q = otp.DataSource('US_COMP_SAMPLE', tick_type='TRD')
    q = q.agg({'market_vwap': otp.agg.vwap('PRICE','SIZE')})
    return q

orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)],
                   exit=[e, e + otp.Milli(9556)],
                   sym=['AAPL', 'MSFT'])

orders['_PARAM_START_TIME_NANOS'] = orders['arrival']
orders['_PARAM_END_TIME_NANOS'] = orders['exit']
orders['SYMBOL_NAME'] = orders['sym']
```

```ipython3
otp.run(vwap, symbols=orders, date=otp.dt(2024, 2, 1))
```

```myst-ansi
{'AAPL':                  Time  market_vwap
 0 2024-02-01 09:30:01   183.901435,
 'MSFT':                      Time  market_vwap
 0 2024-02-01 09:30:10.556   402.840416}
```

A separate query is executed for each order in parallel. Each order becomes a `symbol` that specifies the security and the start/end time. The logic in `vwap()` is executed for every ("unbound") symbol. This is more efficient than calling `join_with_query` as it can be parallelized better. See "Databases, symbols, and tick types" under Concepts for more info.

Note that the start and end parameters are not important for the run method as each of the symbols specifies its own start/end time in `_PARAM_START_TIME_NANOS` and `_PARAM_END_TIME_NANOS`.
