{ "cells": [ { "cell_type": "markdown", "id": "4a33fb14", "metadata": {}, "source": [ "# Time-Based Joins" ] }, { "cell_type": "markdown", "id": "5eaac7ad-b507-48bf-aff2-7ff5d812e147", "metadata": {}, "source": [ "Time series analysis often involves looking up information from other time series for relevant time ranges. OneTick has built-in functions for this." ] }, { "cell_type": "markdown", "id": "6f00e839-f381-49b2-9249-c1f36b36c12c", "metadata": {}, "source": [ "# `join_by_time`: Enhancing a time series with information from other time series at the time of each tick" ] }, { "cell_type": "markdown", "id": "810b313d-8033-46ee-aac3-15c61faf6635", "metadata": {}, "source": [ "Below we'll enhance the trades with the prevailing quote (i.e., best bid and ask) at the time of each trade. We'll first look just at trades, then just at quotes, and finally we'll join the two." ] }, { "cell_type": "markdown", "id": "9bed10d4-dcd8-414d-b618-d600807f39af", "metadata": {}, "source": [ "Let's examine the trades first." ] }, { "cell_type": "code", "execution_count": 2, "id": "7872865c-768c-4a7d-b30e-6887d0e7b1cf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimePRICESIZE
02023-05-15 09:30:00.000178688412.22100
12023-05-15 09:30:00.000776704412.22247
22023-05-15 09:30:00.003603456412.22100
32023-05-15 09:30:00.006352128412.241
42023-05-15 09:30:00.007128064412.243
............
3102023-05-15 09:30:00.934032640412.27160
3112023-05-15 09:30:00.975609344412.242
3122023-05-15 09:30:00.980264448412.271
3132023-05-15 09:30:00.985391616412.28100
3142023-05-15 09:30:00.985394944412.28100
\n", "

315 rows × 3 columns

\n", "
" ], "text/plain": [ " Time PRICE SIZE\n", "0 2023-05-15 09:30:00.000178688 412.22 100\n", "1 2023-05-15 09:30:00.000776704 412.22 247\n", "2 2023-05-15 09:30:00.003603456 412.22 100\n", "3 2023-05-15 09:30:00.006352128 412.24 1\n", "4 2023-05-15 09:30:00.007128064 412.24 3\n", ".. ... ... ...\n", "310 2023-05-15 09:30:00.934032640 412.27 160\n", "311 2023-05-15 09:30:00.975609344 412.24 2\n", "312 2023-05-15 09:30:00.980264448 412.27 1\n", "313 2023-05-15 09:30:00.985391616 412.28 100\n", "314 2023-05-15 09:30:00.985394944 412.28 100\n", "\n", "[315 rows x 3 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import onetick.py as otp\n", "\n", "s = otp.dt(2023, 5, 15, 9, 30)\n", "e = otp.dt(2023, 5, 15, 9, 30, 1)\n", "\n", "trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "trd = trd[['PRICE', 'SIZE']]\n", "otp.run(trd, start=s, end=e, symbols=['SPY'])" ] }, { "cell_type": "markdown", "id": "fd6139cf-af7b-4bc9-bcfe-08ef9271edd0", "metadata": {}, "source": [ "Now let's take a look at the quotes (or rather the 'national best bid/offer').\n", "\n", "Note the {class}`back_to_first_tick ` parameter\n", "which seeks a tick before query start time\n", "and ensures that there is at least one quote at the start of the query\n", "(its actual timestamp is before the start).\n", "\n", "It's commonly used when we want to find the tick (e.g., last quote or trade) at a given time:" ] }, { "cell_type": "code", "execution_count": null, "id": "63840e93", "metadata": {}, "outputs": [], "source": [ "qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=86400)\n", "qte = qte[['BID_PRICE', 'ASK_PRICE']]\n", "\n", "given_time = otp.dt(2023, 5, 15, 9, 30)\n", "otp.run(qte, start=given_time, end=given_time, symbols=['SPY'])" ] }, { "cell_type": "markdown", "id": "86ea8be0", "metadata": {}, "source": [ "Or when we want to make sure there is tick at the start of the query interval\n", "(e.g., to make sure the trades have a prevailing quote even if it's from before the query start time):" ] }, { "cell_type": "code", "execution_count": 10, "id": "ce9c9f4e-419b-4ebb-aba7-a40f8639ac77", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeBID_PRICEASK_PRICE
02023-05-15 09:30:00.000000000412.22412.25
12023-05-15 09:30:00.000174080412.22412.25
22023-05-15 09:30:00.000595200412.22412.24
32023-05-15 09:30:00.000715520412.21412.24
42023-05-15 09:30:00.000790784412.22412.24
............
11502023-05-15 09:30:00.979763456412.27412.28
11512023-05-15 09:30:00.981818880412.27412.28
11522023-05-15 09:30:00.983727872412.27412.28
11532023-05-15 09:30:00.983737856412.27412.28
11542023-05-15 09:30:00.985296640412.27412.28
\n", "

1155 rows × 3 columns

\n", "
" ], "text/plain": [ " Time BID_PRICE ASK_PRICE\n", "0 2023-05-15 09:30:00.000000000 412.22 412.25\n", "1 2023-05-15 09:30:00.000174080 412.22 412.25\n", "2 2023-05-15 09:30:00.000595200 412.22 412.24\n", "3 2023-05-15 09:30:00.000715520 412.21 412.24\n", "4 2023-05-15 09:30:00.000790784 412.22 412.24\n", "... ... ... ...\n", "1150 2023-05-15 09:30:00.979763456 412.27 412.28\n", "1151 2023-05-15 09:30:00.981818880 412.27 412.28\n", "1152 2023-05-15 09:30:00.983727872 412.27 412.28\n", "1153 2023-05-15 09:30:00.983737856 412.27 412.28\n", "1154 2023-05-15 09:30:00.985296640 412.27 412.28\n", "\n", "[1155 rows x 3 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qte = otp.DataSource('TAQ_NBBO', tick_type='NBBO', back_to_first_tick=60)\n", "qte = qte[['BID_PRICE', 'ASK_PRICE']]\n", "otp.run(qte, start=s, end=e, symbols=['SPY'])" ] }, { "cell_type": "markdown", "id": "bb18486d-ccb2-4add-b235-6803077465a0", "metadata": {}, "source": [ "We \"enhance\" the trades with the information from the quotes." ] }, { "cell_type": "code", "execution_count": 11, "id": "5ca34e70-019c-45e0-923a-61765ed71e7a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimePRICESIZEBID_PRICEASK_PRICE
02023-05-15 09:30:00.000178688412.22100412.22412.25
12023-05-15 09:30:00.000776704412.22247412.21412.24
22023-05-15 09:30:00.003603456412.22100412.22412.24
32023-05-15 09:30:00.006352128412.241412.22412.25
42023-05-15 09:30:00.007128064412.243412.22412.25
..................
3102023-05-15 09:30:00.934032640412.27160412.26412.28
3112023-05-15 09:30:00.975609344412.242412.27412.28
3122023-05-15 09:30:00.980264448412.271412.27412.28
3132023-05-15 09:30:00.985391616412.28100412.27412.28
3142023-05-15 09:30:00.985394944412.28100412.27412.28
\n", "

315 rows × 5 columns

\n", "
" ], "text/plain": [ " Time PRICE SIZE BID_PRICE ASK_PRICE\n", "0 2023-05-15 09:30:00.000178688 412.22 100 412.22 412.25\n", "1 2023-05-15 09:30:00.000776704 412.22 247 412.21 412.24\n", "2 2023-05-15 09:30:00.003603456 412.22 100 412.22 412.24\n", "3 2023-05-15 09:30:00.006352128 412.24 1 412.22 412.25\n", "4 2023-05-15 09:30:00.007128064 412.24 3 412.22 412.25\n", ".. ... ... ... ... ...\n", "310 2023-05-15 09:30:00.934032640 412.27 160 412.26 412.28\n", "311 2023-05-15 09:30:00.975609344 412.24 2 412.27 412.28\n", "312 2023-05-15 09:30:00.980264448 412.27 1 412.27 412.28\n", "313 2023-05-15 09:30:00.985391616 412.28 100 412.27 412.28\n", "314 2023-05-15 09:30:00.985394944 412.28 100 412.27 412.28\n", "\n", "[315 rows x 5 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "enh_trd = otp.join_by_time([trd, qte])\n", "otp.run(enh_trd, start=s, end=e, symbols=['SPY'])" ] }, { "cell_type": "markdown", "id": "6478876f-e63d-4d04-81e5-5c314b9eaf07", "metadata": {}, "source": [ "In other words, each trade is joined with the quote that was active at the time the trade took place. We can examine the quote time to make sure it's before the trade time." ] }, { "cell_type": "code", "execution_count": 5, "id": "3fbcedd7-9dba-4fa5-8dd8-f8f01c0226c0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimePRICESIZEBID_PRICEASK_PRICEquote_time
02023-05-15 09:30:00.000178688412.22100412.22412.252023-05-15 09:30:00.000174080
12023-05-15 09:30:00.000776704412.22247412.21412.242023-05-15 09:30:00.000715520
22023-05-15 09:30:00.003603456412.22100412.22412.242023-05-15 09:30:00.003562496
32023-05-15 09:30:00.006352128412.241412.22412.252023-05-15 09:30:00.006343936
42023-05-15 09:30:00.007128064412.243412.22412.252023-05-15 09:30:00.007110656
.....................
3102023-05-15 09:30:00.934032640412.27160412.26412.282023-05-15 09:30:00.934030080
3112023-05-15 09:30:00.975609344412.242412.27412.282023-05-15 09:30:00.970691840
3122023-05-15 09:30:00.980264448412.271412.27412.282023-05-15 09:30:00.979763456
3132023-05-15 09:30:00.985391616412.28100412.27412.282023-05-15 09:30:00.985296640
3142023-05-15 09:30:00.985394944412.28100412.27412.282023-05-15 09:30:00.985296640
\n", "

315 rows × 6 columns

\n", "
" ], "text/plain": [ " Time PRICE SIZE BID_PRICE ASK_PRICE quote_time\n", "0 2023-05-15 09:30:00.000178688 412.22 100 412.22 412.25 2023-05-15 09:30:00.000174080\n", "1 2023-05-15 09:30:00.000776704 412.22 247 412.21 412.24 2023-05-15 09:30:00.000715520\n", "2 2023-05-15 09:30:00.003603456 412.22 100 412.22 412.24 2023-05-15 09:30:00.003562496\n", "3 2023-05-15 09:30:00.006352128 412.24 1 412.22 412.25 2023-05-15 09:30:00.006343936\n", "4 2023-05-15 09:30:00.007128064 412.24 3 412.22 412.25 2023-05-15 09:30:00.007110656\n", ".. ... ... ... ... ... ...\n", "310 2023-05-15 09:30:00.934032640 412.27 160 412.26 412.28 2023-05-15 09:30:00.934030080\n", "311 2023-05-15 09:30:00.975609344 412.24 2 412.27 412.28 2023-05-15 09:30:00.970691840\n", "312 2023-05-15 09:30:00.980264448 412.27 1 412.27 412.28 2023-05-15 09:30:00.979763456\n", "313 2023-05-15 09:30:00.985391616 412.28 100 412.27 412.28 2023-05-15 09:30:00.985296640\n", "314 2023-05-15 09:30:00.985394944 412.28 100 412.27 412.28 2023-05-15 09:30:00.985296640\n", "\n", "[315 rows x 6 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qte['quote_time'] = qte['Time']\n", "enh_trd = otp.join_by_time([trd, qte])\n", "otp.run(enh_trd, start=s, end=e, symbols=['SPY'])" ] }, { "cell_type": "markdown", "id": "66f31c5b-05b3-4d7a-a208-33453fd99039", "metadata": {}, "source": [ "## Join-by-time Use Cases\n", "[Prevailing quote at the time of a trade](prevailing-quote)\n", "\n", "[Computing Markouts](markouts)" ] }, { "cell_type": "markdown", "id": "c7f14ed4-8248-430b-b13a-74a4fe694223", "metadata": {}, "source": [ "# `join_with_query`: Executing a query on each tick" ] }, { "cell_type": "markdown", "id": "d12786f9-dbcd-4726-b667-6b47d71aff64", "metadata": {}, "source": [ "Sometimes we want to do a look up based on the information provided in a tick. For example, we may have a series of order ticks each containing an order arrival and exit times and we may want to find the market vwap during the interval. Let's take this one step at a time." ] }, { "cell_type": "markdown", "id": "7db241f8-69e6-48dc-91a9-ebe496fb2a55", "metadata": {}, "source": [ "Let's find the market vwap for a given time range (i.e., for a given `start` and `end`)." ] }, { "cell_type": "code", "execution_count": 12, "id": "d3994b99-f9f7-4925-8352-ab3094c3d5ba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timemarket_vwap
02023-05-15 09:30:01412.212012
\n", "
" ], "text/plain": [ " Time market_vwap\n", "0 2023-05-15 09:30:01 412.212012" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})\n", "otp.run(q, start=s, end=e, symbols=['SPY'])" ] }, { "cell_type": "markdown", "id": "279ab9ef-844c-4a30-a526-52e09c1b65e2", "metadata": {}, "source": [ "Next let's create a couple of orders each with its own values for `start`/`end` specified in the `arrival`/`exit` columns." ] }, { "cell_type": "code", "execution_count": 17, "id": "d38d98d1-e151-4f11-bc39-bf477aa692df", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timearrivalexitsym
02003-12-01 00:00:00.0002023-05-15 09:30:00.0002023-05-15 09:30:01.000SPY
12003-12-01 00:00:00.0012023-05-15 09:30:07.9342023-05-15 09:30:10.556QQQ
\n", "
" ], "text/plain": [ " Time arrival exit sym\n", "0 2003-12-01 00:00:00.000 2023-05-15 09:30:00.000 2023-05-15 09:30:01.000 SPY\n", "1 2003-12-01 00:00:00.001 2023-05-15 09:30:07.934 2023-05-15 09:30:10.556 QQQ" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)], \n", " exit=[e, e + otp.Milli(9556)],\n", " sym=['SPY', 'QQQ'])\n", "otp.run(orders)" ] }, { "cell_type": "markdown", "id": "844a4826-7771-41c7-8f5d-2a6ed65a83c2", "metadata": {}, "source": [ "We can wrap the code that finds vwap into a function and call it for each order while passing the relevant parameters for `start`, `end`, and `symbol`." ] }, { "cell_type": "code", "execution_count": 18, "id": "fb8b8e54-ac60-49d8-9927-9bddd873055b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timemarket_vwaparrivalexitsym
02023-05-15 09:30:00.000412.2120122023-05-15 09:30:00.0002023-05-15 09:30:01.000SPY
12023-05-15 09:30:00.001325.3189882023-05-15 09:30:07.9342023-05-15 09:30:10.556QQQ
\n", "
" ], "text/plain": [ " Time market_vwap arrival exit sym\n", "0 2023-05-15 09:30:00.000 412.212012 2023-05-15 09:30:00.000 2023-05-15 09:30:01.000 SPY\n", "1 2023-05-15 09:30:00.001 325.318988 2023-05-15 09:30:07.934 2023-05-15 09:30:10.556 QQQ" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def vwap(symbol):\n", " q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", " q = q.agg({'market_vwap': otp.agg.vwap('PRICE', 'SIZE')})\n", " return q\n", "\n", "orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)], \n", " exit=[e, e + otp.Milli(9556)],\n", " sym=['SPY', 'QQQ'])\n", "orders = orders.join_with_query(vwap, start=orders['arrival'], end=orders['exit'], symbol=orders['sym'])\n", "otp.run(orders, start=s, end=s + otp.Day(1))" ] }, { "cell_type": "markdown", "id": "7624b26c-4e9d-4b74-8fc4-80bfa769162f", "metadata": {}, "source": [ "(vwap-symbol-params)=\n", "# Interval VWAP: the efficient way\n", "The code above provides an implementation for this use case. However, a more efficient implementation may be useful when the number of orders is large. It appears below." ] }, { "cell_type": "code", "execution_count": 3, "id": "5e5d69e8-571d-47be-8ce9-f07824c92d2b", "metadata": {}, "outputs": [], "source": [ "def vwap(symbol):\n", " q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", " q = q.agg({'market_vwap': otp.agg.vwap('PRICE','SIZE')})\n", " return q\n", "\n", "orders = otp.Ticks(arrival=[s, s + otp.Milli(7934)],\n", " exit=[e, e + otp.Milli(9556)],\n", " sym=['SPY', 'QQQ'])\n", "\n", "orders['_PARAM_START_TIME_NANOS'] = orders['arrival'] \n", "orders['_PARAM_END_TIME_NANOS'] = orders['exit'] \n", "orders['SYMBOL_NAME'] = orders['sym'] " ] }, { "cell_type": "code", "execution_count": 5, "id": "618781ac-c3ff-4530-9a2b-1e5ad7ec5e97", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'SPY': Time market_vwap\n", " 0 2023-05-15 09:30:01 412.212012,\n", " 'QQQ': Time market_vwap\n", " 0 2023-05-15 09:30:10.556 325.318988}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "otp.run(vwap, symbols=orders)" ] }, { "cell_type": "markdown", "id": "2f2ccbbb", "metadata": {}, "source": [ "A separate query is executed for each order in parallel. Each order becomes a `symbol` that specifies the security and the start/end time. The logic in `vwap()` is executed for every (\"unbound\") symbol. This is more efficient than calling `join_with_query` as it can be parallelized better. See \"Databases, symbols, and tick types\" under Concepts for more info.\n", "\n", "Note that the start and end parameters are not important for the run method as each of the symbols specifies its own start/end time in `_PARAM_START_TIME_NANOS` and `_PARAM_END_TIME_NANOS`." ] }, { "cell_type": "code", "execution_count": null, "id": "2f68daae", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }