{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "aef8b826", "metadata": {}, "source": [ "# Aggregating" ] }, { "attachments": {}, "cell_type": "markdown", "id": "157bf535-0415-431d-a9d7-0375319e6bf1", "metadata": {}, "source": [ "Let's start with an unaggregated time series." ] }, { "cell_type": "code", "execution_count": 3, "id": "91d63514-8e66-4680-be77-252937bdd85d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimePRICESIZECONDEXCHANGE
02023-05-15 09:30:00.000178688412.22100TP
12023-05-15 09:30:00.000776704412.22247Z
22023-05-15 09:30:00.003603456412.22100TT
32023-05-15 09:30:00.006352128412.241IK
42023-05-15 09:30:00.007128064412.243IK
..................
3102023-05-15 09:30:00.934032640412.27160TT
3112023-05-15 09:30:00.975609344412.242ID
3122023-05-15 09:30:00.980264448412.271ID
3132023-05-15 09:30:00.985391616412.28100T
3142023-05-15 09:30:00.985394944412.28100QT
\n", "

315 rows × 5 columns

\n", "
" ], "text/plain": [ " Time PRICE SIZE COND EXCHANGE\n", "0 2023-05-15 09:30:00.000178688 412.22 100 T P\n", "1 2023-05-15 09:30:00.000776704 412.22 247 Z\n", "2 2023-05-15 09:30:00.003603456 412.22 100 T T\n", "3 2023-05-15 09:30:00.006352128 412.24 1 I K\n", "4 2023-05-15 09:30:00.007128064 412.24 3 I K\n", ".. ... ... ... ... ...\n", "310 2023-05-15 09:30:00.934032640 412.27 160 T T\n", "311 2023-05-15 09:30:00.975609344 412.24 2 I D\n", "312 2023-05-15 09:30:00.980264448 412.27 1 I D\n", "313 2023-05-15 09:30:00.985391616 412.28 100 T\n", "314 2023-05-15 09:30:00.985394944 412.28 100 Q T\n", "\n", "[315 rows x 5 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import onetick.py as otp\n", "\n", "s = otp.dt(2023, 5, 15, 9, 30)\n", "e = otp.dt(2023, 5, 15, 9, 30, 1)\n", "\n", "q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]\n", "otp.run(q, start=s, end=e, symbols=['SPY'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "56dbfa00-45b9-408c-bfc0-3ea78f601498", "metadata": {}, "source": [ "Let's make a note of the total number of trades.\n", "\n", "Method {meth}`agg ` can be used to aggregate data.\n", "\n", "We can aggregate over the entire queried interval by default:" ] }, { "cell_type": "code", "execution_count": 4, "id": "34a94316-be74-4548-9f2f-0c7ea6781289", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timevolumevwap
02023-05-15 09:30:01324352412.212012
\n", "
" ], "text/plain": [ " Time volume vwap\n", "0 2023-05-15 09:30:01 324352 412.212012" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]\n", "q = q.agg({\n", " 'volume': otp.agg.sum('SIZE'),\n", " 'vwap': otp.agg.vwap('PRICE', 'SIZE'),\n", " 'count': otp.agg.count(),\n", "})\n", "otp.run(q, start=s, end=e, symbols=['SPY'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "53f66ca3-1029-49d3-97c0-c1f4dd3ad842", "metadata": {}, "source": [ "Or over fixed buckets (aka bars or windows), for example 100 milliseconds buckets:" ] }, { "cell_type": "code", "execution_count": 5, "id": "2d408201-90be-44e1-b8c5-0f4dc371e799", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timevolumevwap
02023-05-15 09:30:00.1001498412.235948
12023-05-15 09:30:00.2007743412.237833
22023-05-15 09:30:00.3001815412.243515
32023-05-15 09:30:00.400307298412.210324
42023-05-15 09:30:00.5002205412.250503
52023-05-15 09:30:00.600416412.258750
62023-05-15 09:30:00.700276412.244354
72023-05-15 09:30:00.8001755412.240875
82023-05-15 09:30:00.900595412.245782
92023-05-15 09:30:01.000751412.267643
\n", "
" ], "text/plain": [ " Time volume vwap\n", "0 2023-05-15 09:30:00.100 1498 412.235948\n", "1 2023-05-15 09:30:00.200 7743 412.237833\n", "2 2023-05-15 09:30:00.300 1815 412.243515\n", "3 2023-05-15 09:30:00.400 307298 412.210324\n", "4 2023-05-15 09:30:00.500 2205 412.250503\n", "5 2023-05-15 09:30:00.600 416 412.258750\n", "6 2023-05-15 09:30:00.700 276 412.244354\n", "7 2023-05-15 09:30:00.800 1755 412.240875\n", "8 2023-05-15 09:30:00.900 595 412.245782\n", "9 2023-05-15 09:30:01.000 751 412.267643" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]\n", "q = q.agg({\n", " 'volume': otp.agg.sum('SIZE'),\n", " 'vwap': otp.agg.vwap('PRICE', 'SIZE')\n", "}, bucket_interval=.1)\n", "otp.run(q, start=s, end=e, symbols=['SPY'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "038be2be-08c2-432f-88c0-a021d3e6a737", "metadata": {}, "source": [ "Or over a sliding window:" ] }, { "cell_type": "code", "execution_count": 7, "id": "60d494a6-58ba-4b0a-9f5e-f6f01ecb9987", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Timevolumevwap
02023-05-15 09:30:00.000178688100412.220000
12023-05-15 09:30:00.000776704347412.220000
22023-05-15 09:30:00.003603456447412.220000
32023-05-15 09:30:00.006352128448412.220045
42023-05-15 09:30:00.007128064451412.220177
............
6042023-05-15 09:30:00.980264448553412.263074
6052023-05-15 09:30:00.982660864552412.263062
6062023-05-15 09:30:00.985391616652412.265660
6072023-05-15 09:30:00.985394944752412.267566
6082023-05-15 09:30:00.993288704751412.267643
\n", "

609 rows × 3 columns

\n", "
" ], "text/plain": [ " Time volume vwap\n", "0 2023-05-15 09:30:00.000178688 100 412.220000\n", "1 2023-05-15 09:30:00.000776704 347 412.220000\n", "2 2023-05-15 09:30:00.003603456 447 412.220000\n", "3 2023-05-15 09:30:00.006352128 448 412.220045\n", "4 2023-05-15 09:30:00.007128064 451 412.220177\n", ".. ... ... ...\n", "604 2023-05-15 09:30:00.980264448 553 412.263074\n", "605 2023-05-15 09:30:00.982660864 552 412.263062\n", "606 2023-05-15 09:30:00.985391616 652 412.265660\n", "607 2023-05-15 09:30:00.985394944 752 412.267566\n", "608 2023-05-15 09:30:00.993288704 751 412.267643\n", "\n", "[609 rows x 3 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]\n", "q = q.agg({\n", " 'volume': otp.agg.sum('SIZE'),\n", " 'vwap': otp.agg.vwap('PRICE', 'SIZE')\n", "}, bucket_interval=.1, running=True)\n", "otp.run(q, start=s, end=e, symbols=['SPY'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "66ad92c4-2944-43c0-b950-7f6888f74867", "metadata": {}, "source": [ "Note that the number of output ticks is more than the number of trades. This is due to the output tick being created not only when each input tick enters the window but also when it drops out." ] }, { "attachments": {}, "cell_type": "markdown", "id": "8d28f734-8ffe-4525-9261-6f000944d27f", "metadata": {}, "source": [ "We can display all fields of the incoming tick along with the current values of the sliding window metrics." ] }, { "cell_type": "code", "execution_count": 8, "id": "fad649b3-41c1-4dbb-b130-12f47eac85e1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimePRICESIZECONDEXCHANGEvolumevwap
02023-05-15 09:30:00.000178688412.22100TP100412.220000
12023-05-15 09:30:00.000776704412.22247Z347412.220000
22023-05-15 09:30:00.003603456412.22100TT447412.220000
32023-05-15 09:30:00.006352128412.241IK448412.220045
42023-05-15 09:30:00.007128064412.243IK451412.220177
........................
3102023-05-15 09:30:00.934032640412.27160TT801412.255893
3112023-05-15 09:30:00.975609344412.242ID552412.263062
3122023-05-15 09:30:00.980264448412.271ID553412.263074
3132023-05-15 09:30:00.985391616412.28100T652412.265660
3142023-05-15 09:30:00.985394944412.28100QT752412.267566
\n", "

315 rows × 7 columns

\n", "
" ], "text/plain": [ " Time PRICE SIZE COND EXCHANGE volume vwap\n", "0 2023-05-15 09:30:00.000178688 412.22 100 T P 100 412.220000\n", "1 2023-05-15 09:30:00.000776704 412.22 247 Z 347 412.220000\n", "2 2023-05-15 09:30:00.003603456 412.22 100 T T 447 412.220000\n", "3 2023-05-15 09:30:00.006352128 412.24 1 I K 448 412.220045\n", "4 2023-05-15 09:30:00.007128064 412.24 3 I K 451 412.220177\n", ".. ... ... ... ... ... ... ...\n", "310 2023-05-15 09:30:00.934032640 412.27 160 T T 801 412.255893\n", "311 2023-05-15 09:30:00.975609344 412.24 2 I D 552 412.263062\n", "312 2023-05-15 09:30:00.980264448 412.27 1 I D 553 412.263074\n", "313 2023-05-15 09:30:00.985391616 412.28 100 T 652 412.265660\n", "314 2023-05-15 09:30:00.985394944 412.28 100 Q T 752 412.267566\n", "\n", "[315 rows x 7 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]\n", "q = q.agg({\n", " 'volume': otp.agg.sum('SIZE'),\n", " 'vwap': otp.agg.vwap('PRICE', 'SIZE')\n", "}, bucket_interval=.1, running=True, all_fields=True)\n", "otp.run(q, start=s, end=e, symbols=['SPY'])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "97d03ed4-cbf3-49e6-9989-8fee3b49b172", "metadata": {}, "source": [ "In this case, we are back to the same number of ticks as the number trades as an output tick is only created on arrival of an input tick." ] }, { "attachments": {}, "cell_type": "markdown", "id": "9e7454b1-deda-48dc-8a01-da63d48b4a5e", "metadata": {}, "source": [ "All of the aggregation operations support grouping." ] }, { "cell_type": "code", "execution_count": 9, "id": "b37e62f1-ff2c-4913-b9d7-bde925bfface", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeEXCHANGEvolumevwap
02023-05-15 09:30:01A100412.240000
12023-05-15 09:30:01B100412.220000
22023-05-15 09:30:01C102412.250000
32023-05-15 09:30:01D3269412.231798
42023-05-15 09:30:01H8412.260000
52023-05-15 09:30:01K2559412.240684
62023-05-15 09:30:01N422412.249976
72023-05-15 09:30:01P304988412.210209
82023-05-15 09:30:01T8141412.243563
92023-05-15 09:30:01U400412.250000
102023-05-15 09:30:01X200412.250000
112023-05-15 09:30:01Z4063412.238673
\n", "
" ], "text/plain": [ " Time EXCHANGE volume vwap\n", "0 2023-05-15 09:30:01 A 100 412.240000\n", "1 2023-05-15 09:30:01 B 100 412.220000\n", "2 2023-05-15 09:30:01 C 102 412.250000\n", "3 2023-05-15 09:30:01 D 3269 412.231798\n", "4 2023-05-15 09:30:01 H 8 412.260000\n", "5 2023-05-15 09:30:01 K 2559 412.240684\n", "6 2023-05-15 09:30:01 N 422 412.249976\n", "7 2023-05-15 09:30:01 P 304988 412.210209\n", "8 2023-05-15 09:30:01 T 8141 412.243563\n", "9 2023-05-15 09:30:01 U 400 412.250000\n", "10 2023-05-15 09:30:01 X 200 412.250000\n", "11 2023-05-15 09:30:01 Z 4063 412.238673" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = otp.DataSource('NYSE_TAQ', tick_type='TRD')\n", "q = q[['PRICE', 'SIZE', 'COND', 'EXCHANGE']]\n", "q = q.agg({\n", " 'volume': otp.agg.sum('SIZE'),\n", " 'vwap': otp.agg.vwap('PRICE', 'SIZE')\n", "}, group_by=['EXCHANGE'])\n", "otp.run(q, start=s, end=e, symbols=['SPY'])" ] }, { "cell_type": "markdown", "id": "0c45c074", "metadata": {}, "source": [ "Note that in non-running mode OneTick unconditionally divides the whole time interval\n", "into specified number of buckets.\n", "It means that you will always get this specified number of ticks in the result,\n", "even if you have less ticks in the input data.\n", "For example, aggregating this empty data will result in 10 ticks nonetheless:" ] }, { "cell_type": "code", "execution_count": null, "id": "1397049e", "metadata": {}, "outputs": [], "source": [ "t = otp.Empty()\n", "t = t.agg({'COUNT': otp.agg.count()}, bucket_interval=0.1)\n", "otp.run(t, start=s, end=e)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4fefcfd9-3ed8-4341-968e-a9ea9fe81bf7", "metadata": {}, "source": [ "A list of all aggregations appears [here](../../api/aggregations/root.rst). It can also be retrieved with `dir(otp.agg)`." ] }, { "attachments": {}, "cell_type": "markdown", "id": "ccafbd04-38c1-4b7a-b4cf-5cda4bbc521f", "metadata": {}, "source": [ "## Aggregation Use Cases\n", "\n", "[Creating Bars](bars)\n", "\n", "[Golden Cross strategy](goldencross)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }