Aggregations#
This section contains reference for the available aggregation functions.
However here are listed some basic examples of using parameters common to different aggregations.
How aggregation buckets are created#
Each aggregation is split into “buckets” - time intervals in which the data is aggregated.
The creation of buckets is determined by the query time interval
and by several aggregation parameters: bucket_interval
, running
and all_fields
.
Let’s see all different combinations of them and how they produce different results.
First, the input data, 10 ticks with different offsets, query time interval is 1 minute:
>>> t = otp.Ticks(A=range(10),
... start=otp.dt(2003, 12, 1),
... end=otp.dt(2003, 12, 1) + otp.Minute(1),
... offset=[otp.Second(i) for i in [0, 5, 13, 15, 20, 27, 30, 35, 42, 45]])
>>> otp.run(t)
Time A
0 2003-12-01 00:00:00 0
1 2003-12-01 00:00:05 1
2 2003-12-01 00:00:13 2
3 2003-12-01 00:00:15 3
4 2003-12-01 00:00:20 4
5 2003-12-01 00:00:27 5
6 2003-12-01 00:00:30 6
7 2003-12-01 00:00:35 7
8 2003-12-01 00:00:42 8
9 2003-12-01 00:00:45 9
Fixed-size buckets#
By default, bucket_interval
is set to 0 and running
is set to False.
It aggregates the input data in a single bucket from the query start time to the query end time,
producing a single output tick:
>>> otp.run(
... t.agg({'S': otp.agg.sum('A')})
... )
Time S
0 2003-12-01 00:01:00 45
Setting bucket_interval
to some value will split the query time interval into buckets
of the specified fixed size
(except the last bucket that may be smaller if the time interval can’t be evenly divided).
Note that bucket intervals are non-overlapping and bucket end time is exclusive, so the tick
with the timestamp equal to the bucket end time will be included in the next bucket.
>>> otp.run(
... t.agg({'S': otp.agg.sum('A')}, bucket_interval=21)
... )
Time S
0 2003-12-01 00:00:21 10
1 2003-12-01 00:00:42 18
2 2003-12-01 00:01:00 17
Sliding window (parameter running
)#
Setting running
to True will aggregate the input data in a “sliding window”.
Setting running
to True without setting bucket_interval
will basically create a “cumulative” aggregation.
It will create a bucket for each input tick from the query start time to the input tick’s timestamp (inclusive):
>>> otp.run(
... t.agg({'S': otp.agg.sum('A')}, running=True)
... )
Time S
0 2003-12-01 00:00:00 0
1 2003-12-01 00:00:05 1
2 2003-12-01 00:00:13 3
3 2003-12-01 00:00:15 6
4 2003-12-01 00:00:20 10
5 2003-12-01 00:00:27 15
6 2003-12-01 00:00:30 21
7 2003-12-01 00:00:35 28
8 2003-12-01 00:00:42 36
9 2003-12-01 00:00:45 45
Setting running
to True together with setting bucket_interval
will create a “sliding windows” of the specified fixed size.
Each input tick may produce up to two buckets:
one looks “backward” from input tick’s timestamp (inclusive)
other looks “forward” from input tick’s timestamp (exclusive)
“forward” bucket is not created if the other tick has an equal “backward” bucket or when “forward” bucket exceeds query end time.
>>> otp.run(
... t.agg({'S': otp.agg.sum('A')}, running=True, bucket_interval=20)
... )
Time S
0 2003-12-01 00:00:00 0
1 2003-12-01 00:00:05 1
2 2003-12-01 00:00:13 3
3 2003-12-01 00:00:15 6
4 2003-12-01 00:00:20 10
5 2003-12-01 00:00:25 9
6 2003-12-01 00:00:27 14
7 2003-12-01 00:00:30 20
8 2003-12-01 00:00:33 18
9 2003-12-01 00:00:35 22
10 2003-12-01 00:00:40 18
11 2003-12-01 00:00:42 26
12 2003-12-01 00:00:45 35
13 2003-12-01 00:00:47 30
14 2003-12-01 00:00:50 24
15 2003-12-01 00:00:55 17
Parameter all_fields
#
Parameter all_fields
allows to include the other fields from the input tick in the result,
and also allows to choose which sliding windows to include in the output.
Setting running
to True together with setting all_fields
to True
will produce output only for “backward” sliding windows and will copy all the fields from the input tick:
>>> otp.run(
... t.agg({'S': otp.agg.sum('A')}, running=True, bucket_interval=20, all_fields=True)
... )
Time A S
0 2003-12-01 00:00:00 0 0
1 2003-12-01 00:00:05 1 1
2 2003-12-01 00:00:13 2 3
3 2003-12-01 00:00:15 3 6
4 2003-12-01 00:00:20 4 10
5 2003-12-01 00:00:27 5 14
6 2003-12-01 00:00:30 6 20
7 2003-12-01 00:00:35 7 22
8 2003-12-01 00:00:42 8 26
9 2003-12-01 00:00:45 9 35
Setting running
to True together with setting all_fields
to when_ticks_exit_window
will produce output only for “forward” sliding windows and will copy all the fields from the input tick:
>>> otp.run(
... t.agg({'S': otp.agg.sum('A')}, running=True, bucket_interval=20, all_fields='when_ticks_exit_window')
... )
Time A S
0 2003-12-01 00:00:20 0 6
1 2003-12-01 00:00:25 1 10
2 2003-12-01 00:00:33 2 20
3 2003-12-01 00:00:35 3 18
4 2003-12-01 00:00:40 4 22
5 2003-12-01 00:00:47 5 35
6 2003-12-01 00:00:50 6 30
7 2003-12-01 00:00:55 7 24
Setting running
to False together with setting all_fields
will copy the values of the other input fields from the first tick of each bucket:
>>> otp.run(
... t.agg({'S': otp.agg.sum('A')}, bucket_interval=21, all_fields=True)
... )
Time A S
0 2003-12-01 00:00:21 0 10
1 2003-12-01 00:00:42 5 18
2 2003-12-01 00:01:00 8 17
Examples with different buckets#
Find average for selected ticks for the 5 second interval with sliding window:
>>> data = otp.Ticks(
... X=[10, 9, 14, 14, 8, 11],
... offset=[0, 1000, 2000, 3000, 4000, 5000],
... )
>>> data = data.agg({'RESULT': otp.agg.average('X')}, running=True, bucket_interval=otp.Second(5))
>>> otp.run(data)
Time RESULT
0 2003-12-01 00:00:00 10.00
1 2003-12-01 00:00:01 9.50
2 2003-12-01 00:00:02 11.00
3 2003-12-01 00:00:03 11.75
4 2003-12-01 00:00:04 11.00
5 2003-12-01 00:00:05 11.20
6 2003-12-01 00:00:06 11.75
7 2003-12-01 00:00:07 11.00
8 2003-12-01 00:00:08 9.50
9 2003-12-01 00:00:09 11.00
10 2003-12-01 00:00:10 NaN
Find total volume of trades, minimal and maximum price for the first day for a symbol AAA
:
>>> data = otp.DataSource(db='DEMO_L1', tick_type='TRD', symbol='AAA')
>>> data = data.agg({
... 'SUM': otp.agg.sum('SIZE'),
... 'MIN': otp.agg.min('PRICE'),
... 'MAX': otp.agg.max('PRICE'),
... }, bucket_interval=otp.Day(1))
>>> data = data.first()
>>> otp.run(data)
Time SUM MIN MAX
0 2003-12-02 1600 59.72 60.24
Find an average in buckets of 5 ticks:
>>> data = otp.Ticks(X=[21, 20, 22, 25, 18, 17, 19, 23, 21, 21, 16, 20, 15])
>>> data = data.agg({'AVG': otp.agg.average('X')}, bucket_interval=5, bucket_units='ticks')
>>> otp.run(data)
Time AVG
0 2003-12-01 00:00:00.004 21.2
1 2003-12-01 00:00:00.009 20.2
2 2003-12-04 00:00:00.000 17.0
Bucket interval can be set as a float if bucket_units
is set to seconds:
>>> data = otp.Ticks(X=[1, 2, 3, 4, 5, 6, 7, 8])
>>> data = data.agg({'SUM': otp.agg.sum('X')}, bucket_interval=0.002)
>>> otp.run(data, start=otp.config.default_start_time, end=otp.config.default_start_time + otp.Milli(20))
Time SUM
0 2003-12-01 00:00:00.002 3
1 2003-12-01 00:00:00.004 7
2 2003-12-01 00:00:00.006 11
3 2003-12-01 00:00:00.008 15
4 2003-12-01 00:00:00.010 0
5 2003-12-01 00:00:00.012 0
6 2003-12-01 00:00:00.014 0
7 2003-12-01 00:00:00.016 0
8 2003-12-01 00:00:00.018 0
9 2003-12-01 00:00:00.020 0
Other#
Aggregate over Operation
instead of Column
:
>>> data = otp.Ticks(X=[1, 2, 3], Y=[4, 5, 6])
>>> data = data.agg({
... 'SUM': otp.agg.sum(data['X'] * data['Y'])
... })
>>> otp.run(data)
Time SUM
0 2003-12-04 32