Databases, symbols, and tick types#
The data is organized by database (e.g., NYSE_TAQ
or CME
), symbol (e.g., AAPL
), and tick type (e.g., TRD
or QTE
). A data source must specify these (possibly providing multiple values for each) to retrieve data.
A tick type can be set explicitly via the tick_type
parameter of a data source.
The database name can be specified via the db
parameter or as part of a symbol using the ::
separator.
For example
>>> trades = otp.DataSource(db='NYSE_TAQ', symbol='AAPL', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> volume()
Time VOLUME
0 2023-03-02 62351689
is equivalent to
>>> trades = otp.DataSource(symbol='NYSE_TAQ::AAPL', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> trades.count()
461617
The same technique applies to the database and tick type:
>>> trades = otp.DataSource(symbol='AAPL', tick_type='NYSE_TAQ::TRD', date=otp.dt(2023, 3, 1))
>>> trades.count()
461617
Note that symbol name and tick type can be accessed as pseudo-fields in OneTick. Those pseudo-fields are always present and you can access them like that:
>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> trades['SYMBOL_NAME'] = trades['_SYMBOL_NAME']
>>> trades['TICK_TYPE'] = trades['_TICK_TYPE']
>>> trades = trades.first()
>>> trades = trades[['PRICE', 'SIZE', 'SYMBOL_NAME', 'TICK_TYPE']]
>>> otp.run(trades, symbols='AAPL')
Time PRICE SIZE SYMBOL_NAME TICK_TYPE
0 2023-03-01 04:00:00.005165425 147.66 4 AAPL TRD
Parameter identify_input_ts
in otp.DataSource
and otp.merge
can also be used to add them as new columns.
Symbols: bound and unbound#
There are two ways of setting symbols: bound and unbound.
Bound symbols are specified when defining a source or with otp.merge
.
Unbound symbols are set when the query is executed and apply to all sources that do not specify bound symbols.
Here is an simple example of an unbound symbol:
>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> volume(symbols=['AAPL']) # unbound symbol is set here
Time VOLUME
0 2023-03-02 62351689
There is no difference between bound and unbound when we talk about a single symbol. To appreciate the difference, let’s look at two symbols.
>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> dict(sorted(volume(symbols=['AAPL', 'MSFT']).items()))
{'AAPL': Time VOLUME
0 2023-03-02 62351689,
'MSFT': Time VOLUME
0 2023-03-02 31619403}
The results for each unbound symbol are processed separately and returned in a separate pandas DataFrame. The result of the run method above is a dict with symbols as keys and pandas DataFrames as values.
Contrast this with bound symbols where the ticks are merged into a single flow:
>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1), symbols=['AAPL', 'MSFT'])
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> volume()
Time VOLUME
0 2023-03-02 93971092
Specifying bound symbols on a source is just a shorthand for the otp.merge
method added right
after the source definition.
>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> cross_symbol_trades = otp.merge([trades], symbols=['AAPL', 'MSFT'])
>>> volume = cross_symbol_trades.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> volume()
Time VOLUME
0 2023-03-02 93971092
Concurrency#
Unbound symbols can be processed in parallel by specifying the concurrency
parameter of otp.run
.
You have to explicitly use otp.run
to execute the query if you want to specify concurrency
.
otp.run(volume, symbols=['AAPL', 'MSFT', 'AMZN', 'META'], concurrency=8)
For bound symbols, all calculations passed into otp.merge
(trades
in example above) run in parallel.
Calculations can be made faster by computing as much as possible per symbol before merging them.
The following example has the same result as the previous one but it finds the total volume faster as it calculates
the volume for every symbol independently which can be done in parallel and then adds up the total.
>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> cross_symbol_volumes = otp.merge([volume], symbols=['AAPL', 'MSFT'], presort=True, identify_input_ts=True)
>>> otp.run(cross_symbol_volumes, concurrency=8)
Time VOLUME SYMBOL_NAME TICK_TYPE
0 2023-03-02 62351689 AAPL TRD
1 2023-03-02 31619403 MSFT TRD
>>> total_volume = cross_symbol_volumes.agg({'TOTAL_VOLUME': otp.agg.sum('VOLUME')})
>>> otp.run(total_volume, concurrency=8)
Time TOTAL_VOLUME
0 2023-03-02 93971092
Note
All sources passed into merge
are considered bound symbol sources.
Symbols#
Specifying symbols dynamically#
In many cases it is necessary to select symbols from a databases according to some logic.
The otp.Symbols
source makes this easy.
>>> symbols = otp.Symbols(db='NYSE_TAQ', date=otp.dt(2023, 3, 1), pattern='AAA%')
>>> symbols()
Time SYMBOL_NAME
0 2023-03-01 AAA
1 2023-03-01 AAAU
The result of otp.Symbols
can be used as bound/unbound symbols.
>>> data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2))
>>> data = data.first(3)
>>> data = data[['PRICE','SIZE']]
>>> dict(sorted(data(symbols=otp.Symbols(db='NYSE_TAQ', pattern='AAA%')).items()))
{'AAA': Time PRICE SIZE
0 2023-03-01 09:30:00.441452800 24.56 10
1 2023-03-01 09:30:00.441455872 24.56 10
2 2023-03-01 11:41:22.735185152 24.53 1,
'AAAU': Time PRICE SIZE
0 2023-03-01 04:25:29.017414400 18.17 40
1 2023-03-01 07:00:00.071737088 18.17 100
2 2023-03-01 07:00:28.802050304 18.22 549}
Note that the interval of the main query is implicitly used in otp.Symbols(db='NYSE_TAQ', pattern='AAA%')
.
The symbols
parameter can be any type of calculation that contains the SYMBOL_NAME
field in the resulting ticks.
Every tick provides a separate symbol name in the SYMBOL_NAME
field while the other fields are passed as
‘symbol parameters’.
Symbol parameters#
Symbol parameters are any fields other than SYMBOL_NAME
that are constant for an instrument. We illustrate this below.
First, consider a query that finds 2 most traded symbols among symbols starting with the letter ‘A’. We’ll later use the
output of this query to specify symbols and symbol params.
>>> trd = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2))
>>> trd = trd.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> count = otp.merge([trd], symbols=otp.Symbols(db='NYSE_TAQ', pattern='A%'), presort=True, identify_input_ts=True)
>>> most_traded = count.high('VOLUME', n=2)
>>> most_traded()
Time VOLUME SYMBOL_NAME TICK_TYPE
0 2023-03-02 62351689 AAPL TRD
1 2023-03-02 56099688 AMZN TRD
The output of most_traded
provides symbols in the SYMBOL_NAME
column while the other columns (VOLUME
and TICK_TYPE
)
provide symbol parameters. The following code retrieves the value of VOLUME
for use in tick-by-tick analytics.
>>> trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
>>> trd = trd.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> count = otp.merge([trd], symbols=otp.Symbols('NYSE_TAQ', pattern='A%'), identify_input_ts=True)
>>> most_traded = count.high('VOLUME', n=2)
>>>
>>> data = otp.DataSource('NYSE_TAQ', tick_type='TRD')
>>> data = data.first(3)
>>> data = data[['PRICE','SIZE']]
>>> data['VOLUME'] = data.Symbol.VOLUME
>>> dict(sorted(otp.run(data, symbols=most_traded, start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2)).items()))
{'AAPL': Time PRICE SIZE VOLUME
0 2023-03-01 04:00:00.005165425 147.66 4 62351689
1 2023-03-01 04:00:00.005173198 147.66 1 62351689
2 2023-03-01 04:00:00.014521032 147.66 4 62351689,
'AMZN': Time PRICE SIZE VOLUME
0 2023-03-01 04:00:00.062175376 94.31 10 56099688
1 2023-03-01 04:00:00.063719476 94.31 3 56099688
2 2023-03-01 04:00:00.064597172 94.31 4 56099688}
Equivalently, symbol parameters can be accessed by wrapping a function around the query as illustrated below.
>>> trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
>>> trd = trd.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> count = otp.merge([trd], symbols=otp.Symbols('NYSE_TAQ', pattern='A%'), identify_input_ts=True)
>>> most_traded = count.high('VOLUME', n=2)
>>>
>>> def query(sym):
... data = otp.DataSource('NYSE_TAQ', tick_type='TRD')
... data = data.first(3)
... data = data[['PRICE','SIZE']]
... data['VOLUME'] = sym['VOLUME']
... return data
>>>
>>> dict(sorted(otp.run(query, symbols=most_traded, start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2)).items()))
{'AAPL': Time PRICE SIZE VOLUME
0 2023-03-01 04:00:00.005165425 147.66 4 62351689
1 2023-03-01 04:00:00.005173198 147.66 1 62351689
2 2023-03-01 04:00:00.014521032 147.66 4 62351689,
'AMZN': Time PRICE SIZE VOLUME
0 2023-03-01 04:00:00.062175376 94.31 10 56099688
1 2023-03-01 04:00:00.063719476 94.31 3 56099688
2 2023-03-01 04:00:00.064597172 94.31 4 56099688}
Time interval per symbol#
It is allowed to specify query interval per symbol using special fields _PARAM_START_TIME_NANOS
and _PARAM_START_TIME_NANOS
>>> custom_symbols = otp.Ticks(SYMBOL_NAME=['AAPL', 'MSFT'],
... _PARAM_START_TIME_NANOS=[otp.dt(2023, 3, 1, 10, 30), otp.dt(2023, 3, 1, 11)],
... _PARAM_END_TIME_NANOS=[otp.dt(2023, 3, 1, 11), otp.dt(2023, 3, 1, 11, 30)])
>>> data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD')
>>> data = data.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> data['SYMBOL'] = data.Symbol.name
>>> dict(sorted(otp.run(data, start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2), symbols=custom_symbols).items()))
{'AAPL': Time VOLUME SYMBOL
0 2023-03-01 11:00:00 3539370 AAPL,
'MSFT': Time VOLUME SYMBOL
0 2023-03-01 11:30:00 1679716 MSFT}
Note that per symbol intervals should be inside the query interval.
Associated symbols#
Associated symbols is a technique when unbound symbols are used to define bound symbols or symbols in related queries.
It is expressed using the otp.eval
and a symbol parameter. More details can be
found in the API doc for the otp.eval