Databases, symbols, and tick types#

The data is organized by database (e.g., NYSE_TAQ or CME), symbol (e.g., AAPL), and tick type (e.g., TRD or QTE). A data source must specify these (possibly providing multiple values for each) to retrieve data.

A tick type can be set explicitly via the tick_type parameter of a data source.

The database name can be specified via the db parameter or as part of a symbol using the :: separator. For example

>>> trades = otp.DataSource(db='NYSE_TAQ', symbol='AAPL', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> volume()
        Time    VOLUME
0 2023-03-02  62351689

is equivalent to

>>> trades = otp.DataSource(symbol='NYSE_TAQ::AAPL', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> trades.count()
461617

The same technique applies to the database and tick type:

>>> trades = otp.DataSource(symbol='AAPL', tick_type='NYSE_TAQ::TRD', date=otp.dt(2023, 3, 1))
>>> trades.count()
461617

Note that symbol name and tick type can be accessed as pseudo-fields in OneTick. Those pseudo-fields are always present and you can access them like that:

>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> trades['SYMBOL_NAME'] = trades['_SYMBOL_NAME']
>>> trades['TICK_TYPE'] = trades['_TICK_TYPE']
>>> trades = trades.first()
>>> trades = trades[['PRICE', 'SIZE', 'SYMBOL_NAME', 'TICK_TYPE']]
>>> otp.run(trades, symbols='AAPL')
                           Time   PRICE  SIZE SYMBOL_NAME TICK_TYPE
0 2023-03-01 04:00:00.005165425  147.66     4        AAPL       TRD

Parameter identify_input_ts in otp.DataSource and otp.merge can also be used to add them as new columns.

Symbols: bound and unbound#

There are two ways of setting symbols: bound and unbound.

Bound symbols are specified when defining a source or with otp.merge. Unbound symbols are set when the query is executed and apply to all sources that do not specify bound symbols. Here is an simple example of an unbound symbol:

>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> volume(symbols=['AAPL'])  # unbound symbol is set here
        Time    VOLUME
0 2023-03-02  62351689

There is no difference between bound and unbound when we talk about a single symbol. To appreciate the difference, let’s look at two symbols.

>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> dict(sorted(volume(symbols=['AAPL', 'MSFT']).items()))
 {'AAPL':         Time    VOLUME
  0 2023-03-02  62351689,
  'MSFT':         Time    VOLUME
  0 2023-03-02  31619403}

The results for each unbound symbol are processed separately and returned in a separate pandas DataFrame. The result of the run method above is a dict with symbols as keys and pandas DataFrames as values.

Contrast this with bound symbols where the ticks are merged into a single flow:

>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1), symbols=['AAPL', 'MSFT'])
>>> volume = trades.agg({'VOLUME': otp.agg.sum(trades['SIZE'])})
>>> volume()
        Time    VOLUME
0 2023-03-02  93971092

Specifying bound symbols on a source is just a shorthand for the otp.merge method added right after the source definition.

>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> cross_symbol_trades = otp.merge([trades], symbols=['AAPL', 'MSFT'])
>>> volume = cross_symbol_trades.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> volume()
        Time    VOLUME
0 2023-03-02  93971092

Concurrency#

Unbound symbols can be processed in parallel by specifying the concurrency parameter of otp.run. You have to explicitly use otp.run to execute the query if you want to specify concurrency.

otp.run(volume, symbols=['AAPL', 'MSFT', 'AMZN', 'META'], concurrency=8)

For bound symbols, all calculations passed into otp.merge (trades in example above) run in parallel. Calculations can be made faster by computing as much as possible per symbol before merging them. The following example has the same result as the previous one but it finds the total volume faster as it calculates the volume for every symbol independently which can be done in parallel and then adds up the total.

>>> trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', date=otp.dt(2023, 3, 1))
>>> volume = trades.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> cross_symbol_volumes = otp.merge([volume], symbols=['AAPL', 'MSFT'], presort=True, identify_input_ts=True)
>>> otp.run(cross_symbol_volumes, concurrency=8)
        Time    VOLUME SYMBOL_NAME TICK_TYPE
0 2023-03-02  62351689        AAPL       TRD
1 2023-03-02  31619403        MSFT       TRD

>>> total_volume = cross_symbol_volumes.agg({'TOTAL_VOLUME': otp.agg.sum('VOLUME')})
>>> otp.run(total_volume, concurrency=8)
        Time  TOTAL_VOLUME
0 2023-03-02      93971092

Note

All sources passed into merge are considered bound symbol sources.

Symbols#

Specifying symbols dynamically#

In many cases it is necessary to select symbols from a databases according to some logic. The otp.Symbols source makes this easy.

>>> symbols = otp.Symbols(db='NYSE_TAQ', date=otp.dt(2023, 3, 1), pattern='AAA%')
>>> symbols()
        Time SYMBOL_NAME
0 2023-03-01         AAA
1 2023-03-01        AAAU

The result of otp.Symbols can be used as bound/unbound symbols.

>>> data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2))
>>> data = data.first(3)
>>> data = data[['PRICE','SIZE']]
>>> dict(sorted(data(symbols=otp.Symbols(db='NYSE_TAQ', pattern='AAA%')).items()))
{'AAA':                            Time    PRICE  SIZE
 0 2023-03-01 09:30:00.441452800  24.56    10
 1 2023-03-01 09:30:00.441455872  24.56    10
 2 2023-03-01 11:41:22.735185152  24.53     1,
 'AAAU':                            Time  PRICE  SIZE
 0 2023-03-01 04:25:29.017414400  18.17    40
 1 2023-03-01 07:00:00.071737088  18.17   100
 2 2023-03-01 07:00:28.802050304  18.22   549}

Note that the interval of the main query is implicitly used in otp.Symbols(db='NYSE_TAQ', pattern='AAA%').

The symbols parameter can be any type of calculation that contains the SYMBOL_NAME field in the resulting ticks. Every tick provides a separate symbol name in the SYMBOL_NAME field while the other fields are passed as ‘symbol parameters’.

Symbol parameters#

Symbol parameters are any fields other than SYMBOL_NAME that are constant for an instrument. We illustrate this below. First, consider a query that finds 2 most traded symbols among symbols starting with the letter ‘A’. We’ll later use the output of this query to specify symbols and symbol params.

>>> trd = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2))
>>> trd = trd.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> count = otp.merge([trd], symbols=otp.Symbols(db='NYSE_TAQ', pattern='A%'), presort=True, identify_input_ts=True)
>>> most_traded = count.high('VOLUME', n=2)
>>> most_traded()
         Time    VOLUME  SYMBOL_NAME  TICK_TYPE
 0 2023-03-02  62351689         AAPL        TRD
 1 2023-03-02  56099688         AMZN        TRD

The output of most_traded provides symbols in the SYMBOL_NAME column while the other columns (VOLUME and TICK_TYPE) provide symbol parameters. The following code retrieves the value of VOLUME for use in tick-by-tick analytics.

>>> trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
>>> trd = trd.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> count = otp.merge([trd], symbols=otp.Symbols('NYSE_TAQ', pattern='A%'), identify_input_ts=True)
>>> most_traded = count.high('VOLUME', n=2)
>>>
>>> data = otp.DataSource('NYSE_TAQ', tick_type='TRD')
>>> data = data.first(3)
>>> data = data[['PRICE','SIZE']]
>>> data['VOLUME'] = data.Symbol.VOLUME
>>> dict(sorted(otp.run(data, symbols=most_traded, start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2)).items()))
{'AAPL':                            Time   PRICE  SIZE    VOLUME
 0 2023-03-01 04:00:00.005165425  147.66     4  62351689
 1 2023-03-01 04:00:00.005173198  147.66     1  62351689
 2 2023-03-01 04:00:00.014521032  147.66     4  62351689,
 'AMZN':                            Time  PRICE  SIZE    VOLUME
 0 2023-03-01 04:00:00.062175376  94.31    10  56099688
 1 2023-03-01 04:00:00.063719476  94.31     3  56099688
 2 2023-03-01 04:00:00.064597172  94.31     4  56099688}

Equivalently, symbol parameters can be accessed by wrapping a function around the query as illustrated below.

>>> trd = otp.DataSource('NYSE_TAQ', tick_type='TRD')
>>> trd = trd.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> count = otp.merge([trd], symbols=otp.Symbols('NYSE_TAQ', pattern='A%'), identify_input_ts=True)
>>> most_traded = count.high('VOLUME', n=2)
>>>
>>> def query(sym):
...     data = otp.DataSource('NYSE_TAQ', tick_type='TRD')
...     data = data.first(3)
...     data = data[['PRICE','SIZE']]
...     data['VOLUME'] = sym['VOLUME']
...     return data
>>>
>>> dict(sorted(otp.run(query, symbols=most_traded, start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2)).items()))
{'AAPL':                            Time   PRICE  SIZE    VOLUME
 0 2023-03-01 04:00:00.005165425  147.66     4  62351689
 1 2023-03-01 04:00:00.005173198  147.66     1  62351689
 2 2023-03-01 04:00:00.014521032  147.66     4  62351689,
 'AMZN':                            Time  PRICE  SIZE    VOLUME
 0 2023-03-01 04:00:00.062175376  94.31    10  56099688
 1 2023-03-01 04:00:00.063719476  94.31     3  56099688
 2 2023-03-01 04:00:00.064597172  94.31     4  56099688}

Time interval per symbol#

It is allowed to specify query interval per symbol using special fields _PARAM_START_TIME_NANOS and _PARAM_START_TIME_NANOS

>>> custom_symbols = otp.Ticks(SYMBOL_NAME=['AAPL', 'MSFT'],
...                            _PARAM_START_TIME_NANOS=[otp.dt(2023, 3, 1, 10, 30), otp.dt(2023, 3, 1, 11)],
...                            _PARAM_END_TIME_NANOS=[otp.dt(2023, 3, 1, 11), otp.dt(2023, 3, 1, 11, 30)])
>>> data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD')
>>> data = data.agg({'VOLUME': otp.agg.sum('SIZE')})
>>> data['SYMBOL'] = data.Symbol.name
>>> dict(sorted(otp.run(data, start=otp.dt(2023, 3, 1), end=otp.dt(2023, 3, 2), symbols=custom_symbols).items()))
 {'AAPL':                  Time   VOLUME SYMBOL
 0 2023-03-01 11:00:00  3539370   AAPL,
 'MSFT':                  Time   VOLUME SYMBOL
 0 2023-03-01 11:30:00  1679716   MSFT}

Note that per symbol intervals should be inside the query interval.

Associated symbols#

Associated symbols is a technique when unbound symbols are used to define bound symbols or symbols in related queries. It is expressed using the otp.eval and a symbol parameter. More details can be found in the API doc for the otp.eval