Data structures and functions#

Tick sources#

Every piece of code starts with specifying a data source.

data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', symbol='AAPL', start=otp.dt(2003, 12, 1), end=otp.dt(2003, 12, 2)) is the base abstract class. We provide several predefined inherited classes to cover various use cases: e.g., for retrieving data from OneTick databases, for creating ticks on the fly, and for creating a data source with no ticks.

Every source should be associated with a data schema (it can be deduced or specified manually; see the schema concept). The schema is available through the property and behaves like a Python dict.

>>> data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', symbols='AAPL', start=otp.dt(2022, 3, 1), end=otp.dt(2022, 3, 2))
>>> data.schema
 {'COND': string[4], 'CORR': <class 'int'>, 'DELETED_TIME': <class ''>, 'EXCHANGE': string[1], 'OMDSEQ': <class 'int'>, 'PARTICIPANT_TIME': <class ''>, 'PRICE': <class 'float'>, 'SEQ_NUM': <class 'int'>, 'SIZE': <class 'int'>, 'SOURCE': string[1], 'STOP_STOCK': string[1], 'TICKER': string[16], 'TICK_STATUS': <class 'int'>, 'TRADE_ID': string[20], 'TRF': string[1], 'TRF_TIME': <class ''>, 'TTE': string[1]}

Next we discuss the Column and Operation classes that make it easy to work with the fields in data sources and to create new ones. We then talk about the methods and functions that operate on individual fields and on entire data sources.

Column and Operation#

A column ( represents a data series for a single field of the data source (The relationship between a column and a data source is similar to the one between pandas’s Series and DataFrame). Columns are accessed via the method: i.e., you can refer to a field using brackets so the PRICE field is accessed as data['PRICE']. Note that only the fields that are specified in the schema can be accessed.

An operation is a generalization of a column that represents columns as well as results of operations involving one or more columns. Formally, is a subclass of Any operation between instances of return an instance of, e.g.:

<an Operation instance> = <a Column instance> * <a Column instance>

Similarly, any operation between the instances of or return an instance of

<an Operation instance> =
        <an Operation or Column instance> / <an Operation or Column instance>

In most cases, a user does not need to make a distinction between a column and an operation. A new column can be created based on an existing column or an operation using the assignment operator:

<a Source instance>[<column name>] = <an Operation instance>

for example

data['VOLUME'] = data['PRICE'] * data['SIZE']
data['FLAG'] = (data['PRICE'] > 3.5) & (data['SIZE'] == 100)

Some functions operate on columns only but it’s clear from the context that the use of operations is not applicable there (e.g., the apply method that casts a column to a different type):

data = otp.Ticks({'A': ['1', '2', '3']})
data['B'] = data['A'].apply(int) + 10
                     Time  A   B
0 2003-12-01 00:00:00.000  1  11
1 2003-12-01 00:00:00.001  2  12
2 2003-12-01 00:00:00.002  3  13

Functions and methods#

There are various functions that can be applied to operations and sources.

Methods/functions on Operations#

The column / operation based functions and methods return an instance of

otp.math.min(data['BID_SIZE'], data['ASK_SIZE'])

that can then be used for further operations (no pun intended):

    data['QTY_FILLED'] >= otp.math.min(data['BID_SIZE'], data['ASK_SIZE'])

The class also has methods, some of which are collected into accessors. An accessor is a special property that collects methods for a certain data type. For example, the accessor collects the methods for working with strings:

data['IS_ORDER_EXECUTED'] = data['STATE'].str.find('F')

Methods/functions on Sources# has methods that operate on entire ticks (rather than on particular columns) like aggregations or Usually the result of such methods is a new instance(s) of but for some methods it is an instance of (e.g.,

Retrieving ticks that satisfy a given condition (aka filtering) is done as follows:

passed, not_passed = data[(data['FLAG'] > 0) & (data['STATE'] == 'F')]

Note that two new sources are returned: first is for the ticks that satisfy the condition and the second for the ones that do not.

A typical filtering case looks like this:

data, _ = data[(data['FLAG'] > 0) & (data['STATE'] == 'F')]

There are also functions that combine multiple sources such as or

trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', symbol='APPL')
quotes = otp.DataSource(db='NYSE_TAQ', tick_type='QTE', symbol='AAPL')

data = otp.join_by_time([trades, quotes])