Data structures and functions#
Tick sources#
Every piece of onetick.py
code starts with specifying a data source.
data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', symbol='AAPL', start=otp.dt(2003, 12, 1), end=otp.dt(2003, 12, 2))
onetick.py.Source
is the base abstract class. We provide several predefined inherited
classes to cover various use cases: e.g., onetick.py.DataSource
for retrieving data from OneTick databases,
onetick.py.Ticks
for creating ticks on the fly, and onetick.py.Empty
for creating a data source
with no ticks.
Every source should be associated with a data schema (it can be deduced or
specified manually; see the schema concept).
The schema is available through the onetick.py.Source.schema
property and behaves
like a Python dict.
>>> data = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', symbols='AAPL', start=otp.dt(2022, 3, 1), end=otp.dt(2022, 3, 2))
>>> data.schema
{'COND': string[4], 'CORR': <class 'onetick.py.types._int'>, 'DELETED_TIME': <class 'onetick.py.types.msectime'>, 'EXCHANGE': string[1], 'OMDSEQ': <class 'onetick.py.types.uint'>, 'PARTICIPANT_TIME': <class 'onetick.py.types.nsectime'>, 'PRICE': <class 'float'>, 'SEQ_NUM': <class 'int'>, 'SIZE': <class 'int'>, 'SOURCE': string[1], 'STOP_STOCK': string[1], 'TICKER': string[16], 'TICK_STATUS': <class 'onetick.py.types._int'>, 'TRADE_ID': string[20], 'TRF': string[1], 'TRF_TIME': <class 'onetick.py.types.nsectime'>, 'TTE': string[1]}
Next we discuss the Column
and Operation
classes that make it easy to work with the fields in data sources
and to create new ones. We then talk about the methods and functions that operate on
individual fields and on entire data sources.
Column and Operation#
A column (onetick.py.Column
) represents a data series for a single field of the data source
(The relationship between a column and a data source is similar to the one between pandas’s Series and
DataFrame). Columns are accessed via the onetick.py.Source.__getitem__()
method:
i.e., you can refer to a field using brackets so the PRICE
field is accessed as data['PRICE']
.
Note that only the fields that are specified in the schema can be accessed.
An operation is a generalization of a column that represents columns as
well as results of operations involving one or more columns.
Formally, onetick.py.Column
is a subclass of onetick.py.Operation
. Any operation between instances of
onetick.py.Column
return an instance of onetick.py.Operation
, e.g.:
<an Operation instance> = <a Column instance> * <a Column instance>
Similarly, any operation between the instances of onetick.py.Operation
or onetick.py.Column
return
an instance of onetick.py.Operation
:
<an Operation instance> =
<an Operation or Column instance> / <an Operation or Column instance>
In most cases, a user does not need to make a distinction between a column and an operation. A new column can be created based on an existing column or an operation using the assignment operator:
<a Source instance>[<column name>] = <an Operation instance>
for example
data['VOLUME'] = data['PRICE'] * data['SIZE']
data['FLAG'] = (data['PRICE'] > 3.5) & (data['SIZE'] == 100)
Some functions operate on columns only but it’s clear from the context that the use of operations is not applicable
there (e.g., the apply
method that casts a column to a different type):
data = otp.Ticks({'A': ['1', '2', '3']})
data['B'] = data['A'].apply(int) + 10
print(otp.run(data))
Time A B
0 2003-12-01 00:00:00.000 1 11
1 2003-12-01 00:00:00.001 2 12
2 2003-12-01 00:00:00.002 3 13
Field names#
Onetick allows using field names that:
have length between 1 and 127 characters
contain upper- and lowercase Latin characters
contain symbols “_” and “.”
Any other character is not allowed in a field name.
In addition to that, Onetick does not allow lowercase Latin characters in field names stored in a database. However, it allows using lowercase characters in field names in analytics:
data = otp.Ticks({'LowercaseField': [1, 2, 3]})
print(otp.run(data))
Time LowercaseField
0 2003-12-01 00:00:00.000 1
1 2003-12-01 00:00:00.001 2
2 2003-12-01 00:00:00.002 3
If you try to save lowercase field name to a database, Onetick will silently convert it to upper case:
test_db = otp.db.DB('TEST_DB')
test_db.add(otp.Tick(FieldName=1), symbol='TEST', tick_type='TEST')
session.use(test_db)
otp.run(otp.DataSource(db='TEST_DB', tick_type='TEST'), symbols='TEST')
Time FIELDNAME
0 2003-12-01 1
Functions and methods#
There are various functions that can be applied to operations and sources.
Methods/functions on Operations#
The column / operation based functions and methods return an instance of onetick.py.Operation
otp.math.min(data['BID_SIZE'], data['ASK_SIZE'])
that can then be used for further operations (no pun intended):
data['TAKEOUT_SUCCESS'] = \
data['QTY_FILLED'] >= otp.math.min(data['BID_SIZE'], data['ASK_SIZE'])
The onetick.py.Operation
class also has methods, some of which are collected into accessors.
An accessor is a special property that collects methods for a certain data type. For example, the
onetick.py.Operation.str
accessor collects the methods for working with strings:
data['IS_ORDER_EXECUTED'] = data['STATE'].str.find('F')
Methods/functions on Sources#
onetick.py.Source
has methods that operate on entire ticks (rather than on particular columns) like
aggregations onetick.py.Source.agg
or onetick.py.Source.sort()
. Usually
the result of such methods is a new instance(s) of onetick.py.Source
but for some methods it is an
instance of onetick.py.Operation
(e.g., onetick.py.Source.apply()
).
Retrieving ticks that satisfy a given condition (aka filtering) is done as follows:
passed, not_passed = data[(data['FLAG'] > 0) & (data['STATE'] == 'F')]
Note that two new sources are returned: first is for the ticks that satisfy the condition and the second for the ones that do not.
A typical filtering case looks like this:
data, _ = data[(data['FLAG'] > 0) & (data['STATE'] == 'F')]
There are also functions that combine multiple sources such
as onetick.py.merge()
or onetick.py.join_by_time()
trades = otp.DataSource(db='NYSE_TAQ', tick_type='TRD', symbol='APPL')
quotes = otp.DataSource(db='NYSE_TAQ', tick_type='QTE', symbol='AAPL')
data = otp.join_by_time([trades, quotes])