otp.run#
- run(query, *, symbols=None, start=utils.adaptive, end=utils.adaptive, date=None, start_time_expression=None, end_time_expression=None, timezone=utils.default, context=utils.default, username=None, alternative_username=None, password=None, batch_size=utils.default, running=False, query_properties=None, concurrency=utils.default, apply_times_daily=None, symbol_date=None, query_params=None, time_as_nsec=True, treat_byte_arrays_as_strings=True, output_matrix_per_field=False, output_structure=None, return_utc_times=None, connection=None, callback=None, svg_path=None, use_connection_pool=False, node_name=None, require_dict=False, max_expected_ticks_per_symbol=None, log_symbol=utils.default, encoding=None, manual_dataframe_callback=False)#
Executes a query and returns its result.
- Parameters
query (
onetick.py.Source
, otq.Ep, otq.GraphQuery, otq.ChainQuery, str, otq.Chainlet, Callable, otq.SqlQuery,onetick.py.SqlQuery
) –Query to execute can be source, path of the query on a disk or onetick.query graph or event processor. For running OTQ files, it represents the path (including filename) to the OTQ file to run a single query within the file. If more than one query is present, then the query to be run must be specified (that is,
'path_to_file/otq_file.otq::query_to_run'
).query
can also be a function that has a symbol object as the first parameter. This object can be used to get symbol name and symbol parameters. Function must return aSource
.symbols (str, list of str, list of otq.Symbol,
onetick.py.Source
, pd.DataFrame, optional) – Symbol(s) to run the query for passed as a string, a list of strings, a pd.DataFrame with theSYMBOL_NAME
column, or as a “symbols” query which results include theSYMBOL_NAME
column. The start/end times for the symbols query will taken from the params below. See symbols for more details.start (datetime.datetime,
onetick.py.datetime
,pyomd.timeval_t
, optional) – The start time of the query. Can be timezone-naive or timezone-aware. See alsotimezone
argument. onetick.py usesdefault_start_time
as default value, if you don’t want to specify start time, e.g. to use saved time of the query, then you should specify None value.end (datetime.datetime,
onetick.py.datetime
,pyomd.timeval_t
, optional) – The end time of the query. Can be timezone-naive or timezone-aware. See alsotimezone
argument. onetick.py usesdefault_end_time
as default value, if you don’t want to specify end time, e.g. to use saved time of the query, then you should specify None value.date (datetime.date,
onetick.py.date
, optional) – The date to run the query for. Can be set instead ofstart
andend
parameters. If set then the interval to run the query will be from 0:00 to 24:00 of the specified date.start_time_expression (str, optional) – Start time onetick expression of the query. If specified, it will take precedence over
start
. Supported only if query is Source, Graph or Event Processor. Not supported for WebAPI mode.end_time_expression (str, optional) – End time onetick expression of the query. If specified, it will take precedence over
end
. Supported only if query is Source, Graph or Event Processor. Not supported for WebAPI mode.timezone (str, optional) – The timezone of output timestamps. Also, when start and/or end arguments are timezone-naive, it will define their timezone. If parameter is omitted timestamps of ticks will be formatted with the default
tz
.context (str, optional) – Allows specification of different instances of OneTick tick_servers to connect to. If not set then default
context
is used.username (Optional[str]) – The username to make the connection. By default the user which executed the process is used.
alternative_username (str) – The username used for authentication. Needs to be set only when the tick server is configured to use password-based authentication. By default,
default_auth_username
is used. Not supported for WebAPI mode.password (str, optional) – The password used for authentication. Needs to be set only when the tick server is configured to use password-based authentication. Note: not supported and ignored on older OneTick versions. By default,
default_password
is used.batch_size (int) – number of symbols to run in one batch. By default, the value from
default_batch_size
is used. Not supported for WebAPI mode.running (bool, optional) – Indicates whether a query is CEP or not. Default is False.
query_properties (
pyomd.QueryProperties
or dict, optional) – Query properties, such as ONE_TO_MANY_POLICY, ALLOW_GRAPH_REUSE, etcconcurrency (int, optional) – The maximum number of CPU cores to use to process the query. By default, the value from
default_concurrency
is used.apply_times_daily (bool) –
Runs the query for every day in the
start
-end
time range, using the time components ofstart
andend
datetimes.Note that those daily intervals are executed separately, so you don’t have access to the data from previous or next days (see example in the next section).
symbol_date (Optional[Union[datetime.datetime, int, str]]) – The symbol date used to look up symbology mapping information in the reference database, expressed as datetime object or integer of YYYYMMDD format
query_params (dict) – Parameters of the query.
time_as_nsec (bool) – Outputs timestamps up to nanoseconds granularity (defaults to False: by default we output timestamps in microseconds granularity)
treat_byte_arrays_as_strings (bool) – Outputs byte arrays as strings (defaults to True) Not supported for WebAPI mode.
output_matrix_per_field (bool) – Changes output format to list of matrices per field. Not supported for WebAPI mode.
output_structure (otp.Source.OutputStructure, optional) –
- Structure (type) of the result. Supported values are:
df (default) - the result is returned as pandas.DataFrame object or dictionary of symbol names and pandas.DataFrame objects in case of using multiple symbols or first stage query.
map - the result is returned as SymbolNumpyResultMap.
list - the result is returned as list.
polars - the result is returned as polars.DataFrame object or dictionary of symbol names and dataframe objects (Only supported in WebAPI mode).
return_utc_times (bool) – If True Return times in UTC timezone and in local timezone otherwise Not supported for WebAPI mode.
connection (
pyomd.Connection
) – The connection to be used for discovering nested .otq files Not supported for WebAPI mode.callback (
onetick.py.CallbackBase
) – Class with callback methods. If set, the output of the query should be controlled with callbacks and this function returns nothing.svg_path (str, optional) – Not supported for WebAPI mode.
use_connection_pool (bool) – Default is False. If set to True, the connection pool is used. Not supported for WebAPI mode.
node_name (str, List[str], optional) – Name of the output node to select result from. If query graph has several output nodes, you can specify the name of the node to choose result from. If node_name was specified, query should be presented by path on the disk and output_structure should be df
require_dict (bool) – If set to True, result will be forced to be a dictionary even if it’s returned for a single symbol
max_expected_ticks_per_symbol (int) – Expected maximum number of ticks per symbol (used for performance optimizations). By default,
max_expected_ticks_per_symbol
is used. Not supported for WebAPI mode.log_symbol (bool) – Log currently executed symbol. Note that this only works with unbound symbols. Also in this case
otp.run
is executed incallback
mode and no value is returned from the function, so it should be used only for debugging purposes. This logging will not work if some other value specified in parametercallback
. By default,otp.config.log_symbol
is used.encoding (str, optional) – The encoding of string fields.
manual_dataframe_callback (bool) – Create dataframe manually with
callback
mode. Only works ifoutput_structure='df'
is specified and parametercallback
is not. May improve performance in some cases.
- Returns
result of the query
- Return type
result, list, dict, pandas.DataFrame, None
Examples
Running
onetick.py.Source
and setting start and end times:>>> data = otp.Tick(A=1) >>> otp.run(data, start=otp.dt(2003, 12, 2), end=otp.dt(2003, 12, 4)) Time A 0 2003-12-02 1
Setting query interval with
date
parameter:>>> data = otp.Tick(A=1) >>> data['START'] = data['_START_TIME'] >>> data['END'] = data['_END_TIME'] >>> otp.run(data, date=otp.dt(2003, 12, 1)) Time A START END 0 2003-12-01 1 2003-12-01 2003-12-02
Running otq.Ep and passing query parameters:
>>> ep = otq.TickGenerator(bucket_interval=0, fields='long A = $X').tick_type('TT') >>> otp.run(ep, symbols='LOCAL::', query_params={'X': 1}) Time A 0 2003-12-04 1
Running in callback mode:
>>> class Callback(otp.CallbackBase): ... def __init__(self): ... self.result = None ... def process_tick(self, tick, time): ... self.result = tick >>> data = otp.Tick(A=1) >>> callback = Callback() >>> otp.run(data, callback=callback) >>> callback.result {'A': 1}
Running with
apply_times_daily
. Note that daily intervals are processed separately so, for example, we can’t access column COUNT from previous day.>>> trd = otp.DataSource('NYSE_TAQ', symbols='AAPL', tick_type='TRD') >>> trd = trd.agg({'COUNT': otp.agg.count()}, ... bucket_interval=12 * 3600, bucket_time='start') >>> trd['PREV_COUNT'] = trd['COUNT'][-1] >>> otp.run(trd, apply_times_daily=True, ... start=otp.dt(2023, 4, 3), end=otp.dt(2023, 4, 5), timezone='EST5EDT') Time COUNT PREV_COUNT 0 2023-04-03 00:00:00 328447 0 1 2023-04-03 12:00:00 240244 328447 2 2023-04-04 00:00:00 263293 0 3 2023-04-04 12:00:00 193018 263293
Using a function as a
query
, accessing symbol name and parameters:>>> def query(symbol): ... t = otp.Tick(X='x') ... t['SYMBOL_NAME'] = symbol.name ... t['SYMBOL_PARAM'] = symbol.PARAM ... return t >>> symbols = otp.Ticks({'SYMBOL_NAME': ['A', 'B'], 'PARAM': [1, 2]}) >>> result = otp.run(query, symbols=symbols) >>> result['A'] Time X SYMBOL_NAME SYMBOL_PARAM 0 2003-12-01 x A 1 >>> result['B'] Time X SYMBOL_NAME SYMBOL_PARAM 0 2003-12-01 x B 2
Debugging unbound symbols with
log_symbol
parameter:>>> data = otp.Tick(X=1) >>> symbols = otp.Ticks({'SYMBOL_NAME': ['A', 'B'], 'PARAM': [1, 2]}) >>> otp.run(query, symbols=symbols, log_symbol=True) Running query <onetick.py.sources.ticks.Tick object at ...> Processing symbol A Processing symbol B
By default, some non-standard characters in data strings could be processed incorrectly:
>>> data = ['AA測試AA'] >>> source = otp.Ticks({'A': data}) >>> otp.run(source) Time A 0 2003-12-01 AA測試AA
To fix this you can pass encoding parameter to otp.run:
data = ['AA測試AA'] source = otp.Ticks({'A': data}) df = otp.run(source, encoding="utf-8") print(df)
Time A 0 2003-12-01 AA測試AA