otp.ReadParquet#
- class ReadParquet(parquet_file_path=None, where=None, time_assignment='end', discard_fields=None, fields=None, symbol_name_field=None, symbol=utils.adaptive, db=utils.adaptive_to_default, tick_type=utils.adaptive, start=utils.adaptive, end=utils.adaptive, **kwargs)#
Bases:
onetick.py.core.source.Source
Read ticks from Parquet file
- Parameters
parquet_file_path (str) – Specifies the path or URL of the Parquet file to read.
where (str, None) – Specifies a criterion for selecting the ticks to propagate.
time_assignment (str) – Timestamps of the ticks created by ReadParquet are set to the start/end of the query or to the given tick field depending on the time_assignment parameter. Possible values are start and end (for _START_TIME and _END_TIME) or a field name. Default: end
discard_fields (list, str, None) – A list of fields (list or comma-separated string) to be discarded from the output ticks.
fields (list, str, None) – A list of fields (list or comma-separated string) to be picked from the output ticks. The opposite to discard_fields.
symbol_name_field (str, None) – Field that is expected to contain the symbol name. When this parameter is set and one or more symbols containing time series (i.e. [dbname]::[time series name] and not just [dbname]::) are bound to the query or to this EP, only rows belonging to those symbols will be propagated.
symbol (str, list of str,
Source
,query
,eval query
) – Symbol(s) from which data should be taken.tick_type (str) – Tick type. Default: ANY.
start (
otp.datetime
) – Start time for tick generation. By default the start time of the query will be used.end (
otp.datetime
) – End time for tick generation. By default the end time of the query will be used.kwargs – Dictionary of columns names with their types. You should set schema manually, if you want to use fields in onetick-py query description before its execution.
Examples
Simple Parquet file read:
>>> data = otp.ReadParquet("/path/to/parquet/file") >>> otp.run(data)
Read Parquet file and filter fields:
>>> data = otp.ReadParquet("/path/to/parquet/file", fields=["some_field", "another_field"]) >>> otp.run(data)
Read Parquet file and filter rows:
>>> data = otp.ReadParquet("/path/to/parquet/file", where="PRICE > 20") >>> otp.run(data)
See also
READ_FROM_PARQUET OneTick event processor