otp.join#
- join(left, right, on, how='outer', rprefix='RIGHT', keep_fields_not_in_schema=False, output_type_index=None)[source]#
Joins two sources
left
andright
based onon
condition.In case you willing to add prefix/suffix to all columns in one of the sources you should use
Source.add_prefix()
orSource.add_suffix()
- Parameters
left (
Source
) – left source to joinright (
Source
) – right source to joinon (
Operation
or ‘all’ or ‘same_size’) –If ‘all’ joins every tick from
left
with every tick fromright
.If ‘same_size’ and size of sources are same, joins ticks from two sources directly, else raises exception.
how ('inner' or 'outer') –
Joining type. Inner join will only produce ticks that matched the
on
condition. Outer join will also produce the ticks from theleft
source that didn’t match the condition (so it’s basically a left-outer join).Doesn’t matter for
on='all'
andon='same_size'
.rprefix (str) – The name of
right
data source. It will be added as prefix to overlapping columns arrived from right to resultkeep_fields_not_in_schema (bool) –
If True - join function will try to preserve any fields of original sources that are not in the source schema, propagating them to output. This means a possibility of runtime error if fields are duplicating.
If False, will remove all fields that are not in schema.
output_type_index (int) – Specifies index of source in sources from which type and properties of output will be taken. Useful when joining sources that inherited from
Source
. By default output object type will beSource
.
- Returns
joined data
- Return type
Source
or same class as[left, right][output_type_index]
Examples
>>> d1 = otp.Ticks({'ID': [1, 2, 3], 'A': ['a', 'b', 'c']}) >>> d2 = otp.Ticks({'ID': [2, 3, 4], 'B': ['q', 'w', 'e']})
Outer join:
>>> otp.join(d1, d2, on=d1['ID'] == d2['ID'], how='outer')() Time ID A RIGHT_ID B 0 2003-12-01 00:00:00.000 1 a 0 1 2003-12-01 00:00:00.001 2 b 2 q 2 2003-12-01 00:00:00.002 3 c 3 w
Inner join:
>>> otp.join(d1, d2, on=d1['ID'] == d2['ID'], how='inner')() Time ID A RIGHT_ID B 0 2003-12-01 00:00:00.001 2 b 2 q 1 2003-12-01 00:00:00.002 3 c 3 w
Join all ticks:
>>> otp.join(d1, d2, on='all')() Time ID A RIGHT_ID B 0 2003-12-01 00:00:00.000 1 a 2 q 1 2003-12-01 00:00:00.000 1 a 3 w 2 2003-12-01 00:00:00.000 1 a 4 e 3 2003-12-01 00:00:00.001 2 b 2 q 4 2003-12-01 00:00:00.001 2 b 3 w 5 2003-12-01 00:00:00.001 2 b 4 e 6 2003-12-01 00:00:00.002 3 c 2 q 7 2003-12-01 00:00:00.002 3 c 3 w 8 2003-12-01 00:00:00.002 3 c 4 e
Join same size sources:
>>> otp.join(d1, d2, on='same_size')() Time ID A RIGHT_ID B 0 2003-12-01 00:00:00.000 1 a 2 q 1 2003-12-01 00:00:00.001 2 b 3 w 2 2003-12-01 00:00:00.002 3 c 4 e
Adding prefix to the right source for all columns:
>>> d2 = d2.add_prefix('right_') >>> otp.join(d1, d2, on=d1['ID'] == d2['right_ID'])() Time ID A right_ID right_B 0 2003-12-01 00:00:00.000 1 a 0 1 2003-12-01 00:00:00.001 2 b 2 q 2 2003-12-01 00:00:00.002 3 c 3 w