otp.join#

join(left, right, on, how='outer', rprefix='RIGHT', keep_fields_not_in_schema=False, output_type_index=None)[source]#

Joins two sources left and right based on on condition.

In case you willing to add prefix/suffix to all columns in one of the sources you should use Source.add_prefix() or Source.add_suffix()

Parameters
  • left (Source) – left source to join

  • right (Source) – right source to join

  • on (Operation or ‘all’ or ‘same_size’) –

    If ‘all’ joins every tick from left with every tick from right.

    If ‘same_size’ and size of sources are same, joins ticks from two sources directly, else raises exception.

  • how ('inner' or 'outer') –

    joining type

    Doesn’t matter for ‘all’ and ‘same_size’

  • rprefix (str) – The name of right data source. It will be added as prefix to overlapping columns arrived from right to result

  • keep_fields_not_in_schema (bool) –

    If True - join function will try to preserve any fields of original sources that are not in the source schema, propagating them to output. This means a possibility of runtime error if fields are duplicating.

    If False, will remove all fields that are not in schema.

  • output_type_index (int) – Specifies index of source in sources from which type and properties of output will be taken. Useful when joining sources that inherited from otp.Source. By default output object type will be otp.Source.

Returns

joined data

Return type

Source or same class as [left, right][output_type_index]

Examples

>>> d1 = otp.Ticks({'ID': [1, 2, 3], 'A': ['a', 'b', 'c']})
>>> d2 = otp.Ticks({'ID': [2, 3, 4], 'B': ['q', 'w', 'e']})
>>> otp.join(d1, d2, on=d1['ID'] == d2['ID'])()
                     Time  ID  A  RIGHT_ID  B
0 2003-12-01 00:00:00.000   1  a         0
1 2003-12-01 00:00:00.001   2  b         2  q
2 2003-12-01 00:00:00.002   3  c         3  w
>>> otp.join(d1, d2, on=d1['ID'] == d2['ID'], how='inner')()
                     Time  ID  A  RIGHT_ID  B
0 2003-12-01 00:00:00.001   2  b         2  q
1 2003-12-01 00:00:00.002   3  c         3  w

Adding preix to right source for all columns:

>>> d2 = d2.add_prefix('right_')
>>> otp.join(d1, d2, on=d1['ID'] == d2['right_ID'])()
                     Time  ID  A  right_ID  right_B
0 2003-12-01 00:00:00.000   1  a         0
1 2003-12-01 00:00:00.001   2  b         2        q
2 2003-12-01 00:00:00.002   3  c         3        w