otp.agg.num_distinct#

num_distinct(running=False, all_fields=False, bucket_interval=0, bucket_units='seconds', bucket_time='end', bucket_end_condition=None, boundary_tick_bucket='new', group_by=None)#

Outputs number of distinct values for a specified set of key fields.

Parameters
  • keys (str or list of str or list of Column) – Specifies a list of tick attributes for which unique values are found. The ticks in the input time series must contain those attributes.

  • running (bool) –

    Aggregation will be calculated as sliding window. running and bucket_interval parameters determines when new buckets are created.

    • running = True

      aggregation will be calculated in a sliding window.

      • bucket_interval = N (N > 0)

        Window size will be N. Output tick will be generated when tick “enter” window (arrival event) and when “exit” window (exit event)

      • bucket_interval = 0

        Left boundary of window will be bound to start time. For each tick aggregation will be calculated in [start_time; tick_t].

    • running = False

      buckets partition the [query start time, query end time) interval into non-overlapping intervals of size bucket_interval (with the last interval possibly of a smaller size). If bucket_interval is set to 0 a single bucket for the entire interval is created.

    Default: False - create totally independent buckets. Number of buckets = (end - start) / bucket_interval’)

  • all_fields (bool) –

    • all_fields = True

      output ticks include all fields from the input ticks

      • running = True

      an output tick is created only when a tick enters the sliding window

      • running = False

      fields of first tick in bucket will be used

    • all_fields = False and running = True

      output ticks are created when a tick enters or leaves the sliding window.

  • bucket_interval (int) – Determines the length of each bucket (units depends on bucket_units).

  • bucket_units (Literal['seconds', 'ticks', 'days', 'months', 'flexible']) –

    Set bucket interval units.

    If set to flexible bucket_end_criteria must be set.

  • bucket_time (Literal['start', 'end']) –

    Control output timestamp.

    • start

      the timestamp assigned to the bucket is the start time of the bucket.

    • end

      the timestamp assigned to the bucket is the end time of the bucket.

  • bucket_end_condition (condition) – An expression that is evaluated on every tick. If it evaluates to “True”, then a new bucket is created. This parameter is only used if bucket_units is set to “flexible”

  • boundary_tick_bucket (Literal['new', 'previous']) –

    Controls boundary tick ownership.

    • previous

      A tick on which bucket_end_condition evaluates to “true” belongs to the bucket being closed.

    • new

      tick belongs to the new bucket.

    This parameter is only used if bucket_units is set to “flexible”

  • group_by (list, str or expression) – When specified, each bucket is broken further into additional sub-buckets based on specified field values. If Operation is used then GROUP_{i} column is added. Where i is index in group_by list. For example, if Operation is the only element in group_by list then GROUP_0 field will be added.

Examples

>>> data = otp.Ticks(dict(X=[1, 3, 2, 1, 3]))
>>> data = data.agg({'X': otp.agg.num_distinct('X')})
>>> otp.run(data)
        Time  X
0 2003-12-04  3