pod5_types

Container class for a pod5 Read object

class BaseRead(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0)[source]

Bases: object

Base class for POD5 Read Data

Parameters:
  • read_id (UUID) – The read_id of this read as UUID.

  • pore (Pore) – Pore data.

  • calibration (Calibration) – Calibration data.

  • read_number (int) – The read number on channel. This is increasing but typically not necessarily consecutive.

  • start_sample (int) – The number samples recorded on this channel before the read started.

  • median_before (float) – The level of current in the well before this read.

  • end_reason (EndReason) – EndReason data.

  • run_info (RunInfo) – RunInfo data.

  • num_minknow_events (int) – Number of minknow events that the read contains

  • tracked_scaling (ShiftScalePair) – Shift and Scale for tracked read scaling values (based on previous reads shift)

  • predicted_scaling (ShiftScalePair) – Shift and Scale for predicted read scaling values (based on this read’s raw signal)

  • num_reads_since_mux_change (int) – Number of selected reads since the last mux change on this reads channel

  • time_since_mux_change (float) – Time in seconds since the last mux change on this reads channel

__init__(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0) None
calibration: Calibration

Calibration metadata

end_reason: EndReason

EndReason data.

median_before: float

The level of current in the well before this read.

num_minknow_events: int = 0

Number of minknow events that the read contains

num_reads_since_mux_change: int = 0

Number of selected reads since the last mux change on this reads channel

pore: Pore

Pore metadata

predicted_scaling: ShiftScalePair

Shift and Scale for predicted read scaling values (based on this read’s raw signal)

read_id: UUID

The read_id of this read as UUID

read_number: int

The read number on channel. This is increasing but typically not necessarily consecutive.

run_info: RunInfo

RunInfo data.

start_sample: int

The number samples recorded on this channel before the read started.

time_since_mux_change: float = 0.0

Time in seconds since the last mux change on this reads channel

tracked_scaling: ShiftScalePair

Shift and Scale for tracked read scaling values (based on previous reads shift)

class Calibration(offset: float, scale: float)[source]

Bases: object

Parameters to convert the signal data to picoamps.

Parameters:
  • offset (float) – Calibration offset used to convert raw ADC data into pA readings.

  • scale (float) – Calibration scale factor used to convert raw ADC data into pA readings.

__init__(offset: float, scale: float) None
classmethod from_range(offset: float, adc_range: float, digitisation: float) Calibration[source]

Create a Calibration instance from offset, adc_range and digitisation

offset: float

Calibration offset used to convert raw ADC data into pA readings.

scale: float

Calibration scale factor used to convert raw ADC data into pA readings.

class CompressedRead(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal_chunks: ~typing.List[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.uint8]]] = <factory>, signal_chunk_lengths: ~typing.List[int] = <factory>)[source]

Bases: BaseRead

POD5 Read Data with a compressed signal.

Parameters:
  • read_id (UUID) – The read_id of this read as UUID.

  • pore (Pore) – Pore data.

  • calibration (Calibration) – Calibration data.

  • read_number (int) – The read number on channel. This is increasing but typically not necessarily consecutive.

  • start_sample (int) – The number samples recorded on this channel before the read started.

  • median_before (float) – The level of current in the well before this read.

  • end_reason (EndReason) – EndReason data.

  • run_info (RunInfo) – RunInfo data.

  • signal_chunks (List[numpy.array[uint8]]) – Compressed signal data in chunks.

  • signal_chunk_lengths (List[int]) – Chunk lengths (number of samples) of signal data before compression.

__init__(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal_chunks: ~typing.List[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.uint8]]] = <factory>, signal_chunk_lengths: ~typing.List[int] = <factory>) None
property decompressed_signal: ndarray[Any, dtype[int16]]

Decompress and return the chunked signal data as a contiguous numpy array.

Returns:

decompressed_signal – Decompressed signal data

Return type:

numpy.array[int16]

property sample_count: int

Return the total number of samples in the uncompressed signal.

signal_chunk_lengths: List[int]

Chunk lengths (number of samples) of signal data before compression.

signal_chunks: List[ndarray[Any, dtype[uint8]]]

Compressed signal data in chunks.

class EndReason(reason: EndReasonEnum, forced: bool)[source]

Bases: object

Data on why the Read ended.

Parameters:
  • reason (EndReasonEnum) – The end reason enumeration.

  • forced (bool) – True if it is a ‘forced’ read break.

__init__(reason: EndReasonEnum, forced: bool) None
forced: bool

True if it is a ‘forced’ read break (e.g. mux_change, unblock), False otherwise.

classmethod from_reason_with_default_forced(reason: EndReasonEnum) EndReason[source]

Return a new EndReason instance with the ‘forced’ flag set to the expected default for the given reason

property name: str

Return the reason name as a lower string

reason: EndReasonEnum

The end reason enumeration

class EndReasonEnum(value)[source]

Bases: Enum

EndReason Enumeration

DATA_SERVICE_UNBLOCK_MUX_CHANGE = 3
MUX_CHANGE = 1
SIGNAL_NEGATIVE = 5
SIGNAL_POSITIVE = 4
UNBLOCK_MUX_CHANGE = 2
UNKNOWN = 0
class Pore(channel: int, well: int, pore_type: str)[source]

Bases: object

Data for the pore that the Read was acquired on

Parameters:
  • channel (int) – 1-indexed channel.

  • well (int) – 1-indexed well.

  • pore_type (PoreType) – The pore type present in the well.

__init__(channel: int, well: int, pore_type: str) None
channel: int

1-indexed channel.

pore_type: str

Name of the pore type present in the well.

well: int

1-indexed well.

class Read(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int16]] = <factory>)[source]

Bases: BaseRead

POD5 Read Data with an uncompressed signal

Parameters:
  • read_id (UUID) – The read_id of this read as UUID.

  • pore (Pore) – Pore data.

  • calibration (Calibration) – Calibration data.

  • read_number (int) – The read number on channel. This is increasing but typically not necessarily consecutive.

  • start_sample (int) – The number samples recorded on this channel before the read started.

  • median_before (float) – The level of current in the well before this read.

  • end_reason (EndReason) – EndReason data.

  • run_info (RunInfo) – RunInfo data.

  • signal (numpy.array[int16]) – Uncompressed signal data.

__init__(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int16]] = <factory>) None
property sample_count: int

Return the total number of samples in the uncompressed signal.

signal: ndarray[Any, dtype[int16]]

Uncompressed signal data.

class RunInfo(acquisition_id: str, acquisition_start_time: datetime, adc_max: int, adc_min: int, context_tags: Dict[str, str], experiment_name: str, flow_cell_id: str, flow_cell_product_code: str, protocol_name: str, protocol_run_id: str, protocol_start_time: datetime, sample_id: str, sample_rate: int, sequencing_kit: str, sequencer_position: str, sequencer_position_type: str, software: str, system_name: str, system_type: str, tracking_id: Dict[str, str])[source]

Bases: object

Higher-level information about the Reads that correspond to a part of an experiment, protocol or acquisition

Parameters:
  • acquisition_id (str) – A unique identifier for the acquisition.

  • acquisition_start_time (datetime.datetime) – This is the clock time for sample 0

  • adc_max (int) – The maximum ADC value that might be encountered.

  • adc_min (int) – The minimum ADC value that might be encountered.

  • context_tags (Dict[str, str]) – The context tags for the run. (For compatibility with fast5).

  • experiment_name (str) – The user-supplied name for the experiment being run.

  • flow_cell_id (str) – Uniquely identifies the flow cell the data was captured on.

  • flow_cell_product_code (str) – Identifies the type of flow cell the data was captured on.

  • protocol_name (str) – The name of the protocol that was run.

  • protocol_run_id (str) – The unique identifier for the protocol run that produced this data.

  • protocol_start_time (datetime.datetime) – When the protocol that the acquisition was part of started.

  • sample_id (str) – A user-supplied name for the sample being analysed.

  • sample_rate (int) – The number of samples acquired each second on each channel.

  • sequencing_kit (str) – The type of sequencing kit used to prepare the sample.

  • sequencer_position (str) – The sequencer position the data was collected on.

  • sequencer_position_type (str) – The type of sequencing hardware the data was collected on.

  • software (str) – A description of the software that acquired the data.

  • system_name (str) – The name of the system the data was collected on.

  • system_type (str) – The type of system the data was collected on.

  • tracking_id (Dict[str, str]) – The tracking id for the run. (For compatibility with fast5).

__init__(acquisition_id: str, acquisition_start_time: datetime, adc_max: int, adc_min: int, context_tags: Dict[str, str], experiment_name: str, flow_cell_id: str, flow_cell_product_code: str, protocol_name: str, protocol_run_id: str, protocol_start_time: datetime, sample_id: str, sample_rate: int, sequencing_kit: str, sequencer_position: str, sequencer_position_type: str, software: str, system_name: str, system_type: str, tracking_id: Dict[str, str]) None
acquisition_id: str

A unique identifier for the acquisition - note that readers should not depend on this uniquely determining the other fields in the run_info, or being unique among the dictionary keys.

acquisition_start_time: datetime

This is the clock time for sample 0

adc_max: int

The maximum ADC value that might be encountered. This is a hardware constraint.

adc_min: int

The minimum ADC value that might be encountered. This is a hardware constraint.

context_tags: Dict[str, str]

The context tags for the run. (For compatibility with fast5).

experiment_name: str

The user-supplied name for the experiment being run.

flow_cell_id: str

Uniquely identifies the flow cell the data was captured on. This is written on the flow cell case.

flow_cell_product_code: str

Identifies the type of flow cell the data was captured on.

protocol_name: str

The name of the protocol that was run.

protocol_run_id: str

The unique identifier for the protocol run that produced this data.

protocol_start_time: datetime

When the protocol that the acquisition was part of started.

sample_id: str

A user-supplied name for the sample being analysed.

sample_rate: int

The number of samples acquired each second on each channel.

sequencer_position: str

The sequencer position the data was collected on. For removable positions, like MinION Mk1Bs, this is unique (e.g. ‘MN12345’), while for integrated positions it is not (e.g. ‘X1’ on a GridION).

sequencer_position_type: str

The type of sequencing hardware the data was collected on. For example: ‘MinION Mk1B’ or ‘GridION’ or ‘PromethION’.

sequencing_kit: str

The type of sequencing kit used to prepare the sample.

software: str

A description of the software that acquired the data. For example: ‘MinKNOW 21.05.12 (Bream 5.1.6, Configurations 16.2.1, Core 5.1.9, Guppy 4.2.3)’.

system_name: str

The name of the system the data was collected on. This might be a sequencer serial (eg: ‘GXB1234’) or a host name (e.g. ‘Lab PC’).

system_type: str

The type of system the data was collected on. For example, ‘GridION Mk1’ or ‘PromethION P48’. If the system is not a Nanopore sequencer with built-in compute, this will be a description of the operating system (e.g. ‘Ubuntu 20.04’).

tracking_id: Dict[str, str]

The tracking id for the run. (For compatibility with fast5).

class ShiftScalePair(shift: float = nan, scale: float = nan)[source]

Bases: object

A pair of floating point shift and scale values.

__init__(shift: float = nan, scale: float = nan) None
scale: float = nan
shift: float = nan