pod5_types

Container class for a pod5 Read object

class BaseRead(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0)[source]

Bases: object

Base class for POD5 Read Data

Parameters:

read_id (UUID) – The read_id of this read as UUID.
pore (Pore) – Pore data.
calibration (Calibration) – Calibration data.
read_number (int) – The read number on channel. This is increasing but typically not necessarily consecutive.
start_sample (int) – The number samples recorded on this channel before the read started.
median_before (float) – The level of current in the well before this read.
end_reason (EndReason) – EndReason data.
run_info (RunInfo) – RunInfo data.
num_minknow_events (int) – Number of minknow events that the read contains
tracked_scaling (ShiftScalePair) – Shift and Scale for tracked read scaling values (based on previous reads shift)
predicted_scaling (ShiftScalePair) – Shift and Scale for predicted read scaling values (based on this read’s raw signal)
num_reads_since_mux_change (int) – Number of selected reads since the last mux change on this reads channel
time_since_mux_change (float) – Time in seconds since the last mux change on this reads channel

__init__(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0) → None

calibration: Calibration: Calibration metadata

end_reason: EndReason: EndReason data.

median_before: float: The level of current in the well before this read.

num_minknow_events: int = 0: Number of minknow events that the read contains

num_reads_since_mux_change: int = 0: Number of selected reads since the last mux change on this reads channel

pore: Pore: Pore metadata

predicted_scaling: ShiftScalePair: Shift and Scale for predicted read scaling values (based on this read’s raw signal)

read_id: UUID: The read_id of this read as UUID

read_number: int: The read number on channel. This is increasing but typically not necessarily consecutive.

run_info: RunInfo: RunInfo data.

start_sample: int: The number samples recorded on this channel before the read started.

time_since_mux_change: float = 0.0: Time in seconds since the last mux change on this reads channel

tracked_scaling: ShiftScalePair: Shift and Scale for tracked read scaling values (based on previous reads shift)

class Calibration(offset: float, scale: float)[source]

Bases: object

Parameters to convert the signal data to picoamps.

Parameters:

offset (float) – Calibration offset used to convert raw ADC data into pA readings.
scale (float) – Calibration scale factor used to convert raw ADC data into pA readings.

__init__(offset: float, scale: float) → None

classmethod from_range(offset: float, adc_range: float, digitisation: float) → Calibration[source]: Create a Calibration instance from offset, adc_range and digitisation

offset: float: Calibration offset used to convert raw ADC data into pA readings.

scale: float: Calibration scale factor used to convert raw ADC data into pA readings.

class CompressedRead(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal_chunks: ~typing.List[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.uint8]]] = <factory>, signal_chunk_lengths: ~typing.List[int] = <factory>)[source]

Bases: BaseRead

POD5 Read Data with a compressed signal.

Parameters:

read_id (UUID) – The read_id of this read as UUID.
pore (Pore) – Pore data.
calibration (Calibration) – Calibration data.
read_number (int) – The read number on channel. This is increasing but typically not necessarily consecutive.
start_sample (int) – The number samples recorded on this channel before the read started.
median_before (float) – The level of current in the well before this read.
end_reason (EndReason) – EndReason data.
run_info (RunInfo) – RunInfo data.
signal_chunks (List[numpy.array[uint8]]) – Compressed signal data in chunks.
signal_chunk_lengths (List[int]) – Chunk lengths (number of samples) of signal data before compression.

__init__(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal_chunks: ~typing.List[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.uint8]]] = <factory>, signal_chunk_lengths: ~typing.List[int] = <factory>) → None

property decompressed_signal: ndarray[Any, dtype[int16]]

Decompress and return the chunked signal data as a contiguous numpy array.

Returns:: decompressed_signal – Decompressed signal data
Return type:: numpy.array[int16]

property sample_count: int: Return the total number of samples in the uncompressed signal.

signal_chunk_lengths: List[int]: Chunk lengths (number of samples) of signal data before compression.

signal_chunks: List[ndarray[Any, dtype[uint8]]]: Compressed signal data in chunks.

class EndReason(reason: EndReasonEnum, forced: bool)[source]

Bases: object

Data on why the Read ended.

Parameters:

reason (EndReasonEnum) – The end reason enumeration.
forced (bool) – True if it is a ‘forced’ read break.

__init__(reason: EndReasonEnum, forced: bool) → None

forced: bool: True if it is a ‘forced’ read break (e.g. mux_change, unblock), False otherwise.

classmethod from_reason_with_default_forced(reason: EndReasonEnum) → EndReason[source]: Return a new EndReason instance with the ‘forced’ flag set to the expected default for the given reason

property name: str: Return the reason name as a lower string

reason: EndReasonEnum: The end reason enumeration

class EndReasonEnum(value)[source]

Bases: Enum

EndReason Enumeration

DATA_SERVICE_UNBLOCK_MUX_CHANGE = 3

MUX_CHANGE = 1

SIGNAL_NEGATIVE = 5

SIGNAL_POSITIVE = 4

UNBLOCK_MUX_CHANGE = 2

UNKNOWN = 0

class Pore(channel: int, well: int, pore_type: str)[source]

Bases: object

Data for the pore that the Read was acquired on

Parameters:

channel (int) – 1-indexed channel.
well (int) – 1-indexed well.
pore_type (PoreType) – The pore type present in the well.

__init__(channel: int, well: int, pore_type: str) → None

channel: int: 1-indexed channel.

pore_type: str: Name of the pore type present in the well.

well: int: 1-indexed well.

class Read(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int16]] = <factory>)[source]

Bases: BaseRead

POD5 Read Data with an uncompressed signal

Parameters:

read_id (UUID) – The read_id of this read as UUID.
pore (Pore) – Pore data.
calibration (Calibration) – Calibration data.
read_number (int) – The read number on channel. This is increasing but typically not necessarily consecutive.
start_sample (int) – The number samples recorded on this channel before the read started.
median_before (float) – The level of current in the well before this read.
end_reason (EndReason) – EndReason data.
run_info (RunInfo) – RunInfo data.
signal (numpy.array[int16]) – Uncompressed signal data.

__init__(read_id: ~uuid.UUID, pore: ~pod5.pod5_types.Pore, calibration: ~pod5.pod5_types.Calibration, read_number: int, start_sample: int, median_before: float, end_reason: ~pod5.pod5_types.EndReason, run_info: ~pod5.pod5_types.RunInfo, num_minknow_events: int = 0, tracked_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, predicted_scaling: ~pod5.pod5_types.ShiftScalePair = <factory>, num_reads_since_mux_change: int = 0, time_since_mux_change: float = 0.0, signal: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int16]] = <factory>) → None

property sample_count: int: Return the total number of samples in the uncompressed signal.

signal: ndarray[Any, dtype[int16]]: Uncompressed signal data.

class RunInfo(acquisition_id: str, acquisition_start_time: datetime, adc_max: int, adc_min: int, context_tags: Dict[str, str], experiment_name: str, flow_cell_id: str, flow_cell_product_code: str, protocol_name: str, protocol_run_id: str, protocol_start_time: datetime, sample_id: str, sample_rate: int, sequencing_kit: str, sequencer_position: str, sequencer_position_type: str, software: str, system_name: str, system_type: str, tracking_id: Dict[str, str])[source]

Bases: object

Higher-level information about the Reads that correspond to a part of an experiment, protocol or acquisition

Parameters:

acquisition_id (str) – A unique identifier for the acquisition.
acquisition_start_time (datetime.datetime) – This is the clock time for sample 0
adc_max (int) – The maximum ADC value that might be encountered.
adc_min (int) – The minimum ADC value that might be encountered.
context_tags (Dict[str, str]) – The context tags for the run. (For compatibility with fast5).
experiment_name (str) – The user-supplied name for the experiment being run.
flow_cell_id (str) – Uniquely identifies the flow cell the data was captured on.
flow_cell_product_code (str) – Identifies the type of flow cell the data was captured on.
protocol_name (str) – The name of the protocol that was run.
protocol_run_id (str) – The unique identifier for the protocol run that produced this data.
protocol_start_time (datetime.datetime) – When the protocol that the acquisition was part of started.
sample_id (str) – A user-supplied name for the sample being analysed.
sample_rate (int) – The number of samples acquired each second on each channel.
sequencing_kit (str) – The type of sequencing kit used to prepare the sample.
sequencer_position (str) – The sequencer position the data was collected on.
sequencer_position_type (str) – The type of sequencing hardware the data was collected on.
software (str) – A description of the software that acquired the data.
system_name (str) – The name of the system the data was collected on.
system_type (str) – The type of system the data was collected on.
tracking_id (Dict[str, str]) – The tracking id for the run. (For compatibility with fast5).

__init__(acquisition_id: str, acquisition_start_time: datetime, adc_max: int, adc_min: int, context_tags: Dict[str, str], experiment_name: str, flow_cell_id: str, flow_cell_product_code: str, protocol_name: str, protocol_run_id: str, protocol_start_time: datetime, sample_id: str, sample_rate: int, sequencing_kit: str, sequencer_position: str, sequencer_position_type: str, software: str, system_name: str, system_type: str, tracking_id: Dict[str, str]) → None

acquisition_id: str: A unique identifier for the acquisition - note that readers should not depend on this uniquely determining the other fields in the run_info, or being unique among the dictionary keys.

acquisition_start_time: datetime: This is the clock time for sample 0

adc_max: int: The maximum ADC value that might be encountered. This is a hardware constraint.

adc_min: int: The minimum ADC value that might be encountered. This is a hardware constraint.

context_tags: Dict[str, str]: The context tags for the run. (For compatibility with fast5).

experiment_name: str: The user-supplied name for the experiment being run.

flow_cell_id: str: Uniquely identifies the flow cell the data was captured on. This is written on the flow cell case.

flow_cell_product_code: str: Identifies the type of flow cell the data was captured on.

protocol_name: str: The name of the protocol that was run.

protocol_run_id: str: The unique identifier for the protocol run that produced this data.

protocol_start_time: datetime: When the protocol that the acquisition was part of started.

sample_id: str: A user-supplied name for the sample being analysed.

sample_rate: int: The number of samples acquired each second on each channel.

sequencer_position: str: The sequencer position the data was collected on. For removable positions, like MinION Mk1Bs, this is unique (e.g. ‘MN12345’), while for integrated positions it is not (e.g. ‘X1’ on a GridION).

sequencer_position_type: str: The type of sequencing hardware the data was collected on. For example: ‘MinION Mk1B’ or ‘GridION’ or ‘PromethION’.

sequencing_kit: str: The type of sequencing kit used to prepare the sample.

software: str: A description of the software that acquired the data. For example: ‘MinKNOW 21.05.12 (Bream 5.1.6, Configurations 16.2.1, Core 5.1.9, Guppy 4.2.3)’.

system_name: str: The name of the system the data was collected on. This might be a sequencer serial (eg: ‘GXB1234’) or a host name (e.g. ‘Lab PC’).

system_type: str: The type of system the data was collected on. For example, ‘GridION Mk1’ or ‘PromethION P48’. If the system is not a Nanopore sequencer with built-in compute, this will be a description of the operating system (e.g. ‘Ubuntu 20.04’).

tracking_id: Dict[str, str]: The tracking id for the run. (For compatibility with fast5).

class ShiftScalePair(shift: float = nan, scale: float = nan)[source]

Bases: object

A pair of floating point shift and scale values.

__init__(shift: float = nan, scale: float = nan) → None

scale: float = nan

shift: float = nan