reader

Tools for accessing POD5 data from PyArrow files

class ArrowTableHandle(location: EmbeddedFileData)[source]

Bases: object

Class for managing arrow file handles and memory view mapping of tables

__init__(location: EmbeddedFileData) None[source]

Open a pod5 file at the given path and use the location data to load an arrow table (e.g. signal table)

Parameters:

location (lib_pod5.pod5_format_pybind.EmbeddedFileData) – Location data for how a pod5 file should be spit in memory to read a table. This is returned from p5b.Pod5FileReader.get_file_X_location methods

Raises:

Pod5ApiException – If handle could not be opened

close() None[source]

Cleanly close the open file handles and memory views.

property reader: RecordBatchFileReader

Return the pyarrow file reader object

class ReadRecord(reader: Reader, batch: ReadRecordBatch, row: int, batch_signal_cache: Optional[List[ndarray[Any, dtype[int16]]]] = None, selected_batch_index: Optional[int] = None)[source]

Bases: object

Represents the data for a single read from a pod5 record.

__init__(reader: Reader, batch: ReadRecordBatch, row: int, batch_signal_cache: Optional[List[ndarray[Any, dtype[int16]]]] = None, selected_batch_index: Optional[int] = None)[source]
property byte_count: int

Get the number of bytes used to store the reads data.

calibrate_signal_array(signal_array_adc: ndarray[Any, dtype[int16]]) ndarray[Any, dtype[float32]][source]

Transform an array of int16 signal data from ADC space to pA.

Return type:

A numpy array of signal data with float32 type.

property calibration: Calibration

Get the calibration data associated with the read.

property calibration_digitisation: int

Get the digitisation value used by the sequencer.

Intended to assist workflows ported from legacy file formats.

property calibration_range: float

Get the calibration range value.

Intended to assist workflows ported from legacy file formats.

property end_reason: EndReason

Get the end reason data associated with the read.

property end_reason_index: int

Get the dictionary index of the end reason data associated with the read. This property is the same as the EndReason enumeration value.

property has_cached_signal: bool

Get if cached signal is available for this read.

property median_before: float

Get the median before level (in pico amps) for the read.

property num_minknow_events: float

Find the number of minknow events in the read.

property num_reads_since_mux_change: int

Number of selected reads since the last mux change on this reads channel.

property num_samples: int

Get the number of samples in the reads signal data.

property pore: Pore

Get the pore data associated with the read.

property predicted_scaling: ShiftScalePair

Find the predicted scaling value in the read.

property read_id: UUID

Get the unique read identifier for the read as a UUID.

property read_number: int

Get the integer read number of the read.

property run_info: RunInfo

Get the run info data associated with the read.

property run_info_index: int

Get the dictionary index of the run info data associated with the read.

property sample_count: int

Get the number of samples in the reads signal data.

property signal: ndarray[Any, dtype[int16]]

Get the full signal for the read.

Returns:

A numpy array of signal data with int16 type.

Return type:

numpy.ndarray[int16]

signal_for_chunk(index: int) ndarray[Any, dtype[int16]][source]

Get the signal for a given chunk of the read.

Returns:

A numpy array of signal data with int16 type for the specified chunk.

Return type:

numpy.ndarray[int16]

property signal_pa: ndarray[Any, dtype[float32]]

Get the full signal for the read, calibrated in pico amps.

Returns:

A numpy array of signal data in pico amps with float32 type.

Return type:

numpy.ndarray[float32]

property signal_rows: List[SignalRowInfo]

Get all signal rows for the read

Returns:

A list of signal row data (as SignalRowInfo) in the read.

Return type:

list[SignalRowInfo]

property start_sample: int

Get the absolute sample which the read started.

property time_since_mux_change: int

Time in seconds since the last mux change on this reads channel.

to_read() Read[source]

Create a mutable pod5.pod5_types.Read from this ReadRecord instance.

Return type:

pod5.pod5_types.Read

property tracked_scaling: ShiftScalePair

Find the tracked scaling value in the read.

class ReadRecordBatch(reader: Reader, batch: RecordBatch)[source]

Bases: object

Read data for a batch of reads.

__init__(reader: Reader, batch: RecordBatch)[source]
property cached_sample_count_column: ndarray[Any, dtype[uint64]]

Get the sample counts from the cached signal data

property cached_samples_column: List[ndarray[Any, dtype[int16]]]

Get the samples column from the cached signal data

property columns: ReadRecordV3Columns

Return the data from this batch as a ReadRecordColumns instance

get_read(row: int) ReadRecord[source]

Get the ReadRecord at row index

property num_reads: int

Return the number of rows in this RecordBatch

property read_id_column

Get the column of read ids for this batch

property read_number_column

Get the column of read numbers for this batch

reads() Generator[ReadRecord, None, None][source]

Iterate all reads in this batch.

Yields:

ReadRecord – ReadRecord instances in the file.

set_cached_signal(signal_cache: Pod5SignalCacheBatch) None[source]

Set the signal cache

set_selected_batch_rows(selected_batch_rows: Iterable[int]) None[source]

Set the selected batch rows

class ReadRecordV3Columns(read_id, read_number, start, channel, well, median_before, pore_type, calibration_offset, calibration_scale, end_reason, end_reason_forced, run_info, signal, num_minknow_events, tracked_scaling_scale, tracked_scaling_shift, predicted_scaling_scale, predicted_scaling_shift, num_reads_since_mux_change, time_since_mux_change, num_samples)

Bases: tuple

property calibration_offset

Alias for field number 7

property calibration_scale

Alias for field number 8

property channel

Alias for field number 3

property end_reason

Alias for field number 9

property end_reason_forced

Alias for field number 10

property median_before

Alias for field number 5

property num_minknow_events

Alias for field number 13

property num_reads_since_mux_change

Alias for field number 18

property num_samples

Alias for field number 20

property pore_type

Alias for field number 6

property predicted_scaling_scale

Alias for field number 16

property predicted_scaling_shift

Alias for field number 17

property read_id

Alias for field number 0

property read_number

Alias for field number 1

property run_info

Alias for field number 11

property signal

Alias for field number 12

property start

Alias for field number 2

property time_since_mux_change

Alias for field number 19

property tracked_scaling_scale

Alias for field number 14

property tracked_scaling_shift

Alias for field number 15

property well

Alias for field number 4

class ReadTableVersion(value)[source]

Bases: Enum

Version of read table

V3: int = 3
class Reader(path: Union[PathLike, str])[source]

Bases: object

The base reader for POD5 data

__init__(path: Union[PathLike, str])[source]

Open a pod5 filepath for reading

property batch_count: int

Find the number of read batches available in the file.

close() None[source]

Close files handles

property file_identifier: UUID
property file_version: Version
property file_version_pre_migration: Version
get_batch(index: int) ReadRecordBatch[source]

Get a read batch in the file.

Returns:

The requested batch as a ReadRecordBatch.

Return type:

ReadRecordBatch

property inner_file_reader: Pod5FileReader

Access the inner c_api Pod5FileReader - use with caution

property is_vbz_compressed: bool

Return if this file’s signal is compressed

property num_reads: int

Find the number of reads in the file.

property path: Path

Return the path to this pod5 file

read_batches(selection: Optional[List[str]] = None, batch_selection: Optional[Iterable[int]] = None, missing_ok: bool = False, preload: Optional[Set[str]] = None) Generator[ReadRecordBatch, None, None][source]

Iterate batches in the file, optionally selecting certain rows.

Parameters:
  • selection (iterable[str]) – The read ids to walk in the file.

  • batch_selection (iterable[int]) – The read batches to walk in the file.

  • missing_ok (bool) – If selection contains entries not found in the file, an error will be raised.

  • preload (set[str]) – Columns to preload - “samples” and “sample_count” are valid values

Return type:

An iterable of ReadRecordBatch in the file.

property read_ids: List[str]

Return all read_ids as a list of strings.

For the most performant implementation consider Reader.read_ids_raw

property read_ids_raw: ChunkedArray

Return chunked arrow array of read ids.

To get read ids as string use Reader.read_ids

property read_table: RecordBatchFileReader

Access the pod5 read table

reads(selection: Optional[Iterable[str]] = None, missing_ok: bool = False, preload: Optional[Set[str]] = None) Generator[ReadRecord, None, None][source]

Iterate reads in the file, optionally filtering for certain read ids.

Parameters:
  • selection (iterable[str]) – The read ids to walk in the file.

  • missing_ok (bool) – If selection contains entries not found in the file, an error will be raised.

  • preload (set[str]) – Columns to preload - “samples” and “sample_count” are valid values

Return type:

An iterable of ReadRecord in the file.

property reads_table_version: ReadTableVersion
property run_info_table: RecordBatchFileReader

Access the pod5 run_info table

property signal_batch_row_count: int

Return signal batch row count

property signal_table: RecordBatchFileReader

Access the pod5 signal table - use with caution

property writing_software: str
class Signal(signal, samples)

Bases: tuple

property samples

Alias for field number 1

property signal

Alias for field number 0

class SignalRowInfo(batch_index, batch_row_index, sample_count, byte_count)

Bases: tuple

property batch_index

Alias for field number 0

property batch_row_index

Alias for field number 1

property byte_count

Alias for field number 3

property sample_count

Alias for field number 2