reader
Tools for accessing POD5 data from PyArrow files
- class ArrowTableHandle(location: EmbeddedFileData)[source]
Bases:
objectClass for managing arrow file handles and memory view mapping of tables
- __init__(location: EmbeddedFileData) None[source]
Open a pod5 file at the given path and use the location data to load an arrow table (e.g. signal table)
- Parameters:
location (lib_pod5.pod5_format_pybind.EmbeddedFileData) – Location data for how a pod5 file should be spit in memory to read a table. This is returned from p5b.Pod5FileReader.get_file_X_location methods
- Raises:
Pod5ApiException – If handle could not be opened
- property reader: RecordBatchFileReader
Return the pyarrow file reader object
- class ReadRecord(reader: Reader, batch: ReadRecordBatch, row: int, batch_signal_cache: Optional[List[ndarray[Any, dtype[int16]]]] = None, selected_batch_index: Optional[int] = None)[source]
Bases:
objectRepresents the data for a single read from a pod5 record.
- __init__(reader: Reader, batch: ReadRecordBatch, row: int, batch_signal_cache: Optional[List[ndarray[Any, dtype[int16]]]] = None, selected_batch_index: Optional[int] = None)[source]
- property byte_count: int
Get the number of bytes used to store the reads data.
- calibrate_signal_array(signal_array_adc: ndarray[Any, dtype[int16]]) ndarray[Any, dtype[float32]][source]
Transform an array of int16 signal data from ADC space to pA.
- Return type:
A numpy array of signal data with float32 type.
- property calibration: Calibration
Get the calibration data associated with the read.
- property calibration_digitisation: int
Get the digitisation value used by the sequencer.
Intended to assist workflows ported from legacy file formats.
- property calibration_range: float
Get the calibration range value.
Intended to assist workflows ported from legacy file formats.
- property end_reason_index: int
Get the dictionary index of the end reason data associated with the read. This property is the same as the EndReason enumeration value.
- property has_cached_signal: bool
Get if cached signal is available for this read.
- property median_before: float
Get the median before level (in pico amps) for the read.
- property num_minknow_events: float
Find the number of minknow events in the read.
- property num_reads_since_mux_change: int
Number of selected reads since the last mux change on this reads channel.
- property num_samples: int
Get the number of samples in the reads signal data.
- property predicted_scaling: ShiftScalePair
Find the predicted scaling value in the read.
- property read_id: UUID
Get the unique read identifier for the read as a UUID.
- property read_number: int
Get the integer read number of the read.
- property run_info_index: int
Get the dictionary index of the run info data associated with the read.
- property sample_count: int
Get the number of samples in the reads signal data.
- property signal: ndarray[Any, dtype[int16]]
Get the full signal for the read.
- Returns:
A numpy array of signal data with int16 type.
- Return type:
numpy.ndarray[int16]
- signal_for_chunk(index: int) ndarray[Any, dtype[int16]][source]
Get the signal for a given chunk of the read.
- Returns:
A numpy array of signal data with int16 type for the specified chunk.
- Return type:
numpy.ndarray[int16]
- property signal_pa: ndarray[Any, dtype[float32]]
Get the full signal for the read, calibrated in pico amps.
- Returns:
A numpy array of signal data in pico amps with float32 type.
- Return type:
numpy.ndarray[float32]
- property signal_rows: List[SignalRowInfo]
Get all signal rows for the read
- Returns:
A list of signal row data (as SignalRowInfo) in the read.
- Return type:
list[SignalRowInfo]
- property start_sample: int
Get the absolute sample which the read started.
- property time_since_mux_change: int
Time in seconds since the last mux change on this reads channel.
- to_read() Read[source]
Create a mutable
pod5.pod5_types.Readfrom thisReadRecordinstance.- Return type:
- property tracked_scaling: ShiftScalePair
Find the tracked scaling value in the read.
- class ReadRecordBatch(reader: Reader, batch: RecordBatch)[source]
Bases:
objectRead data for a batch of reads.
- property cached_sample_count_column: ndarray[Any, dtype[uint64]]
Get the sample counts from the cached signal data
- property cached_samples_column: List[ndarray[Any, dtype[int16]]]
Get the samples column from the cached signal data
- property columns: ReadRecordV3Columns
Return the data from this batch as a ReadRecordColumns instance
- get_read(row: int) ReadRecord[source]
Get the ReadRecord at row index
- property num_reads: int
Return the number of rows in this RecordBatch
- property read_id_column
Get the column of read ids for this batch
- property read_number_column
Get the column of read numbers for this batch
- reads() Generator[ReadRecord, None, None][source]
Iterate all reads in this batch.
- Yields:
ReadRecord – ReadRecord instances in the file.
- class ReadRecordV3Columns(read_id, read_number, start, channel, well, median_before, pore_type, calibration_offset, calibration_scale, end_reason, end_reason_forced, run_info, signal, num_minknow_events, tracked_scaling_scale, tracked_scaling_shift, predicted_scaling_scale, predicted_scaling_shift, num_reads_since_mux_change, time_since_mux_change, num_samples)
Bases:
tuple- property calibration_offset
Alias for field number 7
- property calibration_scale
Alias for field number 8
- property channel
Alias for field number 3
- property end_reason
Alias for field number 9
- property end_reason_forced
Alias for field number 10
- property median_before
Alias for field number 5
- property num_minknow_events
Alias for field number 13
- property num_reads_since_mux_change
Alias for field number 18
- property num_samples
Alias for field number 20
- property pore_type
Alias for field number 6
- property predicted_scaling_scale
Alias for field number 16
- property predicted_scaling_shift
Alias for field number 17
- property read_id
Alias for field number 0
- property read_number
Alias for field number 1
- property run_info
Alias for field number 11
- property signal
Alias for field number 12
- property start
Alias for field number 2
- property time_since_mux_change
Alias for field number 19
- property tracked_scaling_scale
Alias for field number 14
- property tracked_scaling_shift
Alias for field number 15
- property well
Alias for field number 4
- class Reader(path: Union[PathLike, str])[source]
Bases:
objectThe base reader for POD5 data
- property batch_count: int
Find the number of read batches available in the file.
- property file_identifier: UUID
- property file_version: Version
- property file_version_pre_migration: Version
- get_batch(index: int) ReadRecordBatch[source]
Get a read batch in the file.
- Returns:
The requested batch as a ReadRecordBatch.
- Return type:
- property inner_file_reader: Pod5FileReader
Access the inner c_api Pod5FileReader - use with caution
- property is_vbz_compressed: bool
Return if this file’s signal is compressed
- property num_reads: int
Find the number of reads in the file.
- property path: Path
Return the path to this pod5 file
- read_batches(selection: Optional[List[str]] = None, batch_selection: Optional[Iterable[int]] = None, missing_ok: bool = False, preload: Optional[Set[str]] = None) Generator[ReadRecordBatch, None, None][source]
Iterate batches in the file, optionally selecting certain rows.
- Parameters:
selection (iterable[str]) – The read ids to walk in the file.
batch_selection (iterable[int]) – The read batches to walk in the file.
missing_ok (bool) – If selection contains entries not found in the file, an error will be raised.
preload (set[str]) – Columns to preload - “samples” and “sample_count” are valid values
- Return type:
An iterable of
ReadRecordBatchin the file.
- property read_ids: List[str]
Return all read_ids as a list of strings.
For the most performant implementation consider Reader.read_ids_raw
- property read_ids_raw: ChunkedArray
Return chunked arrow array of read ids.
To get read ids as string use Reader.read_ids
- property read_table: RecordBatchFileReader
Access the pod5 read table
- reads(selection: Optional[Iterable[str]] = None, missing_ok: bool = False, preload: Optional[Set[str]] = None) Generator[ReadRecord, None, None][source]
Iterate reads in the file, optionally filtering for certain read ids.
- Parameters:
selection (iterable[str]) – The read ids to walk in the file.
missing_ok (bool) – If selection contains entries not found in the file, an error will be raised.
preload (set[str]) – Columns to preload - “samples” and “sample_count” are valid values
- Return type:
An iterable of
ReadRecordin the file.
- property reads_table_version: ReadTableVersion
- property run_info_table: RecordBatchFileReader
Access the pod5 run_info table
- property signal_batch_row_count: int
Return signal batch row count
- property signal_table: RecordBatchFileReader
Access the pod5 signal table - use with caution
- property writing_software: str