repack

Tools to assist repacking pod5 data into other pod5 files

class Repacker[source]

Bases: object

Wrapper class around native pod5 tools to repack data

__init__()[source]
add_all_reads_to_output(output_ref: Pod5RepackerOutput, reader: Reader) None[source]

Copy the every read from the given Reader into the Repacker output reference which was returned by add_output()

Parameters:
  • output_ref (lib_pod5.pod5_format_pybind.Pod5RepackerOutput) – The repacker handle reference returned from add_output()

  • reader (Reader) – The Pod5 file reader to copy reads from

add_output(output_file: Writer) Pod5RepackerOutput[source]

Add an output file writer to the repacker, so it can have read data repacked into it.

Once a user has added an output, it can be passed as an output to add_selected_reads_to_output() or add_reads_to_output()

Parameters:

output_file (writer.Writer) – The output file writer to use

Returns:

repacker_object – Use this as “output_ref” in calls to add_selected_reads_to_output() or add_reads_to_output()

Return type:

p5b.Pod5RepackerOutput

add_selected_reads_to_output(output_ref: Pod5RepackerOutput, reader: Reader, selected_read_ids: Collection[str])[source]

Copy the selected read_ids from the given Reader into the Repacker output reference which was returned by add_output()

Parameters:
  • output_ref (lib_pod5.pod5_format_pybind.Pod5RepackerOutput) – The repacker handle reference returned from add_output()

  • reader (Reader) – The Pod5 file reader to copy reads from

  • selected_read_ids (Collection[str]) – A Collection of read_ids as strings

Raises:

RuntimeError – If any of the selected_read_ids were not found in the source file

property batches_completed: int

Find the number of batches completed writing to dest files

property batches_requested: int

Find the number of batches requested to be read from source files

finish() None[source]

Call finish on the underlying c_api repacker instance to free resources

property is_complete: bool

Find if the requested repack operations are complete

property pending_batch_writes: int

Find the number of batches in flight, awaiting writing

property reads_completed: int

Find the number of reads written to files

property reads_requested: int

Find the number of requested reads to be written

property reads_sample_bytes_completed: int

Find the number of bytes for sample data repacked

wait(finish: bool = True, interval: float = 0.5, show_pbar: bool = True, leave_pbar: bool = False) None[source]

Wait for the repacker (blocking) until it is done by checking is_complete every interval seconds. Optionally show a progress bar for updates.

Parameters:
  • finish (bool) – Flag to toggle an optional final call to finish() to close the repacker and free resources

  • interval (float) – The interval (in seconds) between checks to is_complete()

  • show_pbar (bool) – Flag to toggle showing the progress bar combined with POD5_PBAR

  • leave_pbar (bool) – Flag to toggle if the progress bar should not be cleared after use