pod5_view
- class Field(expr: Expr, docs: str)[source]
Bases:
tupleContainer class for storing the polars expression for a named field
- property docs
Alias for field number 1
- property expr
Alias for field number 0
- assert_unique_acquisition_id(run_info: LazyFrame, path: Path) None[source]
Perform a check that the acquisition ids are unique raising AssertionError otherwise
- format_view_table(lazyframe: LazyFrame, path: Path, selected_fields: Set[str]) LazyFrame[source]
Format the view table based on the selected fields
- get_reads_tables(path: Path, selected_fields: Set[str], threshold: int = 100000) Generator[LazyFrame, None, None][source]
Generate lazy dataframes from pod5 records. If the number of records is greater than threshold then yield chunks to limit memory consumption and improve overall performance
- join_reads_to_run_info(reads: LazyFrame, run_info: LazyFrame) LazyFrame[source]
Join the reads and run_info tables
- join_workers(processes: List[SpawnProcess], exceptions: JoinableQueue) None[source]
Poll workers checking for exceptions which will likely cause
- launch_view_workers(paths: Set[Path], output: Path, selection: Set[str], separator: str, num_workers: int)[source]
- parse_read_table_chunks(reader: Reader, approx_size: int = 99999) Generator[LazyFrame, None, None][source]
Read record batches and yield polars lazyframes of approx_size records. Records are yielded in units of whole batches of the underlying table
- parse_reads_table_all(reader: Reader) LazyFrame[source]
Parse all records in the reads table returning a polars LazyFrame
- parse_reads_table_batch(reader: Reader, batch_index: int) Tuple[LazyFrame, int][source]
Parse the reads table record batch at batch_index from a pod5 file returning a polars LazyFrame and the number of records in it
- parse_run_info_table(reader: Reader) LazyFrame[source]
Parse the reads table from a pod5 file returning a polars LazyFrame
- resolve_output(output: Optional[Path], force_overwrite: bool) Optional[Path][source]
Resolve the output path if necessary checking for no accidental overwrite and resolving to default output if given a path
- select_fields(*, group_read_id: bool = False, include: Optional[str] = None, exclude: Optional[str] = None) Set[str][source]
Select fields to write
- view_pod5(inputs: List[Path], output: Path, separator: str = '\t', recursive: bool = False, force_overwrite: bool = False, list_fields: bool = False, no_header: bool = False, threads: int = 2, **kwargs) None[source]
Given a list of POD5 files write a table to view their contents
- worker_process(paths: JoinableQueue, exceptions: JoinableQueue, lock: Lock, output: Path, separator: bool, selection: Set[str]) None[source]
Consume pod5 paths from paths queue, parse the records and write to output after acquiring lock. Returns None when all finish sentinel None is received in paths queue.