pod5_subset
Tool for subsetting pod5 files into one or more outputs
- assert_filename_template(template: str, subset_columns: List[str], ignore_incomplete_template: bool) None[source]
Get the keys named in the template to assert that they exist in subset_columns
- assert_overwrite_ok(output: Path, names: Iterable[str], force_overwrite: bool) None[source]
Given the output directory path and target filenames, assert that no unforced overwrite will occur unless requested raising an FileExistsError if not
- calculate_transfers(inputs: List[Path], read_targets: Dict[str, Set[Path]], missing_ok: bool, duplicate_ok: bool) Dict[Path, Dict[Path, Set[str]]][source]
Calculate the transfers which stores the collection of read_ids their source and destination.
- create_default_filename_template(subset_columns: List[str]) str[source]
Create the default filename template from the subset_columns selected
- launch_subsetting(transfers: Dict[Path, Dict[Path, Set[str]]], show_pbar: bool = False) None[source]
Iterate over the transfers one target at a time, opening sources and copying the required read_ids. Wait for the repacker to finish before moving on to ensure we don’t have too many open file handles.
- parse_csv_mapping(csv_path: Path) Dict[str, Set[str]][source]
Parse the csv direct mapping of output target to read_ids
- parse_direct_mapping_targets(csv_path: Optional[Path] = None, json_path: Optional[Path] = None) Dict[str, Set[str]][source]
Parse either the csv or json direct mapping of output target to read_ids
- Return type:
dictionary mapping of output target to read_ids
- parse_json_mapping(json_path: Path) Dict[str, Set[str]][source]
Parse the json direct mapping of output target to read_ids
- parse_table_mapping(summary_path: Path, filename_template: Optional[str], subset_columns: List[str], read_id_column: str = 'read_id', ignore_incomplete_template: bool = False) Dict[str, Set[str]][source]
Parse a table using pandas to create a mapping of output targets to read ids
- resolve_targets(output: Path, mapping) Dict[str, Set[Path]][source]
Resolve the targets from the mapping
- subset_pod5(inputs: List[Path], output: Path, csv: Optional[Path], json: Optional[Path], table: Optional[Path], columns: List[str], threads: int, template: str, read_id_column: str, missing_ok: bool, duplicate_ok: bool, ignore_incomplete_template: bool, force_overwrite: bool) Any[source]
Prepare the subsampling mapping and run the repacker
- subset_pod5s_with_mapping(inputs: Iterable[Path], output: Path, mapping: Mapping[str, Set[str]], threads: int = 1, missing_ok: bool = False, duplicate_ok: bool = False, force_overwrite: bool = False) List[Path][source]
Given an iterable of input pod5 paths and an output directory, create output pod5 files containing the read_ids specified in the given mapping of output filename to set of read_id.