Getting Started

The pod5 python module can be used to read and write nanopore reads stored in POD5 files.

This page provides a quick introduction to the pod5-format API with introductory examples.

Please refer to the installation documentation for details on how to install the pod5-format packages.

Reading POD5 Files

To use the module to open a POD5 file, create a Reader. It is strongly recommended that users use python’s with statement to ensure that any opened resources (e.g. file handles) are safely closed when they are no longer needed.

import pod5 as p5

with p5.Reader("example.pod5") as reader:
    # Use reader within this context manager
    ...
# Resources are safely closed

Iterate Over Reads

With an open Reader call reads() to generate a ReadRecord instance for each read in the file:

import pod5 as p5

with p5.Reader("example.pod5") as reader:
    for read_record in reader.reads():
        print(read_record.read_id)

To iterate over a selection of read_ids, provide reads() with a collection of read_ids which must be string UUID’s’ :

import pod5 as p5

# Create a collection of read_id UUIDs as string
read_ids = {
    "00445e58-3c58-4050-bacf-3411bb716cc3",
    "00520473-4d3d-486b-86b5-f031c59f6591",
}

with p5.Reader("example.pod5") as reader:
    for read_record in reader.reads(read_ids):
        assert str(read_record.read_id) in read_ids

Reads and ReadRecords

Nanopore sequencing data comprises Reads which are formed from signal data and other metadata about how and when the sample was sequenced. This data is accessible via the Read or ReadRecord classes.

Although these two classes have very similar interfaces, know that the ReadRecord is a Read formed from a POD5 file record which uses caching to improve read performance.

Note

There will likely be revisions to this beta implementation to unify these similar classes into a common interface.

Here are some of the most important members of a ReadRecord. Please read the ReadRecord API reference for the complete set.

ReadRecord(reader, batch, row[, ...])

Represents the data for a single read from a pod5 record.

ReadRecord.read_id

Get the unique read identifier for the read as a UUID.

ReadRecord.calibration

Get the calibration data associated with the read.

ReadRecord.end_reason

Get the end reason data associated with the read.

ReadRecord.pore

Get the pore data associated with the read.

ReadRecord.read_number

Get the integer read number of the read.

ReadRecord.run_info

Get the run info data associated with the read.

ReadRecord.signal

Get the full signal for the read.

Plotting Example

Here is an example of how a user may plot a read’s signal data against time.

"""
Example use of pod5 to plot the signal data from a selected read.
"""

import matplotlib.pyplot as plt
import numpy as np

import pod5 as p5

# Using the example pod5 file provided
example_pod5 = "test_data/multi_fast5_zip.pod5"
selected_read_id = '0000173c-bf67-44e7-9a9c-1ad0bc728e74'

with p5.Reader(example_pod5) as reader:

    # Read the selected read from the pod5 file
    # next() is required here as Reader.reads() returns a Generator
    read = next(reader.reads([selected_read_id]))

    # Get the signal data and sample rate
    sample_rate = read.run_info.sample_rate
    signal = read.signal

    # Compute the time steps over the sampling period
    time = np.arange(len(signal)) / sample_rate

    # Plot using matplotlib
    plt.plot(time, signal)

Writing POD5 Files

The pod5-format package provides the functionality to write POD5 files. Although most users will only need to read files produced by Oxford Nanopore sequencers there are certainly use cases where writing ones own POD5 files would be desirable.

Note

It is strongly recommended that users first look at the tools package for tools to manipulate existing datasets.

New tools may be added to support our users and if you have a suggestion for a new tool please submit a request on the pod5-file-format GitHub issues page.

Adding Reads Example

Below is an example of how one may add reads to a new POD5 file using the Writer and its add_read() method.

import pod5 as p5

# Example container classes for read information
pore = p5.Pore(channel=123, well=3, pore_type="pore_type")
calibration = p5.Calibration(offset=0.1, scale=1.1)
end_reason = p5.EndReason(name=p5.EndReasonEnum.SIGNAL_POSITIVE, forced=False)
run_info = p5.RunInfo(
    acquisition_id = ...
    acquisition_start_time = ...
    adc_max = ...
    ...
)
signal = ... # some signal data

read = p5.Read(
    read_id=UUID("0000173c-bf67-44e7-9a9c-1ad0bc728e74"),
    end_reason=end_reason,
    calibration=calibration,
    pore=pore,
    run_info=run_info,
    ...
    signal=signal,
    sample_count=len(signal),
    pre_compressed_signal=False,
)

with p5.Writer("example.pod5") as writer:
    # Write the read and all of its metadata
    writer.add_read(read)