SURPI (“Sequence-Based Ultra-Rapid Pathogen Identification”) is a computational pipeline for pathogen identification from complex metagenomic next-generation sequencing (NGS) data. Metagenomic sequencing of clinical samples for pathogen detection has numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology has been hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. SURPI was developed to address these challenges.

Our manuscript describing SURPI has been published in Genome Research.


SURPI is available under the 2-clause BSD license on github. SURPI can be installed on a local Ubuntu server or an Amazon EC2 cloud-computing instance.

The script is designed to install SURPI and all software dependencies onto a machine running Ubuntu 12.04. The SURPI™ installer can be downloaded here.


For questions and/or report bugs, see our mailing list.

A clinical version of the software, SURPI+, is currently being used as the core analysis pipeline for clinical metagenomic next-generation sequencing assays for infectious diseases being developed at UCSF. Please contact us if you are interested in collaborating or licensing SURPI+ for clinical use.


SURPI and SURPI+ are trademarked by the Regents of the University of California.