Scientists from The Genome Analysis Centre (TGAC) in Norwich UK have developed a new bioinformatics tool to boost complex genome analysis.The software, NextClip, generates a comprehensive quality report and extracts high class trimmed and de-duplicated data.
The tool supports Illumina’s recently released Nextera LMP kit, which enables the production of jumping libraries of up to 12kb. These Long Mate Pair libraries are an invaluable resource for analysing large areas of the genome, carrying out complex assemblies and other downstream bioinformatics analytics.
However, LMP libraries are intrinsically noisy and to maximise their value, post-sequencing data analysis is required.
Richard Leggett, at TGAC, said: “Regulating laboratory protocols and selection of sequenced data for downstream analysis are vital in making effective use of mate pair libraries.
“However, quality control of the libraries can require significant bioinformatics analysis. Further processing is also required to extract true mate pair reads, remove fragment junction adaptors and clip reads.
“For this reason we developed NextClip, a tool for comprehensive quality analysis of Nextera LMP libraries and preparation of reads for scaffolding.”
Mate pair libraries are formed by making large fragments of DNA (5-12 kb in length for Nextera) and are sequenced from either end of the fragment to produce two sequences of DNA that are separated by a known distance.
Sequence reads from Long Mate Pair libraries are an important tool in the construction of complex genome assemblies because they connect large repeat regions.
Grouping the data generated from mate pair library sequencing with shorter insert paired-end reads provide a powerful combination, allowing the joining together of longer DNA sequences, with higher certainty.
The Genome Analysis Centre is a research institute focused on the development of genomics and computational biology. Based at Norwich Research Park, it receives strategic funding from the Biotechnology and Biological Science Research Council - £9.2 million in 2012-2013 - as well as support from other research funders.
TGAC offers state of the art DNA sequencing facility, unique by its operation of multiple complementary technologies for data generation. The Institute is a UK hub for innovative Bioinformatics through research, analysis and interpretation of multiple, complex data sets.
It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. It is also actively involved in developing novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational Bioscience.
• PHOTOGRAPH SHOWS: Richard Leggett, TGAC