MANAGING ADDITIONAL REFERENCES IN SHORT-READ ALIGNMENT DOWNLOAD v. 0.7 SOURCE CODE |
![]() |
About: MARSHAL allows short-read nucleotide aligners to utilize multiple references with negligible overhead of time and space. The use of multiple references in short-read assembly is important for detecting structural variations (SVs), such as long insertions and deletions (indels) that may only appear in some of the references. MARSHAL is designed for analyzing structural variations among closely related individuals or species. It is most effective when run with a set of references that are relatively similar to one another and to the reads, but are polymorphic with respect to specific long indels. Users may choose which external program performs the short-read alignment, since MARSHAL works by preprocessing the alignment input and postprocessing the alignment output. Input is provided as one or more short read files, a FASTA primary reference, a tab-delimited file with a formatted list of long indels, and FASTA sequence files containing the insertions. An optional preprocessing step identifies the indels from pairwise alignment between the primary reference and other reference(s), outputting them in the list format handled by MARSHAL. Prior to short-read alignment, the indel information is used to create a FASTA-formatted initial chimeric reference. This chimera is inclusive of the primary reference and all the insertions, and is designed to enable the short-read aligner to adequately map the reads to the indels. After the aligner is run, indel coverage is analyzed from a SAM-formatted alignment map and each indel's presence vs. absence evaluated. An additional script, which may be called before or after short-read alignment, allows the user to create a continuous chimeric reference by specifying which segments to add and remove. The software and algorithm were developed in the Department of Computer Science at Columbia University. They are built with Java and tested in the GNU/Linux environment; the source code is distributed here in a .tar.gz package under the GPL license. Usage: The downloaded files can be uncompressed by typing Compile the program in the main download directory by typing A tutorial and complete instructions are included in the manual. The data files required to perform the analysis in the tutorial can be downloaded here (these files are over 300M uncompressed). Citation: Sealfon,R.A. and Song,R. (2011) Managing multiple references in short-read alignment. Submitted. Please cite this manuscript if you are using or discussing MARSHAL in a published analysis. Contact: If you have any questions or comments, please send them to Rebecca Sealfon at ras2198@columbia.edu. ![]() |