TagCleaner. For cleaner sequences.

FAQ

If you can't find the answer to your question, take a look at the manual or the Q&A site.

What is TagCleaner?

TagCleaner is a publicly available application that is able to automatically detect and efficiently remove tag sequences from genomic and metagenomic datasets. It is easily configurable and provides a standalone version and a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner or by clicking on "Use TagCleaner" in the menu above.


How can I cite TagCleaner?

If you use TagCleaner, please cite:
Schmieder R, Lim YW, Rohwer F, Edwards R: TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets. BMC Bioinformatics 2010, 11:341. [PMID: 20573248]

@article{schmieder_tagcleaner,
	title = {{TagCleaner:} Identification and removal of tag sequences from genomic and metagenomic datasets},
	volume = {11},
	issn = {1471-2105},
	shorttitle = {{TagCleaner}},
	url = {http://www.ncbi.nlm.nih.gov/pubmed/20573248},
	doi = {10.1186/1471-2105-11-341},
	number = {1},
	journal = {{BMC} Bioinformatics},
	author = {Robert Schmieder and Yan Wei Lim and Forest Rohwer and Robert Edwards},
	month = jun,
	year = {2010},
	note = {{PMID:} 20573248},
	pages = {341}
}



Why should I use TagCleaner?

Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences that can contain deletions or insertions due to sequencing limitations. The tag (e.g. WTA primer) sequence may be unavailable or incorrectly reported in public databases. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data.

There are several advantages in using TagCleaner to pre-process sequence data:
   - Tag sequence trimming and data filtering improve the reliability of downstream data analysis
   - TagCleaner is a web application that allows users to pre-process their datasets without installing any software
   - TagCleaner is independent of third-party software and thus compatible with any computer supporting web services.


How is it different from other programs?

TagCleaner offers features that are unique to the program such as automatic prediction of tag sequences (including the quasi-random parts of WTA tags), continuous trimming, and detection and splitting of fragment-to-fragment concatenations.


Is there a stand-alone version of the program?

With release of version 0.9, TagCleaner is available as web and as standalone versions. Both version can be downloaded under "Downloads".


What file formats does TagCleaner support?

You can submit file in FASTA or FASTQ format. The file can also be compress using various algorithms, including ZIP, GZIP, BZIP2 and LZOP format.


What is the maximum number of sequences that I can submit through the web version?

There is no limit on the number of sequences that you can submit. However, there is a limit for the file size that you can upload. The current web-service allows files up to 600 MB. If you compress your data, you can submit around 2 GB of sequence data.


Do I need to know my primer/tag sequence?

No. If you do not know your primer or tag sequence, the program will estimate it for you. You can use the graphical user interface to change the tag sequence, if necessary.


Why is the maximum length for tag sequences 64 bp?

TagCleaner is based on a bit-parallel algorithm developed by Myers et al. (1999) and is therefore bounded by the architecture of the system. Systems with 32 or 64 bit architecture basically allow tag sequences of at most 32 or 64 nucleotides, respectively. The current web-server for TagCleaner runs on a 64 bit architecture system and therefore allows a maximum tag sequence length of 64 bp. We choose the bit-parallel algorithm because of its superior performance and because tag/primer sequences with more than 64 bp are rarely used for high-throughput sequencing.


Where can I set the filter parameters in the web version?

TagCleaner does not require the setting of filter parameters (such as maximum number of mismatches) before the data is processed. Instead, the filter parameters are set after the data is processed, which allows the user to choose parameters appropriate for their dataset and does not require them to submit and process the same data with modified parameters for several times.


How long do you keep the data submitted to the web version?

You as the user can select if you want us to keep the data accessible for one day (24 hours) or one week (168 hours). You can also request to delete the data after you are done, or if you want us to keep it for a longer time period.


Why does TagCleaner split fragment-to-fragment concatenations?

Fragment-to-fragment concatenations are artificially concatenated fragments generated by blunt-end ligation before sequencing. The splitting of fragment-to-fragment concatenations is an important pre-processing step to remove tag contaminations inside the sequences. The concatenated fragments may additionally present a source of error for assembly, annotation, and taxonomic assignments (since fragments from different organisms may not be assigned correctly when concatenated).


I have a 16S metagenome with sequence tags. Can I use TagCleaner to remove the tags?

TagCleaner should be able to remove tag sequences from 16S metagenomes without any problem. If the tag sequence is not known, there might be difficulties with the tag sequence detection step. The algorithm implemented in TagCleaner for the automatic detection of tag sequences assumes the randomness of a typical metagenome. Datasets that do not contain random sequences from organisms in an environment, but rather contain, for example, 16S metagenomes may cause incorrect detection of the tag sequences. However, the tag sequences will most likely be over-predicted and can be redefined by the user prior to data processing.


I am getting this error: "Can't use an undefined value as an ARRAY reference at tagcleaner.pl line 435". What should I do?

The error handling in the Perl module File::Path might be different on your system, depending on the version you are using. A simple fix might be to remove the @ sign on line 435 (without garantee that the error checking will then work correctly).
This modification is not part of the current version of TagCleaner, because the latest version (version 2.08) of the module states that if no errors are encountered, $err will reference an empty array. This means that $err will always end up TRUE; so the program needs to test @$err to determine if errors occurred. (See http://perldoc.perl.org/File/Path.html under "Error Handling" for details.)