Mercurial > repos > greg > cwpair2
diff cwpair2.xml @ 11:6383cae47688 draft
Uploaded
author | greg |
---|---|
date | Wed, 16 Dec 2015 14:18:28 -0500 |
parents | b52d6705aed0 |
children | 79a10fe09b66 |
line wrap: on
line diff
--- a/cwpair2.xml Wed Dec 02 16:14:07 2015 -0500 +++ b/cwpair2.xml Wed Dec 16 14:18:28 2015 -0500 @@ -27,8 +27,8 @@ </command> <inputs> <param name="input" type="data" format="gff" multiple="True" label="Find matched pairs on" /> - <param name="up_distance" type="integer" value="50" min="0" label="Distance upstream from a peak to allow a pair" /> - <param name="down_distance" type="integer" value="100" min="0" label="Distance downstream from a peak to allow a pair" /> + <param name="up_distance" type="integer" value="50" min="0" label="Distance upstream from a peak to allow a pair" help="The maximum distance upstream or 5’ to the primary peak"/> + <param name="down_distance" type="integer" value="100" min="0" label="Distance downstream from a peak to allow a pair" help="The maximum distance downstream or 3’ to the primary peak"/> <param name="method" type="select" label="Method of finding a match"> <option value="mode" selected="True">Mode</option> <option value="closest">Closest</option> @@ -127,33 +127,58 @@ <help> **What it does** -Takes a list of called peaks on both strands and produces lists of matched pairs and unmatched peaks using a -specified method for finding matched pairs. Methods for finding matched pairs are mode, closest, largest or -all (where the analysis is run for each method). A statistics dataset is generated and a collection of datasets -is produced for each method as follows. +CWPair accepts one or more gff files as input and takes the peak location to be the midpoint between the +exclusion zone start and end coordinate (columns D and E). CWPair starts with the highest peak (primary peak) +in the dataset, and then looks on the opposite strand for another peak located within the distance defined by +a combination of the tool's **Distance upstream from a peak to allow a pair** (the distance upstream or 5’ to +the primary peak) and **Distance downstream from a peak to allow a pair** (the distance downstream or 3’ to the +primary peak) parameters. So "upstream" value 30 "downstream" value 20 makes the tool look 30 bp upstream and +20 bp downstream (inclusive). Consequently, the search space would be 51 bp, since it includes the primary peak +coordinate. The use of a negative number changes the direction of the search limits. So, "upstream" -30 and +"downstream" 20 produces an 11 bp downstream search window (20-30 bp downstream, inclusive). -**Data Files** +.. image:: $PATH_TO_IMAGES/cwpair2.png + +When encountering multiple candidate peaks within the search window, CWPair uses the resolution method defined by +the tool's **Method of finding a match** parameter as follows: + + + * **mode** - This is an iterative process in which all peak-pair distances within the search window are determined, and the mode calculated. The pair whose distance apart is closest to the mode is then selected. + * **closest** - Pairs the peak that has the closest absolute distance from the primary peak. + * **largest** - Pairs the peak that has the highest tag count. + * **all** - Runs all three methods, producing separate outputs for each. -* **closest/largest/mode MP** - the Matched Pairs in gff format -* **closest/largest/mode O** - the Orphans in tabular format -* **closest/largest/mode D** - the Details in tabular format +When considering the candidate peaks for pairing to a primary peak, a tag-count threshold may also be set using +the tool's **Filter using relative/absolute threshold** parameter. A relative threshold determines the tag counts +at the 95th percentile of peak occupancy (i.e. top 5% in terms of tag counts), then uses a tag count threshold at +the specified percentage of this 95th percentile. So if the peak at the 95th percentile has 200 tags, and "relative +threshold" 50 is used, then it will not consider any peak having less than 100 tags. + +----- + +**Output Data Files** -**Statistics Files** + * **closest/largest/mode MP** - gff file containing the Matched Pairs and includes the peak-pair midpoint coordinate (column D) and the coordinate +1 (column E). The tag count sum is reported in column F, along with the C-W distance in bp in column I. + * **closest/largest/mode O** - tabular file containing the Orphans (all peaks that are not in pairs). + * **closest/largest/mode D** - tabular file containing the Details, which lists + and – strand information separately. The start and end represent the lower and higher coordinates of the exclusion zone from GeneTrack, and “Value” is the tag count sum within the exclusion zone. The peak pair midpoint is calculated along with the distance between the two paired peaks (midpoint-to-midpoint or C-W distance). -* **closest/largest/mode C** - the stastics graph in pdf format -* **closest/largest/mode P** - the preview plots graph in pdf format -* **closest/largest/mode F** - the final plots graph in pdf format +**Output Statistics Files** + + * **closest/largest/mode C** - pdf file that provides the frequency distribution of peak pair distances. + * **closest/largest/mode P** - pdf file that provides the preview plots graph (the initial iteration of the process for finding the mode). + * **closest/largest/mode F** - pdf file that provides the final plots graph. + * **Statistics Table** - provides the number of peaks in pairs (dividing this by 2 provides the number of peak-pairs). ----- **Options** -* **Method of finding match** - Method of finding matched pair, mode, closest, largest, or all (run with each method). -* **Distance upstream from a peak to allow a pair** - Distance upstream from a Watson peak to allow a Crick pair. -* **Distance downstream from a peak to allow a pair** - Distance downstream from a Watson peak to allow a Crick pair. -* **Percentage of the 95 percentile value to filter below** - Percentage of the 95 percentile value below which to filter when using a relative threshold. -* **Absolute value to filter below** - Absolute value below which to filter when using an absolute threshold. -* **Output files** - Restrict output dataset collections to matched pairs only or one of several combinations of collection types. + * **Method of finding match** - Method of finding matched pair, mode, closest, largest, or all (run with each method). + * **Distance upstream from a peak to allow a pair** - The maximum distance (inclusive) upstream on the opposite strand from the primary peak to locate another peak, resulting in a pair. + * **Distance downstream from a peak to allow a pair** - The maximum distance (inclusive) downstream on the opposite strand from the primary peak to locate another peak, resulting in a pair. + * **Percentage of the 95 percentile value to filter below** - Percentage of the 95 percentile value below which to filter when using a relative threshold. + * **Absolute value to filter below** - Absolute value below which to filter when using an absolute threshold. + * **Output files** - Restrict output dataset collections to matched pairs only or one of several combinations of collection types. </help> <expand macro="citations" />