hrf_seq: estimate_unique_counts.sh annotate

annotate estimate_unique_counts.sh @ 2:14d6929f8aa1 draft default tip

Tool tests or/and test-data are missing.

author	nikos
date	Thu, 11 Sep 2014 08:30:05 -0400
parents
children

rev	line source
2 14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	1 #!/bin/bash
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	2
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	3 ############################################################
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	4 #Create table of fragments containing coordinates[RNA molecule, start,end], number of mapped reads, number of unique barcodes and 3'most nucleotide of cDNA (on that was ligated to)
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	5 ############################################################
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	6 #Remove untemplated nucleotides and create positions_temp file
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	7
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	8 #defaults
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	9 output_dir="output_dir"
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	10 priming_pos=-1
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	11
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	12 #parse input
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	13 while getopts hf:b:p:o: myarg
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	14 do case "$myarg" in
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	15 h) echo "Usage: estimate_unique_counts.sh -f <bam_file> -b <barcodes_file> -o <output_dir>"
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	16 exit ;;
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	17 f) bamfile="$OPTARG" ;; #required
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	18 b) barcodes="$OPTARG" ;; #required
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	19 p) priming_pos="$OPTARG" ;; #optional
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	20 o) output_dir="$OPTARG" ;; #optional
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	21 [?]) echo "Usage: estimate_unique_counts.sh -f <myfile.bam> -b <barcodes.txt>"
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	22 exit 1 ;;
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	23 esac
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	24 done
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	25
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	26 mkdir $output_dir
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	27
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	28 samtools view $bamfile \| awk 'BEGIN{OFS="\t"}{if(substr($0,1,1)!="@"){print}}' - \| awk -v out="${output_dir}/trimming_stats.txt" 'BEGIN{OFS="\t";counter[0]=0;counter[1]=0;counter[2]=0;counter[3]=0}
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	29 function abs(value){return(value<0?-value:value)}
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	30 ($2==99 && /[\s\t]MD:Z:/ && !/MD:Z:([012][ACGT])/) {print($1, $3, $4+0, $4+abs($9)-1);counter[0]++;next};
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	31 ($2==99 && /[\s\t]MD:Z:0[ACGT]/ && !/MD:Z:0[ACGT][01][ACGT]/) {print($1, $3, $4+1, $4+abs($9)-1);counter[1]++;next};
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	32 ($2==99 && (/[\s\t]MD:Z:1[ACGT]/ && !/MD:Z:1[ACGT]0[ACGT]/ \|\| (/MD:Z:0[ACGT]0[ACGT]/ && !/MD:Z:0[ACGT]0[ACGT]0[ACGT]/))) {print($1, $3, $4+2, $4+abs($9)-1);counter[2]++;next};
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	33 ($2==99 && ((/[\s\t]MD:Z:1[ACGT]0[ACGT]/)\|\|(/MD:Z:0[ACGT]1[ACGT]/)\|\|(/MD:Z:0[ACGT]0[ACGT]0[ACGT]/)\|\|(/MD:Z:2[ACGT]/))) {print($1, $3, $4+3, $4+abs($9)-1);counter[3]++;next}
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	34 END{print("No trimming:",counter[0],"1 nt trimmed:", counter[1],"2 nt trimmed:", counter[2],"3 nt trimmed:",counter[3]) > out}' \| sort -k1,1 \| gzip > positions_temp_sorted.gz &
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	35
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	36 # Computing barcode length (Use the first line and compute the string length of the second column
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	37
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	38 TMP=`head -1 $barcodes \| awk '{print $2}'`
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	39 BAR_LEN=`echo ${#TMP}`
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	40
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	41 # Remove "@" from barcodes and sort them
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	42 sed 's/^.//' $barcodes \| sort -k1,1 \| gzip > barcodes_temp_sorted.gz &
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	43
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	44 wait
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	45
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	46 # Merge poistions and barcodes
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	47 join -1 1 <(zcat positions_temp_sorted.gz) <(zcat barcodes_temp_sorted.gz) \| cut -f 2,3,4,5 -d " " \| awk '{if($4 !~ /N/){print}}' \| awk -v bar_len="${BAR_LEN}" '{if(length($4)==bar_len){print}}' \| gzip > merged_temp.gz
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	48
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	49 #### If priming flag is set....
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	50 if [ $priming_pos != -1 ]; then
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	51 zcat merged_temp.gz \| awk -v pos="${priming_pos}" '{print $1, $2, pos, $4}' - > merged_temp2
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	52 cat merged_temp2 \| gzip > merged_temp.gz
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	53 rm merged_temp2
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	54 fi
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	55
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	56 #File summary.gz columns: RNA_ID, Start, End, barcode sequence, sequenced_count[=number of sequenced fragments fulfilling previous requiremnts]
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	57
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	58 zcat merged_temp.gz \| awk '{barcode[$1][$2][$3][$4]++}END{
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	59 for(RNA in barcode){
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	60 for(start_position in barcode[RNA]){
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	61 for(end_position in barcode[RNA][start_position]){
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	62 for(barseq in barcode[RNA][start_position][end_position]){print RNA,start_position,end_position,barseq,barcode[RNA][start_position][end_position][barseq]}}}}}' > $output_dir/summary.txt
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	63
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	64 #File unique_barcodes columns: RNA_ID, Start, End, number of unique barcodes observed for this fragment [PROBLEM: How to treat the different 3cdns for the same fragment? if the template was homogenous then it should be always the same]
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	65
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	66 awk '{barcode[$1][$2][$3]++}END{
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	67 for(RNA in barcode){
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	68 for(start_position in barcode[RNA]){
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	69 for(end_position in barcode[RNA][start_position]){print RNA,start_position,end_position,barcode[RNA][start_position][end_position]}}}}' $output_dir/summary.txt > $output_dir/unique_barcodes.txt &
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	70
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	71 #read_counts.gz colums: RNA_ID, Start, End, sequenced_count
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	72
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	73 zcat merged_temp.gz \| awk '{barcode[$1][$2][$3]++}END{
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	74 for(RNA in barcode){
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	75 for(start_position in barcode[RNA]){
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	76 for(end_position in barcode[RNA][start_position]){print RNA,start_position,end_position,barcode[RNA][start_position][end_position]}}}}' > $output_dir/read_counts.txt &
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	77
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	78 wait
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	79
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	80 ##Remove temporary files - didn't do it, in case debugging needed.
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	81
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	82 rm positions_temp_sorted.gz
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	83 rm barcodes_temp_sorted.gz
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	84 #rm merged_temp.gz
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	85
14d6929f8aa1 Tool tests or/and test-data are missing. nikos parents: diff changeset	86 ##End of remove temp files

Mercurial > repos > nikos > hrf_seq

annotate estimate_unique_counts.sh @ 2:14d6929f8aa1 draft default tip