How to add an integer to a difference calculation and print it to the end of a line?
Goal: To print the difference between two fields separated by semicolons ($3
and $2
) and add an integer (+1
) to that calculated value at the end of each line beginning with ">
".
Representative sample of my file:
>lcl|ORF1_ 17609 17804 (+):21:131 unnamed protein product
MEKVKNKFDENDIKVPFVPSSLLFNNTGNLNTMDKR
>lcl|ORF2_ 17609 17804 (+):70:111 unnamed protein product
MFLLHYYLIIQVI
>lcl|ORF3_ 17609 17804 (+):112:147 unnamed protein product
MQWIKDKVLIK
>lcl|ORF4_ 17609 17804 (+):129:91 unnamed protein product
MFYPLYLDYLYY
>lcl|ORF5_ 17609 17804 (+):90:1 unnamed protein product, partial
MIMKKEQMELLYHSHQIYFLPFPLHQNIHP
Desired Output:
>lcl|ORF1_ 17609 17804 (+):21:131 unnamed protein product:111
MEKVKNKFDENDIKVPFVPSSLLFNNTGNLNTMDKR
>lcl|ORF2_ 17609 17804 (+):70:111 unnamed protein product:42
MFLLHYYLIIQVI
>lcl|ORF3_ 17609 17804 (+):112:147 unnamed protein product:36
MQWIKDKVLIK
>lcl|ORF4_ 17609 17804 (+):129:91 unnamed protein product:39
MFYPLYLDYLYY
>lcl|ORF5_ 17609 17804 (+):90:1 unnamed protein product, partial:90
MIMKKEQMELLYHSHQIYFLPFPLHQNIHP
My current awk
script gets me very close by printing the difference between $3
and $2
at the end of each line, but does not include the +1
addition step (required) and is not specific to lines beginning with ">
", despite my attempt with /^ *>/
(not required, but nice):
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {$4=$3-$2} $4<0 {$4=-$4} 1' file
>lcl|ORF1_ 17609 17804 (+):21:131 unnamed protein product:110
MEKVKNKFDENDIKVPFVPSSLLFNNTGNLNTMDKR:::0
>lcl|ORF2_ 17609 17804 (+):70:111 unnamed protein product:41
MFLLHYYLIIQVI:::0
>lcl|ORF3_ 17609 17804 (+):112:147 unnamed protein product:35
MQWIKDKVLIK:::0
>lcl|ORF4_ 17609 17804 (+):129:91 unnamed protein product:38
MFYPLYLDYLYY:::0
>lcl|ORF5_ 17609 17804 (+):90:1 unnamed protein product, partial:89
MIMKKEQMELLYHSHQIYFLPFPLHQNIHP:::0
Attempts to add the integer (+1
) to the difference calculation:
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {$4+1=$3-$2} $4<0 {$4=-$4} 1' file
awk: line 1: syntax error at or near =
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {$4+=1=$3-$2} $4<0 {$4=-$4} 1' file
awk: line 1: syntax error at or near =
$ awk -F":" -v n=1 'BEGIN {OFS=FS} /^ *>/ {$4+n=$3-$2} $4<0 {$4=-$4} 1' file
awk: line 1: syntax error at or near =
And although I'm not sure how to implement functions using awk
, I think there could be some utility in using something similar to this:
$ function add_one (number) {
return number + 1
}
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {add_one($4)=$3-$2} $4<0 {$4=-$4} 1' file
While I have been attempting to use awk
to solve this problem, I am interested in any solution (e.g., since I am attempting to perform this calculation line-by-line, perhaps there is a more efficient solution with sed
?).
:)
– David C. Rankin Apr 10 at 4:29