How to print row(s) if they meet a certain range

- April 30, 2021

I have two mega files that look like below:

f1:

chr1,3073253,3074322,gene_id,"ENSMUSG00000102693.1",gene_type,"TEC"
chr1,3074253,3075322,gene_id,"ENSMUSG00000102693.1",transcript_id,"ENSMUST00000193812.1"
chr1,3077253,3078322,gene_id,"ENSMUSG00000102693.1",transcript_id,"ENSMUST00000193812.1"
chr1,3102916,3103025,gene_id,"ENSMUSG00000064842.1",gene_type,"snRNA"
chr1,3105016,3106025,gene_id,"ENSMUSG00000064842.1",transcript_id,"ENSMUST00000082908.1"

f2:

chr,name,start,end
chr1,linc1320,3073300,3074300
chr3,linc2245,3077270,3078250
chr1,linc8956,4410501,4406025

What I want to do is to print the rows of file 2 in a separate column in file 1 IF the range of start and end column of file2 is inside the ranges in file1 (columns 2 and 3) and chr is the same. So based on the dummy example files I provided - the desired output should be (only the range of linc1320 is in the first row of the file1):

chr1,3073253,3074322,gene_id,"ENSMUSG00000102693.1",gene_type,"TEC",linc1320,3073300,3074300
chr1,3074253,3075322,gene_id,"ENSMUSG00000102693.1",transcript_id,"ENSMUST00000193812.1"
chr1,3077253,3078322,gene_id,"ENSMUSG00000102693.1",transcript_id,"ENSMUST00000193812.1"
chr1,3102916,3103025,gene_id,"ENSMUSG00000064842.1",gene_type,"snRNA"
chr1,3105016,3106025,gene_id,"ENSMUSG00000064842.1",transcript_id,"ENSMUST00000082908.1"

I am not a professional coder but I have been using this code to manually change the ranges based on the file2:

awk -F ',' '$2<=3073300,$3>=3074300, {print $1,$2,$3,$4,$5,$6,$7}' f1.csv

I do not have a particular preference for using a specific programming language - both Python and awk would be very helpful. Any help is appreciated thank you.

edited yesterday

asked 2 days ago

Apex

7591 silver badge14 bronze badges

2

Are you okay with using the pandas library in python or you want the script in core python itself? – Shubham Sharma 2 days ago
1

I would like to also have the solution with pandas as well - thank you. – Apex 2 days ago

Add a comment |

Search This Blog

unname coder's blog

How to print row(s) if they meet a certain range

Comments

Post a Comment

Popular posts from this blog

Meaning of `{}` for return expression

Get current scroll position of ScrollView in React Native

Chart JS +ng2-charts not working on Angular+2