Posts

Showing posts with the label pandas

How to add 91 to all the values in a column of a pandas data frame?

2 Consider my data frame as like this S.no Phone Number 1 9955290232 2 8752837492 3 9342832245 4 919485928837 5 917482482938 6 98273642733 I want the values in "Phone number" column to prefixed with 91 If the value has 91 already then, proceed to the next value. My output S.no Phone Number 1 919955290232 2 918752837492 3 919342832245 4 919485928837 5 917482482938 6 919827364273 How could this be done? python pandas dataframe numpy Share ...

how to get a subgroup start finish indexes of dataframe

5 1 df=pd.DataFrame({"C1":['USA','USA','USA','USA','USA','JAPAN','JAPAN','JAPAN','USA','USA'],'C2':['A','B','A','A','A','A','A','A','B','A']}) C1 C2 0 USA A 1 USA B 2 USA A 3 USA A 4 USA A 5 JAPAN A 6 JAPAN A 7 JAPAN A 8 USA B 9 USA A This is a watered version of my problem so to keep it simple, my objective is to iterate a sub group of the dataframe where C2 has B in it. If a B is in C2 - I look at C1 and need the entire group. So in this example, I see USA and it starts at index 0 and finish at 4. Another one is between 8 and 9. ...

Create n rows per id | Pandas

8 1 I have a Dataframe df as follows: id lob addr addr2 a1 001 1234 0 a1 001 1233 0 a3 003 1221 0 a4 009 1234 0 I want to generate n (let's take 4) rows per id, with the other columns being null/na/nan values. So, the above table is to be transformed to: id lob addr addr2 a1 001 1234 0 a1 001 1233 0 a1 001 na na a1 na na na a3 003 1221 0 a3 na na na a3 na na na a3 na na na a4 009 1234 0 a4 na na na a4 na na na a4 na na na How can I achieve this? I will have anywhere from 500-700 ids at the time of execution and the n will always be 70 (so each id should have 70 rows). I wanted to create a loop that would create a row, do a group by id, se...

Why does pandas “None | True” return False when Python “None or True” returns True?

21 0 In pure Python, None or True returns True . However with pandas when I'm doing a | between two Series containing None values, results are not as I expected: >>> df.to_dict() {'buybox': {0: None}, 'buybox_y': {0: True}} >>> df buybox buybox_y 0 None True >>> df['buybox'] = (df['buybox'] | df['buybox_y']) >>> df buybox buybox_y 0 False True Expected result: >>> df buybox buybox_y 0 True True I get the result I want by applying the OR operation twice, but I don't get why I should do this. I'm not looking for a workaround (I have it by applying df['buybox'] = (df['buybox'] | df['buybox_y']) twice in a row) but a...

Applying regex to pandas column based on different pos of same character

8 I have a dataframe like as shown below tdf = pd.DataFrame({'text_1':['value: 1.25MG - OM - PO/TUBE - ashaf', 'value:2.5 MG - OM - PO/TUBE -test','value: 18 UNITS(S)','value: 850 MG - TDS AFTER FOOD - SC (SUBCUTANEOUS) -had', 'value: 75 MG - OM - PO/TUBE']}) I would like to apply regex and create two columns based on rules given below col val should store all text after value: and before first hyphen col Adm should store all text after third hyphen I tried the below but it doesn't work accurately tdf['text_1'].str.findall('[.0-9]+\s*[mgMG/lLcCUNIT]+') python regex pandas string dataframe ...

How to print row(s) if they meet a certain range

6 I have two mega files that look like below: f1: chr1,3073253,3074322,gene_id,"ENSMUSG00000102693.1",gene_type,"TEC" chr1,3074253,3075322,gene_id,"ENSMUSG00000102693.1",transcript_id,"ENSMUST00000193812.1" chr1,3077253,3078322,gene_id,"ENSMUSG00000102693.1",transcript_id,"ENSMUST00000193812.1" chr1,3102916,3103025,gene_id,"ENSMUSG00000064842.1",gene_type,"snRNA" chr1,3105016,3106025,gene_id,"ENSMUSG00000064842.1",transcript_id,"ENSMUST00000082908.1" f2: chr,name,start,end chr1,linc1320,3073300,3074300 chr3,linc2245,3077270,3078250 chr1,linc8956,4410501,4406025 What I want to do is to print the rows of file 2 in a separate column in file 1 IF the range of start and ...