unname coder's blog

Posts

Showing posts with the label r

How to count the number of matched strings in R, when the string pattern to match is a column from another dataframe?

- May 18, 2021

4 I have got two extremely large dataframes, the first data frame consists of a column body , which is a list of comments and the second one consists of names . I want to count how many elements in body contain each element of names . Here's a small reproducible dataset (the original dataset has about 2000 names, where each name is a name of the car): df1 <- tibble(body = c("The Tesla Roadster has a range of 620 miles", "ferrari needs to make an electric car", "How much does a tesla cost?", "When is the new Mercedes releasing?", "Can't wait to get my hands on the new Tesla")) df2 <- tibble(names = c("FORD...

Accumulate values for every possible combination in R

- May 18, 2021

6 Let's say I have data test (dput given) where a list-col say items : test <- structure(list(items = list('a', c('b', 'c'), c('d', 'e'), 'f', c('g', 'h')), ID = c(1,1,1,2,2)), row.names = c(NA, 5L), class = "data.frame") library(tidyverse) test %>% group_by(ID) %>% mutate(dummy = accumulate(items, ~paste(.x, .y))) I am getting an output with list-col like this items ID dummy 1 a 1 a 2 b, c 1 a b, a c 3 d, e 1 a b d, a c e 4 f 2 f 5 g, h 2 f g, f h I would like there to be four items in row3, having each possible combination, i.e. c("a b d", "a b e", "a c d", "a c e")...

Using the dplyr library in R to “print” the name of the non-NA columns

- May 11, 2021

6 Here is my data frame: a <- data.frame(id=c(rep("A",2),rep("B",2)), x=c(rep(2,2),rep(3,2)), p.ABC= c(1,NA,1,1), p.DEF= c(NA,1,NA,NA), p.TAR= c(1,NA,1,1), p.REP= c(NA,1,1,NA), p.FAR= c(NA,NA,1,1)) I Want to create a new character column (using mutate() in the dplyr library in R), which tells (by row) the name of the columns that have a non-NA value (here the non-NA value is always 1). However, it should only search among the columns that start with "p." and it should order the names by alphabetical order and then concatenate them using the expression "_" as a separator. You can find below the desired result, under the column cal...

Is it possible to draw the axis line first, before the data?

- May 11, 2021

9 2 This is a follow up to my previous question where I was looking for a solution to get the axis drawn first, then the data. The answer works for that specific question and example, but it opened a more general question how to change the plotting order of the underlying grobs. First the axis, then the data. Very much in the way that the panel grid grob can be drawn on top or not. Panel grid and axis grobs are apparently generated differently - axes more as guide objects rather than "simple" grobs. (Axes are drawn with ggplot2:::draw_axis() , whereas the panel grid is built as part of the ggplot2:::Layout object). I guess this is why axes are drawn on top, and I wondered if the drawing order can be changed. # An example to play with library(ggplot2) df...

How to sum one column values and group them by intervals from another column

- May 06, 2021

4 I'm newbie to R and have a data frame with 25k rows and would like to group the SUM of "Freq" inputs within a range of "Var1" (let's say from 5 to 5). Idea is to have less rows and create a histogram. Here are 20 rows for simplicity: Var1 <- c(0:19) Freq <- c(289, 370, 2295, 2691, 2206, 1624, 1267, 1076, 971, 889, 891, 834, 866, 780, 794, 809, 772, 740, 742, 734) df <- data.frame(Var1, Freq) Here is what I would expect: Var1_intervals <- c("0 - 4", "5 - 9", "10 - 14", "15-19") Freq_sum <- c(7851, 5837, 4165, 3797) df_2 <- data.frame(Var1_intervals, Freq_sum) r grouping histogram intervals ...

Is there a convenient way to replicate R's concept of 'named vectors' in Raku, possibly using Mixins?

- April 30, 2021

8 Recent questions on StackOverflow pertaining to Mixins in Raku have piqued my interest as to whether Mixins can be applied to replicate features present in other programming languages. For example, in the R-programming language, elements of a vector can be given a name (i.e. an attribute), which is very convenient for data analysis. For an excellent example see: "How to Name the Values in Your Vectors in R" by Andrie de Vries and Joris Meys, who illustrate this feature using R 's built-in islands dataset. Below is a more prosaic example (code run in the R-REPL): > #R-code > x <- 1:4 > names(x) <- LETTERS[1:4] > str(x) Named int [1:4] 1 2 3 4 - attr(*, "names")= chr [1:4] "A" "B" "C" "D...

remove matrix from a list of matrices

- April 30, 2021

6 I have a list of 12 matrices called M and I'm trying to remove every matrix from the list that has 0 rows. I know that I can manually remove those matrices with (for example, to remove the second matrix) M[2] <- NULL . I would like to use logic to remove them with something like: M <- M[nrow(M)>0,] (but that obviously didn't work). r list matrix remove drop Share Improve this question Follow ...

Extract rows where value appears in any of multiple columns

- April 30, 2021

4 2 Let' say I have two data.frames name_df = read.table(text = "player_name a b c d e f g", header = T) game_df = read.table(text = "game_id winner_name loser_name 1 a b 2 b a 3 a c 4 a d 5 b c 6 c d 7 d e 8 e f 9 f a 10 g f 11 g a 12 f e 13 a d", header = T) name_df contains a unique list of all the winner_name or loser_name values in game_df . I want to create a new data.frame that has, for each person in the name_df a row if a given name (e.g. a ) appears in either the winner_name or loser_name column So I essentially want to merge game_df with name_df , but the key column ( name ) can appear in either winner_name or loser_name . So, for just a and b the final output would look something like: final_df = read.table(text = ...

Problem using rowwise() to count the number of NA's in each row of a dataframe

- April 30, 2021

2 0 I'm having trouble using rowwise() to count the number of NAs in each row. My minimal example: df <- data.frame(Q1 = c(rep(1, 1), rep(NA, 9)), Q2 = c(rep(2, 2), rep(NA, 8)), Q3 = c(rep(3, 3), rep(NA, 7)) ) df Q1 Q2 Q3 1 1 2 3 2 NA 2 3 3 NA NA 3 4 NA NA NA 5 NA NA NA 6 NA NA NA 7 NA NA NA 8 NA NA NA 9 NA NA NA 10 NA NA NA I would like to create a new column that counts the number of NAs in each row. I can do this very simply by writing df$Count_NA <- rowSums(is.na(df)) df Q1 Q2 Q3 Count_NA 1 1 2 3 0 2 NA 2 3 1 3 NA NA 3 2 4 NA NA NA 3 5 NA NA NA 3 6 NA NA NA 3 7 NA NA NA 3 8 NA NA NA 3 9 NA NA NA 3 10 NA NA NA 3 B...

How can I split rows up by the number of times located in a column in R?

- April 30, 2021

5 For example, suppose you have the following dataframe: ID<-c("11", "12", "13", "14", "14") Date<-c("2020-01-01", "2020-02-01", "2020-03-15", "2020-04-10", "2020-06-01") Item<-c("Item1", "Item1", "Item2", "Item2", "Item2") ItemPrice<-c(5, 5, 7, 7, 7) Quantity<-c(1, 2, -2, 2, 3) Cost<-c(5, 10, -14, 14, 21) df<-data.frame(ID, Date, Item, ItemPrice, Quantity, Cost) df ID Date Item ItemPrice Quantity Cost 1 11 2020-01-01 Item1 5 1 5 2 12 2020-02-01 Item1 5 2 10 3 13 2020-03-15 Item2 7 -2 -14 4 14 2020-04-10 Item2 7 2 14 5 14 202...

Canonical tidyverse method to update some values of a vector from a look-up table

- April 30, 2021

22 3 I frequently need to recode some (not all!) values in a data frame column based off of a look-up table. I'm not satisfied by the ways I know of to solve the problem. I'd like to be able to do it in a clear, stable, and efficient way. Before I write my own function, I'd want to make sure I'm not duplicating something standard that's already out there. ## Toy example data = data.frame( id = 1:7, x = c("A", "A", "B", "C", "D", "AA", ".") ) lookup = data.frame( old = c("A", "D", "."), new = c("a", "d", "!") ) ## desired result # id x # 1 1 a # 2 2 a # 3 3 B # 4 4 C # 5 5 d # 6 6 AA # 7 7 ! I c...