How to count the number of matched strings in R, when the string pattern to match is a column from another dataframe?
I have got two extremely large dataframes, the first data frame consists of a column body
, which is a list of comments and the second one consists of names
. I want to count how many elements in body
contain each element of names
. Here's a small reproducible dataset (the original dataset has about 2000 names, where each name is a name of the car):
df1 <- tibble(body = c("The Tesla Roadster has a range of 620 miles",
"ferrari needs to make an electric car",
"How much does a tesla cost?",
"When is the new Mercedes releasing?",
"Can't wait to get my hands on the new Tesla"))
df2 <- tibble(names = c("FORD", "TESLA", "MERCEDES", "FERRARI", "JAGUAR", "HYUNDAI"))
As mentioned above, I am trying to count the number of times each of the values in names occur in body and then preferably, I want to add it as a column in df2
. I have tried it in the following way:
counter = c()
for (i in df2$names) {
counter[i] = sum(ifelse(str_detect(df1$body, i),1, 0))
}
While this method works, it takes extremely long amounts of time and returns a vector where the names are attributes of the counter
values, I then unstack it and join the dataframe to df2
using names
as keys.
This is the only method that works, apart from that I have tried using str_count
but with my current proficiency in R, the code was absolutely horrible and got me nowhere.
Is there a more efficient way of finding the matching strings? I have tried to find similar questions on stack but to no avail!
Many Thanks in advance :)