R: Lookup value for calculation (similar to vlookup in Excel)? - r

I've got the following dataframes in R.
Name <- c("Tom", "Bill", "Jeffrey", "George", "David")
Value <- c("5.24", "5.48", "6.32", "6.07", "5.1")
df <- data.frame(Name, Value)
And:
Name1 <- c("Tom", "George", "David")
Name2 <- c("Jeffrey", "Bill", "Tom")
df2 <- data.frame(Name1, Name2)
I want to create another column in df2 which will produce the following:
Name1 * Name2 (based on value from df)
What is the best way to achieve this in R?
I know you can use vlookup function in Excel.

We could use match to get the index for subsetting the 'Value' column use Reduce to do elementwise multiplication
# as Value is created with quotes, it is of class character
# use type.convert to change the column types
df <- type.convert(df, as.is = TRUE)
df2$Value <- Reduce(`*`, lapply(df2, function(x) df$Value[match(x, df$Name)]))
-output
> df2
Name1 Name2 Value
1 Tom Jeffrey 33.1168
2 George Bill 33.2636
3 David Tom 26.7240
> with(df, Value[Name == 'Tom'] * Value[Name == "Jeffrey"])
[1] 33.1168
Or without Reduce
with(df2, df$Value[match(Name1, df$Name)] * df$Value[match(Name2, df$Name)])
[1] 33.1168 33.2636 26.7240
Or using tidyverse
library(dplyr)
library(purrr)
df2 %>%
mutate(Value = across(everything(), ~ df$Value[match(.x, df$Name)]) %>%
reduce(`*`))
-output
Name1 Name2 Value
1 Tom Jeffrey 33.1168
2 George Bill 33.2636
3 David Tom 26.7240

using tapply:
first convert to numeric.
df$Value <- as.numeric(df$Value)
Then
df2$val <- tapply(df$Value[match(unlist(df2), df$Name)],row(df2), prod)
Name1 Name2 val
1 Tom Jeffrey 33.1168
2 George Bill 33.2636
3 David Tom 26.7240

library(dplyr)
df2 %>%
left_join(df, join_by(Name1 == Name)) %>%
left_join(df, join_by(Name2 == Name)) %>%
mutate(mult = as.numeric(Value.x) * as.numeric(Value.y))
Result
Name1 Name2 Value.x Value.y mult
1 Tom Jeffrey 5.24 6.32 33.1168
2 George Bill 6.07 5.48 33.2636
3 David Tom 5.1 5.24 26.7240

Related

Find gender based on a specific dataframe

Having a dataframe with have the gender of specific names
dfgender <- data.frame(name = c("Helen","Erik"), gender = c("F","M"))
How is it possible to use the previous data frame in order to check the names of another column of a dataframe and insert "Neutral" if the name is not in the list of gender dataframe:
Example of the dataframe with the names:
dfnames <- data.frame(names = c("Helen", "Von", "Erik", "Brook"))
Example of expected output
dfnames <- data.frame(name = c("Helen", "Von", "Erik", "Brook"), gender = c("F", "Neutral", "M", "Neutral"))
left_join + replace_na should do:
dfnames %>% left_join(dfgender, by=c('names' = 'name')) %>%
mutate(gender = gender %>% as.character %>% replace_na('Neutral'))
# names gender
# 1 Helen F
# 2 Von Neutral
# 3 Erik M
# 4 Brook Neutral
The (experimental) rows_update could be an intuitive compliment to #Juan C's answer:
library(dplyr)
dfnames |>
mutate(gender = "Neutral") |>
rows_update(rename(dfgender, names = name), "names")
Output:
names gender
1 Helen F
2 Von Neutral
3 Erik M
4 Brook Neutral
Here's a solution similar to Juan C's but with, I think, a simpler replacement of NAs:
library(dplyr)
library(tidyr)
dfgender <- data.frame(name = c("Helen","Erik"), gender = c("F","M"))
dfnames <- data.frame(names = c("Helen", "Von", "Erik", "Brook"))
dfnames %>%
left_join(dfgender, by = c("names" = "name")) %>%
replace_na(list(gender = "Neutral"))
# names gender
# 1 Helen F
# 2 Von Neutral
# 3 Erik M
# 4 Brook Neutral
And here's another solution with no tidyr dependency:
library(dplyr)
dfgender <- data.frame(name = c("Helen","Erik"), gender = c("F","M"))
dfnames <- data.frame(names = c("Helen", "Von", "Erik", "Brook"))
dfnames %>%
left_join(dfgender, by = c("names" = "name")) %>%
mutate(gender = coalesce(gender, "Neutral"))
# names gender
# 1 Helen F
# 2 Von Neutral
# 3 Erik M
# 4 Brook Neutral

Combine similar columns with different character names data?

ID <- c("IDa", "IDb","IDc","IDe","IDd","IDe")
names1 <- c("robin", "bob", "eric", "charlie", "robin", "gabby")
matrix1 <- matrix(names1, 1, 6)
colnames(matrix1) <- c("IDa", "IDb", "IDc","IDe", "IDd", "IDe")
This is the output:
IDa
IDb
IDc
IDe
IDd
IDe
robin
bob
eric
charlie
robin
gabby
But I want it to look like this:
IDa
IDb
IDc
IDe
IDd
robin
bob
eric
charlie
robin
gabby
We may split and then cbind after padding with NA
lst1 <- split(names, ID)
do.call(cbind, lapply(lst1, `length<-`, max(lengths(lst1))))
-output
IDa IDb IDc IDd IDe
[1,] "robin" "bob" "eric" "robin" "charlie"
[2,] NA NA NA NA "gabby"
Another option:
library(reshape2)
library(tidyverse)
melt(matrix1) %>%
select(-Var1) %>%
group_by(Var2) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = Var2,
values_from = value
) %>%
select(-id)
IDa IDb IDc IDe IDd
<chr> <chr> <chr> <chr> <chr>
1 robin bob eric charlie robin
2 NA NA NA gabby NA

How to find intersect elements of concatenated string?

# create sample df
basket_customer <- c("apple,orange,banana","apple,banana,orange","strawberry,blueberry")
basket_ideal<- c("orange,banana","orange,apple,banana","strawberry,watermelon")
customer_name <- c("john","adam","john")
visit_id <- c("1001","1001","1003")
df2 <- cbind.data.frame(basket_customer,basket_ideal,customer_name,visit_id)
df2$basket_ideal <- as.character(basket_ideal)
df2$basket_customer <- as.character(basket_customer)
The goal is to compare the basket elements (fruits) of each customer to the ideal basket and return the missing fruit.
Note the same visit_id can exists for 1 or more users so the uniqueID is (id+username) and elements are not alphabetically sorted.
expected output:
visit_id
customer_name
NOT_in_basket_ideal
NOT_in_basket_customer
1001
john
apple
NA
1001
adam
NA
NA
1003
john
blueberry
watermelon
I tried using row_wise(),intersect(),except(),and unnesting however did not succeed. Thank you
We could use Map to loop over the corresponding elements of the list columns, and use setdiff to get the elements of the first vector not in the second
cst_list <- strsplit(df2$basket_customer, ",\\s*")
idl_list <- strsplit(df2$basket_ideal, ",\\s*")
lst1 <- Map(function(x, y) if(identical(x, y)) 'equal'
else setdiff(x, y), cst_list, idl_list)
lst1[lengths(lst1) == 0] <- NA_character_
v1 <- sapply(lst1, toString)
and the second case, just reverse the order
lst2 <- Map(function(x, y) if(identical(x, y)) 'equal'
else setdiff(y, x), cst_list, idl_list)
lst2[lengths(lst2) == 0] <- NA_character_
v2 <- sapply(lst2, toString)
Combining the output from both to 'df2'
df2[c("NOT_in_basket_ideal", "NOT_in_basket_customer")] <- list(v1, v2)
-output
df2[-(1:2)]
# customer_name visit_id NOT_in_basket_ideal NOT_in_basket_customer
#1 john 1001 apple NA
#2 adam 1001 NA NA
#3 john 1003 blueberry watermelon
Or in tidyverse
library(dplyr)
library(purrr)
library(stringr)
df2 %>%
mutate(across(starts_with('basket'), ~ str_extract_all(., "\\w+"))) %>%
transmute(customer_name, visit_id,
NOT_in_basket_ideal = map2_chr(basket_customer,
basket_ideal, ~ toString(setdiff(.x, .y))),
NOT_in_basket_customer = map2_chr(basket_ideal, basket_customer,
~ toString(setdiff(.x, .y))))
# customer_name visit_id NOT_in_basket_ideal NOT_in_basket_customer
#1 john 1001 apple
#2 adam 1001
#3 john 1003 blueberry watermelon

Remove rows found in more than 3 groups

I have a dataframe, i am trying to remove the rows that are present in >= 3 groups. In my below example bike is the common value across 3 group and i need to remove that. Please help me to achieve this.
df <- data.frame(a = c("name1","name1","name1","name2","name2","name2","name3"), b=c("car","bike","bus","train","bike","tour","bike"))
df
a b
name1 car
name1 bike
name1 bus
name2 train
name2 bike
name2 tour
name3 bike
Expected Output:
a b
name1 car
name1 bus
name2 train
name2 tour
You can use dplyr::n_distinct:
n_gr <- 3
cn <- df %>% group_by(b) %>% summarise(na = n_distinct(a)) %>%
filter(na >= n_gr) %>% pull(b)
df <- df %>% filter(!(b %in% cn))
Output
a b
1 name1 car
2 name1 bus
3 name2 train
4 name2 tour
In base R you could do this...
df[ave(as.numeric(as.factor(df$a)), #convert a to numbers (factor levels) (required by ave)
df$b, #group by b
FUN=length) < 3, ] #return whether no of a's per b is less than 3
a b
1 name1 car
3 name1 bus
4 name2 train
6 name2 tour
Using data.table:
library(data.table)
setDT(df)[, count := .N, by = b] ## convert df to data.table & create a column to count groups
df <- df[!(count >= 3), ] ## delete rows that have count equal to 3 or more than 3
df[, count := NULL] ## delete the column created
df
a b
1: name1 car
2: name1 bus
3: name2 train
4: name2 tour
Using Base R:
df <- data.frame(a = c("name1","name1","name1","name2","name2","name2","name3"), b=c("car","bike","bus","train","bike","tour","bike"))
df
lst <- table(df$b)
df[df$b != names(lst)[lst >=3],]
# a b
# 1 name1 car
# 3 name1 bus
# 4 name2 train
# 6 name2 tour

Multiple Values in One Cell using R

Suppose, there are 2 data.frames, for instance:
dat1 <- read.table("[path_dat1]", header=TRUE, sep=",")
id name age
1 Jack 21
2 James 40
dat2 <- read.table("[path_dat2]", header=TRUE, sep=",")
id interests
1 football
1 basketball
1 soccer
2 pingpang ball
How do I join table 1 and table 2 into a data.frame like the one below?
id name age interests
1 1 Jack 21 (football, basketball, soccer)
2 2 James 40 (pingpang ball)
How can I join these using plyr in the simplest way?
I can't tell you how to solve this in plyr but can in base:
dat3 <- aggregate(interests~id, dat2, paste, collapse=",")
merge(dat1, dat3, "id")
EDIT: If you really want the parenthesis you could use:
ppaste <- function(x) paste0("(", gsub("^\\s+|\\s+$", "", paste(x, collapse = ",")), ")")
dat3 <- aggregate(interests~id, dat2, ppaste)
merge(dat1, dat3, "id")
Using Tyler's example:
dat1$interests <- ave(dat1$id, dat1$id,
FUN=function(x) paste(dat2[ dat2$id %in% x, "interests"], collapse=",") )
> dat1
id name age interests
1 1 Jack 21 football, basketball, soccer
2 2 James 40 pingpang ball

Resources