How to delete pairs of duplicate across multiple columns in R - r

I have a data frame as below:
dat <- data.frame(
V1=c("A","B","C","A"),
V2=c("B","C","D","B"),
V3=c("C","D","","C"),
V4=c("D","","","E")
)
V1 V2 V3 V4
1 A B C D
2 B C D
3 C D
4 A B C E
Row 2 and 3 are in Row 1 in different columns. How can I filter out Row 2 and 3 so that I am left with Row 1 and 4 only?

Paste together each row. Go through the values and check if it (partially) matches any of the values using grepl.
S = trimws(Reduce(paste, dat), "right")
dat[sapply(1:length(S), function(i) !any(grepl(S[i], S[-i]))),]
# V1 V2 V3 V4
#1 A B C D
#4 A B C E

Related

R: How to replace a set of values in a dataframe [duplicate]

Complicated title but here is a simple example of what I am trying to achieve:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8),
v2 = c("A","E","C","B","B","C","A","E"))
m <- data.frame(v3 = c("D","E","A","C","D","B"),
v4 = c("d","e","a","c","d","b"))
Values in d$v2 should be replaced by values in m$v4 by matching the values from d$v2 in m$v3
The resulting data frame d should look like:
v1 v4
1 a
2 e
3 c
4 b
5 b
6 c
7 a
8 e
I tried different stuff and the closest I came was: d$v2 <- m$v4[which(m$v3 %in% d$v2)]
I try to avoid any for-loops again! Must be possible :-) somehow... ;)
You could try:
merge(d,m, by.x="v2", by.y="v3")
v2 v1 v4
1 A 1 a
2 A 7 a
3 B 4 b
4 B 5 b
5 C 3 c
6 C 6 c
7 E 2 e
8 E 8 e
Edit
Here is another approach, to preserve the order:
data.frame(v1=d$v1, v4=m[match(d$v2, m$v3), 2])
v1 v4
1 1 a
2 2 e
3 3 c
4 4 b
5 5 b
6 6 c
7 7 a
8 8 e
You could use a standard left join.
Loading the data:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8), v2 = c("A","E","C","B","B","C","A","E"), stringsAsFactors=F)
m <- data.frame(v3 = c("D","E","A","C","D","B"), v4 = c("d","e","a","c","d","b"), stringsAsFactors=F)
Changing column name, such that I can join by column "v2"
colnames(m) <- c("v2", "v4")
Left joining and maintaining the order of data.frame d
library(dplyr)
left_join(d, m)
Output:
v1 v2 v4
1 1 A a
2 2 E e
3 3 C c
4 4 B b
5 5 B b
6 6 C c
7 7 A a
8 8 E e
This will give you the desired output:
d$v2 <- m$v4[match(d$v2, m$v3)]
match function returns the position from m matrix's v3 column for the values in d$v2 being matched. Once you have obtained the indices (from using match()), access elements from m$v4 using those indices to replace the elements in d matrix, column v2.

R Ifelse: Find if any column meet the condition

I'm trying to apply the same condition for multiple columns of an array and, then, create a new column if any of the columns meet the condition.
I can do it manually with an OR statement, but I was wondering if there is an easy way to apply it for more columns.
An example:
data <- data.frame(V1=c("A","B"),V2=c("A","A","A","B","B","B"),V3=c("A","A","B","B","A","A"))
data[4] <- ifelse((data[1]=="A"|data[2]=="A"|data[3]=="A"),1,0)
So the 4th row is the only that doesn't meet the condition for all columns:
V1 V2 V3 V1
1 A A A 1
2 B A A 1
3 A A B 1
4 B B B 0
5 A B A 1
6 B B A 1
Do you know a way to apply the condition in a shorter code?
I tried something like
data[4] <- ifelse(any(data[,c(1:3)]=="A"),1,0)
but it consider the condition for all the dataset instead of by rows, so all the rows are given 1.
We can use Reduce with lapply
data$NewCol <- +( Reduce(`|`, lapply(data, `==`, 'A')))
We can use apply row-wise :
data$ans <- +(apply(data[1:3] == "A", 1, any))
data
# V1 V2 V3 ans
#1 A A A 1
#2 B A A 1
#3 A A B 1
#4 B B B 0
#5 A B A 1
#6 B B A 1
Try:
data$V4 <- +(rowSums(data == 'A') > 0)
Output:
V1 V2 V3 V4
1 A A A 1
2 B A A 1
3 A A B 1
4 B B B 0
5 A B A 1
6 B B A 1

R, 2 dataframe, check if column exist and create new variable if not

I have 2 dataframes want to have exact the same columns in order to perform merge(vertically) later. What I currently do is first manually check if df1 has each column in df2, if not, create a new column and assign null value like this df1$v3 <- NA.
However, df1 has much less columns than df2, which will make the code really long and not smart with my current way. I wonder if there is a efficient method to do it.
Here is an example:
v1<-c(1:5)
v2<-c("a", "b", "c", "d", "e")
df1<-data.frame(v1,v2)
v3=c("de890","gyu","71g", "178sg", "ss10")
df2<-data.frame(v1,v2,v3)
df1
v1 v2
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
df2
v1 v2 v3
1 1 a de890
2 2 b gyu
3 3 c 71g
4 4 d 178sg
5 5 e ss10
So since df1 dont have v3 column, I want to create a new one and name it as v3 and assign null, so the finally df1 would like this:
df1
v1 v2 v3
1 1 a NA
2 2 b NA
3 3 c NA
4 4 d NA
5 5 e NA
Would you please share some lights on how to do it efficiently?
Thank you for all your help!
Not sure why you need to create unnecessary columns and assign them NA. However, You could do this:
v1<-c(1:5)
v2<-c("a", "b", "c", "d", "e")
df1<-data.frame(v1,v2)
v3=c("de890","gyu","71g", "178sg", "ss10")\
v4<-c(1:5)
df2<-data.frame(v1,v2,v3,v4)
# Finding the columns not found in df1, but df2
cols<-setdiff(names(df2),names(df1))
# Looping to create them in df1
for(i in cols){df1[[i]]<-NA}
As #LAP has mentioned that merge automatically adds columns with NA but if OP still wants to add column before hand then it can be achieved as:
df1[,setdiff(names(df2),names(df1))] <- NA
df1
# v1 v2 v3
# 1 1 a NA
# 2 2 b NA
# 3 3 c NA
# 4 4 d NA
# 5 5 e NA

melt data frame and split values

I have the following data frame with measurements concatenated into a single column, separated by some delimiter:
df <- data.frame(v1=c(1,2), v2=c("a;b;c", "d;e;f"))
df
v1 v2
1 1 a;b;c
2 2 d;e;f;g
I would like to melt/transforming it into the following format:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 2 g
Is there an elegant solution?
Thx!
You can split the strings with strsplit.
Split the strings in the second column:
splitted <- strsplit(as.character(df$v2), ";")
Create a new data frame:
data.frame(v1 = rep.int(df$v1, sapply(splitted, length)), v2 = unlist(splitted))
The result:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f

Match values in data frame with values in another data frame and replace former with a corresponding pattern from the other data frame

Complicated title but here is a simple example of what I am trying to achieve:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8),
v2 = c("A","E","C","B","B","C","A","E"))
m <- data.frame(v3 = c("D","E","A","C","D","B"),
v4 = c("d","e","a","c","d","b"))
Values in d$v2 should be replaced by values in m$v4 by matching the values from d$v2 in m$v3
The resulting data frame d should look like:
v1 v4
1 a
2 e
3 c
4 b
5 b
6 c
7 a
8 e
I tried different stuff and the closest I came was: d$v2 <- m$v4[which(m$v3 %in% d$v2)]
I try to avoid any for-loops again! Must be possible :-) somehow... ;)
You could try:
merge(d,m, by.x="v2", by.y="v3")
v2 v1 v4
1 A 1 a
2 A 7 a
3 B 4 b
4 B 5 b
5 C 3 c
6 C 6 c
7 E 2 e
8 E 8 e
Edit
Here is another approach, to preserve the order:
data.frame(v1=d$v1, v4=m[match(d$v2, m$v3), 2])
v1 v4
1 1 a
2 2 e
3 3 c
4 4 b
5 5 b
6 6 c
7 7 a
8 8 e
You could use a standard left join.
Loading the data:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8), v2 = c("A","E","C","B","B","C","A","E"), stringsAsFactors=F)
m <- data.frame(v3 = c("D","E","A","C","D","B"), v4 = c("d","e","a","c","d","b"), stringsAsFactors=F)
Changing column name, such that I can join by column "v2"
colnames(m) <- c("v2", "v4")
Left joining and maintaining the order of data.frame d
library(dplyr)
left_join(d, m)
Output:
v1 v2 v4
1 1 A a
2 2 E e
3 3 C c
4 4 B b
5 5 B b
6 6 C c
7 7 A a
8 8 E e
This will give you the desired output:
d$v2 <- m$v4[match(d$v2, m$v3)]
match function returns the position from m matrix's v3 column for the values in d$v2 being matched. Once you have obtained the indices (from using match()), access elements from m$v4 using those indices to replace the elements in d matrix, column v2.

Resources