R: How to replace a set of values in a dataframe [duplicate] - r

Complicated title but here is a simple example of what I am trying to achieve:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8),
v2 = c("A","E","C","B","B","C","A","E"))
m <- data.frame(v3 = c("D","E","A","C","D","B"),
v4 = c("d","e","a","c","d","b"))
Values in d$v2 should be replaced by values in m$v4 by matching the values from d$v2 in m$v3
The resulting data frame d should look like:
v1 v4
1 a
2 e
3 c
4 b
5 b
6 c
7 a
8 e
I tried different stuff and the closest I came was: d$v2 <- m$v4[which(m$v3 %in% d$v2)]
I try to avoid any for-loops again! Must be possible :-) somehow... ;)

You could try:
merge(d,m, by.x="v2", by.y="v3")
v2 v1 v4
1 A 1 a
2 A 7 a
3 B 4 b
4 B 5 b
5 C 3 c
6 C 6 c
7 E 2 e
8 E 8 e
Edit
Here is another approach, to preserve the order:
data.frame(v1=d$v1, v4=m[match(d$v2, m$v3), 2])
v1 v4
1 1 a
2 2 e
3 3 c
4 4 b
5 5 b
6 6 c
7 7 a
8 8 e

You could use a standard left join.
Loading the data:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8), v2 = c("A","E","C","B","B","C","A","E"), stringsAsFactors=F)
m <- data.frame(v3 = c("D","E","A","C","D","B"), v4 = c("d","e","a","c","d","b"), stringsAsFactors=F)
Changing column name, such that I can join by column "v2"
colnames(m) <- c("v2", "v4")
Left joining and maintaining the order of data.frame d
library(dplyr)
left_join(d, m)
Output:
v1 v2 v4
1 1 A a
2 2 E e
3 3 C c
4 4 B b
5 5 B b
6 6 C c
7 7 A a
8 8 E e

This will give you the desired output:
d$v2 <- m$v4[match(d$v2, m$v3)]
match function returns the position from m matrix's v3 column for the values in d$v2 being matched. Once you have obtained the indices (from using match()), access elements from m$v4 using those indices to replace the elements in d matrix, column v2.

Related

Simplest way to replace a list of values in a data frame with a list of new values

Say we have a data frame with a factor (Group) that is a grouping variable for a list of IDs:
set.seed(123)
data <- data.frame(Group = factor(sample(5,10, replace = T)),
ID = c(1:10))
In this example, the ID's belong to one of 5 Groups, labeled 1:5. We simply want to replace 1:5 with A:E. In other words, if Group == 1, we want to change it to A, if Group == 2, we want to change it to B, and so on. What is the simplest way to achieve this?
You may assign new labels= in a names list using factor once again.
data$Group1 <- factor(data$Group, labels=list("1"="A", "2"="B", "3"="C", "4"="D", "5"="E"))
## more succinct:
data$Group2 <- factor(data$Group, labels=setNames(list("A", "B", "C", "D", "E"), 1:5))
data
# Group ID Group1 Group2 Group3
# 1 3 1 C C C
# 2 3 2 C C C
# 3 2 3 B B B
# 4 2 4 B B B
# 5 3 5 C C C
# 6 5 6 E E E
# 7 4 7 D D D
# 8 1 8 A A A
# 9 2 9 B B B
# 10 3 10 C C C
This for general, if indeed capital letters are wanted see #RonakShah's solution.
You can use the built-in constant in R LETTERS :
data$new_group <- LETTERS[data$Group]
data
# Group ID new_group
#1 3 1 C
#2 3 2 C
#3 2 3 B
#4 2 4 B
#5 3 5 C
#6 5 6 E
#7 4 7 D
#8 1 8 A
#9 2 9 B
#10 3 10 C
Created a new column (new_group) here for comparison purposes. You can overwrite the same column if you wish to.

How to delete pairs of duplicate across multiple columns in R

I have a data frame as below:
dat <- data.frame(
V1=c("A","B","C","A"),
V2=c("B","C","D","B"),
V3=c("C","D","","C"),
V4=c("D","","","E")
)
V1 V2 V3 V4
1 A B C D
2 B C D
3 C D
4 A B C E
Row 2 and 3 are in Row 1 in different columns. How can I filter out Row 2 and 3 so that I am left with Row 1 and 4 only?
Paste together each row. Go through the values and check if it (partially) matches any of the values using grepl.
S = trimws(Reduce(paste, dat), "right")
dat[sapply(1:length(S), function(i) !any(grepl(S[i], S[-i]))),]
# V1 V2 V3 V4
#1 A B C D
#4 A B C E

Assigning rows of data.frame to another data.frame in R based on frequency of element's occurance

I have a data.frame df
> df
V1 V2
1 a b
2 a e
3 a f
4 b c
5 b e
6 b f
7 c d
8 c g
9 c h
10 d g
11 d h
12 e f
13 f g
14 g h
I found the frequency of each element's occurrence considering column V1 and sorted the Freq column in ascending order
>dfFreq <- as.data.frame(table(df$V1))
Var1 Freq
1 a 3
2 b 3
3 c 3
4 d 2
5 e 1
6 f 1
7 g 1
>dfFreqSorted <- dfFreq[order(dfFreq$Freq),]
Var1 Freq
5 e 1
6 f 1
7 g 1
4 d 2
1 a 3
2 b 3
3 c 3
Now what i want to do is to create a new data.frame based on original data.frame such that each "Var1" item in "dfFreqSorted" is used according to it's Freq but once every time going from the top of "dfFreqSorted" to the bottom which would give the result below:
So consider the first Var1 item which is "e" so the first matching row of "e" in V1 column of df is (e,f) which would be the first item in the new data.frame.
I figured that this can be done using:
>subset(df, V1==dfFreqSorted$Var[1])[1,]
V1 V2
12 e f
So if i used a for loop and looped through all the elements in the Var1 column of dfFreqSorted and used the subset command above and rbind the returned result into another data.frame I would have something like below
V1 V2
12 e f
13 f g
14 g h
10 d g
1 a b
4 b c
7 c d
Now this result shows each Var1 item once. I need the remaining rows as shown below such that after finishing first iteration of all the rows of Var1 once, the loop should go again to the beginning and check the frequency of all Var1 whose frequency is more than 1 now and find the next row from df for that element so the remaining rows that should be produced in the same data.frame as shown below:
11 d h
2 a e
5 b e
8 c g
3 a f
6 b f
9 c h
As you can see above that all elements are considered in Var1 whose frequency is 1 are used first then those whose frequency is greater than 1 (i.e 2) and are used once then in the next iteration those are used whose freq is greater than 2 (i.e 3) are used. Used such that corresponding unused row of that element is fetched from df.
So in short all the elements of df are arranged in anew data.frame such that elements are used in ascending order of their frequencies but used once first and then twice or thrice in every iteration based on what their frequency is.
I am not asking for the whole code just few guidelines of how i can achieve the objective. Thanks in advance.
Hello #akrun i am a beginner so the solution might be really a beginner level approach but it solved my problem perfectly fine.
> a<-read.table("isnodes.txt")
> a
V1 V2
1 a b
2 a e
3 a f
4 b c
5 b e
6 b f
7 c d
8 c g
9 c h
10 d g
11 d h
12 e f
13 f g
14 g h
> aF<-as.data.frame(table(a$V1))
> aF
Var1 Freq
1 a 3
2 b 3
3 c 3
4 d 2
5 e 1
6 f 1
7 g 1
> aFsorted <- aF[order(aF$Freq),]
> aFsorted
Var1 Freq
5 e 1
6 f 1
7 g 1
4 d 2
1 a 3
2 b 3
3 c 3
> sortedEdgeList <- a[-c(1:nrow(a)),]
> sortedEdgeList
[1] V1 V2
<0 rows> (or 0-length row.names)
> aFsorted <- cbind(aFsorted, Used=0)
> aFsorted
Var1 Freq Used
5 e 1 0
6 f 1 0
7 g 1 0
4 d 2 0
1 a 3 0
2 b 3 0
3 c 3 0
> maxFreq <- max(aFsorted$Freq)
> maxFreq
[1] 3
> for(i in 1:maxFreq){
+ rows<-nrow(aFsorted)
+ for(j in 1:rows){
+ Var1Value<-aFsorted$Var[j]
+ Var1Edge<-a[match(aFsorted$Var1[j],a$V1),]
+ sortedEdgeList<-rbind(sortedEdgeList,Var1Edge)
+ a<-a[!(a$V1==Var1Edge$V1 & a$V2==Var1Edge$V2),]
+ aFsorted$Used[j]=aFsorted$Used[j]+1
+ }
+ if(aFsorted$Used==aFsorted$Freq){
+ aFsorted<-aFsorted[!(aFsorted$Used==aFsorted$Freq),]
+ }
+ }
Warning messages:
1: In if (aFsorted$Used == aFsorted$Freq) { :
the condition has length > 1 and only the first element will be used
2: In if (aFsorted$Used == aFsorted$Freq) { :
the condition has length > 1 and only the first element will be used
3: In if (aFsorted$Used == aFsorted$Freq) { :
the condition has length > 1 and only the first element will be used
> sortedEdgeList
V1 V2
12 e f
13 f g
14 g h
10 d g
5 a b
4 b c
7 c d
11 d h
2 a e
51 b e
8 c g
3 a f
6 b f
9 c h
I'm not sure this is what you want, but it might be close. It helps conceptually to keep the frequencies in the original data frame.
library("plyr")
set.seed(3)
df <- data.frame(V1 = sample(letters[1:10], 20, replace = TRUE),
V2 = sample(letters[1:10], 20, replace = TRUE),
stringsAsFactors = FALSE)
df$freqV1 <- NA_integer_
for (i in 1:nrow(df)) {
df$freqV1[i] <- length(grep(pattern = df$V1[i], x = df$V1))
}
df2 <- arrange(df, freqV1, V2) # you may want just arrange(df, freqV1)
which gives:
V1 V2 freqV1
1 h c 1
2 d a 2
3 d b 2
4 c c 2
5 c j 2
6 b c 3
7 g c 3
8 b f 3
9 g h 3
10 g h 3
11 b i 3
12 i a 4
13 i c 4
14 i d 4
15 i f 4
16 f b 5
17 f d 5
18 f d 5
19 f e 5
20 f f 5

melt data frame and split values

I have the following data frame with measurements concatenated into a single column, separated by some delimiter:
df <- data.frame(v1=c(1,2), v2=c("a;b;c", "d;e;f"))
df
v1 v2
1 1 a;b;c
2 2 d;e;f;g
I would like to melt/transforming it into the following format:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 2 g
Is there an elegant solution?
Thx!
You can split the strings with strsplit.
Split the strings in the second column:
splitted <- strsplit(as.character(df$v2), ";")
Create a new data frame:
data.frame(v1 = rep.int(df$v1, sapply(splitted, length)), v2 = unlist(splitted))
The result:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f

Match values in data frame with values in another data frame and replace former with a corresponding pattern from the other data frame

Complicated title but here is a simple example of what I am trying to achieve:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8),
v2 = c("A","E","C","B","B","C","A","E"))
m <- data.frame(v3 = c("D","E","A","C","D","B"),
v4 = c("d","e","a","c","d","b"))
Values in d$v2 should be replaced by values in m$v4 by matching the values from d$v2 in m$v3
The resulting data frame d should look like:
v1 v4
1 a
2 e
3 c
4 b
5 b
6 c
7 a
8 e
I tried different stuff and the closest I came was: d$v2 <- m$v4[which(m$v3 %in% d$v2)]
I try to avoid any for-loops again! Must be possible :-) somehow... ;)
You could try:
merge(d,m, by.x="v2", by.y="v3")
v2 v1 v4
1 A 1 a
2 A 7 a
3 B 4 b
4 B 5 b
5 C 3 c
6 C 6 c
7 E 2 e
8 E 8 e
Edit
Here is another approach, to preserve the order:
data.frame(v1=d$v1, v4=m[match(d$v2, m$v3), 2])
v1 v4
1 1 a
2 2 e
3 3 c
4 4 b
5 5 b
6 6 c
7 7 a
8 8 e
You could use a standard left join.
Loading the data:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8), v2 = c("A","E","C","B","B","C","A","E"), stringsAsFactors=F)
m <- data.frame(v3 = c("D","E","A","C","D","B"), v4 = c("d","e","a","c","d","b"), stringsAsFactors=F)
Changing column name, such that I can join by column "v2"
colnames(m) <- c("v2", "v4")
Left joining and maintaining the order of data.frame d
library(dplyr)
left_join(d, m)
Output:
v1 v2 v4
1 1 A a
2 2 E e
3 3 C c
4 4 B b
5 5 B b
6 6 C c
7 7 A a
8 8 E e
This will give you the desired output:
d$v2 <- m$v4[match(d$v2, m$v3)]
match function returns the position from m matrix's v3 column for the values in d$v2 being matched. Once you have obtained the indices (from using match()), access elements from m$v4 using those indices to replace the elements in d matrix, column v2.

Resources