Use conditions from multiple variables to replace a variable in R - r

I did some searches but could not find the best keywords to phrase my question so I think I will attempt to ask it here.
I am dealing with a data frame in R that have two variables represent the identity of the data points. In the following example, A and 1 represent the same individual, B and 2 are the same and so are C and 3 but they are being mixed in the original data.
ID1 ID2 Value
A 1 0.5
B 2 0.8
C C 0.7
A A 0.6
B 2 0.3
3 C 0.4
2 2 0.3
1 A 0.4
3 3 0.6
What I want to achieve is to unify the identity by using only one of the identifiers so it can be either:
ID1 ID2 Value ID
A 1 0.5 A
B 2 0.8 B
C C 0.7 C
A A 0.6 A
B 2 0.3 B
3 C 0.4 C
2 2 0.3 B
1 A 0.4 A
3 3 0.6 C
or:
ID1 ID2 Value ID
A 1 0.5 1
B 2 0.8 2
C C 0.7 3
A A 0.6 1
B 2 0.3 2
3 C 0.4 3
2 2 0.3 2
1 A 0.4 1
3 3 0.6 3
I can probably achieve it by using ifelse function but that means I have to write two ifelse statements for each condition and it does not seem efficient so I was wondering if there is a better way to do it. Here is the example data set.
df=data.frame(ID1=c("A","B","C","A","B","3","2","1","3"),
ID2=c("1","2","C","A","2","C","2","A","3"),
Value=c(0.5,0.8,0.7,0.6,0.3,0.4,0.3,0.4,0.6))
Thank you so much for the help!
Edit:
To clarify, the two identifiers I have in my real data are longer string of texts instead of just ABC and 123. Sorry I did not make it clear.

An option is to to detect the elements that are only digits, convert to integer, then get the corresponding LETTERS in case_when
library(dplyr)
library(stringr)
df %>%
mutate(ID = case_when(str_detect(ID1, '\\d+')~
LETTERS[as.integer(ID1)], TRUE ~ ID1))
# ID1 ID2 Value ID
#1 A 1 0.5 A
#2 B 2 0.8 B
#3 C C 0.7 C
#4 A A 0.6 A
#5 B 2 0.3 B
#6 3 C 0.4 C
#7 2 2 0.3 B
#8 1 A 0.4 A
#9 3 3 0.6 C
Or more compactly
df %>%
mutate(ID = coalesce(LETTERS[as.integer(ID1)], ID1))
If we have different sets of values, then create a key/value dataset and do a join
keyval <- data.frame(ID1 = c('1', '2', '3'), ID = c('A', 'B', 'C'))
left_join(df, keyval) %>% mutate(ID = coalesce(ID, ID1))

A base R option using replace
within(
df,
ID <- replace(
ID1,
!ID1 %in% LETTERS,
LETTERS[as.numeric(ID1[!ID1 %in% LETTERS])]
)
)
or ifelse
within(
df,
ID <- suppressWarnings(ifelse(ID1 %in% LETTERS,
ID1,
LETTERS[as.integer(ID1)]
))
)
which gives
ID1 ID2 Value ID
1 A 1 0.5 A
2 B 2 0.8 B
3 C C 0.7 C
4 A A 0.6 A
5 B 2 0.3 B
6 3 C 0.4 C
7 2 2 0.3 B
8 1 A 0.4 A
9 3 3 0.6 C

Related

subsetting large data frames with condition

I have got the following dataset:
ID s1 s2 s3
A 0.6 1 0.3
B 3 0.4 0.4
C 3 2 1
D 0 0.3 0.2
E 3 2 0.1
i would like to retain the rows which have the value >=0.5 at least two of the 3 samples
So, the new data frame would be:
ID s1 s2 s3
A 0.6 1 0.3
C 3 2 1
E 3 2 0.1
Thanks in advance
You can do
df[rowSums(df[-1] > 0.5) >= 2, ]
# ID s1 s2 s3
#1 A 0.6 1 0.3
#3 C 3.0 2 1.0
#5 E 3.0 2 0.1
We create a logical matrix df[-1] > 0.5 and check if at least two values per row are TRUE.
data
df <- read.table(text="ID s1 s2 s3
A 0.6 1 0.3
B 3 0.4 0.4
C 3 2 1
D 0 0.3 0.2
E 3 2 0.1", header = TRUE, stringsAsFactor = FALSE)

R: merge two lists of matched dataframes

I have two lists that consist of the same amount of dataframes, and the order of dataframes in both lists indicates which dataframes belong together. In other words, the first dataframe in the first list goes together with the first list in the second dataframe, and the second one with the second, etc. I want to merge the dataframes in both lists with each other, but only the dataframes that belong together. Let's say the first list has these three dataframes:
df1:
id var1
1 0.2
2 0.1
3 0.4
4 0.3
df2:
id var1
1 0.2
6 0.5
df3:
id var1
1 0.2
3 0.1
6 0.4
And the second list has the following dataframes:
df1:
id var2
1 A
2 B
3 C
4 C
df2:
id var2
1 B
6 B
df3:
id var2
1 A
3 D
6 D
I would like to merge them based on the variable "id", and the end result then to be the following:
df1:
id var1 var2
1 0.2 A
2 0.1 B
3 0.4 C
4 0.3 C
df2:
id var1 var2
1 0.2 B
6 0.5 B
df3:
id var1 var2
1 0.2 A
3 0.1 D
6 0.4 D
How do I do this?
First list of data-sets:
list1<-list(df1,df2,df3)
Second list of data sets:
list2<-list(df1,df2,df3)
result:
lapply(1:length(list1),function(x) {merge(list1[[x]], list2[[x]], by.x = 'id')})
Using either tidyverse or base R :
Map(merge,l1,l2)
library(tidyverse)
map2(l1,l2,inner_join)
# [[1]]
# id a b
# 1 1 0.1 A
# 2 2 0.2 B
#
# [[2]]
# id a b
# 1 1 0.1 A
# 2 2 0.2 B
#
# [[3]]
# id a b
# 1 1 0.1 A
# 2 2 0.2 B
#
data
l1 <- replicate(3,data.frame(id= 1:2,a=c(0.1,0.2)),F)
l1
# [[1]]
# id a
# 1 1 0.1
# 2 2 0.2
#
# [[2]]
# id a
# 1 1 0.1
# 2 2 0.2
#
# [[3]]
# id a
# 1 1 0.1
# 2 2 0.2
l2 <- replicate(3,data.frame(id= 1:2,b=c("A","B")),F)
l2
# [[1]]
# id b
# 1 1 A
# 2 2 B
#
# [[2]]
# id b
# 1 1 A
# 2 2 B
#
# [[3]]
# id b
# 1 1 A
# 2 2 B
#
Use Map like this:
Map(merge, L1, L2)
giving:
$`df1`
id var1 var2
1 1 0.2 A
2 2 0.1 B
3 3 0.4 C
4 4 0.3 C
$df2
id var1 var2
1 1 0.2 B
2 6 0.5 B
$df3
id var1 var2
1 1 0.2 A
2 3 0.1 D
3 6 0.4 D
Note
The input lists in reproducible form are:
Lines1 <- "df1:
id var1
1 0.2
2 0.1
3 0.4
4 0.3
df2:
id var1
1 0.2
6 0.5
df3:
id var1
1 0.2
3 0.1
6 0.4"
Read <- function(Lines) {
L <- readLines(textConnection(Lines))
ix <- grep(":", L)
nms <- sub(":", "", L[ix])
g <- nms[cumsum(L[-ix] == "")+1]
lapply(split(L[-ix], g), function(x) read.table(text = x, header = TRUE))
}
L1 <- Read(Lines1)
and
Lines2 <- "df1:
id var2
1 A
2 B
3 C
4 C
df2:
id var2
1 B
6 B
df3:
id var2
1 A
3 D
6 D"
L2 <- Read(Lines2)

Binding dataframes by multiple conditions in R

I have a data frame which looks like this:
> data
Class Number
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 B 3
7 C 1
8 C 2
9 C 3
I have a reference data frame which is:
> reference
Class Number Value
1 A 1 0.5
2 B 3 0.3
I want to join these data frames to create a single data frame:
> resultdata
Class Number Value
1 A 1 0.5
2 A 2 0.0
3 A 3 0.0
4 B 1 0.0
5 B 2 0.0
6 B 3 0.3
7 C 1 0.0
8 C 2 0.0
9 C 3 0.0
How can I achieve this? Any help will be greatly appreciated
You can do
library(data.table)
setkey(setDT(reference), Class, Number)[data]
Or
setkey(setDT(data), Class, Number)[reference,
Value:= i.Value][is.na(Value), Value:=0]
# Class Number Value
#1: A 1 0.5
#2: A 2 0.0
#3: A 3 0.0
#4: B 1 0.0
#5: B 2 0.0
#6: B 3 0.3
#7: C 1 0.0
#8: C 2 0.0
#9: C 3 0.0
The basic starting point for this would be merge.
merge(data, reference, all = TRUE)
# Class Number Value
# 1 A 1 0.5
# 2 A 2 NA
# 3 A 3 NA
# 4 B 1 NA
# 5 B 2 NA
# 6 B 3 0.3
# 7 C 1 NA
# 8 C 2 NA
# 9 C 3 NA
There are many questions which show how to replace NA with 0.
You can do:
library(dplyr)
left_join(data, reference) %>% (function(x) { x[is.na(x)] <- 0; x })
Or (as per #akrun suggestion):
left_join(data, reference) %>% mutate(Value = replace(Value, is.na(Value), 0))
Which gives:
# Class Number Value
#1 A 1 0.5
#2 A 2 0.0
#3 A 3 0.0
#4 B 1 0.0
#5 B 2 0.0
#6 B 3 0.3
#7 C 1 0.0
#8 C 2 0.0
#9 C 3 0.0

How to reset row names?

Here is a sample data set:
sample1 <- data.frame(Names=letters[1:10], Values=sample(seq(0.1,1,0.1)))
When I'm reordering the data set, I'm losing the row names order
sample1[order(sample1$Values), ]
Names Values
7 g 0.1
4 d 0.2
3 c 0.3
9 i 0.4
10 j 0.5
5 e 0.6
8 h 0.7
6 f 0.8
1 a 0.9
2 b 1.0
Desired output:
Names Values
1 g 0.1
2 d 0.2
3 c 0.3
4 i 0.4
5 j 0.5
6 e 0.6
7 h 0.7
8 f 0.8
9 a 0.9
10 b 1.0
Try
rownames(Ordersample2) <- 1:10
or more generally
rownames(Ordersample2) <- NULL
I had a dplyr usecase:
df %>% as.data.frame(row.names = 1:nrow(.))

randomize or permuting values in a data.frame

I have a data.frame that looks like this: (my real dataframe is bigger):
df <- data.frame(A=c("a","b","c","d","e","f","g","h","i"),
B=c("1","1","1","2","2","2","3","3","3"),
C=c(0.1,0.2,0.4,0.1,0.5,0.7,0.1,0.2,0.5))
> df
A B C
1 a 1 0.1
2 b 1 0.2
3 c 1 0.4
4 d 2 0.1
5 e 2 0.5
6 f 2 0.7
7 g 3 0.1
8 h 3 0.2
9 i 3 0.5
I want to add several n-columns (something similar to permutations) where the column D would be a random value from df$C but this value should only be picked from those rows with the dame value of df$B, an example of the desired output would be:
df <- data.frame(A=c("a","b","c","d","e","f","g","h","i"),
B=c("1","1","1","2","2","2","3","3","3"),
C=c(0.1,0.2,0.4,0.1,0.5,0.7,0.1,0.2,0.5),
D=c(0.2,0.2,0.1,0.5,0.7,0.1,0.5,0.5,0.2))
> df
A B C D
1 a 1 0.1 0.2
2 b 1 0.2 0.2
3 c 1 0.4 0.1
4 d 2 0.1 0.5
5 e 2 0.5 0.7
6 f 2 0.7 0.1
7 g 3 0.1 0.5
8 h 3 0.2 0.5
9 i 3 0.5 0.2
I've tried with plyr package but my approach does not work properly:
ddply(df, levels(.(B)), transform, D=sample(C))
I also have thought about splitting the dataframe based on df$B and then using a function to add the column in each dataframe using lapply however I have no clue how select for the levels of df$B,
Many thanks
No need for plyr, ave will do the trick.
transform(df, D=ave(C, B, FUN=function(b) sample(b, replace=TRUE)))

Resources