R - Add a column from values of others dataframes with conditions - r

My dataframe is :
dataMDS <- data.frame(FID=c(1,1), IID=c("CD03577","50016"), SOL=c(0,0), C1=c(0.00332472,-0.00154285))
> dataMDS
FID IID SOL C1
1 1 CD03577 0 0.00332472
2 1 50016 0 -0.00154285
I would like to add a new column plates with values from 2 others dataframe :
platesRAC <- data.frame(V1=c(1,1), V2=c("CD03577","CD0371"), V3=c("2011-01-12_RAC1","2011-01-27_RAC5"))
> platesRAC
V1 V2 V3
1 1 CD03577 2011-01-12_RAC1
2 1 CD0371 2011-01-27_RAC5
platesDESIR <- data.frame(V1=c(1,1,1), V2=c("50015","50016","50017"), V3=c("2011-11-23_DESIR9","2011-11-23_DESIR9","2011-11-23_DESIR8"))
> platesDESIR
V1 V2 V3
1 1 50015 2011-11-23_DESIR9
2 1 50016 2011-11-23_DESIR9
3 1 50017 2011-11-23_DESIR8
I would like to get the value in V3 from platesRAC OR platesDESIR when V2 == IID and add this value in a new column plates in dataMDS.
I tried with merge :
new <- merge(x = dataMDS, y = platesRAC, by.x = "IID", by.y = 'V2', all = TRUE)
FID IID SOL C1 V1 V3
1 1 CD03577 0 0.00332472 1 2011-01-12_RAC1
2 1 50016 0 -0.00154285 NA <NA>
And of course I have NA values because IID 50016 is in platesDESIR and not in platesRAC. I don't know how to do an OR | to don't have NA values.
Also, I don't want the V1 column after merging, just the V3 column rename in plates
The results I would like to have :
FID IID SOL C1 plates
1 1 CD03577 0 0.00332472 2011-01-12_RAC1
2 1 50016 0 -0.00154285 2011-11-23_DESIR9
Thanks for any help

It's not a merge but a match after binded platesRAC and platesDESIR :
bindRACDESIR = rbind(platesRAC, platesDESIR)
dataMDS$plates <- bindRACDESIR$V3[match(dataMDS$IID,bindRACDESIR$V2)]
And the result is :
FID IID SOL C1 plates
1 1 CD03577 0 0.00332472 2011-01-12_RAC1
2 1 50016 0 -0.00154285 2011-11-23_DESIR9

Related

R: Applying a function to every entry in dataframe

I want to make every element in the dataframe (except fot the ID column) become a 0 if it is any number other than 1.
I have:
ID A B C D E
abc 5 3 1 4 1
def 4 1 3 2 5
I want:
ID A B C D E
abc 0 0 1 0 1
def 0 1 0 0 0
I am having trouble figuring out how to specify for this to be done to do to every entry in every column and row.
Here is my code:
apply(dat.lec, 2 , function(y)
if(!is.na(y)){
if(y==1){y <- 1}
else{y <-0}
}
else {y<- NA}
)
Thank you for your help!
No need for implicit or explicit looping.
# Sample data
set.seed(2016);
df <- as.data.frame(matrix(sample(10, replace = TRUE), nrow = 2));
df <- cbind.data.frame(id = sample(letters, 2), df);
df;
# id V1 V2 V3 V4 V5
#1 k 2 9 5 7 1
#2 g 2 2 2 9 1
# Replace all entries != 1 with 0's
df[, -1][df[, -1] != 1] <- 0;
df;
# id V1 V2 V3 V4 V5
#1 k 0 0 0 0 1
#2 g 0 0 0 0 1

Dummy variable where two continuous variables are equal in R?

Data set cwm looks like this
V1 V2 V3
1 2 ?
3 5 ?
4 4 ?
#NA 9 ?
#NA #NA ?
Want to create dummy variable V3, 1 if V1=V2, 0 otherwise, and producing #NA in any case where #NA is involved.
After I have done a similar thing for equivalent columns V3 and V4, to produce dummy variable V5, I need to create a continuous variable, V6, where 1 means neither V3 or V5 = 1, 2 means either V3 or V5 = 1, 3 means both V3 and V5 = 1.
V3 V5 V6
1 0 ?
1 0 ?
0 0 ?
1 1 ?
If done correctly, V3 = {0,0,1,#NA,#NA} and V6 = {2,2,1,3}
Best approach?
df = read.table(text="V1 V2
1 2
3 5
4 4
NA 9
NA NA",
header = TRUE, na.strings="NA")
V3 = as.numeric(df$V1 == df$V2)
V3
[1] 0 0 1 NA NA
df2 = read.table(text="V3 V5
1 0
1 0
0 0
1 1",
header = TRUE)
V6 = df2$V3 + df2$V5 + 1
V6
[1] 2 2 1 3

count the number of distinct variables in a group

I have a data frame such as this:
df <- data.frame(
ID = c('123','124','125','126'),
Group = c('A', 'A', 'B', 'B'),
V1 = c(1,2,1,0),
V2 = c(0,0,1,0),
V3 = c(1,1,0,3))
which returns:
ID Group V1 V2 V3
1 123 A 1 0 1
2 124 A 2 0 1
3 125 B 1 1 0
4 126 B 0 0 3
and I would like to return a table that indicates if a variable is represented in the group or not:
Group V1 V2 V3
A 1 0 1
B 1 1 1
In order to count the number of distinct variables in each group.
We can do this with base R
aggregate(.~Group, df[-1], function(x) as.integer(sum(x)>0))
# Group V1 V2 V3
#1 A 1 0 1
#2 B 1 1 1
Or using rowsum from base R
+(rowsum(df[-(1:2)], df$Group)>0)
# V1 V2 V3
#A 1 0 1
#B 1 1 1
Or with by from base R
+(do.call(rbind, by(df[3:5], df['Group'], FUN = colSums))>0)
# V1 V2 V3
#A 1 0 1
#B 1 1 1
Have you tried
unique(group_by(mtcars,cyl)$cyl).
Output:[1] 6 4 8

Recode and codense variables

I'm working on the output off an online questionnaire and have some trouble handling the data. This is the setups: 200 images have been rated on two 9-point-scales, totaling in 400 combinations. Unfortunately, the data hasn't been in encoded in 400 variables with values ranging from 1 to 9, but for each scale-image combination, 9 binary variables have been encoded, looking like this for two image-scale combinations:
Part. V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 0 0
3 0 0 1 0 0 0 0 0 0
As you can see, there are also some N/A values in the data set. That's because of all 400 combinations, each participant only rated a randomised 50. Given the 400 combinations, we have a total of 3600 variables in the data set. I would now like to condense and recode those values in a sense, that R counts the vars in intervals of 9, then recodes the binary 1 for a value of 1 to 9, depending on its position on the scale, and then condenses everything into 400 combination variables. In the end, it should look something like this:
Part. C1 C2
1 3 2
2 7
3 3
I've looked into the reshape package, but couldn't exactly figure out the way to do this.
Any suggestions?
Using apply family functions:
#dummy data
df <- read.table(text = "
Part.,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18
1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,,,,,,,,,
3,,,,,,,,,,0,0,1,0,0,0,0,0,0
", header = TRUE, sep = ",")
# result
# cbind - column bind, put columns side by side
cbind(
# First column is the "Part." column
df[, "Part.", drop = FALSE],
# other columns are coming from below code
# sapply returns matrix, converting it to data.frame so we can use cbind.
as.data.frame(
# get data column index 9 columns each, first 2 to 9, then 10 to 18, etc.
sapply(seq(2, ncol(df), 9), function(i)
# for each 9 columns check at which position it is equal to 1,
# using which() function
apply(df[, i:(i + 8)], 1, function(j) which(j == 1)))
)
)
#output
# Part. V1 V2
# 1 1 3 2
# 2 2 7
# 3 3 3
Here is a solution for a small example. I did it for only 2 possible outcomes. So v1 = 1 for pic 1, v2 = 2 for pic one, v3 = 1 for pic 2 ... . If you have 9 possible outcomes you have to change id <- rep(1:2, each = 2) to id <- rep(1:n, each = 9) where n is the total number of pictures. Also change the 2 in final <- matrix(nrow = nrow(dat), ncol = ncol(dat)/2) to 9.
I hope that helps.
dat <- data.frame(v1 = c(NA,0,1,0), v2 = c(NA,1,0,1), v3 = c(0,1,NA,0), v4 = c(1,0,NA,1))
id <- rep(1:2, each = 2)
final <- matrix(nrow = nrow(dat), ncol = ncol(dat)/2)
for (i in unique(id)){
wdat <- dat[ ,which(id == i)]
for (j in 1:nrow(wdat)){
if(is.na(wdat[j,1] )) {
final[j,i] <- NA
} else {
final[j,i] <- which(wdat[j, ] == 1)
}
}
}
The input and output for my example:
> dat
v1 v2 v3 v4
1 NA NA 0 1
2 0 1 1 0
3 1 0 NA NA
4 0 1 0 1
> final
[,1] [,2]
[1,] NA 2
[2,] 2 1
[3,] 1 NA
[4,] 2 2

Data Manipulation in R Project: compare rows

I'm looking to compare values within a dataset
Every row starts with a unique ID followed by a couple binary variables
The data looks like this:
row.name v1 v2 v3 ...
1 0 0 0
2 1 1 1
3 1 0 1
I want to know which values are the same (if equal assign value of 1) and which are different (if not equal assign value of 0) for all unique pairings.
For example in column v1: row1 == 0 and row2 == 1, which should result in an assignment of 0.
So, the output should look like this
id1 id2 v1 v2 v3 ...
1 2 0 0 0 ...
1 3 0 1 0 ...
2 3 1 0 1 ...
I'm looking for an efficient way of doing this for more than 1000 rows...
There's no way to do this without expanding each combination of rows, so with 1000 rows, it is going to take a bit of time. But here is a solution:
dat <- read.table(header=T, text="row.name v1 v2 v3
1 0 0 0
2 1 1 1
3 1 0 1")
Create the index rows:
indices <- t(combn(dat$row.name, 2))
colnames(indices) <- c('id1', 'id2')
Loop through the index rows, and collect the comparisons:
res1 <- t(apply(indices, 1, function(x) as.numeric(dat[x[1],-1] == dat[x[2],-1])))
colnames(res1) <- names(dat[-1])
Put them together:
result <- cbind(indices, res1)
result
## id1 id2 v1 v2 v3
## [1,] 1 2 0 0 0
## [2,] 1 3 0 1 0
## [3,] 2 3 1 0 1

Resources