I need to translate some Python code to R. What I need to do is sample random rows from a larger table multiple times so I can use it for later. Here is an illustration:
library(data.table)
library(dplyr)
test_table <- data.table(replicate(10, sample(0:1, 10, rep=TRUE)))
test_table
Gives a 10 x 10 table populated with (on some particular run):
So for instance one can get a sample:
sample <- sample_n(test_table, 2)
sample
Which might look like:
However, I don't understand the result when taking multiple samples:
kSampleSize <- 2
kNumSamples <- 3
samples <- replicate(kNumSamples, sample_n(test_table, kSampleSize))
samples
may give:
But it doesn't really look like a "list of sample". I expected samples[1] to give a result similar to sample but instead I get a weird result (varies per run):
1. 1 0
Am I doing something wrong? Am I misinterpreting the output? Is expecting a "list of sample" something to expect in Python but not in R?
There is a simplify argument within replicate that determines whether R attempts to simplify the returned object to a less complicated data structure.
simplify defaults to TRUE, and in this case it collapses the returned list of data frames down into a single object of type list. Specifying simplify = FALSE turns off this behavior.
kSampleSize <- 2
kNumSamples <- 3
replicate(kNumSamples, sample_n(test_table, kSampleSize), simplify = FALSE)
Returns a list of three data frames, preserving the original data structure:
[[1]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1: 1 0 0 0 1 0 0 1 0 1
2: 1 1 1 0 0 1 0 0 1 1
[[2]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1: 1 1 0 1 0 1 0 1 0 0
2: 1 1 1 1 1 0 0 1 0 1
[[3]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1: 0 0 1 0 1 1 0 0 1 1
2: 1 1 1 1 0 0 1 0 0 0
I have a data frame such as this:
df <- data.frame(
ID = c('123','124','125','126'),
Group = c('A', 'A', 'B', 'B'),
V1 = c(1,2,1,0),
V2 = c(0,0,1,0),
V3 = c(1,1,0,3))
which returns:
ID Group V1 V2 V3
1 123 A 1 0 1
2 124 A 2 0 1
3 125 B 1 1 0
4 126 B 0 0 3
and I would like to return a table that indicates if a variable is represented in the group or not:
Group V1 V2 V3
A 1 0 1
B 1 1 1
In order to count the number of distinct variables in each group.
We can do this with base R
aggregate(.~Group, df[-1], function(x) as.integer(sum(x)>0))
# Group V1 V2 V3
#1 A 1 0 1
#2 B 1 1 1
Or using rowsum from base R
+(rowsum(df[-(1:2)], df$Group)>0)
# V1 V2 V3
#A 1 0 1
#B 1 1 1
Or with by from base R
+(do.call(rbind, by(df[3:5], df['Group'], FUN = colSums))>0)
# V1 V2 V3
#A 1 0 1
#B 1 1 1
Have you tried
unique(group_by(mtcars,cyl)$cyl).
Output:[1] 6 4 8
My dataframe is :
dataMDS <- data.frame(FID=c(1,1), IID=c("CD03577","50016"), SOL=c(0,0), C1=c(0.00332472,-0.00154285))
> dataMDS
FID IID SOL C1
1 1 CD03577 0 0.00332472
2 1 50016 0 -0.00154285
I would like to add a new column plates with values from 2 others dataframe :
platesRAC <- data.frame(V1=c(1,1), V2=c("CD03577","CD0371"), V3=c("2011-01-12_RAC1","2011-01-27_RAC5"))
> platesRAC
V1 V2 V3
1 1 CD03577 2011-01-12_RAC1
2 1 CD0371 2011-01-27_RAC5
platesDESIR <- data.frame(V1=c(1,1,1), V2=c("50015","50016","50017"), V3=c("2011-11-23_DESIR9","2011-11-23_DESIR9","2011-11-23_DESIR8"))
> platesDESIR
V1 V2 V3
1 1 50015 2011-11-23_DESIR9
2 1 50016 2011-11-23_DESIR9
3 1 50017 2011-11-23_DESIR8
I would like to get the value in V3 from platesRAC OR platesDESIR when V2 == IID and add this value in a new column plates in dataMDS.
I tried with merge :
new <- merge(x = dataMDS, y = platesRAC, by.x = "IID", by.y = 'V2', all = TRUE)
FID IID SOL C1 V1 V3
1 1 CD03577 0 0.00332472 1 2011-01-12_RAC1
2 1 50016 0 -0.00154285 NA <NA>
And of course I have NA values because IID 50016 is in platesDESIR and not in platesRAC. I don't know how to do an OR | to don't have NA values.
Also, I don't want the V1 column after merging, just the V3 column rename in plates
The results I would like to have :
FID IID SOL C1 plates
1 1 CD03577 0 0.00332472 2011-01-12_RAC1
2 1 50016 0 -0.00154285 2011-11-23_DESIR9
Thanks for any help
It's not a merge but a match after binded platesRAC and platesDESIR :
bindRACDESIR = rbind(platesRAC, platesDESIR)
dataMDS$plates <- bindRACDESIR$V3[match(dataMDS$IID,bindRACDESIR$V2)]
And the result is :
FID IID SOL C1 plates
1 1 CD03577 0 0.00332472 2011-01-12_RAC1
2 1 50016 0 -0.00154285 2011-11-23_DESIR9
This is what my dataframe looks like. V3 is my desired Column. V3 is not available to me.
library(data.table)
dt <- fread('
Level V1 V2
0 10 2
1 0 3
1 0 2
1 0 2 ')
I am trying to calculate V3 based on prior values of V3. The V3 formula is:
New Value of V3 =((Prior Value of V3+ Prior Value of V3*V2)*Level)+V1
1st Row V3 = (NA+NA*3)*1 + 10 = 10
2nd Row V3 = (10+10*3)*1 + 0 =40
3rd Row V3 = (40+40*2)*1 + 0 =120
4th Row V3 = (120+120*2)*1 + 0 = 360
The output should look like this.
Level V1 V2 V3
0 10 2 10
1 0 3 40
1 0 2 120
1 0 2 360
I was trying:
dt[,V3:= (cumsum(V3+V3*V2)*Level)+V1]
I reworked your efforts in the comments to get the desired result:
dt[,V3:=cumprod( c(V1[1] ,(Level*(1 + V2))[-1]) ) ]
dt
Level V1 V2 V3
1: 0 10 2 10
2: 1 0 3 40
3: 1 0 2 120
4: 1 0 2 360
I didn't actually get an error (only a warning) with dt[,V3:= V1[1] * cumprod((Level*(1 + V2))[-1])]. Using the [-1] shortened the cumprod with no extension, and resulted in recycling.
Within data.table
dt[,{ lag.V3=c(0, V3[-.N]) ; V3 = (lag.V3 + lag.V3 * V2 )* Level + V1 }]
Output
[1] 10 40 120 360
Here is one way to do it in dplyr
dt %>%
mutate(V4=lag(V3) + lag(V3)*V2 + V1,
V4=ifelse(is.na(V4), 0, V4))
Level V1 V2 V3 V4
1 0 10 2 10 0
2 1 0 3 40 40
3 1 0 2 120 120
4 1 0 2 360 360
I'm working on the output off an online questionnaire and have some trouble handling the data. This is the setups: 200 images have been rated on two 9-point-scales, totaling in 400 combinations. Unfortunately, the data hasn't been in encoded in 400 variables with values ranging from 1 to 9, but for each scale-image combination, 9 binary variables have been encoded, looking like this for two image-scale combinations:
Part. V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 0 0
3 0 0 1 0 0 0 0 0 0
As you can see, there are also some N/A values in the data set. That's because of all 400 combinations, each participant only rated a randomised 50. Given the 400 combinations, we have a total of 3600 variables in the data set. I would now like to condense and recode those values in a sense, that R counts the vars in intervals of 9, then recodes the binary 1 for a value of 1 to 9, depending on its position on the scale, and then condenses everything into 400 combination variables. In the end, it should look something like this:
Part. C1 C2
1 3 2
2 7
3 3
I've looked into the reshape package, but couldn't exactly figure out the way to do this.
Any suggestions?
Using apply family functions:
#dummy data
df <- read.table(text = "
Part.,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18
1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,,,,,,,,,
3,,,,,,,,,,0,0,1,0,0,0,0,0,0
", header = TRUE, sep = ",")
# result
# cbind - column bind, put columns side by side
cbind(
# First column is the "Part." column
df[, "Part.", drop = FALSE],
# other columns are coming from below code
# sapply returns matrix, converting it to data.frame so we can use cbind.
as.data.frame(
# get data column index 9 columns each, first 2 to 9, then 10 to 18, etc.
sapply(seq(2, ncol(df), 9), function(i)
# for each 9 columns check at which position it is equal to 1,
# using which() function
apply(df[, i:(i + 8)], 1, function(j) which(j == 1)))
)
)
#output
# Part. V1 V2
# 1 1 3 2
# 2 2 7
# 3 3 3
Here is a solution for a small example. I did it for only 2 possible outcomes. So v1 = 1 for pic 1, v2 = 2 for pic one, v3 = 1 for pic 2 ... . If you have 9 possible outcomes you have to change id <- rep(1:2, each = 2) to id <- rep(1:n, each = 9) where n is the total number of pictures. Also change the 2 in final <- matrix(nrow = nrow(dat), ncol = ncol(dat)/2) to 9.
I hope that helps.
dat <- data.frame(v1 = c(NA,0,1,0), v2 = c(NA,1,0,1), v3 = c(0,1,NA,0), v4 = c(1,0,NA,1))
id <- rep(1:2, each = 2)
final <- matrix(nrow = nrow(dat), ncol = ncol(dat)/2)
for (i in unique(id)){
wdat <- dat[ ,which(id == i)]
for (j in 1:nrow(wdat)){
if(is.na(wdat[j,1] )) {
final[j,i] <- NA
} else {
final[j,i] <- which(wdat[j, ] == 1)
}
}
}
The input and output for my example:
> dat
v1 v2 v3 v4
1 NA NA 0 1
2 0 1 1 0
3 1 0 NA NA
4 0 1 0 1
> final
[,1] [,2]
[1,] NA 2
[2,] 2 1
[3,] 1 NA
[4,] 2 2