I'm trying to do a repeated measures correlation in R using rmcorr, but received the above error, even though I have more than 3 subjects.
> scores$SUBJECT
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
[36] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[71] 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5
[106] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[141] 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8
[176] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[211] 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
[246] 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[281] 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14
[316] 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15
[351] 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 17
[386] 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18
[421] 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
[456] 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21
[491] 21 21 21 21 21 21 21 21 21 21 21 21 21 21
# Convert data types
scores$SUBJECT<-factor(scores$SUBJECT)
scores$FACTOR1<-factor(scores$FACTOR1)
scores$FACTOR2<-factor(scores$FACTOR2)
Interestingly, I was able to perform the correlation on some subsets of the data but not others.
# SUBSETS
subset1 <- subset(scores, FACTOR1 == "m1")
subset1a <- subset(subset1, FACTOR2 == "a")
subset1b <- subset(subset1, FACTOR2 == "b")
subset1c <- subset(subset1, FACTOR2 == "c")
subset2 <- subset(scores, FACTOR1 == "mp")
subset2a <- subset(subset2, FACTOR2 == "a")
subset2b <- subset(subset2, FACTOR2 == "b")
subset2c <- subset(subset2, FACTOR2 == "c")
rmcorr(participant = subset1$SUBJECT, measure1 = subset1$SCORE, measure2 = subset2$SCORE, dataset = scores)
rmcorr(participant = subset1a$SUBJECT, measure1 = subset1a$SCORE, measure2 = subset2a$SCORE, dataset = scores)
rmcorr(participant = subset1b$SUBJECT, measure1 = subset1b$SCORE, measure2 = subset2b$SCORE, dataset = scores)
rmcorr(participant = subset1c$SUBJECT, measure1 = subset1c$SCORE, measure2 = subset2c$SCORE, dataset = scores)
Specifically
rmcorr(participant = subset1$SUBJECT, measure1 = subset1$SCORE, measure2 = subset2$SCORE, dataset = scores)
worked, but all of the other calls to rmcorr generated the error. Does anyone know where I went wrong?
If I have a vector from 1 to 200, how would I create a variable that creates an ordered cluster of these numbers. Example would be that the first 10 numbers would be assigned a 1, the next 10 would be assigned a 2, etc.
You can use rep with the each argument. Substitute the length of your vector for 200 and the number wanted in each group for 10 respectively, and truncate if you aren't dividing into even groups.
rep(1:(200/10), each = 10)
#> [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
#> [24] 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
#> [47] 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7
#> [70] 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 10 10
#> [93] 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12
#> [116] 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14
#> [139] 14 14 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 17
#> [162] 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 19 19 19 19
#> [185] 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20
Created on 2019-04-26 by the reprex package (v0.2.1)
How do I generate a vector in the form
1 2 ... 19 20 19 ... 2 1
Is it possible using the c() function?
You can use seq as well as rev function for the desired purpose.
seq
> c(1:20, seq(19,1,-1))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
As suggested by #jimbou,
> c(1:20, 19:1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
> c(1:20, rev(1:19))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
I have very big data set and I'm computing thousands of models for it. For every model I need to randomize my data 100 times.This randomization part makes my script very slow.
Would someone help me to make this step faster?
Here is my code:
for (l in seq(repeat.times)) {
y <- as.matrix(dfr[1])
x <- as.matrix(dfr[2:ncol(dfr)])
# Random Generation
x.random.name = sample(colnames(x),1,replace=FALSE)
x.random.1 <- sample(x[,x.random.name],nrow(y),replace=FALSE)
x <- cbind(x,x.random.1)
.
.
.
For example:
> x
A B C D E
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> y
[,1]
[1,] 10
[2,] 20
[3,] 30
[4,] 40
After randomization:
> x
A B C D E x.random.1
[1,] 1 5 9 13 17 10
[2,] 2 6 10 14 18 12
[3,] 3 7 11 15 19 9
[4,] 4 8 12 16 20 11
>
This is way way faster if I understand OP's requirement correctly
x
## A B C D E
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
y
## [,1]
## [1,] 10
## [2,] 20
## [3,] 30
## [4,] 40
xncol <- ncol(x)
ynrow <- nrow(y)
require(microbenchmark)
microbenchmark(xrand <- sapply(1:100, FUN = function(iter) {
sample(x[, sample(1:xncol, 1)], ynrow)
}), times = 1L)
## Unit: milliseconds
## expr min
## xrand <- sapply(1:100, FUN = function(iter) { sample(x[, sample(1:xncol, 1)], ynrow) }) 1.083906
## lq median uq max neval
## 1.083906 1.083906 1.083906 1.083906 1
x <- cbind(x, xrand)
x
## A B C D E
## [1,] 1 5 9 13 17 8 16 2 18 5 3 10 10 14 9 19 6 6 15 18 2 13 13 15 18 7 20 17 11 13 1 16 1 20 1 9 19 14 20
## [2,] 2 6 10 14 18 7 14 3 20 8 4 12 9 13 10 20 8 8 13 20 1 14 15 16 20 6 19 19 10 16 2 15 4 17 4 12 20 15 19
## [3,] 3 7 11 15 19 5 15 1 19 7 2 11 12 15 11 18 7 7 14 17 4 15 16 14 19 8 17 18 9 14 4 14 2 18 3 11 18 16 17
## [4,] 4 8 12 16 20 6 13 4 17 6 1 9 11 16 12 17 5 5 16 19 3 16 14 13 17 5 18 20 12 15 3 13 3 19 2 10 17 13 18
##
## [1,] 5 13 2 3 5 2 5 8 4 6 19 3 7 19 4 7 6 4 17 9 18 9 5 3 1 15 8 19 19 3 19 15 15 1 1 10 15 19 11 6 5 17 7
## [2,] 7 15 1 1 7 1 6 6 3 8 18 2 6 17 2 6 5 3 18 10 17 11 8 1 3 13 6 17 18 4 17 16 13 4 3 11 16 18 9 8 8 18 6
## [3,] 8 14 3 2 8 3 8 7 2 7 20 1 8 18 3 8 8 1 20 12 19 10 6 2 2 16 5 20 17 2 18 13 16 3 4 12 13 20 12 7 7 20 8
## [4,] 6 16 4 4 6 4 7 5 1 5 17 4 5 20 1 5 7 2 19 11 20 12 7 4 4 14 7 18 20 1 20 14 14 2 2 9 14 17 10 5 6 19 5
##
## [1,] 3 3 15 19 2 12 16 11 18 7 10 11 5 12 12 10 1 2 19 2 16 17 11
## [2,] 4 2 13 20 1 11 15 12 17 5 11 12 6 10 9 11 4 3 18 3 14 19 9
## [3,] 1 4 16 18 4 10 14 9 19 8 12 9 8 11 11 9 3 4 20 4 13 20 12
## [4,] 2 1 14 17 3 9 13 10 20 6 9 10 7 9 10 12 2 1 17 1 15 18 10
The key step is ofcourse, which I have wrapped in microbenchmark purely for benchmarking purpose.
xrand <- sapply(1:100, FUN = function(iter) { sample(x[, sample(1:xncol, 1)], ynrow) })
Here is a one-liner:
# Data
x<-matrix(1:10^4,nrow=10)
# Generate 2000 replicates.
replicate(2000,x[order(runif(nrow(x))),sample(ncol(x),1)])
Or even just:
replicate(2000,sample(x[,sample(ncol(x),1)]))
I found that you could dramatically reduce the runtime by moving x and y outside the loop. Then you can just create a new transformed matrix in the loop
y <- as.matrix(dfr[1])
XX <- as.matrix(dfr[2:ncol(dfr)])
for (l in seq(repeat.times)) {
# Random Generation
x.random.name = sample(colnames(x),1,replace=FALSE)
x.random.1 <- sample(XX[,x.random.name],nrow(y),replace=FALSE)
x <- cbind(XX,x.random.1)
}
So i've moved out x and renamed it. Then when you do your analysis, you would continue to use the newly made x. I found that with my benchmark this speed things up by nearly two orders of magnitude.
I have a dataframe that looks like this:
df$a <- 1:20
df$b <- 2:21
df$c <- 3:22
df <- as.data.frame(df)
> df
a b c
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
6 6 7 8
7 7 8 9
8 8 9 10
9 9 10 11
10 10 11 12
11 11 12 13
12 12 13 14
13 13 14 15
14 14 15 16
15 15 16 17
16 16 17 18
17 17 18 19
18 18 19 20
19 19 20 21
20 20 21 22
I would like to add another column to the data frame (df$d) so that every 5 rows (df$d[seq(1, nrow(df), 4)]) would take the value of the start of the respective row in the first column: df$a.
I have tried the manual way, but was wondering if there is a for loop or shorter way that can do this easily. I'm new to R, so I apologize if this seems trivial to some people.
"Manual" way:
df$d[1:5] <- df$a[1]
df$d[6:10] <- df$a[6]
df$d[11:15] <- df$a[11]
df$d[16:20] <- df$a[16]
>df
a b c d
1 1 2 3 1
2 2 3 4 1
3 3 4 5 1
4 4 5 6 1
5 5 6 7 1
6 6 7 8 6
7 7 8 9 6
8 8 9 10 6
9 9 10 11 6
10 10 11 12 6
11 11 12 13 11
12 12 13 14 11
13 13 14 15 11
14 14 15 16 11
15 15 16 17 11
16 16 17 18 16
17 17 18 19 16
18 18 19 20 16
19 19 20 21 16
20 20 21 22 16
I have tried
for (i in 1:nrow(df))
{df$d[i:(i+4)] <- df$a[seq(1, nrow(df), 4)]}
But this is not going the way I want it to. What am I doing wrong?
This should work:
df$d <- rep(df$a[seq(1,nrow(df),5)],each=5)
And here's a data.table solution:
library(data.table)
dt = data.table(df)
dt[, d := a[1], by = (seq_len(nrow(dt))-1) %/% 5]
I'd use logical indexing after initializing to NA
df$d <- NA
df$d <- rep(df$a[ c(TRUE, rep(FALSE,4)) ], each=5)
df
#--------
a b c d
1 1 2 3 1
2 2 3 4 1
3 3 4 5 1
4 4 5 6 1
5 5 6 7 1
6 6 7 8 6
7 7 8 9 6
8 8 9 10 6
9 9 10 11 6
10 10 11 12 6
11 11 12 13 11
12 12 13 14 11
13 13 14 15 11
14 14 15 16 11
15 15 16 17 11
16 16 17 18 16
17 17 18 19 16
18 18 19 20 16
19 19 20 21 16
20 20 21 22 16