Creating a set of sequences with decreasing length in R - r

I want to make a set of sequences from x to 20, with x = c(2:19). I want this, essentially, but without having to do it this way:
a = seq(2, 20)
b = seq(3, 20)
...
q = seq(18, 20)
r = seq(19, 20)
> a
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> b
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
...
> q
[1] 18 19 20
> r
[1] 19 20`
I've attempted it using a for loop, but I can't get the replacement to work out:
a = c(2:20)
b = numeric()
for (i in 1:19){
b = seq(a[i]:20)
}
Any help?

sapply(2:19, seq, to = 20)
[[1]]
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[2]]
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[3]]
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[4]]
[1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[5]]
[1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[6]]
[1] 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[[7]]
[1] 8 9 10 11 12 13 14 15 16 17 18 19 20
[[8]]
[1] 9 10 11 12 13 14 15 16 17 18 19 20
[[9]]
[1] 10 11 12 13 14 15 16 17 18 19 20
[[10]]
[1] 11 12 13 14 15 16 17 18 19 20
[[11]]
[1] 12 13 14 15 16 17 18 19 20
[[12]]
[1] 13 14 15 16 17 18 19 20
[[13]]
[1] 14 15 16 17 18 19 20
[[14]]
[1] 15 16 17 18 19 20
[[15]]
[1] 16 17 18 19 20
[[16]]
[1] 17 18 19 20
[[17]]
[1] 18 19 20
[[18]]
[1] 19 20
If you want to save the object and give name to each element
res <- sapply(2:19, seq, to = 20)
names(res) <- letters[1:length(res)]

Extending on dickoa's answer to assign global variables a to r (although I would not see why that would ever be preferable over storing in a list):
mapply(FUN=assign,x=letters[1:18],value=sapply(2:19, seq, to = 20),MoreArgs=list(envir=.GlobalEnv))
Gives:
> a
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> b
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> q
[1] 18 19 20
> r
[1] 19 20

Related

Error in R function rmcorr: Error in psych::r.con(rmcorrvalue, errordf, p = CI.level) : number of subjects must be greater than 3

I'm trying to do a repeated measures correlation in R using rmcorr, but received the above error, even though I have more than 3 subjects.
> scores$SUBJECT
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
[36] 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[71] 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5
[106] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[141] 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8
[176] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[211] 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
[246] 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
[281] 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14
[316] 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15
[351] 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 17
[386] 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18
[421] 18 18 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
[456] 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21
[491] 21 21 21 21 21 21 21 21 21 21 21 21 21 21
# Convert data types
scores$SUBJECT<-factor(scores$SUBJECT)
scores$FACTOR1<-factor(scores$FACTOR1)
scores$FACTOR2<-factor(scores$FACTOR2)
Interestingly, I was able to perform the correlation on some subsets of the data but not others.
# SUBSETS
subset1 <- subset(scores, FACTOR1 == "m1")
subset1a <- subset(subset1, FACTOR2 == "a")
subset1b <- subset(subset1, FACTOR2 == "b")
subset1c <- subset(subset1, FACTOR2 == "c")
subset2 <- subset(scores, FACTOR1 == "mp")
subset2a <- subset(subset2, FACTOR2 == "a")
subset2b <- subset(subset2, FACTOR2 == "b")
subset2c <- subset(subset2, FACTOR2 == "c")
rmcorr(participant = subset1$SUBJECT, measure1 = subset1$SCORE, measure2 = subset2$SCORE, dataset = scores)
rmcorr(participant = subset1a$SUBJECT, measure1 = subset1a$SCORE, measure2 = subset2a$SCORE, dataset = scores)
rmcorr(participant = subset1b$SUBJECT, measure1 = subset1b$SCORE, measure2 = subset2b$SCORE, dataset = scores)
rmcorr(participant = subset1c$SUBJECT, measure1 = subset1c$SCORE, measure2 = subset2c$SCORE, dataset = scores)
Specifically
rmcorr(participant = subset1$SUBJECT, measure1 = subset1$SCORE, measure2 = subset2$SCORE, dataset = scores)
worked, but all of the other calls to rmcorr generated the error. Does anyone know where I went wrong?

How to create variable that clusters by order in R?

If I have a vector from 1 to 200, how would I create a variable that creates an ordered cluster of these numbers. Example would be that the first 10 numbers would be assigned a 1, the next 10 would be assigned a 2, etc.
You can use rep with the each argument. Substitute the length of your vector for 200 and the number wanted in each group for 10 respectively, and truncate if you aren't dividing into even groups.
rep(1:(200/10), each = 10)
#> [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
#> [24] 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
#> [47] 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7
#> [70] 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 10 10
#> [93] 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12
#> [116] 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14
#> [139] 14 14 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16 17
#> [162] 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 19 19 19 19
#> [185] 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20
Created on 2019-04-26 by the reprex package (v0.2.1)

R Generate a vector with increasing and then decreasing elements

How do I generate a vector in the form
1 2 ... 19 20 19 ... 2 1
Is it possible using the c() function?
You can use seq as well as rev function for the desired purpose.
seq
> c(1:20, seq(19,1,-1))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
As suggested by #jimbou,
> c(1:20, 19:1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
> c(1:20, rev(1:19))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

How to make data randomization faster in R?

I have very big data set and I'm computing thousands of models for it. For every model I need to randomize my data 100 times.This randomization part makes my script very slow.
Would someone help me to make this step faster?
Here is my code:
for (l in seq(repeat.times)) {
y <- as.matrix(dfr[1])
x <- as.matrix(dfr[2:ncol(dfr)])
# Random Generation
x.random.name = sample(colnames(x),1,replace=FALSE)
x.random.1 <- sample(x[,x.random.name],nrow(y),replace=FALSE)
x <- cbind(x,x.random.1)
.
.
.
For example:
> x
A B C D E
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> y
[,1]
[1,] 10
[2,] 20
[3,] 30
[4,] 40
After randomization:
> x
A B C D E x.random.1
[1,] 1 5 9 13 17 10
[2,] 2 6 10 14 18 12
[3,] 3 7 11 15 19 9
[4,] 4 8 12 16 20 11
>
This is way way faster if I understand OP's requirement correctly
x
## A B C D E
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
y
## [,1]
## [1,] 10
## [2,] 20
## [3,] 30
## [4,] 40
xncol <- ncol(x)
ynrow <- nrow(y)
require(microbenchmark)
microbenchmark(xrand <- sapply(1:100, FUN = function(iter) {
sample(x[, sample(1:xncol, 1)], ynrow)
}), times = 1L)
## Unit: milliseconds
## expr min
## xrand <- sapply(1:100, FUN = function(iter) { sample(x[, sample(1:xncol, 1)], ynrow) }) 1.083906
## lq median uq max neval
## 1.083906 1.083906 1.083906 1.083906 1
x <- cbind(x, xrand)
x
## A B C D E
## [1,] 1 5 9 13 17 8 16 2 18 5 3 10 10 14 9 19 6 6 15 18 2 13 13 15 18 7 20 17 11 13 1 16 1 20 1 9 19 14 20
## [2,] 2 6 10 14 18 7 14 3 20 8 4 12 9 13 10 20 8 8 13 20 1 14 15 16 20 6 19 19 10 16 2 15 4 17 4 12 20 15 19
## [3,] 3 7 11 15 19 5 15 1 19 7 2 11 12 15 11 18 7 7 14 17 4 15 16 14 19 8 17 18 9 14 4 14 2 18 3 11 18 16 17
## [4,] 4 8 12 16 20 6 13 4 17 6 1 9 11 16 12 17 5 5 16 19 3 16 14 13 17 5 18 20 12 15 3 13 3 19 2 10 17 13 18
##
## [1,] 5 13 2 3 5 2 5 8 4 6 19 3 7 19 4 7 6 4 17 9 18 9 5 3 1 15 8 19 19 3 19 15 15 1 1 10 15 19 11 6 5 17 7
## [2,] 7 15 1 1 7 1 6 6 3 8 18 2 6 17 2 6 5 3 18 10 17 11 8 1 3 13 6 17 18 4 17 16 13 4 3 11 16 18 9 8 8 18 6
## [3,] 8 14 3 2 8 3 8 7 2 7 20 1 8 18 3 8 8 1 20 12 19 10 6 2 2 16 5 20 17 2 18 13 16 3 4 12 13 20 12 7 7 20 8
## [4,] 6 16 4 4 6 4 7 5 1 5 17 4 5 20 1 5 7 2 19 11 20 12 7 4 4 14 7 18 20 1 20 14 14 2 2 9 14 17 10 5 6 19 5
##
## [1,] 3 3 15 19 2 12 16 11 18 7 10 11 5 12 12 10 1 2 19 2 16 17 11
## [2,] 4 2 13 20 1 11 15 12 17 5 11 12 6 10 9 11 4 3 18 3 14 19 9
## [3,] 1 4 16 18 4 10 14 9 19 8 12 9 8 11 11 9 3 4 20 4 13 20 12
## [4,] 2 1 14 17 3 9 13 10 20 6 9 10 7 9 10 12 2 1 17 1 15 18 10
The key step is ofcourse, which I have wrapped in microbenchmark purely for benchmarking purpose.
xrand <- sapply(1:100, FUN = function(iter) { sample(x[, sample(1:xncol, 1)], ynrow) })
Here is a one-liner:
# Data
x<-matrix(1:10^4,nrow=10)
# Generate 2000 replicates.
replicate(2000,x[order(runif(nrow(x))),sample(ncol(x),1)])
Or even just:
replicate(2000,sample(x[,sample(ncol(x),1)]))
I found that you could dramatically reduce the runtime by moving x and y outside the loop. Then you can just create a new transformed matrix in the loop
y <- as.matrix(dfr[1])
XX <- as.matrix(dfr[2:ncol(dfr)])
for (l in seq(repeat.times)) {
# Random Generation
x.random.name = sample(colnames(x),1,replace=FALSE)
x.random.1 <- sample(XX[,x.random.name],nrow(y),replace=FALSE)
x <- cbind(XX,x.random.1)
}
So i've moved out x and renamed it. Then when you do your analysis, you would continue to use the newly made x. I found that with my benchmark this speed things up by nearly two orders of magnitude.

Changing every set of 5 rows in R

I have a dataframe that looks like this:
df$a <- 1:20
df$b <- 2:21
df$c <- 3:22
df <- as.data.frame(df)
> df
a b c
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
6 6 7 8
7 7 8 9
8 8 9 10
9 9 10 11
10 10 11 12
11 11 12 13
12 12 13 14
13 13 14 15
14 14 15 16
15 15 16 17
16 16 17 18
17 17 18 19
18 18 19 20
19 19 20 21
20 20 21 22
I would like to add another column to the data frame (df$d) so that every 5 rows (df$d[seq(1, nrow(df), 4)]) would take the value of the start of the respective row in the first column: df$a.
I have tried the manual way, but was wondering if there is a for loop or shorter way that can do this easily. I'm new to R, so I apologize if this seems trivial to some people.
"Manual" way:
df$d[1:5] <- df$a[1]
df$d[6:10] <- df$a[6]
df$d[11:15] <- df$a[11]
df$d[16:20] <- df$a[16]
>df
a b c d
1 1 2 3 1
2 2 3 4 1
3 3 4 5 1
4 4 5 6 1
5 5 6 7 1
6 6 7 8 6
7 7 8 9 6
8 8 9 10 6
9 9 10 11 6
10 10 11 12 6
11 11 12 13 11
12 12 13 14 11
13 13 14 15 11
14 14 15 16 11
15 15 16 17 11
16 16 17 18 16
17 17 18 19 16
18 18 19 20 16
19 19 20 21 16
20 20 21 22 16
I have tried
for (i in 1:nrow(df))
{df$d[i:(i+4)] <- df$a[seq(1, nrow(df), 4)]}
But this is not going the way I want it to. What am I doing wrong?
This should work:
df$d <- rep(df$a[seq(1,nrow(df),5)],each=5)
And here's a data.table solution:
library(data.table)
dt = data.table(df)
dt[, d := a[1], by = (seq_len(nrow(dt))-1) %/% 5]
I'd use logical indexing after initializing to NA
df$d <- NA
df$d <- rep(df$a[ c(TRUE, rep(FALSE,4)) ], each=5)
df
#--------
a b c d
1 1 2 3 1
2 2 3 4 1
3 3 4 5 1
4 4 5 6 1
5 5 6 7 1
6 6 7 8 6
7 7 8 9 6
8 8 9 10 6
9 9 10 11 6
10 10 11 12 6
11 11 12 13 11
12 12 13 14 11
13 13 14 15 11
14 14 15 16 11
15 15 16 17 11
16 16 17 18 16
17 17 18 19 16
18 18 19 20 16
19 19 20 21 16
20 20 21 22 16

Resources