Sequence of two numbers with decreasing occurrence of one of them - r

I would like to create a sequence from two numbers, such that the occurrence of one of the numbers decreases (from n_1 to 1) while for the other number the occurrences are fixed at n_2.
I've been looking around for and tried using seq and rep to do it but I can't seem to figure it out.
Here is an example for c(0,1) and n_1=5, n_2=3:
0,0,0,0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1
And here for c(0,1) and n_1=2, n_2=1:
0,0,1,0,1

Maybe something like this?
rep(rep(c(0, 1), n_1), times = rbind(n_1:1, n_2))
## [1] 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1
Here it is as a function (without any sanity checks):
myfun <- function(vec, n1, n2) rep(rep(vec, n1), times = rbind(n1:1, n2))
myfun(c(0, 1), 2, 1)
## [1] 0 0 1 0 1
inverse.rle
Another alternative is to use inverse.rle:
y <- list(lengths = rbind(n_1:1, n_2),
values = rep(c(0, 1), n_1))
inverse.rle(y)
## [1] 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1

An alternative (albeit slower) method using a similar concept:
unlist(mapply(rep,c(0,1),times=rbind(n_1:1,n_2)))
###[1] 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1

Here is another approach using upper-triangle of a matrix:
f_rep <- function(num1, n_1, num2, n_2){
m <- matrix(rep(c(num1, num2), times=c(n_1+1, n_2)), n_1+n_2+1, n_1+n_2+1, byrow = T)
t(m)[lower.tri(m,diag=FALSE)][1:sum((n_1:1)+n_2)]
}
f_rep(0, 5, 1, 3)
#[1] 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1
f_rep(2, 4, 3, 3)
#[1] 2 2 2 2 3 3 3 2 2 2 3 3 3 2 2 3 3 3 2 3 3 3

myf = function(x, n){
rep(rep(x,n[1]), unlist(lapply(0:(n[1]-1), function(i) n - c(i,0))))
}
myf(c(0,1), c(5,3))
#[1] 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1

Related

Can the acf function properly compute autocorrelation with a binary vector?

Hi supposed I have a vector consisting of 0's and 1's; will the default acf function be able to calculate this correctly?
set.seed ( 12 )
bin = sample(c(0,1), replace=TRUE, size=5000)
acf (bin )
Yes. Your example doesn't work because it is completely random. But we can create a binomial sample with an oscillating probability of 1s and 0s like this:
times <- seq(0, 20 * pi, pi / 6)
probs <- sin(times) * 0.5 + 0.5
Our probability of getting a 1 at each time step looks like this:
plot(times, probs, type = "l")
And we can generate a sample like this:
set.seed(1)
samp <- rbinom(length(times), 1, probs)
samp
#> [1] 0 1 1 1 1 1 0 1 0 0 0 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0
#> [38] 1 1 1 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1
#> [75] 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 1
#> [112] 1 1 0 0 0 0 0 0 0 1
And we can demonstrate that acf correctly identifies the autocorrelation:
acf(samp)
Created on 2022-02-12 by the reprex package (v2.0.1)

Count occurences of teams in matrix in R

Have a 1000*16 matrix from a simulation with team names as characters. I want to count number of occurrences per team in all 16 columns.
I know I could do apply(test, 2, table) but that makes the data hard to work with afterward since all teams is not included in every column.
If you have a vector that is all the unique team names you could do something like this. I'm counting occurrences here via column to ensure that not every team (in this case letter) is not included.
set.seed(15)
letter_mat <- matrix(
sample(
LETTERS,
size = 1000*16,
replace = TRUE
),
ncol = 16,
nrow = 1000
)
output <- t(
apply(
letter_mat,
1,
function(x) table(factor(x, levels = LETTERS))
)
)
head(output)
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
[1,] 1 2 0 1 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1
[2,] 0 1 0 2 2 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 2 2 1
[3,] 1 1 0 0 1 0 1 2 1 0 0 0 0 0 1 0 1 0 1 1 0 0 3 0 1 1
[4,] 0 1 0 0 0 1 0 0 0 2 0 1 0 0 1 1 1 1 2 0 2 3 0 0 0 0
[5,] 2 1 0 0 0 0 0 2 0 2 1 1 1 0 0 2 0 2 1 0 0 1 0 0 0 0
[6,] 0 0 0 0 0 1 3 1 0 0 0 0 1 1 3 0 1 0 0 1 0 0 0 1 0 3

How to reset cumsum at end of consecutive string [duplicate]

This question already has answers here:
Cumulative sum for positive numbers only [duplicate]
(9 answers)
Closed 6 years ago.
If I have the following vector:
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
how can I calculate the cumulative sum for all of the consecutive 1's, resetting each time I hit a 0?
So, the desired output would look like this:
> y
[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
This works:
unlist(lapply(rle(x)$lengths, FUN = function(z) 1:z)) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
It relies pretty heavily on your special case of only having 1s and 0s, but for that case it works great! Even better, with #nicola's suggested improvements:
sequence(rle(x)$lengths) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
I read this post about how to split a vector, and use splitAt2 by #Calimo.
So it's like this:
splitAt2 <- function(x, pos) {
out <- list()
pos2 <- c(1, pos, length(x)+1)
for (i in seq_along(pos2[-1])) {
out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
}
return(out)
}
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
where_split = which(x == 0)
x_split = splitAt2(x, where_split)
unlist(sapply(x_split, cumsum))
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Here is another option
library(data.table)
ave(x, rleid(x), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Or without any packages
ave(x, cumsum(c(TRUE, x[-1]!= x[-length(x)])), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3

Create block diagonal data frame in R

I have a data set that looks like this:
Person Team
114 1
115 1
116 1
117 1
121 1
122 1
123 1
214 2
215 2
216 2
217 2
221 2
222 2
223 2
"Team" ranges from 1 to 33, and teams vary in terms of size (i.e., there can be 5, 6, or 7 members, depending on the team). I need to create a data set into something that looks like this:
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1
The sizes of the individual blocks are given by the number of people in a team. How can I do this in R?
You could use bdiag from the package Matrix. For example:
> bdiag(matrix(1,ncol=7,nrow=7),matrix(1,ncol=7,nrow=7))
Another idea, although, I guess this is less efficient/elegant than RStudent's:
DF = data.frame(Person = sample(100, 21), Team = rep(1:5, c(3,6,4,5,3)))
DF
lengths = tapply(DF$Person, DF$Team, length)
mat = matrix(0, sum(lengths), sum(lengths))
mat[do.call(rbind,
mapply(function(a, b) arrayInd(seq_len(a ^ 2), c(a, a)) + b,
lengths, cumsum(c(0, lengths[-length(lengths)])),
SIMPLIFY = F))] = 1
mat

merge one data frame by row with another data frame as a template

I want to merge each row of the data.frame my.samples to another data.frame my.template to obtain the desired.result.
The template my.template could be created with expand.grid. So, even though this is a minimal example the output data set desired.result is still large.
I have posted below several attempts that did not work and one attempt that does work. However, the code that works seems overly complex.
Thank you for any advice. I prefer base R. There are numerous other posts about merging data frames. I looked at quite a few, but did not see this scenario addressed. Sorry if I overlooked it.
my.samples <- read.table(text = '
obs X1 X2 X3 z
1 2 1 0 1
2 0 0 0 1
3 0 1 2 1
', header = TRUE)
my.template <- read.table(text = '
X1 X2 X3
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2
', header = TRUE)
desired.result <- read.table(text = '
obs X1 X2 X3 z
1 0 0 0 0
1 0 0 1 0
1 0 0 2 0
1 0 1 0 0
1 0 1 1 0
1 0 1 2 0
1 0 2 0 0
1 0 2 1 0
1 0 2 2 0
1 1 0 0 0
1 1 0 1 0
1 1 0 2 0
1 1 1 0 0
1 1 1 1 0
1 1 1 2 0
1 1 2 0 0
1 1 2 1 0
1 1 2 2 0
1 2 0 0 0
1 2 0 1 0
1 2 0 2 0
1 2 1 0 1
1 2 1 1 0
1 2 1 2 0
1 2 2 0 0
1 2 2 1 0
1 2 2 2 0
2 0 0 0 1
2 0 0 1 0
2 0 0 2 0
2 0 1 0 0
2 0 1 1 0
2 0 1 2 0
2 0 2 0 0
2 0 2 1 0
2 0 2 2 0
2 1 0 0 0
2 1 0 1 0
2 1 0 2 0
2 1 1 0 0
2 1 1 1 0
2 1 1 2 0
2 1 2 0 0
2 1 2 1 0
2 1 2 2 0
2 2 0 0 0
2 2 0 1 0
2 2 0 2 0
2 2 1 0 0
2 2 1 1 0
2 2 1 2 0
2 2 2 0 0
2 2 2 1 0
2 2 2 2 0
3 0 0 0 0
3 0 0 1 0
3 0 0 2 0
3 0 1 0 0
3 0 1 1 0
3 0 1 2 1
3 0 2 0 0
3 0 2 1 0
3 0 2 2 0
3 1 0 0 0
3 1 0 1 0
3 1 0 2 0
3 1 1 0 0
3 1 1 1 0
3 1 1 2 0
3 1 2 0 0
3 1 2 1 0
3 1 2 2 0
3 2 0 0 0
3 2 0 1 0
3 2 0 2 0
3 2 1 0 0
3 2 1 1 0
3 2 1 2 0
3 2 2 0 0
3 2 2 1 0
3 2 2 2 0
', header = TRUE)
# this works for one obs at a time
merge(my.samples[1,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
# this does not work
apply(my.samples, 1, function(x) merge(x, my.template, by=c('X1', 'X2', 'X3'), all=TRUE))
# this does not work
my.output <- matrix(0, nrow=(3^3 * max(my.samples$obs)), ncol=5)
for(i in 1:max(desired.result$obs)) {
x <- merge(my.samples[i,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
my.output[((i-1) * 3^3 +1) : ((i-1) * 3^3 + 3^3), 1:5] <- x
}
# this works
for(i in 1:max(desired.result$obs)) {
x <- merge(my.samples[i,], my.template, by=c('X1', 'X2', 'X3'), all=TRUE)
x$obs <- i
x$z[is.na(x$z)] <- 0
if(i == 1) {my.output = x}
if(i > 1) {my.output = rbind(my.output, x)}
}
my.output
all.equal(my.output[1:3], desired.result[,2:4])
I believe this should work
#expand template
full<-do.call(rbind, lapply(unique(my.samples$obs),
function(x) cbind(obs=x, my.template)))
#merge
result<-merge(full, my.samples, all.x=T)
#change NA's to 0
result$z[is.na(result$z)]<-0
#> all(result==desired.result)
#[1] TRUE
I like the answer posted by #MrFlick but when I added another column to my.samples I discovered that I had to modify the code. Below is what I came up with.
my.samples <- read.table(text = '
obs X1 X2 X3 z aa
1 2 1 0 1 20
2 0 0 0 1 -10
3 0 1 2 1 10
', header = TRUE)
my.template <- read.table(text = '
X1 X2 X3
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 1 2
0 2 0
0 2 1
0 2 2
1 0 0
1 0 1
1 0 2
1 1 0
1 1 1
1 1 2
1 2 0
1 2 1
1 2 2
2 0 0
2 0 1
2 0 2
2 1 0
2 1 1
2 1 2
2 2 0
2 2 1
2 2 2
', header = TRUE)
obs.aa <- my.samples[, c(1, ncol(my.samples))]
my.template2 <- merge(my.template, obs.aa)
my.template3 <- merge(my.template2, my.samples, by=c('obs', 'aa', paste0('X', 1:(ncol(my.samples)-3))), all = TRUE)
my.template3$z[is.na(my.template3$z)] <- 0
my.template3

Resources