cbind recycled rows to add to last row in R - r

I have a question about cbinding recycled items. I simplified my problem into the following code.
I have two objects "a" and "b". "a" has 5 rows and "b" has 10 rows.
When I cbind them, I get a data.frame with 10 rows, and my column "a" recycles until it reaches 10 rows. My problem is, how do i recycle the values so it adds to the length(a). Thanks!
a <- c(4, 3, 5, 2, 8)
b <- c(1:10)
cbind(a,b)
a b
1 4 1
2 3 2
3 5 3
4 2 4
5 8 5
6 4 6
7 3 7
8 5 8
9 2 9
10 8 10
What I want to do: a[6] = a[5] + 4, a[7] = a[5] + 5, ... a[10] = a[5] + 8
a b
1 4 1
2 3 2
3 5 3
4 2 4
5 8 5
6 12 6
7 11 7
8 13 8
9 10 9
10 16 10

Do you mean this? I have 5 items and I'm adding a[5] to the the next 5 items, 2*a[5] to the next 5 items and so on.
a <- c(4, 3, 5, 2, 8)
b <- c(1:11)
counter <-0:floor(length(b)-1)/length(a))
new.col <- rep(a[length(a)] * counter, each = length(a)) + a
length(new.col) <- length(b)
new.col
[1] 4 3 5 2 8 12 11 13 10 16
The first length(a) items stay intact, we add a[5] to the next length(a) items, 2*a[5] to the next length(a) items and so on...

Related

Extract cumulative unique values in a rolling basis (reset and resume) using data.table R

Given a data.table, I would like to extract cumulative unique elements until it reachs three unique values, than reset and resume:
y <- data.table(a=c(1, 2, 2, 3, 3, 4, 3, 2, 2, 5, 6, 7, 9, 8))
The desired output unique_acc_roll_3 is:
a unique_acc_roll_3
1 1
2 1 2
2 1 2
3 1 2 3
3 1 2 3
4 4 #4 is the forth element, so it resets and start again
3 3 4
2 2 3 4
2 2 3 4
5 5 #5 is the forth element, so it resets and start again
6 5 6
7 5 6 7
9 9 #9 is the forth element, so it resets and start again
8 8 9
Because it refers back recursively, I really got stucked... Real data is large, so data.table solutions would be great.
I can't think of any way to avoid a for loop essentially, except to hide it behind a Reduce call. My logic is to keep union-ing each new value at each row, until the set grows to length == n, at which point the new value is used as the starting point to the next iteration of the loop.
unionlim <- function(x, y, n=4) {
u <- union(x,y)
if(length(u) == n) y else u
}
y[, out := sapply(Reduce(unionlim, a, accumulate=TRUE), paste, collapse=" ")]
# a out
# 1: 1 1
# 2: 2 1 2
# 3: 2 1 2
# 4: 3 1 2 3
# 5: 3 1 2 3
# 6: 4 4
# 7: 3 4 3
# 8: 2 4 3 2
# 9: 2 4 3 2
#10: 5 5
#11: 6 5 6
#12: 7 5 6 7
#13: 9 9
#14: 8 9 8
This is far from the fastest code on the planet, but a quick test suggests it will chew about 1M cases in ~15 seconds on my decent machine.
bigy <- y[rep(1:nrow(y), 75e3)]
system.time({
bigy[, out := sapply(Reduce(unionlim, a, accumulate=TRUE), paste, collapse=" ")]
})
# user system elapsed
# 14.27 0.09 15.06
purrr::accumulate also does the work here
y$b <- accumulate(y$a, ~if(length(union(.x, .y)) == 4) .y else union(.x, .y))
y
a b
1 1 1
2 2 1, 2
3 2 1, 2
4 3 1, 2, 3
5 3 1, 2, 3
6 4 4
7 3 4, 3
8 2 4, 3, 2
9 2 4, 3, 2
10 5 5
11 6 5, 6
12 7 5, 6, 7
13 9 9
14 8 9, 8

Generate a vector of sequence greater than 1 but less than n in r

How do I generate a vector of sequence in this range 1<i<n that is the number contained in the vector will be a positive integer greater than 1, but less than n.
Here is what I tried bellow:
n <- 10
my_seq <- seq(from => 1, to =< n)
It gave me this error:
Error: unexpected '>' in "my_seq <- seq(from =>"
my expected output should be
[1] 2 3 4 5 6 7 8 9
Depending on which type of vectors you need. Below are some examples:
If you want to have ascend sequence (without duplicates)
seq(n-2)+1
# [1] 2 3 4 5 6 7 8 9
If you want to shuffle the values 2 to n-2:
sample(n-2)+1
# [1] 6 7 9 5 8 4 2 3
If you need random integers that allow duplicates
sample(n-2,replace = TRUE)+1
# [1] 5 2 8 9 4 3 6 9
You could generate the sequence using
n <- 10
2:(n-1)
#[1] 2 3 4 5 6 7 8 9
OR
seq(2, n - 1)
You can also do:
tail(head(1:n, -1), -1)
[1] 2 3 4 5 6 7 8 9

R: how to change values in a data.frame

> dummy <- data.frame(X = c(1, 2, 3, 4, 5, 5, 2, 6, 7, 2), Y = c(3, 2, 1, 4, 5, 6, 7, 3, 4, 2))
> dummy
X Y
1 1 3
2 2 2
3 3 1
4 4 4
5 5 5
6 5 6
7 2 7
8 6 3
9 7 4
10 2 2
I have a data.frame that consists of values from 1 to 7. I want to change the 1's to 7's (and vice versa), 2's to 6's (and vice versa), 3's to 5's (and vice versa), and the 4's will stay as 4's. I.e. essentially I want to 'reverse' the numbers. I thought about writing a for loop to iterate over each value in each column and use ifelse statements, but how can I change, say, the 7's to 1's and the 1's to 7s simultaneously?
Considering all the pairs of numbers you want to switch have a sum of 8, you can subtract your original data frame from 8 and all the values should be reverted as you want, so you can just do 8 - dummy:
dummy = 8 - dummy
dummy
# X Y
#1 7 5
#2 6 6
#3 5 7
#4 4 4
#5 3 3
#6 3 2
#7 6 1
#8 2 5
#9 1 4
#10 6 6
match is the right generic way to do this - it will work even when you can't find a nice simple mathematical operation:
First set up key and value vectors, where the ith entry of key you want to replace with the corresponding entry of value:
key = 1:7 # key to look up (current value)
value = 7:1 # desired value corresponding to key
dummy$newX = value[match(dummy$X, key)]
dummy$newY = value[match(dummy$Y, key)]
# X Y newX newY
# 1 1 3 7 5
# 2 2 2 6 6
# 3 3 1 5 7
# 4 4 4 4 4
# 5 5 5 3 3
# 6 5 6 3 2
# 7 2 7 6 1
# 8 6 3 2 5
# 9 7 4 1 4
# 10 2 2 6 6
You could, of course, directly overwrite X and Y - I keep them both here to demonstrate that it worked.
Making a little more generic:
max(dummy) + min(dummy) - dummy
X Y
1 7 5
2 6 6
3 5 7
4 4 4
5 3 3
6 3 2
7 6 1
8 2 5
9 1 4
10 6 6

Compare 2 values of the same row of a matrix with the row and column index of another matrix in R

I have a matrix1 with 11217 rows and 2 columns, a second matrix2 which has 10 rows and 10 columns. Now, I want to compare the values in the rows of matrix 1 with the indices of matrix 2 and if these are the same then the value of the corresponding index (currently 0) of the matrix2 should be increased with +1.
c1 <- x[2:11218] #these values go from 1 to 10
#second column from index 3 to N
c2 <- x[3:11219] #these values also go from 1 to 10
#matrix with column c1 and c2
m1 <- as.matrix(cbind(c1 = c1, c2 = c2))
#empty matrix which will count the frequencies
m2 <- matrix(0, nrow = 10, ncol = 10)
#change row and column names of m2 to the numbers of 1 to 10
dimnames(m2) <-list(c(1:10), c(1:10))
#go through every row of the matrix m1 and look which rotation appears, add 1 to m2 if the rotation
#equals the corresponding index
r <- c(1:10)
c <- c(1:10)
for (i in 1:nrow(m1)) {
if(m1[i,1] == r & m1[i,2] == c)
m2[r,c]+1
}
no frequencies where calculated, i don't understand why?
It appears that you are trying to replicate the behavior of table. I'd recommend just using it instead.
Simpler data (it appears you did not include variable x):
m1 <-
matrix(round(runif(20, 1,10))
, ncol = 2)
Then, use table. Here, I am setting the values of each column to be a factor to ensure that the right columns are generated:
table(factor(m1[,1], 1:10)
, factor(m1[,2], 1:10))
gives:
1 2 3 4 5 6 7 8 9 10
1 3 4 0 4 2 0 5 3 2 0
2 3 7 9 7 4 5 3 4 5 2
3 4 6 3 10 8 9 4 2 7 3
4 5 2 14 3 7 13 8 11 3 3
5 2 13 2 5 8 5 7 7 8 6
6 1 10 7 4 5 6 8 5 8 5
7 3 3 6 5 4 5 4 8 7 7
8 5 5 8 7 6 10 5 4 3 4
9 2 5 8 4 7 4 4 6 4 2
10 3 1 2 3 3 5 3 5 1 0

repeat sequences from vector

Say I have a vector like so:
vector <- 1:9
#$ [1] 1 2 3 4 5 6 7 8 9
I now want to repeat every i to i+x sequence n times, like so for x=3, and n=2:
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
I'm accomplishing this like so:
index <- NULL
x <- 3
n <- 2
for (i in 1:(length(vector)/3)) {
index <- c(index, rep(c(1:x + (i-1)*x), n))
}
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
This works just fine, but I have a hunch there's got to be a better way (especially since usually, a for loop is not the answer).
Ps.: the use case for this is actually repeating rows in a dataframe, but just getting the index vector would be fine.
You can try to first split the vector, then use rep and unlist:
x <- 3 # this is the length of each subset sequence from i to i+x (see above)
n <- 2 # this is how many times you want to repeat each subset sequence
unlist(lapply(split(vector, rep(1:(length(vector)/x), each = x)), rep, n), use.names = FALSE)
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
Or, you can try creating a matrix and converting it to a vector:
c(do.call(rbind, replicate(n, matrix(vector, ncol = x), FALSE)))
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9

Resources