Adding the subsequent numbers of list containing random numbers, to the subsequent indices - r

I have a list with some random numbers. I want to add the two following numbers for each random number and add them to the subsequent indices in the list, without using a for loop.
So, lets say I have this list: v <- c(238,1002,569,432,6,1284)
Then the output I want is:
v <- c(238,239,240,1002,1003,1004,569,570,571,432,433,434,6,7,8,1284,1285,1286)
I am still pretty new to r, so I don't really know what I'm doing, but I've tried for hours now with no results.. I have tho, made it work using a for loop, but I know r isn't too happy with loops so I really need to vectorize it, somehow.
Does anybody know how I can implement this into my r code in an efficient manner?

You can just use outer to calculate the outer sum:
res <- outer(0:2, v, "+")
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 238 1002 569 432 6 1284
#[2,] 239 1003 570 433 7 1285
#[3,] 240 1004 571 434 8 1286
You can then turn the resulting matrix into a vector:
res <- as.vector(res)
#[1] 238 239 240 1002 1003 1004 569 570 571 432 433 434 6 7 8 1284 1285 1286
Note that matrices are "column-major" in R.

Related

Expanding a list of numbers into a matrix (list with n values to multiply to a n x n matrix)

I have a set of numbers, which I want to expand into a matrix.
There are 4 values in the list which I want to expand into a 4x4 matrix.
Here is some example data
freq <- c(627,449,813,111)
I want to expand this into a matrix of so that it's like this.
Apologies I have just copied and pasted data, thus it's not an R output, but hope it helps to get the idea across.
1 2 3 4 Total
1 197 141 255 35 627
2 141 101 183 25 449
3 255 183 330 45 813
4 35 25 45 6 111
627 449 813 111 2000
The cells are multiplication of the (row total)x(column total)/(table total). The value in 1,1 = (627 x 627)/2000 = 197. The value in 2,1 = (627 x 449)/2000 = 141, and so on.
Is there a function that will create this matrix? I will try to do it via a loop but was hoping there is a function or matrix calculation trick that can do this more efficiently? Apologies if I didn't articulate the above too well, any help is greatly appreciated. Thanks
freq <- c(627,449,813,111)
round(outer(freq, freq)/sum(freq))
#> [,1] [,2] [,3] [,4]
#> [1,] 197 141 255 35
#> [2,] 141 101 183 25
#> [3,] 255 183 330 45
#> [4,] 35 25 45 6
It doesn't really matter here, but it is good practice to avoid constructions like outer(x, x) / sum(x) in favour of ones like tcrossprod(x / sqrt(sum(x))):
round(tcrossprod(freq / sqrt(sum(freq))))
## [,1] [,2] [,3] [,4]
## [1,] 197 141 255 35
## [2,] 141 101 183 25
## [3,] 255 183 330 45
## [4,] 35 25 45 6
There are a few issues with the outer approach:
outer(x, x) evaluates tcrossprod(as.vector(x), as.vector(x)) internally. The as.vector calls and everything else that happens inside of outer are completely redundant if x is already a vector. The as.vector calls are actually worse than redundant: if x has any attributes, then as.vector(x) requires a deep copy of x.
Naively doing A <- outer(x, x); A / sum(x) requires R to allocate memory for two n-by-n matrices. For large enough n, that can be quite wasteful, if not impossible. R is clever enough to avoid the second allocation if you compute outer(x, x) / sum(x) directly. However, such optimizations are low level, come with a number of gotchas, and are not even documented in ?Arithmetic, so it can be unsafe to rely on them.
outer(x, x) can result in underflow or overflow if the elements of x are very (very) small or large.
tcrossprod(x / sqrt(sum(x))) avoids all of these issues by scaling x before computing an outer product and cutting out all of the redundancies of outer.

Why is outer recycling a vector that should go unused and not throwing a warning?

I recently used the following line of code, expecting to get an error. To my surprise, I was given an output:
> outer(1:5,5:10,c=1:3,function(a,b,c) 10*a + 100*b + 1000*c)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1510 3610 2710 1810 3910 3010
[2,] 2520 1620 3720 2820 1920 4020
[3,] 3530 2630 1730 3830 2930 2030
[4,] 1540 3640 2740 1840 3940 3040
[5,] 2550 1650 3750 2850 1950 4050
It appears that the code is being evaluated to outer(1:5,5:10,function(a,b) 10*a + 100*b)+1000*(1:3). Why is this? And as a follow-up, is there any clear reason why this doesn't give a warning? To my mind, a user who entered code like this was probably expecting an output covering all a, b, and c values.
This is expected behaviour based on R's recycling rules. It has nothing to do with outer as such, though it might be a surprise if you think outer is somehow applying a function across margins.
Instead, outer takes two vectors X and Y as its first two arguments. It takes Xand replicates it length(Y) times. Similarly, it takes Y and replicates it length(X) times. Then it just runs your function FUN on these two long vectors, passing the long X as the first argument and the long Y as the second argument. Any other arguments to FUN have to be passed directly as arguments to outer via ... (as you have done with c = 1:3).
The result is a single long vector which is turned into a matrix by writing its dim attribute as the original values of length(X) by length(Y).
Now, in the specific example you gave, X has 5 elements (1:5) and Y has 6 (5:10). Therefore your anonymous function is called on two length-30 vectors and a single length-3 vector. R's recycling rules dictate that if the recycled vector fits neatly into the longer vector without partial recycling, no warning is emitted.
To see this, take your anonymous function and try it outside outer with two length-30 vectors and one length-3 vector:
f <- function(a, b, c) 10*a + 100*b + 1000*c
f(1:30, 1:30, 1:3)
#> [1] 1110 2220 3330 1440 2550 3660 1770 2880 3990 2100 3210 4320 2430
#> [14] 3540 4650 2760 3870 4980 3090 4200 5310 3420 4530 5640 3750 4860
#> [27] 5970 4080 5190 6300
3 recycles nicely into 30, so there is no warning.
Conversely, if the product of the length of the two vectors you pass to outer is not a multiple of 3, you will get a warning:
outer(1:5,6:10,c=1:3,function(a,b,c) 10*a + 100*b + 1000*c)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1610 3710 2810 1910 4010
#> [2,] 2620 1720 3820 2920 2020
#> [3,] 3630 2730 1830 3930 3030
#> [4,] 1640 3740 2840 1940 4040
#> [5,] 2650 1750 3850 2950 2050
#> Warning message:
#> In 10 * a + 100 * b + 1000 * c :
#> longer object length is not a multiple of shorter object length

How to automatically multiply and add some coefficient to a data frame in R?

I have this data set
obs <- data.frame(replicate(8,rnorm(10, 0, 1)))
and this coefficients
coeff <- data.frame(replicate(8,rnorm(2, 0, 1)))
For each column of obs, I need to multiply the first element of first column, and add the second element of the first column too. I need to do the same for the 8 columns. I read somewhere that if someone copy and paste code more than once you are doing something wrong... and that's exactly what I did.
obs.transformed.X1 <-(obs[1]*coeff[1,1])+coeff[2,1]
obs.transformed.X2 <-(obs[2]*coeff[1,2])+coeff[2,2]
.
.
.
.
.
obs.transformed.X8 <-(obs[8]*coeff[1,8])+coeff[2,8]
I know there is a smarter way to do this (loop?), but I just couldn't figure it out. Any help will be appreciated.
This is what I've tried but I am only getting the last column
for (i in 1:length(obs)) {
results=(obs[i]*coeff[1,i])+coeff[2,i]
}
If you coerce to matrix class you can use the sweep function in a sequential fashion first multiplying columns by the first row of coeff and then by adding hte second row, again column-wise:
obs <- data.frame(matrix(1:60, 10)) # I find checking with random numbers difficult
coeff <- data.frame(matrix(1:12,2))
sweep(
sweep(as.matrix(obs), 2, as.matrix(coeff)[1,], "*"), # first operation is "*"
2, as.matrix(coeff)[2,], "+" ) # arguments for the addition
#--------------------------------
X1 X2 X3 X4 X5 X6
[1,] 3 37 111 225 379 573
[2,] 4 40 116 232 388 584
[3,] 5 43 121 239 397 595
[4,] 6 46 126 246 406 606
[5,] 7 49 131 253 415 617
[6,] 8 52 136 260 424 628
[7,] 9 55 141 267 433 639
[8,] 10 58 146 274 442 650
[9,] 11 61 151 281 451 661
[10,] 12 64 156 288 460 672
Decreased number of columns because your original code was too wide for my Rstudio console. But this should be very general. I suspect there's an equivalent matrix operator method but It didn't come to me
I came up with this solution..
results = list()
for (i in 1:length(obs)) {
results[[i]]=(obs[i]*coeff[1,i])+coeff[2,i]
}
results <- as.data.frame(results)
Is there any efficient way to do this?
I used Map
results <- as.data.frame(Map(`+`, Map(`*`, obs, coeff[1,]), coeff[2,]))
This should also give what you are looking for.

What does sapply do for given function

I am still learning R. Kindly, I'd like to understand this function:
sapply(M[,-1], function(x) x^2)
Where M is a matrix. It looks like it is squaring every element in M. Can someone provide a brief example of how this line functions?
Thank you
The apply functions family in R are of different types depending on the use case.
1.When you want apply a function to the rows or columns of a matrix , apply() function is used.
When you want to apply a function to each element of a list in turn and get a list back , we use lapply() function.
When you want to apply a function to each element of a list in turn, but you want a vector in return, and not a list - we use sapply() function.
In your case above yes it squares all values and returns a vector , except the first column of the matrix, see below :
M <- matrix(seq(10,25), 4, 4) # random 4 by 4 matrix
[,1] [,2] [,3] [,4]
[1,] 10 14 18 22
[2,] 11 15 19 23
[3,] 12 16 20 24
[4,] 13 17 21 25
M[,-1]
[,1] [,2] [,3]
[1,] 14 18 22
[2,] 15 19 23
[3,] 16 20 24
[4,] 17 21 25
sapply(M[,-1], function(x) x^2)
[1] 196 225 256 289 324 361 400 441 484 529 576 625

shuffle elements of a matrix's column to correlate to another column of the matrix in R

I have a matrix of human height in R like:
#the first 4 rows of 400 total
[,1] [,2]
[1,] 178 162
[2,] 186 157
[3,] 179 159
[4,] 180 157
I need to shuffle elements of second column i.e x[,2] so that cor(x[,1],x[,2])≈0.6 or anything more than 0.5, and I want to keep x[,1] untouched. (for now it has a very weak correlation of <0.1)
anybody know how to do this? thanks in advance.

Resources