sweeping out colMeans and rowMeans - r

To sweep out colMeans, rowMeans and mean from columns, rows and observations respectively I use the following code:
a <- matrix(data=seq(from=2, to=60, by=2), nrow=6, ncol=5, byrow=FALSE)
b <- matrix(data=rep(colMeans(a), nrow(a)), nrow=nrow(a), ncol=ncol(a), byrow=TRUE)
c <- matrix(data=rep(rowMeans(a), ncol(a)), nrow=nrow(a), ncol=ncol(a), byrow=FALSE)
d <- matrix(data=rep(mean(a), nrow(a)*ncol(a)), nrow=nrow(a), ncol=ncol(a), byrow=FALSE)
e <- a-b-c-d
colMeans can be sweep out by using this command
a1 <- sweep(a, 2, colMeans(a), "-")
Is there any single command to sweep out colMeans, rowMeans and mean? Thanks in advance.

What do you think eshould look like in this example? Perhaps your line should be e <- a-b-c+d so that e has zero mean.
The following code produces the same result as your calculation using b, c, and d (with your arithmetic progression example, a matrix of 0s). Change +to - if you insist.
e <- t(t(a) - colMeans(a)) - rowMeans(a) + mean(a)

Not that I know of, but why not just write your own? It's only four lines:
meanSweep <- function(x){
tmp <- sweep(x,2,colMeans(x),"-")
tmp <- sweep(tmp,1,rowMeans(x),"-")
tmp <- tmp - mean(x)
tmp
}
all.equal(e,meanSweep(a))
[1] TRUE

Related

How to write this function to have as output a vector when the input is a vector?

Hello,
I want to have in output a vector when the entry is vector without using a function of apply's family. How should I write my function?
Thanks.
I used this code where I was forced to use two functions.
f1=function(l){
y= B1 # vector of length N
li= position # a vector of length N
h=10
a=(li-l)/h
Knorm=dnorm(a)
b=Knorm*y
num=sum(b)
den=sum(Knorm)
num/den
}
########## Forme vectorielle
f2 = function(l){
sapply(l,f1)
}
L=seq(10000000,11000000,by=1)
f2(L)
If I compute f1(L) I will get one value. That's why I was forced to write a second function to apply my first function to each element of vector L.
The purpose is to write it in one function.
Use outer and colSums to allow the function to take l as a vector:
f <- function(l){
y <- B1 # vector of length N
li <- position # a vector of length N
h <- 10
a <- outer(li, l, "-")/h
Knorm <- dnorm(a)
b <- Knorm*y
num <- colSums(b)
den <- colSums(Knorm)
num/den
}
And here is a simpler equivalent function:
f <- function(l){
Knorm <- dnorm(outer(position, l, "-")/10)
colSums(Knorm*B1)/colSums(Knorm)
}
Compare to OP's function:
f1=function(l){
y= B1 # vector of length N
li= position # a vector of length N
h=10
a=(li-l)/h
Knorm=dnorm(a)
b=Knorm*y
num=sum(b)
den=sum(Knorm)
num/den
}
position <- 10:1
B1 <- 1:10
sapply(8:12, f1)
#> [1] 5.300480 5.220937 5.141656 5.062713 4.984177
f(8:12)
#> [1] 5.300480 5.220937 5.141656 5.062713 4.984177
UPDATE
Based on the comments, something like this may work best for the large vectors involved:
library(parallel)
f1 <- function(l) {
dkAll <- abs(outer(position, l, "-"))
Knorm <- dnorm(outer(position, l, "-")/pmax(dkAll[order(col(dkAll), dkAll)[seq(70, by = length(position), length.out = length(l))]], 1000))
colSums(Knorm*y)/colSums(Knorm)
}
y <- seq(1, 100, length.out = 23710)
position <- seq(10351673, 12422082, length.out=23710)
l <- seq(11190000, 11460000, by=10)
# ysmoothed <- f1(l) # memory allocation error
cl <- makeCluster(detectCores())
clusterExport(cl, list("y", "position", "l", 'f1'))
system.time(ysmoothed <- parLapply(cl, l, f1))
#> user system elapsed
#> 0.02 0.00 20.13
Created on 2022-02-02 by the reprex package (v2.0.1)

printing intermediate multiplication using loop

I have data frame 'df' which has 8*8 rows and columns.
here i am getting the answer directly the 5th multiplication, i want all the intermediate multiplications answers.
And i also want the code in loop for 15 times, so there will be 15 intermediate multiplication outputs.
Code:
p <- eigen(df)$vector
d <- eigen(df)$values
n <- 5
p %*% diag(d^n) %*% solve(p)
expected output will: if i am multiplying n = 15 times, then there should be 15 matrices for each intermediate multiplication.
please help.
Assuming that you mean power (X^n) can do the following:
mat <- matrix(1:9, nrow=3)
n <- 5
pows <- list()
pows[[1]] <- mat
for (i in 2:n) {
pows[[i]] <- pows[[i - 1]] %*% pows[[1]]
}
p <- eigen(mat)$vector
d <- eigen(mat)$values
res <- p %*% diag(d^n) %*% solve(p)
all(res - pows[[n]] < 1e-6)
Can also use:
library(expm)
mat %^% n

Euclidean distance for each row in dataset

Suppose we have dataset G2:
data(iris)
G2 <- iris[1:5, -5]
We need to calculate Euclidean distance between x (row in G2) and G2 (excluding x) for all x's in G2, formally
I wonder what is the best way to to this. Here is my initial attempt:
D <- dist(G2)
m1 <- as.matrix(D)
(1 / (5 - 1)) * colSums(m1)
Your notation is a bit confusing because you use D differently in the code and formula. How about
m <- as.matrix(dist(G2, upper=T))
D <- apply(m, 2, mean)
n <- length(D)
D <- n/(n-1)*D

R applying a function to a sub-matrix [duplicate]

If I have an array A
A <- array(0, c(4, 3, 5))
for(i in 1:5) {
set.seed(i)
A[, , i] <- matrix(rnorm(12), 4, 3)
}
and if I have matrix B
set.seed(6)
B <- matrix(rnorm(12), 4, 3)
The code to subtract B from the each matrix of the array A would be:
d<-array(0, c(4,3,5))
for(i in 1:5){
d[,,i]<-A[,,i]-B
}
However, what would be the code to perform the same calculation using a function from "apply" family?
This is what sweep is for.
sweep(A, 1:2, B)
Maybe not very intuitive:
A[] <- apply(A, 3, `-`, B)
Because you are looping on the last array dimension, you can simply do:
d <- A - as.vector(B)
and it will be much faster. It is the same idea as when you subtract a vector from a matrix: the vector is recycled so it is subtracted to each column.

Generate all possible permutations from four integer lists in R

(Very) amateur coder and statistician working on a problem in R.
I have four integer lists: A, B, C, D.
A <- [1:133]
B <- [1:266]
C <- [1:266]
D <- [1:133, 267-400]
I want R to generate all of the permutations from picking 1 item from each of these lists (I know this code will take forever to run), and then take the mean of each of those permutations. So, for instance, [1, 100, 200, 400] -> 175.25.
Ideally what I would have at the end is a list of all of these means then.
Any ideas?
Here's how I'd do this for a smaller but similar problem:
A <- 1:13
B <- 1:26
C <- 1:26
D <- c(1:13, 27:40)
mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4
You'll probably crash R if you just up all the indices, but you could probably set up a loop, something like this (not tested):
B <- 1:266
C <- 1:266
D <- c(1:133, 267:400)
for(A in 1:133) {
mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4
write.table(mymat, file = paste("matrix", A, "txt", sep = "."))
write.table(mymeans, file = paste("means", A, "txt", sep = "."))
rm(mymat, mymeans)
}
to get them all. That still might be too big, in which case you could do a nested loop, or loop over D (since it's the biggest)
Alternatively,
n <- 1e7
A <- sample(133, size = n, replace= TRUE)
B <- sample(266, size = n, replace= TRUE)
C <- sample(266, size = n, replace= TRUE)
D <- sample(x = c(1:133, 267:400), size = n, replace= TRUE)
mymeans <- (A+B+C+D)/4
will give you a large sample of the means and take no time at all.
hist(mymeans)
Even creating a vector of means as large as your permutations will use up all of your memory. You will have to split this into smaller problems, look up writing objects to excel and then removing objects from memory here (both on SO).
As for the code to do this, I've tried to keep it as simple as possible so that it's easy to 'grow' your knowledge:
#this is how to create vectors of sequential integers integers in R
a <- c(1:33)
b <- c(1:33)
c <- c(1:33)
d <- c(1:33,267:300)
#this is how to create an empty vector
means <- rep(NA,length(a)*length(b)*length(c)*length(d))
#set up for a loop
i <- 1
#how you run a loop to perform this operation
for(j in 1:length(a)){
for(k in 1:length(b)){
for(l in 1:length(c)){
for(m in 1:length(d)){
y <- c(a[j],b[k],c[l],d[m])
means[i] <- mean(y)
i <- i+1
}
}
}
}
#and to graph your output
hist(means, col='brown')
#lets put a mean line through the histogram
abline(v=mean(means), col='white', lwd=2)

Resources