Subtracting from random values in a weighted matrix in R - r

and thanks in advance for your help!
This question is related to one I posted before, but I think it deserves its own post because it is a separate challenge.
Last time I asked about randomly selecting values from a matrix after adding a vector. In that example, the matrix and the vector were both binary. Now I would like to change the values in a weighted matrix after adding a weighted vector. Here is some example code to play with.
require(gamlss.dist)
mat1<-matrix(c(0,0,0,0,1,0, 0,10,0,0,0,5, 0,0,0,0,1,0, 0,0,3,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,1,1,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,0,0,1,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1)) #rZIP is a function from gamlss.dist that randomly selects values from a zero-inflated distribution
vec1[ones]<-temp
The values in the vector are sampled from a zero-inflated distribution (thanks to this question). When I bind the vector to the matrix, I want to randomly select a non zero value from the same column, and subtract the vector value from it. I can see a further complication arising if the vector value is greater than the randomly selected value in the same column. In such an instance, it would simply set that value to zero.
Here is some modified code from the earlier question that does not work for this problem but maybe will be helpful.
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec != 0) #select matrix columns where the vector is not zero
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] != 0)
out <- if(length(ones) != 0) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows #this line doesn't work b/c it is not binary
mat[ind] <- 0 #here is where I would like to subtract the vector value
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
Any ideas? Thanks again for all of the fantastic help!
EDIT:
Thanks to help from bnaul down below, I am a lot closer to the answer, but we have run into the same problem we hit last time. The sample function doesn't work properly on columns where there is only one nonzero value. I have fixed this using Gavin Simpson's if else statement (which was the solution in the previous case). I've adjusted the matrix to have columns with only one nonzero value.
mat1<-matrix(c(0,0,0,0,1,0, 0,0,0,0,0,5, 0,0,0,0,1,0, 0,0,0,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,0,0,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,1,0,0,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1))
vec1[ones]<-temp
mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) { #Returns matrix of integers indicating their column
#number in matrix-like object
nonzero = which(head(col,-1) != 0); #negative integer means all but last # of elements in x
sample_ind = if(length(nonzero) == 1){
nonzero
} else{
sample(nonzero, 1)
}
; #sample nonzero elements one time
col[sample_ind] = max(0, col[sample_ind] - tail(col,1)); #take max of either 0 or selected value minus Inv
return(col)
}
)
Thanks again!

mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) {
nonzero = which(head(col,-1) != 0);
sample_ind = sample(nonzero, 1);
col[sample_ind] = max(0, col[sample_ind] - tail(col,1));
return(col)
}
)
I made a couple of simplifications; hopefully they don't conflict with what you had in mind. First, I ignore the requirement that you only operate on the nonzero elements of the vector, since subtracting 0 from anything will not change it. Second, I bind the matrix and vector and then perform the operation column-wise on the result, since this is a bit easier than tracking the indices in two separate data structures and then combining them afterward.

Related

R: Ignorance of logical condition

I am trying to find k maximum values from a global data.frame and update a logical variable outlier in each iteration. This is the code I am using
while(k > 0){
#find the max value index
index = which(data$value==max(data$value[which(data$brand==brand_rm &
data$outlier == FALSE)],na.rm = T))[1]
#update outlier column
data$outlier[index] <<- TRUE
k = k - 1
}
However it will only work in the first iteration and will ignore the data$outlier == FALSE condtion in the next iterations. Can you please explain why this happens and what am I doing wrong?
I find the better approach to do this which is:
while(k > 0){
#find the outlier index
row_to_remove = which(data[,value]==max(data[data$outlier ==FALSE &
data$brand==brand_rm ,value]))
ata$outlier[row_to_remove] <<- TRUE
}
if I understand your question right, you don't even need a loop for that. See my reproducible example below.
data <- data.frame(value=11:20, outlier=FALSE)
k <- 2
kth_value = sort(data$value, decreasing = TRUE)[k] #Get the Kth highest value
data$outlier <- ifelse( data$value>=kth_value, TRUE, FALSE) #Compare to the kth

How to locate specific elements in one matrix and compare those with a second matrix?

Let's have a binary Matrix/ Data Frame:
library("Matrix")
df_binary <- data.frame(as.matrix(rsparsematrix(1000, 20,nnz = 800, rand.x = runif)))
df_binary[df_binary > 0] = 1
Now, I would like to create an index-object of all elements of equal value 1. How I can do this in R?
I need something like an index of those entries to compare the entries of the binary matrix with entries of a second matrix. Both matrices are of the same size - if this information could be important.
If you want a list out you could do something along the lines of
list_ones <- function(df) {
out <- list()
for (col in names(df)) {
out[[col]] <- which(df[[col]] == 1)
}
return(out)
}
list_ones(df_binary)

How to substitute negative values with a calculated value in an entire dataframe

I've got a huge dataframe with many negative values in different columns that should be equal to their original value*0.5.
I've tried to apply many R functions but it seems I can't find a single function to work for the entire dataframe.
I would like something like the following (not working) piece of code:
mydf[] <- replace(mydf[], mydf[] < 0, mydf[]*0.5)
You can simply do,
mydf[mydf<0] <- mydf[mydf<0] * 0.5
If you have values that are non-numeric, then you may want to apply this to only the numeric ones,
ind <- sapply(mydf, is.numeric)
mydf1 <- mydf[ind]
mydf1[mydf1<0] <- mydf1[mydf1<0] * 0.5
mydf[ind] <- mydf1
You could try using lapply() on the entire data frame, making the replacements on each column in succession.
df <- lapply(df, function(x) {
x <- ifelse(x < 0, x*0.5, x)
})
The lapply(), or list apply, function is intended to be used on lists, but data frames are a special type of list so this works here.
Demo
In the replace the values argument should be of the same length as the number of TRUE values in the list ('index' vector)
replace(mydf, mydf <0, mydf[mydf <0]*0.5)
Or another option is set from data.table, which would be very efficient
library(data.table)
for(j in seq_along(mydf)){
i1 <- mydf[[j]] < 0
set(mydf, i = which(i1), j= j, value = mydf[[j]][i1]*0.5)
}
data
set.seed(24)
mydf <- as.data.frame(matrix(rnorm(25), 5, 5))

Efficient algorithm to turn matrix subdiagonal to columns r

I have a non-square matrix and need to do some calculations on it's subdiagonals. I figure out that the best way is too turn subdiagonals to columns/rows and use functions like cumprod. Right now I use a for loop and exdiag defined as below:
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
However it to be not really efficient. Do you know any other algorithm to achieve that kind of results.
A little example to show what I am doing:
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
mat <- matrix(1:72, nrow = 12, ncol = 6)
newmat <- matrix(nrow=11, ncol=6)
for (i in 1:11){
newmat[i,] <- c(cumprod(exdiag(mat,i)),rep(0,max(6-12+i,0)))
}
Best regards,
Artur
The fastest but by far the most cryptic solution to get all possible diagonals from a non-square matrix, would be to treat your matrix as a vector and simply construct an id vector for selection. In the end you can transform it back to a matrix if you want.
The following function does that:
exdiag <- function(mat){
NR <- nrow(mat)
NC <- ncol(mat)
smalldim <- min(NC,NR)
if(NC > NR){
id <- seq_len(NR) +
seq.int(0,NR-1)*NR +
rep(seq.int(1,NC - 1), each = NR)*NR
} else if(NC < NR){
id <- seq_len(NC) +
seq.int(0,NC-1)*NR +
rep(seq.int(1,NR - 1), each = NC)
} else {
return(diag(mat))
}
out <- matrix(mat[id],nrow = smalldim)
id <- (ncol(out) + 1 - row(out)) - col(out) < 0
out[id] <- NA
return(out)
}
Keep in mind you have to take into account how your matrix is formed.
In both cases I follow the same logic:
first construct a sequence indicating positions along the smallest dimension
To this sequence, add 0, 1, 2, ... times the row length.
This creates the first diagonal. After doing this, you simply add a sequence that shifts the entire previous sequence by 1 (either down or to the right) until you reach the end of the matrix. To shift right, I need to multiply this sequence by the number of rows.
In the end you can use these indices to select the correct positions from mat, and return all that as a matrix. Due to the vectorized nature of this code, you have to check that the last subdiagonals are correct. These contain less elements than the first, so you have to replace the values not part of that subdiagonal by NA. Also here you can simply use an indexing trick.
You can use it as follows:
> diag1 <- exdiag(amatrix)
> diag2 <- exdiag(t(amatrix))
> identical(diag1, diag2)
[1] TRUE
In order to come to your result
amatrix <- matrix(1:72, ncol = 6)
diag1 <- exdiag(amatrix)
res <- apply(diag1,2,cumprod)
res[is.na(res)] <- 0
t(res)
You can modify the diag() function.
exdiag <- function(mat, off=0) {mat[row(mat) == col(mat)+off]}
exdiag2 <- function(matrix, off){diag(matrix[-1:-off,])}
Speed Test:
mat = diag(10, 10000,10000)
off = 4
> system.time(exdiag(mat,4))
user system elapsed
7.083 2.973 10.054
> system.time(exdiag2(mat,4))
user system elapsed
5.370 0.155 5.524
> system.time(diag(mat))
user system elapsed
0.002 0.000 0.002
It looks like that the subsetting from matrix take a lot of time, but it still performs better than your implementation. May be there are a lot of other subsetting approaches, which outperforms my solution. :)

statistical moments in R

I've got a data set in R of a variable, repeated 10,000 times and sampled 200 times on each repeat so a 10,000 by 200 matrix, I would like to calculate statistical moments for the variable up to an arbitrary number. So in the end I would like a numeric vector holding the value of moments.
I can get the variance and the mean for the data set using colMean and colVar, but they only go so far.
I am also aware of the moments package in R, however using the all.moments command is returning me moments for each time course, or treating each column or row as an individual variable, not what I want.
Does anyone know an equivalent to colMean and colVar for higher order moments? And if possible also for cross moments?
Many thanks!
I stole this code from an obscure R package e1071:
theskew<- function (x) {
x<-as.vector(x)
sum((x-mean(x))^3)/(length(x)*sd(x)^3)
}
thekurt <- function (x) {
x<-as.vector(x)
sum((x-mean(x))^4)/(length(x)*var(x)^2) - 3
}
You can fold that into your code by feeding them one column at a time
Okay did this yesterday for posterity here is a loop that will do what I asked.
Provided your data is a time course of a variable you are measuring, and you want the moments of that variable:
rm(list=ls())
yourdata<-read.table("whereveryourdatais/and/variableyouwant")
yourdata<-t(yourdata) #only do this at your own discretion
mu<-colMeans(yourdata,1:ncol(yourdata))
NumMoments <- 5
rawmoments <- matrix(NA, nrow=NumMoments, ncol=ncol(yourdata))
for(i in 1:NumMoments) {
rawmoments[i, ] <- colMeans(yourdata^i)
}
plot(rawmoments[1,])
holder<-matrix(NA,nrow=nrow(yourdata),ncol=ncol(yourdata))
middles<-matrix(NA,nrow=1,ncol=ncol(yourdata))
for(j in 1:nrow(yourdata)){
for(o in 1:ncol(rawmoments)){
middles[o]<-yourdata[j,o]-rawmoments[1,o]
}
holder[j,] <- middles
}
centmoments<-matrix(NA,nrow=NumMoments,ncol=ncol(yourdata))
for(i in 1:NumMoments){
centmoments[i,]<-colMeans(holder^i)
}
Then centmoments has the centralmoments and rawmoments has the raw moments, you can specify how many moments to take by changing the value of NumMoments.
Note that the first row in "centmoments" will be approximately 0.
Is this what you're looking for?
X <- matrix(1:12, 3, 4) # your data
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
for(i in 1:NumMoments) {
moments[i, ] <- colMeans(X^i)
}
EDIT:
okay, apparently you want "central moments"
X <- matrix(1:12, 3, 4)
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
Y <- X
for(i in 1:ncol(X)) {
Y[, i] <- Y[, i] - moments[1, i]
}
for(i in 2:NumMoments) {
moments[i, ] <- colMeans(Y^i)
}

Resources