I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.
You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))
I've got a huge dataframe with many negative values in different columns that should be equal to their original value*0.5.
I've tried to apply many R functions but it seems I can't find a single function to work for the entire dataframe.
I would like something like the following (not working) piece of code:
mydf[] <- replace(mydf[], mydf[] < 0, mydf[]*0.5)
You can simply do,
mydf[mydf<0] <- mydf[mydf<0] * 0.5
If you have values that are non-numeric, then you may want to apply this to only the numeric ones,
ind <- sapply(mydf, is.numeric)
mydf1 <- mydf[ind]
mydf1[mydf1<0] <- mydf1[mydf1<0] * 0.5
mydf[ind] <- mydf1
You could try using lapply() on the entire data frame, making the replacements on each column in succession.
df <- lapply(df, function(x) {
x <- ifelse(x < 0, x*0.5, x)
})
The lapply(), or list apply, function is intended to be used on lists, but data frames are a special type of list so this works here.
Demo
In the replace the values argument should be of the same length as the number of TRUE values in the list ('index' vector)
replace(mydf, mydf <0, mydf[mydf <0]*0.5)
Or another option is set from data.table, which would be very efficient
library(data.table)
for(j in seq_along(mydf)){
i1 <- mydf[[j]] < 0
set(mydf, i = which(i1), j= j, value = mydf[[j]][i1]*0.5)
}
data
set.seed(24)
mydf <- as.data.frame(matrix(rnorm(25), 5, 5))
I have the following function taken from R: iterative outliers detection (this is an updated version):
dropout<-function(x) {
outliers <- NULL
res <- NULL
if(length(x)<2) return (1)
vals <- rep.int(1, length(x))
r <- chisq.out.test(x)
while (r$p.value<.05 & sum(vals==1)>2) {
if (grepl("highest",r$alternative)) {
d <- which.max(ifelse(vals==1,x, NA))
res <- rbind(list(as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value)),fill=TRUE)
}
else {
d <- which.min(ifelse(vals==1, x, NA))
}
vals[d] <- r$p.value
r <- chisq.out.test(x[vals==1])
}
return(res)
}
The problem is that in each round it gives me some missing rows to fill in the data.frame
i want to fill res but in some iterations it contains missing values.
I used all possible things e.g rbindlist, rbind.fill, rbind (with fill=TRUE) but nothing is working.
When i do something like :
res <- c(res,as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value))
it works but it creates 2 rows for each set of (V1,V2), one with the last column as r$alternativeand the second row with the same first 2 columns but with the p-value in the last column instead.
Thats how I'm calling the function on data similar as the one in the mentioned question:
outliers <- d[, dropout(V3), list(V1, V2)]
and im getting always this error : j doesn't evaluate to the same number of columns for each group
I am trying to create a "for loop" setup that is going calculate different rolling means of a return series, where I use rolling means ranging from the last 2 observations to the last 16 observations. kϵ[2,16]. I've been trying to use a function like this, where the "rollmean" is a function from zoo. This produces the warning "Warning message:
In roll[i] <- rollmean(x, i) :
number of items to replace is not a multiple of replacement length"
Can someone please help me?
rollk <- function(x, kfrom= 2, kto=16){
roll <- as.list(kto-kfrom+1)
for (i in kfrom:kto){
roll[i]<- rollmean(x, i)
return(roll)
}}
I suppose you want
# library(zoo)
rollk <- function(x, kfrom = 2, kto = 16){
roll <- list()
ft <- kfrom:kto
for (i in seq_along(ft)){
roll[[i]]<- rollmean(x, ft[i])
}
return(roll)
}
There are several problems in your function:
You need [[ to access a single list element, not [.
You want a list of length length(krom:kto). Now, i starts at 1, not at kfrom.
Now, roll is returned after the for loop. Hence, the function returns a single list containing all values.
A shorter equivalent of the function above:
rollk2 <- function(x, kfrom = 2, kto = 16)
lapply(seq(kfrom, kto), function(i) na.omit(filter(x, 1 / rep(i, i))))
It does not require loading additional packages.
Try this:
library(zoo)
lapply(2:16, rollmean, x = x)
and thanks in advance for your help!
This question is related to one I posted before, but I think it deserves its own post because it is a separate challenge.
Last time I asked about randomly selecting values from a matrix after adding a vector. In that example, the matrix and the vector were both binary. Now I would like to change the values in a weighted matrix after adding a weighted vector. Here is some example code to play with.
require(gamlss.dist)
mat1<-matrix(c(0,0,0,0,1,0, 0,10,0,0,0,5, 0,0,0,0,1,0, 0,0,3,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,1,1,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,0,0,1,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1)) #rZIP is a function from gamlss.dist that randomly selects values from a zero-inflated distribution
vec1[ones]<-temp
The values in the vector are sampled from a zero-inflated distribution (thanks to this question). When I bind the vector to the matrix, I want to randomly select a non zero value from the same column, and subtract the vector value from it. I can see a further complication arising if the vector value is greater than the randomly selected value in the same column. In such an instance, it would simply set that value to zero.
Here is some modified code from the earlier question that does not work for this problem but maybe will be helpful.
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec != 0) #select matrix columns where the vector is not zero
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] != 0)
out <- if(length(ones) != 0) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows #this line doesn't work b/c it is not binary
mat[ind] <- 0 #here is where I would like to subtract the vector value
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
Any ideas? Thanks again for all of the fantastic help!
EDIT:
Thanks to help from bnaul down below, I am a lot closer to the answer, but we have run into the same problem we hit last time. The sample function doesn't work properly on columns where there is only one nonzero value. I have fixed this using Gavin Simpson's if else statement (which was the solution in the previous case). I've adjusted the matrix to have columns with only one nonzero value.
mat1<-matrix(c(0,0,0,0,1,0, 0,0,0,0,0,5, 0,0,0,0,1,0, 0,0,0,0,0,0, 0,0,0,0,3,0,
0,0,2,0,0,0, 2,1,0,1,0,1, 0,0,0,0,37,0, 0,0,0,2,0,0, 0,0,0,0,0,1, 1,0,0,0,0,0,
0,0,0,0,0,0), byrow=T, ncol=6, nrow=12)
vec1<-c(0,1,0,0,1,1)
ones <- which(vec1 == 1L)
temp=rZIP(sum(vec1))
vec1[ones]<-temp
mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) { #Returns matrix of integers indicating their column
#number in matrix-like object
nonzero = which(head(col,-1) != 0); #negative integer means all but last # of elements in x
sample_ind = if(length(nonzero) == 1){
nonzero
} else{
sample(nonzero, 1)
}
; #sample nonzero elements one time
col[sample_ind] = max(0, col[sample_ind] - tail(col,1)); #take max of either 0 or selected value minus Inv
return(col)
}
)
Thanks again!
mat2 = rbind(mat1, vec1)
apply(mat2, 2, function(col) {
nonzero = which(head(col,-1) != 0);
sample_ind = sample(nonzero, 1);
col[sample_ind] = max(0, col[sample_ind] - tail(col,1));
return(col)
}
)
I made a couple of simplifications; hopefully they don't conflict with what you had in mind. First, I ignore the requirement that you only operate on the nonzero elements of the vector, since subtracting 0 from anything will not change it. Second, I bind the matrix and vector and then perform the operation column-wise on the result, since this is a bit easier than tracking the indices in two separate data structures and then combining them afterward.