Is there a faster way to do this:
for (i in 1:nrow(dataframe))
{
dataframe$results <- with(dataframe, myownfunction(column1[i],
column2[i], column3[i], column4[i], column5[i], column6[i])
}
myownfunction finds implied volatility using uniroot(), but when uniroot does not find a solution it stops (usually because there some faults in the data), and the loop stops. Is there a way to make the function just output NA if it gets an error from uniroot and continue with the next row?
regards.
Part1: It's very likely that you could succeed with:
ave( dataframe[, c("column1", column2", column3", "column4", "column5", "column6")], myownfunction)
Part2: If you modified the function to test for failure with try and returned NA when it fails you can fill in the missing data in the result properly.
You seem to have to questions:
1) Returning a value if failure happens. As posted already, look at failwith in the plyr package. It does exactly this.
2) Speed up the for loop. Have you tried using mapply? It is a multivariate apply that applies a function to the each element of each argument simultaneously. So
mapply(myfunc, column1, column2, column3, column4)
(along with modifying myfunc to use failwith) would do the sort of thing you are looking for.
The plyr version of mapply is mdply if you prefer.
Got this from another forum, I like it more than the previous options:
R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16
Related
I need to apply a function that takes two arguments on matrices.
mapply(function(x, y) x+y, rbind(1:3, 1:3), rbind(2:4, 2:4))
output is
[1] 3 3 5 5 7 7
which doesn't give me the desired format I want. I need it to retain its matrix form.
On the other hand, apply function in R has an argument margin which helps retain the matrix format but only applies to one argument.
apply(rbind(1:3,1:3), MARGIN = c(1,2), function(x) x+3)
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 4 5 6
The point is there's a MARGIN argument for apply and not something like it for mapply or is there ?
PLEASE: I don't require an answer to rearrange the result, I can do it. I am using this piece of code to write a function that takes a three dimensional meshgrid which will be hassle to rearrange.
EDITED LATER:
I am really sorry, I didn't elaborate this,
Of course, I am not stuck because I wanna do
rbind(1:3, 1:3) + rbind(2:4, 2:4)
These rbinds are just examples of the vectors I am using. And the function(x, y) x+y is also an example of very long nested functions that I can't just copy here which will be so confusing and inefficient. But it is a function of two variables which is relevant for now.
I want to calculate levenshteinDist distance between the rownames and colnames of a matrix using mapply function: Because the volume of may matrix is too big and using a nested loop "for" take a very long time to give me the result.
Here's the old code with nested loop:
mymatrix <- matrix(NA, nrow=ncol(dataframe),ncol=ncol(dataframe),dimnames=list(colnames(dataframe),colnames(dataframe)))
distfunction = function (text1, text2) {return(1 - (levenshteinDist(text1, text2)/max(nchar(text1), nchar(text2))))}
for(i in 1:ncol(mymatrix))
{
for(j in 1:nrow(mymatrix))
mymatrix[i,j]=(distfunction(rownames(mymatrix)[i], colnames(mymatrix)[j]))*100
}
I tried to switch nested loop by mapply:
mapply(distfunction,mymatrix)
It gave me this error:
Error in typeof(str2) : argument "text2" is missing, with no default
I planned to apply the levenshteinDist distance to my matrix and then conclude how to apply myfunction.
Is it possible?
Thank you.
The function mapply cannot be used in this context. It requires two input vectors and the function is applied to the first elements, second elements, .. and so on. But you want all combinations applied.
You could try a stacked sapply
sapply(colnames(mymatrix), function(col)
sapply(rownames(mymatrix), function(row)
distfunction(row, col)))*100
Simple usage example
sapply(1:3, function(x) sapply(1:4, function(y) x*y))
Output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
[4,] 4 8 12
Update
Even better is to use outer but i think your distfunction is not vectorized (due to the max). So use the wrapper function Vectorize:
distfunction_vec <- Vectorize(distfunction)
outer(rownames(mymatrix), rownames(mymatrix), distfunction_vec)
But I'm not sure about the performance penalty. Better to directly vectorize the function (probably with pmax).
All,
I have the following code, I'd like to make it generalized for more clusters, ie C clusters. Is there a way to do this without a loop? Here, the rows of X correspond to variables x1,x2, and T is a linear transformation of X. Thanks.
X=matrix(c(2,3,4,5,6,7,8,9,2,3,4,5,6,7,8,9),2)
cluster=c(1,1,1,0,0,0,0,0)
T=matrix(c(1,2,2,1),2)
f<-function(x) max(eigen(t(x)%*%x)$values)
f(T%*%X[,cluster==0])+f(T%*%X[,cluster==1])
## [1] 1134.87
I was thinking of
sum(tapply(X,cluster,function(x) f(T%*%x)))
but I get this error, I think because tapply takes a vector vs matrix:
> sum(tapply(X,cluster,function(x) f(T%*%x)))
Error in tapply(X, cluster, function(x) f[x]) :
arguments must have same length
Here is an answer with a for loop, if you can find something without a loop please let me know
#
c=length(levels(factor(cluster)))
cluster=factor(cluster,labels=1:c)
s=0
for (i in 1:c){
s=s+f(T%*%X[,cluster==c])
}
s
## [1] 1134.872
Could try doing this via tapply
tapply(seq_len(ncol(X)), cluster, function(x) f(T%*%X[, x]))
# 0 1
# 3840.681 1238.826
Given the following matrix with weights in ls in the first column and heihts in the second colum:
> wgt.hgt.matrix
[,1] [,2]
[1,] 180 70
[2,] 156 67
[3,] 128 64
[4,] 118 66
[5,] 202 72
I am looking for a concise way to apply this a binary function like
function(lb, inch) { (lb/inch**2)*703 } -> bmi
to each row of the matrix, resulting in an array, list or vector of with the 5 resulting BMI values. One way I found uses the apply function:
apply(wgt.hgt.matrix, 1, function(row) bmi(row[1], row[2]))
But a splat operator as in Ruby (*) would help making the call more concise and clear:
apply(wgt.hgt.matrix, 1, function(row) bmi(*row))
Does an equivalent to the splat operator exist, i.e. a syntax element telling R to split all vector-like objects to populate argument lists? Are there other, simpler or more concise suggestion for the apply call?
Perhaps I'm missing something, but what's wrong with:
wgt.hgt.matrix <-
structure(c(180L,156L,128L,118L,202L,70L,67L,64L,66L,72L), .Dim=c(5L,2L))
bmi <- function(lb, inch) (lb/inch**2)*703
bmi(wgt.hgt.matrix[,1], wgt.hgt.matrix[,2])
Update:
Based on the OP's comment, it seems like do.call would work more generally:
# put each matrix column in a separate list element
lc <- lapply(1:ncol(wgt.hgt.matrix), function(i) wgt.hgt.matrix[,i])
# call 'bmi' with one argument for each column / list element
do.call(bmi, lc)
Using the bmi() function as a vectorized solution is preferable since it has all vectorized operators, as was illustrated in Joshua's answer. You can also do this with:
colnames(wgt.hgt.matrix) <- c("lb", "inch")
with( as.data.frame(wgt.hgt.matrix), bmi(lb,inch) )
# [1] 25.82449 24.43039 21.96875 19.04362 27.39313
Unfortunately matrices are not good substrate for constructing environments using 'with' so coercing to a dataframe was needed above. You could get an apply solution (which will be less time-efficient than a vectorized approach) to work with a version of bmi() re-ritten to take a vector with named elements (as created above):
bmi <- function(vec) { (vec['lb']/vec['inch']**2)*703 }
apply(wgt.hgt.matrix, 1, function(row) bmi(row ) )
# [1] 25.82449 24.43039 21.96875 19.04362 27.39313
We can get pretty close to the syntax you're looking for with do.call:
## Setup
wgt.hgt.matrix=matrix(c(180,70,156,67,128,64,118,66,202,72),ncol=2,byrow=TRUE)
bmi = function(lb, inch) { (lb/inch**2)*703 }
## The action
apply(wgt.hgt.matrix, 1, function(row) do.call(bmi,as.list(row)))
do.call() is actually more flexible than just a splat operator, in that you can use the list names to give the argument names.
I'm analyzing large sets of data using the following script:
M <- c_alignment
c_check <- function(x){
if (x == c_1) {
1
}else{
0
}
}
both_c_check <- function(x){
if (x[res_1] == c_1 && x[res_2] == c_1) {
1
}else{
0
}
}
variance_function <- function(x,y){
sqrt(x*(1-x))*sqrt(y*(1-y))
}
frames_total <- nrow(M)
cols <- ncol(M)
c_vector <- apply(M, 2, max)
freq_vector <- matrix(nrow = sum(c_vector))
co_freq_matrix <- matrix(nrow = sum(c_vector), ncol = sum(c_vector))
insertion <- 0
res_1_insertion <- 0
for (res_1 in 1:cols){
for (c_1 in 1:conf_vector[res_1]){
res_1_insertion <- res_1_insertion + 1
insertion <- insertion + 1
res_1_subset <- sapply(M[,res_1], c_check)
freq_vector[insertion] <- sum(res_1_subset)/frames_total
res_2_insertion <- 0
for (res_2 in 1:cols){
if (is.na(co_freq_matrix[res_1_insertion, res_2_insertion + 1])){
for (c_2 in 1:max(c_vector[res_2])){
res_2_insertion <- res_2_insertion + 1
both_res_subset <- apply(M, 1, both_c_check)
co_freq_matrix[res_1_insertion, res_2_insertion] <- sum(both_res_subset)/frames_total
co_freq_matrix[res_2_insertion, res_1_insertion] <- sum(both_res_subset)/frames_total
}
}
}
}
}
covariance_matrix <- (co_freq_matrix - crossprod(t(freq_vector)))
variance_matrix <- matrix(outer(freq_vector, freq_vector, variance_function), ncol = length(freq_vector))
correlation_coefficient_matrix <- covariance_matrix/variance_matrix
A model input would be something like this:
1 2 1 4 3
1 3 4 2 1
2 3 3 3 1
1 1 2 1 2
2 3 4 4 2
What I'm calculating is the binomial covariance for each state found in M[,i] with each state found in M[,j]. Each row is the state found for that trial, and I want to see how the state of the columns co-vary.
Clarification: I'm finding the covariance of two multinomial distributions, but I'm doing it through binomial comparisons.
The input is a 4200 x 510 matrix, and the c value for each column is about 15 on average. I know for loops are terribly slow in R, but I'm not sure how I can use the apply function here. If anyone has a suggestion as to properly using apply here, I'd really appreciate it. Right now the script takes several hours. Thanks!
I thought of writing a comment, but I have too much to say.
First of all, if you think apply goes faster, look at Is R's apply family more than syntactic sugar? . It might be, but it's far from guaranteed.
Next, please don't grow matrices as you move through your code, that slows down your code incredibly. preallocate the matrix and fill it up, that can increase your code speed more than a tenfold. You're growing different vectors and matrices through your code, that's insane (forgive me the strong speech)
Then, look at the help page of ?subset and the warning given there:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[, and in particular the non-standard evaluation of argument subset
can have unanticipated consequences.
Always. Use. Indices.
Further, You recalculate the same values over and over again. fre_res_2 for example is calculated for every res_2 and state_2 as many times as you have combinations of res_1 and state_1. That's just a waste of resources. Get out of your loops what you don't need to recalculate, and save it in matrices you can just access again.
Heck, now I'm at it: Please use vectorized functions. Think again and see what you can drag out of the loops : This is what I see as the core of your calculation:
cov <- (freq_both - (freq_res_1)*(freq_res_2)) /
(sqrt(freq_res_1*(1-freq_res_1))*sqrt(freq_res_2*(1-freq_res_2)))
As I see it, you can construct a matrix freq_both, freq_res_1 and freq_res_2 and use them as input for that one line. And that will be the whole covariance matrix (don't call it cov, cov is a function). Exit loops. Enter fast code.
Given the fact I have no clue what's in c_alignment, I'm not going to rewrite your code for you, but you definitely should get rid of the C way of thinking and start thinking R.
Let this be a start: The R Inferno
It's not really the 4 way nested loops but the way your code is growing memory on each iteration. That's happening 4 times where I've placed # ** on the cbind and rbind lines. Standard advice in R (and Matlab and Python) in situations like this is to allocate in advance and then fill it in. That's what the apply functions do. They allocate a list as long as the known number of results, assign each result to each slot, and then merge all the results together at the end. In your case you could just allocate the correct size matrix in advance and assign into it at those 4 points (roughly speaking). That should be as fast as the apply family, and you might find it easier to code.