I want to calculate levenshteinDist distance between the rownames and colnames of a matrix using mapply function: Because the volume of may matrix is too big and using a nested loop "for" take a very long time to give me the result.
Here's the old code with nested loop:
mymatrix <- matrix(NA, nrow=ncol(dataframe),ncol=ncol(dataframe),dimnames=list(colnames(dataframe),colnames(dataframe)))
distfunction = function (text1, text2) {return(1 - (levenshteinDist(text1, text2)/max(nchar(text1), nchar(text2))))}
for(i in 1:ncol(mymatrix))
{
for(j in 1:nrow(mymatrix))
mymatrix[i,j]=(distfunction(rownames(mymatrix)[i], colnames(mymatrix)[j]))*100
}
I tried to switch nested loop by mapply:
mapply(distfunction,mymatrix)
It gave me this error:
Error in typeof(str2) : argument "text2" is missing, with no default
I planned to apply the levenshteinDist distance to my matrix and then conclude how to apply myfunction.
Is it possible?
Thank you.
The function mapply cannot be used in this context. It requires two input vectors and the function is applied to the first elements, second elements, .. and so on. But you want all combinations applied.
You could try a stacked sapply
sapply(colnames(mymatrix), function(col)
sapply(rownames(mymatrix), function(row)
distfunction(row, col)))*100
Simple usage example
sapply(1:3, function(x) sapply(1:4, function(y) x*y))
Output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
[4,] 4 8 12
Update
Even better is to use outer but i think your distfunction is not vectorized (due to the max). So use the wrapper function Vectorize:
distfunction_vec <- Vectorize(distfunction)
outer(rownames(mymatrix), rownames(mymatrix), distfunction_vec)
But I'm not sure about the performance penalty. Better to directly vectorize the function (probably with pmax).
Related
I need to apply a function that takes two arguments on matrices.
mapply(function(x, y) x+y, rbind(1:3, 1:3), rbind(2:4, 2:4))
output is
[1] 3 3 5 5 7 7
which doesn't give me the desired format I want. I need it to retain its matrix form.
On the other hand, apply function in R has an argument margin which helps retain the matrix format but only applies to one argument.
apply(rbind(1:3,1:3), MARGIN = c(1,2), function(x) x+3)
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 4 5 6
The point is there's a MARGIN argument for apply and not something like it for mapply or is there ?
PLEASE: I don't require an answer to rearrange the result, I can do it. I am using this piece of code to write a function that takes a three dimensional meshgrid which will be hassle to rearrange.
EDITED LATER:
I am really sorry, I didn't elaborate this,
Of course, I am not stuck because I wanna do
rbind(1:3, 1:3) + rbind(2:4, 2:4)
These rbinds are just examples of the vectors I am using. And the function(x, y) x+y is also an example of very long nested functions that I can't just copy here which will be so confusing and inefficient. But it is a function of two variables which is relevant for now.
I have a following R code. I wonder what is the difference when I put
mat<-matrix(1:20, ncol=5) outside the function. I am getting same results in both cases.
fun<-function(x,y){
mat<-matrix(1:20, ncol=5)
for (i in 1:x){
for (j in 1:y){
mat[i,j]=rnorm(1)
return(mat)
}
}
}
fun(4,5)
The following code works as suggested in the comments and answer. Why it does not work when converted to a function as above?
mat<-matrix(1:20, ncol=5)
for(i in 1:4){
for (j in 1:5){
mat[i,j]=rnorm(1)
}
}
mat
fun1 <- function(x,y) {
mat <- matrix(1:20, ncol=5)
mat[1:x, 1:y] <- rnorm(x*y)
mat
}
This will achieve your goal of creating a function that accepts indices as two arguments and returns a matrix with random normally distributed numbers by the index.
fun1(2,1)
# [,1] [,2] [,3] [,4] [,5]
#[1,] -0.2883407 5 9 13 17
#[2,] -0.5290704 6 10 14 18
#[3,] 3.0000000 7 11 15 19
#[4,] 4.0000000 8 12 16 20
Note that the last line is returned when the function is called.
The mat matrix created in the function is not available in the Global environment:
mat
#Error: object 'mat' not found
Whenever you write a nested for loop as a new user of R, alarm bells should go off. There is usually a better way. The advantage of the nested loop is that it "makes sense." But the clear logical progression turns out to be very inefficient in execution. There are exceptions of course, but you will most likely not run into them any time soon. It is better to take the time to learn R's programming intuition.
There are many discussions of scoping for study:
R environments and function call stacks
Scoping and functions in R 2.11.1 : What's going wrong?
http://developer.r-project.org/nonstandard-eval.pdf
http://adv-r.had.co.nz/Functions.html#lexical-scoping
I have a function that returns a list of vectors and matrices. I then create a variable that is a list of several of the resulting lists from calls to the function. So I have a list of lists. My question is how do I apply a function over the elements of these lists (note this is not the same as applying a function over the lists themselves). Here is a simple example that retains all the essential features of what I am doing
numtrials = 5
x = rep(list(NULL),numtrials)
testfunction = function(){return( list( c(1,2,3,4,5), matrix(runif(10), 2,5),
matrix(0,2,2) ) )}
for(index in 1:numtrials){
x[[index]] = testfunction()
}
I want to now calculate the mean of say the (2,3) element of x[[index]][[2]] across all "index" lists. Or even better get a matrix of means, xbar, such that xbar[i,j] = mean(x[[]][[2]][i,j]). I tried to play around with (and of course read the help file for) lapply, and apply, but couldn't get it to work. One of the reasons is that x[[]][[2]][i,j] appears to be invalid notation
Error in x[[]] : invalid subscript type 'symbol'
I think R doesn't know what to make of the "[[]]". I know some people are going to suggest vectorizing but note that my function returns matrices and vectors of different, unrelated dimensions (although I am not opposed to vectorizing if you have a clever way of doing this).
Using abind you can create a list which contains arrays for the relevent components of the internal lists..
eg
library(abind)
xl <- do.call(mapply, c('abind', x, rev.along = 0))
# the second element from each inner list is now within a 3-d array
# which is the 2nd element of xl
# you can now construct your matrix of mean values by using `apply`
means <- apply(xl[[2]], 1:2, mean)
means
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.4576039 0.5185270 0.7099742 0.3812656 0.4529965
## [2,] 0.6528345 0.2304651 0.5534443 0.4404609 0.7361132
If you know which elements you want to pull out, then it's pretty straightforward to grab them with sapply/lapply, and get the mean:
# Mean of [[2]][2, 3] elements
values = sapply(x, function(elem) {
return(elem[[2]][2, 3])
})
mean(values)
sapply applies a function to each element of the outer list, which is passed in as the elem argument to the little anonymous function I've written. Then you just get the 2nd element of each of those: elem[[2]], and index into it to get the [2, 3] value.
here is the code, it may help you, first unlist and then apply on top of it
ulist=function(x)
{
l=length(x)
for(i in 1:l) lst[[i]]=x[[i]][[2]]
return(lst)
}
#apply mean of each column of the x[[i]][[2]] matrix
sapply(ulist(x),function(y) apply(y,2,mean))
#apply mean of each row of the x[[i]][[2]] matrix
sapply(ulist(x),function(y) apply(y,1,mean))
Given the following matrix with weights in ls in the first column and heihts in the second colum:
> wgt.hgt.matrix
[,1] [,2]
[1,] 180 70
[2,] 156 67
[3,] 128 64
[4,] 118 66
[5,] 202 72
I am looking for a concise way to apply this a binary function like
function(lb, inch) { (lb/inch**2)*703 } -> bmi
to each row of the matrix, resulting in an array, list or vector of with the 5 resulting BMI values. One way I found uses the apply function:
apply(wgt.hgt.matrix, 1, function(row) bmi(row[1], row[2]))
But a splat operator as in Ruby (*) would help making the call more concise and clear:
apply(wgt.hgt.matrix, 1, function(row) bmi(*row))
Does an equivalent to the splat operator exist, i.e. a syntax element telling R to split all vector-like objects to populate argument lists? Are there other, simpler or more concise suggestion for the apply call?
Perhaps I'm missing something, but what's wrong with:
wgt.hgt.matrix <-
structure(c(180L,156L,128L,118L,202L,70L,67L,64L,66L,72L), .Dim=c(5L,2L))
bmi <- function(lb, inch) (lb/inch**2)*703
bmi(wgt.hgt.matrix[,1], wgt.hgt.matrix[,2])
Update:
Based on the OP's comment, it seems like do.call would work more generally:
# put each matrix column in a separate list element
lc <- lapply(1:ncol(wgt.hgt.matrix), function(i) wgt.hgt.matrix[,i])
# call 'bmi' with one argument for each column / list element
do.call(bmi, lc)
Using the bmi() function as a vectorized solution is preferable since it has all vectorized operators, as was illustrated in Joshua's answer. You can also do this with:
colnames(wgt.hgt.matrix) <- c("lb", "inch")
with( as.data.frame(wgt.hgt.matrix), bmi(lb,inch) )
# [1] 25.82449 24.43039 21.96875 19.04362 27.39313
Unfortunately matrices are not good substrate for constructing environments using 'with' so coercing to a dataframe was needed above. You could get an apply solution (which will be less time-efficient than a vectorized approach) to work with a version of bmi() re-ritten to take a vector with named elements (as created above):
bmi <- function(vec) { (vec['lb']/vec['inch']**2)*703 }
apply(wgt.hgt.matrix, 1, function(row) bmi(row ) )
# [1] 25.82449 24.43039 21.96875 19.04362 27.39313
We can get pretty close to the syntax you're looking for with do.call:
## Setup
wgt.hgt.matrix=matrix(c(180,70,156,67,128,64,118,66,202,72),ncol=2,byrow=TRUE)
bmi = function(lb, inch) { (lb/inch**2)*703 }
## The action
apply(wgt.hgt.matrix, 1, function(row) do.call(bmi,as.list(row)))
do.call() is actually more flexible than just a splat operator, in that you can use the list names to give the argument names.
Is there a faster way to do this:
for (i in 1:nrow(dataframe))
{
dataframe$results <- with(dataframe, myownfunction(column1[i],
column2[i], column3[i], column4[i], column5[i], column6[i])
}
myownfunction finds implied volatility using uniroot(), but when uniroot does not find a solution it stops (usually because there some faults in the data), and the loop stops. Is there a way to make the function just output NA if it gets an error from uniroot and continue with the next row?
regards.
Part1: It's very likely that you could succeed with:
ave( dataframe[, c("column1", column2", column3", "column4", "column5", "column6")], myownfunction)
Part2: If you modified the function to test for failure with try and returned NA when it fails you can fill in the missing data in the result properly.
You seem to have to questions:
1) Returning a value if failure happens. As posted already, look at failwith in the plyr package. It does exactly this.
2) Speed up the for loop. Have you tried using mapply? It is a multivariate apply that applies a function to the each element of each argument simultaneously. So
mapply(myfunc, column1, column2, column3, column4)
(along with modifying myfunc to use failwith) would do the sort of thing you are looking for.
The plyr version of mapply is mdply if you prefer.
Got this from another forum, I like it more than the previous options:
R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16