Using the Outer Function - r

I'm having difficulty using the outer function. I've looked at a few threads, but haven't been able to find a solution.
I have a matrix, prices, with the following information:
25 26
I use the outer function as follows to multiply these numbers together:
a = outer(prices[1,1:2],prices[1,1:2],FUN ="*")
This gives me the following error:
Error in as.vector(X) %*% t(as.vector(Y)) :
requires numeric/complex matrix/vector arguments
If, however, I do the exact same thing, but with the numbers directly, it works as I would like it to:
a = outer(c(25,26),c(25,26),FUN ="*")
and returns a 2x2 matrix with the products.
Any help would be greatly appreciated.

Your prices matrix is apparently a data.frame instead of a matrix. You can either change that:
prices <- as.matrix(prices)
a <- outer(prices[1,1:2],prices[1,1:2],FUN ="*")
or you can just convert to numeric when you use it:
a <- outer(as.numeric(prices[1,1:2]),as.numeric(prices[1,1:2]),FUN ="*")

prices <- matrix(c(25,26), nrow=1)
a = outer(prices[1,1:2],prices[1,1:2],FUN ="*")
# [,1] [,2]
#[1,] 625 650
#[2,] 650 676

Related

R data.table: How to use apply function?

I need to briefly explain the context before letting you know my question.
I am trying to process a large graph, namely the Social circles: Google+ here. The file gplus_combined.txt downloaded from that site is read by using data.table package:
library(data.table)
data = fread('gplus_combined.txt',stringsAsFactors = TRUE)
Variable data is of dimensions dim(data) = c(30494865,2) and here is an example of a row of data:
>data[1,]
>1: 112188647432305746617 107727150903234299458
The two long integer strings are ids of nodes of the graph, and each row of data corresponds to an edge between the first and second node ids. Since working with node ids like those are not very convenient, I'd like to convert them to numbers using R function strtoi. Here is what I have tried
M = matrix(0,2,2)
for (i in 1:2) {
for (j in 1:2) {
M[i,j] = strtoi(data[i,j,with = FALSE])
}
}
print(M)
[,1] [,2]
[1,] 47826 45374
[2,] 65616 2462
This works well, for just two rows of data. But it is too slow for processing about 30 millions rows of data. So I want to use R function apply to speed up the calculation. The problem is that if I just use
apply(data[1:2,], 1:2, strtoi)
[1,] NA NA
[2,] NA NA
then it returns a 2x2 matrix with NA entries. Note that to get the matrix M above, I need to include the parameter with = FALSE,
strtoi(data[i,j,with = FALSE])
otherwise M would also be a matrix of NA entries. Is there a way to pass the option with = FALSE to apply function? Or any other faster way to get the same result like matrix M? Any sugguestions/comments are greatly appreciated!
Thank you for spending your time reading this long post!

Populating a matrix using values from two other matrices of unpredictable size

I'd like to populate a matrix using information from two other matrices
I have managed to do this with a given dataset, but I need to integrate this within a larger script, and the size of the two matrices I'm using to populate the larger matrix may differ each time.
Example data:
days = 150
block <- matrix(c(50,120,150), nrow=3, ncol=1)
[,1]
[1,] 50
[2,] 120
[3,] 150
e1 <- matrix(c(0.1,0.5,0.7), nrow=3, ncol=1)
[,1]
[1,] 0.1
[2,] 0.5
[3,] 0.7
result <- matrix(0, nrow = 150, ncol=1)
I need to create a vector of numbers (taken from e1) that repeat themselves depending on each number in 'block'
The code below demonstrates the desired outcome in this instance, however I'm trying to write a more flexible script that can cope with fewer than or more than 3 'blocks'
I appreciate there is probably a much easier way of doing this, but my head is stuck in loop mode and I can't seem to get out of it!
for (v1 in 1:days){
if(v1 <= block[1,1]){
result[v1,1] <- e1[1,1]
}
else if (v1 > block[1,1] & v1 <= block[2,1]){
result[v1,1] <- e1[2,1]
}
else if (v1 > block[2,1] & v1 <= block[3,1]){
result[v1,1] <- e1[3,1]
}
}
Any help would be much appreciated!
You can get this by using a nice feature of rep:
result <- rep(e1, c(block[1], diff(block)))
# cast the vector as a column matrix
result <- matrix(result, length(result))
This works because rep will accept a vector in its second argument that tells it how many times to repeat each element of its first argument.
If you know the length ahead of time, you can combine the lines, like
result <- matrix(rep(e1, c(block[1], diff(block))), days)
for example.

Mapply with margin in R

I need to apply a function that takes two arguments on matrices.
mapply(function(x, y) x+y, rbind(1:3, 1:3), rbind(2:4, 2:4))
output is
[1] 3 3 5 5 7 7
which doesn't give me the desired format I want. I need it to retain its matrix form.
On the other hand, apply function in R has an argument margin which helps retain the matrix format but only applies to one argument.
apply(rbind(1:3,1:3), MARGIN = c(1,2), function(x) x+3)
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 4 5 6
The point is there's a MARGIN argument for apply and not something like it for mapply or is there ?
PLEASE: I don't require an answer to rearrange the result, I can do it. I am using this piece of code to write a function that takes a three dimensional meshgrid which will be hassle to rearrange.
EDITED LATER:
I am really sorry, I didn't elaborate this,
Of course, I am not stuck because I wanna do
rbind(1:3, 1:3) + rbind(2:4, 2:4)
These rbinds are just examples of the vectors I am using. And the function(x, y) x+y is also an example of very long nested functions that I can't just copy here which will be so confusing and inefficient. But it is a function of two variables which is relevant for now.

Calculate levenshteinDist between rownames and colnames using mapply

I want to calculate levenshteinDist distance between the rownames and colnames of a matrix using mapply function: Because the volume of may matrix is too big and using a nested loop "for" take a very long time to give me the result.
Here's the old code with nested loop:
mymatrix <- matrix(NA, nrow=ncol(dataframe),ncol=ncol(dataframe),dimnames=list(colnames(dataframe),colnames(dataframe)))
distfunction = function (text1, text2) {return(1 - (levenshteinDist(text1, text2)/max(nchar(text1), nchar(text2))))}
for(i in 1:ncol(mymatrix))
{
for(j in 1:nrow(mymatrix))
mymatrix[i,j]=(distfunction(rownames(mymatrix)[i], colnames(mymatrix)[j]))*100
}
I tried to switch nested loop by mapply:
mapply(distfunction,mymatrix)
It gave me this error:
Error in typeof(str2) : argument "text2" is missing, with no default
I planned to apply the levenshteinDist distance to my matrix and then conclude how to apply myfunction.
Is it possible?
Thank you.
The function mapply cannot be used in this context. It requires two input vectors and the function is applied to the first elements, second elements, .. and so on. But you want all combinations applied.
You could try a stacked sapply
sapply(colnames(mymatrix), function(col)
sapply(rownames(mymatrix), function(row)
distfunction(row, col)))*100
Simple usage example
sapply(1:3, function(x) sapply(1:4, function(y) x*y))
Output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
[4,] 4 8 12
Update
Even better is to use outer but i think your distfunction is not vectorized (due to the max). So use the wrapper function Vectorize:
distfunction_vec <- Vectorize(distfunction)
outer(rownames(mymatrix), rownames(mymatrix), distfunction_vec)
But I'm not sure about the performance penalty. Better to directly vectorize the function (probably with pmax).

R - Apply function on every row, with data from the row

Is there a faster way to do this:
for (i in 1:nrow(dataframe))
{
dataframe$results <- with(dataframe, myownfunction(column1[i],
column2[i], column3[i], column4[i], column5[i], column6[i])
}
myownfunction finds implied volatility using uniroot(), but when uniroot does not find a solution it stops (usually because there some faults in the data), and the loop stops. Is there a way to make the function just output NA if it gets an error from uniroot and continue with the next row?
regards.
Part1: It's very likely that you could succeed with:
ave( dataframe[, c("column1", column2", column3", "column4", "column5", "column6")], myownfunction)
Part2: If you modified the function to test for failure with try and returned NA when it fails you can fill in the missing data in the result properly.
You seem to have to questions:
1) Returning a value if failure happens. As posted already, look at failwith in the plyr package. It does exactly this.
2) Speed up the for loop. Have you tried using mapply? It is a multivariate apply that applies a function to the each element of each argument simultaneously. So
mapply(myfunc, column1, column2, column3, column4)
(along with modifying myfunc to use failwith) would do the sort of thing you are looking for.
The plyr version of mapply is mdply if you prefer.
Got this from another forum, I like it more than the previous options:
R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16

Resources