R - Apply function with different argument value for each row/column of a matrix - r

I am trying to apply a function to each row or column of a matrix, but I need to pass a different argument value for each row.
I thought I was familiar with lapply, mapply etc... But probably not enough.
As a simple example :
> a<-matrix(1:100,ncol=10);
> a
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 61 71 81 91
[2,] 2 12 22 32 42 52 62 72 82 92
[3,] 3 13 23 33 43 53 63 73 83 93
[4,] 4 14 24 34 44 54 64 74 84 94
[5,] 5 15 25 35 45 55 65 75 85 95
[6,] 6 16 26 36 46 56 66 76 86 96
[7,] 7 17 27 37 47 57 67 77 87 97
[8,] 8 18 28 38 48 58 68 78 88 98
[9,] 9 19 29 39 49 59 69 79 89 99
[10,] 10 20 30 40 50 60 70 80 90 100
Let's say I want to apply a function to each row, I would do :
apply(a, 1, myFunction);
However my function takes an argument, so :
apply(a, 1, myFunction, myArgument);
But if I want my argument to take a different value for each row, I cannot find the right way to do it.
If I define a 'myArgument' with multiple values, the whole vector will obviously be passed to each call of 'myFunction'.
I think that I would need a kind of hybrid between apply and the multivariate mapply. Does it make sense ?
One 'dirty' way to achieve my goal is to split the matrix by rows (or columns), use mapply on the resulting list and merge the result back to a matrix :
do.call(rbind, Map(myFunction, split(a,row(a)), as.list(myArgument)));
I had a look at sweep, aggregate, all the *apply variations but I wouldn't find the perfect match to my need. Did I miss it ?
Thank you for your help.

You can use sweep to do that.
a <- matrix(rnorm(100),10)
rmeans <- rowMeans(a)
a_new <- sweep(a,1,rmeans,`-`)
rowMeans(a_new)

I don't think there are any great answers, but you can somewhat simplify your solution by using mapply, which handles the "rbind" part for you, assuming your function always returns the same sizes vector (also, Map is really just mapply):
a <- matrix(1:80,ncol=8)
myFun <- function(x, y) (x - mean(x)) * y
myArg <- 1:nrow(a)
t(mapply(myFun, split(a, row(a)), myArg))

I know the topic is quiet old but I had the same issue and I solved it that way:
# Original matrix
a <- matrix(runif(n=100), ncol=5)
# Different value for each row
v <- runif(n=nrow(a))
# Result matrix -> Add a column with the row number
o <- cbind(1:nrow(a), a)
fun <- function(x, v) {
idx <- 2:length(x)
i <- x[1]
r <- x[idx] / v[i]
return(r)
}
o <- t(apply(o, 1, fun, v=v)
By adding a column with the row number to the left of the original matrix, the index of the needed value from the argument vector can be received from the first column of the data matrix.

Related

Choose the highest row value of an array

I have the following matrix as the result of a neuralnetowrk classification.
[,1] [,2] [,3] [,4]
78 6.679997e-04 4.650186e-05 9.820879e-01 4.037018e-02
85 6.721164e-05 4.037081e-03 3.442273e-04 9.993829e-01
97 5.889365e-06 8.632577e-03 7.168499e-04 9.992764e-01
52 2.118997e-01 5.272690e-04 9.340079e-01 2.318471e-05
63 1.630762e-05 2.278233e-04 9.999697e-01 1.327665e-05
11 9.999995e-01 8.570293e-04 1.033523e-05 1.954824e-03
4 9.999998e-01 4.675230e-03 4.100173e-06 1.386167e-04
67 8.230676e-08 3.901855e-05 9.999998e-01 2.482015e-05
82 3.113818e-05 4.045431e-03 4.980008e-04 9.994791e-01
59 2.199707e-02 8.932616e-05 9.996509e-01 3.201505e-06
68 6.396933e-05 3.507847e-05 9.999431e-01 2.231336e-04
50 3.644305e-03 9.955089e-01 6.152610e-07 2.438749e-03
65 2.985633e-01 3.111180e-04 7.284095e-04 9.567911e-01
84 8.953203e-08 2.043904e-03 2.796990e-02 9.997651e-01
33 5.182628e-03 9.959819e-01 1.582604e-07 9.150829e-03
29 4.094475e-03 9.936016e-01 2.439294e-07 1.378562e-02
21 9.999986e-01 2.920500e-03 2.343490e-04 8.551598e-06
79 2.356930e-01 1.064989e-04 9.998469e-01 8.037159e-08
54 9.760921e-07 1.125948e-04 9.999947e-01 4.913316e-05
71 7.575290e-05 1.901314e-03 9.998013e-01 1.212056e-06
73 3.069030e-02 1.351355e-04 9.961720e-01 2.970879e-05
98 1.852377e-05 1.071308e-02 1.508556e-03 9.923317e-01
8 9.999967e-01 1.091833e-03 8.615699e-05 3.788923e-04
55 7.353873e-05 1.572100e-04 9.999848e-01 2.654150e-06
87 6.485545e-05 1.801804e-03 2.487318e-03 9.978182e-01
66 1.075623e-04 9.965178e-05 9.999943e-01 1.090936e-06
6 9.999996e-01 2.057387e-03 5.199279e-06 8.711600e-04
46 1.675466e-03 9.923240e-01 5.403372e-07 1.406461e-02
48 2.897351e-03 9.948545e-01 2.023942e-07 1.650545e-02
28 4.179047e-03 9.950091e-01 1.261037e-07 2.139333e-02
99 6.191239e-08 2.242249e-02 7.910123e-04 9.999195e-01
47 1.265915e-03 9.928326e-01 1.905755e-07 6.175589e-02
41 2.460404e-02 9.910379e-01 2.134886e-07 6.080052e-03
45 1.416097e-03 9.904895e-01 4.379419e-07 3.060463e-02
18 9.999999e-01 2.119948e-03 4.377037e-06 2.702198e-04
What I want to do is to get the highest value in each row. More precisely, I want a vector with the list of the highest values for all the columns.
The first case would be number3:
78 6.679997e-04 4.650186e-05 **9.820879e-01** 4.037018e-02
The values represent the probability of correctly choosing the label of a case given the data.
If we want the column index of the max value per row, just use max.col
max.col(m1, "first")
Or with apply
apply(m1, 1, which.max)
Inorder to get the max values, we can use apply with MARGIN = 1 to loop over the rows and get the max
apply(m1, 1, max)
For columns, use MARGIN = 2
apply(m1, 2, max)
A vectorized option is max.col to get the max value per row
m1[cbind(seq_len(nrow(m1)), max.col(m1, "first")]
Or with pmax if it is a data.frame
do.call(pmax, as.data.frame(m1))

The first row is missing from head() function in R

Something interesting(strange) occured to me when I was trying to pull some data from the etf_env object from the rutils package.
First of all I created a variable called 'foo'.
foo <- as.list(rutils::etf_env)["VTI"]
Then I tried to call the head() function and here is the result.
> head(foo$VTI, n = 6)
VTI.Open VTI.High VTI.Low VTI.Close VTI.Volume VTI.Adjusted
2001-06-01 41.89521 42.18640 41.64041 42.0772 2542200 42.0772
2001-06-04 42.25920 42.29560 41.96801 42.2592 1018200 42.2592
2001-06-05 42.36841 42.95080 42.36841 42.8780 562400 42.8780
2001-06-06 42.76879 42.87799 42.47760 42.5140 278500 42.5140
2001-06-07 42.47761 42.73240 42.36841 42.7324 236700 42.7324
The first row is missing!
Then I created a random matrix called 'mat' and I tried to call the head() function again.
> mat <- matrix(1:100,ncol = 5)
> head(mat, n = 6)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 21 41 61 81
[2,] 2 22 42 62 82
[3,] 3 23 43 63 83
[4,] 4 24 44 64 84
[5,] 5 25 45 65 85
[6,] 6 26 46 66 86
The head() function seems working just fine. How and why did this happen? I'm really scratching my head right now. Hope somebody knows the answer. Many thanks!

How to write function that takes uses the single ouput from another function as starting point for new analysis?

I'm having trouble writing a function that calls another function and uses the output as the basis for running new analysis in a loop (or equivalent). For example, let's say function 1 creates this output: 10. The second function would take that as a starting point to run new analysis. The single data point from the second output would then be the basis for the next round of analysis, and so on.
Here's a simple example. The question is how to create a for loop for this. Or perhaps there's a more efficient way using lapply. In any case, the first function might be as follows:
f.1 <-function(x) {
x
a <-seq(x,by=1,length.out=5)
a.1 <-tail(a,1)
}
The second function, which calls the first function, could run as follows:
f.2 <-function(x) {
f.1 <-function(x) {
a <-seq(x,by=1,length.out=5)
a.1 <-tail(a,1)
}
z <-f.1(x)
y=z+1
seq(y,by=1,length.out=5)
}
How can I modify f.2() so that it re-runs that computation using the previous output as the basis for the next round of analysis. To be precise, f.1(10) outputs:
[1] 14
In turn, f.2(10) results in:
[1] 15 16 17 18 19
How can I re-write f.2() so that it automatically computes f.2(19) on the next iteration, and continually do so for several loops. In the process, I'd like to collect the outputs in a separate file for review. Thanks much!
The magrittr library (which is used most notably by dplyr) makes this type of chaining somewhat simple. First, define the functions,
f.1 <-function(x) {
x
a <- seq(x, by=1, length.out=5)
a.1 <- tail(a,1)
}
f.2 <-function(x) {
y <- x+1
seq(y, by=1, length.out=5)
}
then
library(magrittr)
f.1(10) %>% f.2
# [1] 15 16 17 18 19
As #BondedDust mentioned, you could use Reduce although normally it expects to use the same function over and over so you just need to flip the most common use case
Reduce(function(x,f) f(x), list(f.1, f.2), init=10)
# [1] 15 16 17 18 19
You can try this with two arguments for f.2. The first argument is the x value that you need to initialize x with and n is the number of iterations that you want to do. The output of the function will be a matrix containing n rows and 5 columns.
f.2 <-function(x, n) {
c <- matrix(nrow=n, ncol=5)
for (i in 1:nrow(c))
{
z <-f.1(x) ##if you have already defined your f.1(x) beforehand, there is no need to define it again in f.2. you can simply use z <- f.1(x) like it is done here
y=z+1
c[i,] = seq(y, by=1, length.out=5)
x = c[i,5]
}
return(c)
}
The output of
f <- f.2(10, 10) ##initialising x with 10 and running 10 loops
f
[,1] [,2] [,3] [,4] [,5]
[1,] 15 16 17 18 19
[2,] 24 25 26 27 28
[3,] 33 34 35 36 37
[4,] 42 43 44 45 46
[5,] 51 52 53 54 55
[6,] 60 61 62 63 64
[7,] 69 70 71 72 73
[8,] 78 79 80 81 82
[9,] 87 88 89 90 91
[10,] 96 97 98 99 100

Data frame to 3D array and calculate mean in Z

I have a data frame read from CSV which contains 14 columns and 990 rows. Each set of 110 rows contains repeats of structured data (not the values) with the first 5 columns being labels.
I now want to create a new grid of 14x110, such that if columns are labelled with letters and rows are numbered numerically, then A1 to E110 of the new grid are the labels and F1 contains the mean average of F1 in the original frame, and so on through to N110.
I have never used R before, and have got as far as calculating the mean of one cell with
mean(data[c(seq.int(3,nrow(d),110)),c(6)])
but I need some help with repeating this for the rest of the cells and constructing a resulting data frame, please.
To transform an matrix to a 3D array
yourarray=array(unlist(yourmatrix),dim = c(110,14,9))
Then to take an average of z values you can do something like
out=matrix(NA,110,14)
for(n in 1:14){
for(i in 1:110){out[i,n]=mean(b[i,n,])}}
Example
a=matrix(1:125,25,5)
b=array(unlist(a),dim = c(5,5,5))
out=matrix(NA,5,5)
for(n in 1:5){
for(i in 1:5){out[i,n]=mean(b[i,n,])}}
> out
[,1] [,2] [,3] [,4] [,5]
[1,] 51 56 61 66 71
[2,] 52 57 62 67 72
[3,] 53 58 63 68 73
[4,] 54 59 64 69 74
[5,] 55 60 65 70 75
Hope this is what you're after.

Apply over all columns and rows of two diffrent dataframes in R

I try to apply a function over all rows and columns of two dataframes but I don't know how to solve it with apply.
I think the following script explains what I intend to do and the way i tried to solve it. Any advice would be warmly appreciated! Please note, that the simplefunction is only intended to be an example function to keep it simple.
# some data and a function
df1<-data.frame(name=c("aa","bb","cc","dd","ee"),a=sample(1:50,5),b=sample(1:50,5),c=sample(1:50,5))
df2<-data.frame(name=c("aa","bb","cc","dd","ee"),a=sample(1:50,5),b=sample(1:50,5),c=sample(1:50,5))
simplefunction<-function(a,b){a+b}
# apply on a single row
simplefunction(df1[1,2],df2[1,2])
# apply over all colums
apply(?)
## apply over all columns and rows
# create df to receive results
df3<-df2
# loop it
for (i in 2:5)df3[i]<-apply(?)
My first mapply answer!! For your simple example you have...
mapply( FUN = `+` , df1[,-1] , df2[,-1] )
# a b c
# [1,] 60 35 75
# [2,] 57 39 92
# [3,] 72 71 48
# [4,] 31 19 85
# [5,] 47 66 58
You can extend it like so...
mapply( FUN = function(x,y,z,etc){ simplefunctioncodehere} , df1[,-1] , df2[,-1] , ... other dataframes here )
The dataframes will be passed in order to the function, so in this example df1 would be x, df2 would be y and z and etc would be some other dataframes that you specify in that order. Hopefully that makes sense. mapply will take the first row, first column values of all dataframes and apply the function, then the first row, second column of all data frames and apply the function and so on.
You can also use Reduce:
set.seed(45) # for reproducibility
Reduce(function(x,y) { x + y}, list(df1[, -1], df2[,-1]))
# a b c
# 1 53 22 23
# 2 64 28 91
# 3 19 56 51
# 4 38 41 53
# 5 28 42 30
You can just do :
df1[,-1] + df2[,-1]
Which gives :
a b c
1 52 24 37
2 65 63 62
3 31 90 89
4 90 35 33
5 51 33 45

Resources