Integrate function in R - r

From the code:
integrand <- function(x) {1/((x+1)*sqrt(x))}
a <- integrate(integrand, lower = 0, upper = Inf)
Then, it provides the result: 3.141593 with absolute error < 2.7e-05
How to keep only the value 3.141593?, because I need to calculate a+3 using R.
Thanks

Although it prints as a character, you can tell by
str(a)
that a is in fact a list with value as its first element. So you can get that value easily:
# by name
a$value
a[['value']]
# or index
a[[1]]
If you are interested in why it prints as a character: str(a) also tells you that a has a class attribute set to integrate. It turns out that print has a method for this class: you can find print.integrate in methods(print). This method determines the printing behavior. Normal printing can be forced by print.default(a).

Related

Using mocking with apply R

I am currently mocking in some unit tests, using the packages testthatand mockery. I am trying to understand how the function expect_argsfrom the mockerypackage works when the mocked function is actually called in a function using apply. Here is an example where the test is successful.
myMean <- function(A){
apply(A,1,mean)
}
myMat = matrix(rep(1,6), nrow = 2, ncol = 3)
test_that("myMean calls base::mean correctly",{
m <- mock(1, cycle = TRUE)
with_mock(
`base::mean` = m,
myMean(myMat),
expect_args(m, 1, as.double(myMat[1,])))
})
Let's take now a slightly more complicated example, where the argument of myMeanis actually a data.frame and needs to be converted to a matrix within the function.
myMean <- function(A){
B = as.matrix(A)
apply(B,1,mean)
}
myMat = as.data.frame(myMat)
test_that("myMean calls base::mean correctly",{
m <- mock(1, cycle = TRUE)
with_mock(
`base::mean` = m,
myMean(myMat),
expect_args(m, 1, as.double(myMat[1,])))
})
I then get the following error message:
Error: Test failed: 'myMeanSimple calls base::mean correct number of times
* 1st actual argument not equal to 1st expected argument.
names for target but not for current
This error is explained on the vignette of the mockery package. Nevertheless I do not manage to find which argument name I should associate with as.double(myMat[1,]).
First of all, I'm happy this small utility became useful! Second of all, the error you see results from how your transformations are carried out and how expect_args compares results. Internally, we call expect_equal which requires all of the names of the matrix to be present there.
After calling your second example I run this:
> mock_args(m)
[[1]]
[[1]][[1]]
V1 V2 V3
1 1 1
[[2]]
[[2]][[1]]
V1 V2 V3
1 1 1
So you can see that in the first call a single named raw was passed, and the same is true for the second call - there are names assigned to each column. This is because as.matrix preserves column names. So this is not about argument names, this is about names in the data that's compared.
Now, when you run your final comparison using expect_args you actually use as.double which doesn't preserve names. Thus, the error you see. To fix it you can simply change your expectation to:
expect_args(m, 1, as.matrix(myMat)[1,])
I hope this solves your problem.

Function name in single quotation marks in R

It may be a silly question but I have been bothered for quite a while. I've seen people use single quotation marks to surround the function name when they are defining a function. I keep wondering the benefit of doing so. Below is a naive example
'row.mean' <- function(mat){
return(apply(mat, 1, mean))
}
Thanks in advance!
Going off Richard's assumption, the back ticks allows you to use symbols in names which are normally not allowed. See:
`add+5` <- function(x) {return(x+5)}
defines a function, but
add+5 <- function(x) {return(x+5)}
returns
Error in add + 5 <- function(x) { : object 'add' not found
To refer to the function, you need to explicitly use the back ticks as well.
> `add+5`(3)
[1] 8
To see the code for this function, simply call it without its arguments:
> `add+5`
function(x) {return(x+5)}
See also this comment which deals with the difference between the backtick and quotes in name assignment: https://stat.ethz.ch/pipermail/r-help/2006-December/121608.html
Note, the usage of back ticks is much more general. For example, in a data frame you can have columns named with integers (maybe from using reshape::cast on integer factors).
For example:
test = data.frame(a = "a", b = "b")
names(test) <- c(1,2)
and to retrieve these columns you can use the backtick in conjunction with the $ operator, e.g.:
> test$1
Error: unexpected numeric constant in "test$1"
but
> test$`1`
[1] a
Levels: a
Funnily you can't use back ticks in assigning the data frame column names; the following doesn't work:
test = data.frame(`1` = "a", `2` = "b")
And responding to statechular's comments, here are the two more use cases.
In fix functions
Using the % symbol we can naively define the dot product between vectors x and y:
`%.%` <- function(x,y){
sum(x * y)
}
which gives
> c(1,2) %.% c(1,2)
[1] 5
for more, see: http://dennisphdblog.wordpress.com/2010/09/16/infix-functions-in-r/
Replacement functions
Here is a great answer demonstrating what these are: What are Replacement Functions in R?

R: passing by parameter to function and using apply instead of nested loop and recursive indexing failed

I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.

unexpected "=" in par=c(...)

I'm trying to reproduce the following code (nls function does not perform well), but with an extra implementation, using for loops, sprintf and as.formula(), that adds variables depending of the number of peaks in the given spectrum. To be more coherent among peaks, I vectorized the variable names for each peak, so peak number 1 has 'alfa[1]', 'peak[1]' and 'height[1]' related to it.
So far, I got the expected formula:
height[1]/(pi*alfa[1]*(1+((x-peak[1])/alfa[1])^2))+height[2]/(pi*alfa[2]*(1+((x-peak[2])/alfa[2])^2))+drift.a+drift.b*x
Nevertherless, I have some problems when I try to replicate the same system for the par line. This should show:
par=c(alfa[1]=0.001,
peak[1]=2.156460,
height[1]=1,
alfa[2]=0.001,
peak[2]=2.170150,
height[2]=1,
drift.a=0,
driftb=0)
But instead, when I collapse all strings and used the as.formula command afterwards, I got:
Error en parse(text = x) : <text>:1:15: unexpected '='
1: par=c( alfa[1]=
^
If I print the collapsed string, the character line is the one expected, so I'm thinking that it will be somehow linked to the as.formula command (i.e. it may not be the appropiated command)
When you create a named vector using c, the names must be valid variable names, or you have to wrap them in quotes.
This is OK:
c(alfa1 = 0.001)
## alfa1
## 0.001
alfa[1] is not a valid variable name – it's the first element of a variable – so you have to wrap it in quotes:
c(alfa1[1] = 0.001)
## Error: unexpected '=' in "c(alfa1[1] ="
c("alfa1[1]" = 0.001)
## alfa1[1]
## 0.001
Backquotes also work:
c(`alfa1[1]` = 0.001)
## alfa1[1]
## 0.001
See also is_valid_variable_name in the assertive package.
library(assertive)
is_valid_variable_name(c("alfa1", "alfa[1]"))
## alfa1 alfa[1]
## TRUE FALSE
You can turn your coefficient names into valid variable names using make.names:
make.names("alfa[1]")
## [1] "alfa.1."
You want to have vectors like alfa and so on to store several values, if I understand you correctly. Maybe you should try to combine this vectors in a list, this makes them more accessible.
And it should be a good idea to name this list not par.
As far as I understand your question, you have more than one peak to process. So you have data like:
peak <- c(2.31, 3.17, ...)
alfa <- c(0.001, 0.002, ...)
In this case, you could use list(peak = peak, ...) to construct a list with all your parameters and then call your function and supply appropriate objects from the list.
Or did I missed the main point of your question?

How to pass vector to integrate function

I want to integrate a function fun_integrate that has a vector vec as an input parameter:
fun_integrate <- function(x, vec) {
y <- sum(x > vec)
dnorm(x) + y
}
#Works like a charm
fun_integrate(0, rnorm(100))
integrate(fun_integrate, upper = 3, lower = -3, vec = rnorm(100))
300.9973 with absolute error < 9.3e-07
Warning message:
In x > vec :
longer object length is not a multiple of shorter object length
As far as I can see, the problem is the following: integrate calls fun_integrate for a vector of x that it computes based on upper and lower. This vectorized call seems not to work with another vector being passed as an additional argument. What I want is that integrate calls fun_integrate for each x that it computes internally and compares that single x to the vector vec and I'm pretty sure my above code doesn't do that.
I know that I could implement an integration routine myself, i.e. compute nodes between lower and upper and evaluate the function on each node separately. But that wouldn't be my preferred solution.
Also note that I checked Vectorize, but this seems to apply to a different problem, namely that the function doesn't accept a vector for x. My problem is that I want an additional vector as an argument.
integrate(Vectorize(fun_integrate,vectorize.args='x'), upper = 3, lower = -3, vec = rnorm(100),subdivisions=10000)
304.2768 with absolute error < 0.013
#testing with an easier function
test<-function(x,y) {
sum(x-y)
}
test(1,c(0,0))
[1] 2
test(1:5,c(0,0))
[1] 15
Warning message:
In x - y :
longer object length is not a multiple of shorter object length
Vectorize(test,vectorize.args='x')(1:5,c(0,0))
[1] 2 4 6 8 10
#with y=c(0,0) this is f(x)=2x and the integral easy to solve
integrate(Vectorize(test,vectorize.args='x'),1,2,y=c(0,0))
3 with absolute error < 3.3e-14 #which is correct
Roland's answer looks good. Just wanted to point out that it's sum , not integrate that is throwing the warning message.
Rgames> xf <- 1:10
Rgames> vf <- 4:20
Rgames> sum(xf>vf)
[1] 0
Warning message:
In xf > vf :
longer object length is not a multiple of shorter object length
The fact that the answer you got is not the correct value is what suggests that integrate is not sending the x-vector you expected to your function.

Resources