R summation within integration - r

I'm trying to figure out how to integrate the following function in R:
item.fill.rate <- function(x, lt, ib, S){
1-((((1/(factorial(S)))*((x*lt*ib)^S)))/
(sum(((1/(factorial(0:S)))*((x*lt*ib)^(0:S))))))}
Where x is a variable and lt, ib and S are input parameters
Based on a previous topic on here, I tried the following:
int.func <- function(lt, ib, S){
item.fill.rate <- function(x){
1-((((1/(factorial(S)))*((x*lt*ib)^S)))/(sum(((1/(factorial(0:S)))*((x*lt*ib)^(0:S))))))
}
return(item.fill.rate)
}
integrate(int.func(0.25, 1, 1), lower=0.25, upper=0.75)$value
When applying this, I get the following error:
> integrate(int.func(0.25, 1, 1), lower=0.25, upper=0.75)$value
[1] 0.4947184
Warning messages:
1: In (x * lt * ib)^(0:S) :
longer object length is not a multiple of shorter object length
2: In (1/(factorial(0:S))) * ((x * lt * ib)^(0:S)) :
longer object length is not a multiple of shorter object length
I evaluated the length of those objects, but that did not give me any indication where the error must be.
I tried to be as specific as possible, so hopefully someone is able to help me out with this!

The sum function is notorious for returning single items when a longer vector was expected, so integrand functions that have a call to sum generally need to be "vectorized" so they deliver the expected results (a vector of the the same length as a provided "x"-vector) for integrate to succeed. The Vectorize function is a wrapper for sapply and is quite handy for this process. You can set the parameters in the call to integrate. (At the moment I think you may be integrating a constant over a domain of length 1/2.)
item.fill.rate <- function(x,lt, ib, S){
1-((((1/(factorial(S)))*((x*lt*ib)^S)))/(sum(((1/(factorial(0:S)))*((x*lt*ib)^(0:S))))))
}
vint <- Vectorize(item.fill.rate)
integrate(vint, S=1, lt=0.25, ib= 1, lower=0.25, upper=0.75)$value
#[1] 0.4449025

Related

an error in integrating a function in R

The following code chunk is for defining and integrating a function f1 involving matrix exponentials.
library(expm)
Lambdahat=rbind(c(-0.57,0.21,0.36,0,0),
c(0,-7.02,7.02,0,0),
c(1,0,-37.02,29,7.02),
c(0.03,0,0,-0.25,0.22),
c(0,0,0,0,0));
B=rbind(c(-1,1,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0),c(0,0,0,0,0))
f1<-function(tau1)
{
A=(expm(Lambdahat*tau1)%*%B%*%expm(Lambdahat*(5-tau1)));
return(A[1,5]);
}
out=integrate(f1,lower=0,upper=5)#integration of f1
The integration in the above line gives the following error:
Error in integrate(f1, lower = 0, upper = 5) :
evaluation of function gave a result of wrong length
In addition: Warning messages:
1: In Lambdahat * tau1 :
longer object length is not a multiple of shorter object length
2: In Lambdahat * (t[i] - tau1) :
longer object length is not a multiple of shorter object length
To check for if the function outputs and inputs are of function f1 different length, 10 evenly spaced inputs and corresponding outputs of f1 are reported below. Input and output length for all the test cases were recorded as equal to 1.
sapply(X=seq(from=0,to=5,by=0.5),FUN=f1)
[1] 2.107718e-01 1.441219e-01 0.000000e+00 2.023337e+06 1.709569e+14
[6] 1.452972e+22 1.243012e+30 1.071096e+38 9.302178e+45 8.146598e+53
[11] 7.197606e+61
If anyone could share any hint or directions where the code may be going erroneous, it would be very helpful. Thanks very much!
The problem is that the function passed to integrate need to be vectorized, i.e. it should be able to receive a vector of input values and to return a vector of output values. I think f1 <- Vectorize(f1) could solve your problem.

simulation a while loop

there might be some threads on while loops but I am struggling with them. It would be great if someone could help an R beginner out.
So I am trying to do 10000 simulations from a an out of sample regression forecast using the forecast parameters: mean, sd. Thankfully, my data is normal.
This is what I have
N<-10000
i<-1:N
k<-vector(,N)
while(i<N+1){k(,i)=vector(,rnorm(N,mean=.004546,sd=.00464163))}
...and I get this error
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
In addition: Warning message:
In while (i < N + 1) { : the condition has length > 1 and only the first element will be used
I can't seem to get my head around it.
No reason to create a loop here. If you want to put 10000 samples, normal distributed around mean = 0.004546 and sd = 0.00464163 into vector k, just do:
k <- rnorm(10000,mean = 0.004546, sd = 0.00464163)
try this
N<-10
i<-1
k<-matrix(0,1,N)
while(i<N+1){k[i]=rnorm(1,mean=.004546,sd=.00464163)
i=i+1
}
print(k)
To solve your problem, use #Esben Friis' answer. You are taking a hard approach to an easy problem.
To adress the questions you had about the error messages you got however:
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
This is the wrong way to go as vector() will produce a vector of a set length instead of a set of values. You are thinking about the as.vector() function:
as.vector(rnorm(5000, mean = 0.004546, sd = 0.00464163))
This is however not needed as this will only create a new vector of your values, which are already in a vector structure of the type double. Using this function will therefore not change anything.
It is best to simply use:
rnorm(5000, mean=0.004546, sd=0.00464163)
Further:
In addition: Warning message:
In while(i<N+1){: the condition has length>1 and only the first element will be used
This warning stems from i being a vector 1:N with a length larger than 1. The warning states that only the first index in i will be recycled (used in all instances of the loop) which is the same as doing i[1] .
while(i<N+1){ }
#is the same as
while(i[1]<N+1){ }
Instead you want to loop a new value to N. Furthermore you can use the <= (less or equal to) operator instead of doing <N+1 .
while(newVal<=N){ }
This method will bring up new problems which could be solved by using a for() loop instead, but that is however out of the scope of the question and really not the right approach to your problem, as stated in the beginning. Hope you learned something and good luck!

R: passing by parameter to function and using apply instead of nested loop and recursive indexing failed

I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.

vectorize a bidimensional function in R

I have a some true and predicted labels
truth <- factor(c("+","+","-","+","+","-","-","-","-","-"))
pred <- factor(c("+","+","-","-","+","+","-","-","+","-"))
and I would like to build the confusion matrix.
I have a function that works on unary elements
f <- function(x,y){ sum(y==pred[truth == x])}
however, when I apply it to the outer product, to build the matrix, R seems unhappy.
outer(levels(truth), levels(truth), f)
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]
What is the recommended strategy for this in R ?
I can always go through higher order stuff, but that seems clumsy.
I sometimes fail to understand where outer goes wrong, too. For this task I would have used the table function:
> table(truth,pred) # arguably a lot less clumsy than your effort.
pred
truth - +
- 4 2
+ 1 3
In this case, you are test whether a multivalued vector is "==" to a scalar.
outer assumes that the function passed to FUN can take vector arguments and work properly with them. If m and n are the lengths of the two vectors passed to outer, it will first create two vectors of length m*n such that every combination of inputs occurs, and pass these as the two new vectors to FUN. To this, outer expects, that FUN will return another vector of length m*n
The function described in your example doesn't really do this. In fact, it doesn't handle vectors correctly at all.
One way is to define another function that can handle vector inputs properly, or alternatively, if your program actually requires a simple matching, you could use table() as in #DWin 's answer
If you're redefining your function, outer is expecting a function that will be run for inputs:
f(c("+","+","-","-"), c("+","-","+","-"))
and per your example, ought to return,
c(3,1,2,4)
There is also the small matter of decoding the actual meaning of the error:
Again, if m and n are the lengths of the two vectors passed to outer, it will first create a vector of length m*n, and then reshapes it using (basically)
dim(output) = c(m,n)
This is the line that gives an error, because outer is trying to shape the output into a 2x2 matrix (total 2*2 = 4 items) while the function f, assuming no vectorization, has given only 1 output. Hence,
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]

How to pass vector to integrate function

I want to integrate a function fun_integrate that has a vector vec as an input parameter:
fun_integrate <- function(x, vec) {
y <- sum(x > vec)
dnorm(x) + y
}
#Works like a charm
fun_integrate(0, rnorm(100))
integrate(fun_integrate, upper = 3, lower = -3, vec = rnorm(100))
300.9973 with absolute error < 9.3e-07
Warning message:
In x > vec :
longer object length is not a multiple of shorter object length
As far as I can see, the problem is the following: integrate calls fun_integrate for a vector of x that it computes based on upper and lower. This vectorized call seems not to work with another vector being passed as an additional argument. What I want is that integrate calls fun_integrate for each x that it computes internally and compares that single x to the vector vec and I'm pretty sure my above code doesn't do that.
I know that I could implement an integration routine myself, i.e. compute nodes between lower and upper and evaluate the function on each node separately. But that wouldn't be my preferred solution.
Also note that I checked Vectorize, but this seems to apply to a different problem, namely that the function doesn't accept a vector for x. My problem is that I want an additional vector as an argument.
integrate(Vectorize(fun_integrate,vectorize.args='x'), upper = 3, lower = -3, vec = rnorm(100),subdivisions=10000)
304.2768 with absolute error < 0.013
#testing with an easier function
test<-function(x,y) {
sum(x-y)
}
test(1,c(0,0))
[1] 2
test(1:5,c(0,0))
[1] 15
Warning message:
In x - y :
longer object length is not a multiple of shorter object length
Vectorize(test,vectorize.args='x')(1:5,c(0,0))
[1] 2 4 6 8 10
#with y=c(0,0) this is f(x)=2x and the integral easy to solve
integrate(Vectorize(test,vectorize.args='x'),1,2,y=c(0,0))
3 with absolute error < 3.3e-14 #which is correct
Roland's answer looks good. Just wanted to point out that it's sum , not integrate that is throwing the warning message.
Rgames> xf <- 1:10
Rgames> vf <- 4:20
Rgames> sum(xf>vf)
[1] 0
Warning message:
In xf > vf :
longer object length is not a multiple of shorter object length
The fact that the answer you got is not the correct value is what suggests that integrate is not sending the x-vector you expected to your function.

Resources