I'm trying to compute a kind of Gini index using a generated dataset.
But, I got a problem in the last integrate function.
If I try to integrate the function named f1,
R says
Error in integrate(Q, 0, p) : length(upper) == 1 is not TRUE
My code is
# set up parameters b>a>1 and the number of observations n
n <- 1000
a <- 2
b <- 4
# generate x and y
# where x follows beta distribution
# y = 10x+3
x <- rbeta(n,a,b)
y <- 10*x+3
# the starting point of the integration having problem
Q <- function(q) {
quantile(y,q)
}
# integrate the function Q from 0 to p
G <- function(p) {
integrate(Q,0,p)
}
# compute a function
L <- function(p) {
numer <- G(p)$value
dino <- G(1)$value
numer/dino
}
# the part having problem
d <- 3
f1 <- function(p) {
((1-p)^(d-2))*L(p)
}
integrate(f1,0,1) # In this integration, the aforementioned error appears
I think, the repeated integrate could make a problem but I have no idea what is the exact problem.
Please help me!
As mentioned by #John Coleman, integrate needs to have a vectorized function and a proper subdivisions option to fulfill the integral task. Even if you have already provided a vectorized function for integral, it is sometimes tricky to properly set the subdivisions in integrate(...,subdivisions = ).
To address your problem, I recommend integral from package pracma, where you still a vectorized function for integral (see what I have done to functions G and L), but no need to set subdivisions manually, i.e.,
library(pracma)
# set up parameters b>a>1 and the number of observations n
n <- 1000
a <- 2
b <- 4
# generate x and y
# where x follows beta distribution
# y = 10x+3
x <- rbeta(n,a,b)
y <- 10*x+3
# the starting point of the integration having problem
Q <- function(q) {
quantile(y,q)
}
# integrate the function Q from 0 to p
G <- function(p) {
integral(Q,0,p)
}
# compute a function
L <- function(p) {
numer <- Vectorize(G)(p)
dino <- G(1)
numer/dino
}
# the part having problem
d <- 3
f1 <- function(p) {
((1-p)^(d-2))*L(p)
}
res <- integral(f1,0,1)
then you will get
> res
[1] 0.1283569
The error that you reported is due to the fact that the function in integrate must be vectorized and integrate itself isn't vectorized.
From the help (?integrate):
f must accept a vector of inputs and produce a vector of function
evaluations at those points. The Vectorize function may be helpful to
convert f to this form.
Thus one "fix" is to replace your definition of f1 by:
f1 <- Vectorize(function(p) {
((1-p)^(d-2))*L(p)
})
But when I run the resulting code I always get:
Error in integrate(Q, 0, p) : maximum number of subdivisions reached
A solution might be to assemble a large number of quantiles and then smooth it out and use that rather than your Q, although the error here strikes me as odd.
Related
I am trying to write a function in R to generate n random variables from x using sample () function when x~Ge(p) (it means x has geometric distribution). In my function I would like to use a while loop.
I think my function needs two inputs as size and p. I need also a for loop in my function. What I think will work is something like a below framework for my function:
rGE <- function(size,p){
for
i<-1
while()
...
return(i)
}
I would like to develope my above function in order to generate n random variables from x (when x~Ge(p))
For a home-grown, inefficient (but comprehensible) version of rgeom, something like this should work:
my_rgeom <- function(n, p) {
x <- numeric(n) ## allocate space for the results (all zeros)
for (i in seq(n)) {
done <- FALSE
while (!done) {
x[i] <- x[i] + 1
done <- runif(1)<p
}
}
return(x)
}
I'm sure you could use sample() instead of runif() for the innermost loop, but it's not obvious to me how. One piece of advice: if you're unfamiliar with programming, try writing your proposed algorithm out as pseudocode rather than jumping in to R-bashing right away. It can be easier if you deal with the logic and the coding nuts-and-bolts separately ...
You could use rgeom:
set.seed(1)
rgeom(n = 10, p = .1)
#> [1] 6 3 23 3 24 13 15 2 20 3
I have finally written the below function:
rge<- function(n, p) {
x <- numeric(n)
for (i in seq(n)) {
j <- 0
while (j==0) {
x[i] <- x[i] + 1
j <- sum(sample(0:1,replace=TRUE,prob=c(1-p,p)))
}
}
return(x)
}
rge(10,.2)
I hope it really generates n random variables number from geometric distribution.
I am trying to estimate the below log function using maximum likelihood method in R, but I get the following error:
Error in optim(start, f, method = method, hessian = TRUE, ...) : objective function in optim evaluates to length 10 not 1
My attempt was as follows:
Generating data
set.seed(101)
n <- 10
u <- runif(n)
theta1 <- 1
lamba1 <- 0.5
Generating PTIR data using quantile function
x <- function(u, theta1, lamba1) {
(-theta1/(log((1+lamba1)-sqrt((1+lamba1)^2-(4*lamba1*u)))/(2*lamba1)))^(1/(2))
}
x <- x(u = u, theta1 = theta1, lamba1 = lamba1)
Declaring the Log-Likelihood function
LL <- function(theta, lamba) {
R = suppressWarnings((n*log(2))+
(n*log(theta))-(((2)+1)*sum(log(x)))-
(sum(theta/(x^(2))))+
(log(1+lamba-(2*lamba*exp(-theta/(x^(2)))))))
return(-R)
}
mle(LL, start = list(theta = 5, lamba=0.5))
Any advice would be greatly appreciated.
I don't know how to fix your problem, but hopefully I can help you diagnose it. As #KonradRudolph suggests in comments, This may be a case where the usual advice "add more parentheses if you're not sure" may do more harm than good ... I've rewritten your function in a way that matches what you've got above, but has fewer parentheses and more consistent line breaking/indentation. Every line below is a separate additive term. Your specific problem is that the last term involves x (which has length 10 in this case), but is not summed, so the return value ends up being a length-10 vector.
LL2 <- function(theta, lambda) {
R <- n*log(2)+
n*log(theta)-
((2)+1)*sum(log(x))-
sum(theta/(x^2))+
log(1+lambda-(2*lambda*exp(-theta/x^2)))
return(-R)
}
all.equal(LL(1,1),LL2(1,1)) ## TRUE
length(LL2(1,1)) ## 10
So im trying to solve an equation that have 3 unknown factor. I decided to use nlm.
I defined my function F that take 3 parameters that are put in a vector, and what im trying to find is the vetor X that verify the following equation :
F(X)-F(X1)-F(X2)-F(X3)=0
so I applied nlm to the LHS. but i get some weird results, instead of having a solution that make the LHS close to zero, it give solution that make the LHS converge to -infinite
Can anyone point me to the right direction.
Thank you all in advance :)
rm(list=ls())
Ta <- 30 #commun parameter
c <- 0.09 #commun parameter
Delta_T <- c( 10, 20, 30 ) #vector containing X1(1), X2(1) and X3(1)
tetha <- c( 0.9, 1.1, 1.5 ) #vector containing X1(2), X2(2) and X3(2)
t <- c( 300, 400, 100 )
N <- t/tetha #vector containing X1(3), X2(3) and X3(3)
F <- function(X){ #definition of function F
x <- X[1]
y <- X[2]
N <- X[3]
N*(min(c(y,2))/2)^1/3*x^1.9*exp(-1414/(x+Ta+273))*(1+c*(x/20)^2.1*(2/min(y,2))^1/3)
}
S <- vector("numeric",length(t)) #creation of F(X1) F(X2) and F(X3)
for (i in 1:length(t)) {
S[i]=F(c(Delta_T[i],tetha[i],N[i]))
}
Eq <- function(X){ #creation of F(X)-F(X1)-F(X2)-F(X3)
F(X)-sum(S)}
p <- c(min(Delta_T),min(tetha),min(N))
Sol = nlm(Eq,p)
EDIT : so I found the solution to the problem, instead of writing
Eq <- function(X){ #creation of F(X)-F(X1)-F(X2)-F(X3)
F(X)-sum(S)}
I applied abs() to the function Eq
Eq <- function(X){ #creation of F(X)-F(X1)-F(X2)-F(X3)
abs(F(X)-sum(S)) }
I dont get satisfying results doe, the error is close to 0 but X[2] is way bigger then 2 because of the min(2,X[2])
So i found the solution to this problem, instead of using the function nlm to solve my non linear equation, I used the function auglag from the package nloptr.
Eq <- function(X){ #creation of F(X)-F(X1)-F(X2)-F(X3)
F(X)-sum(S)}
p <- c(min(Delta_T),min(tetha),min(N))
Sol = auglag(p,Eq,hin = Eq)
auglag is a very strong non linear optimization algorithme. And the results are very satisfying as i get an error 10e-7
I'm working on a problem where a parameter is estimated through minimizing the sum of squares. The equations needed are:
I used optim in the package stats:
# provide the values for a test dataset (the y estimated should be 1.41)
pvector <- c(0.0036,0.0156,0.0204,0.0325,0.1096,0.1446,0.1843,0.4518)
zobs <- c(0.0971,0.0914,0.1629,0.1623,0.3840,0.5155,0.3648,0.6639)
# make input of the C value
c <- function(y){
gamma(y)/((gamma(y*(1-pvector)))*(gamma(y*pvector)))
}
# make input of the gamma function
F1 <- function(y){
f1 <- function(x){
c*(1-x)^(y*(1-pvector)-1)*x^(y*pvector-1)
}
return (f1)
}
# integration over x
int <- function(y){
integrate (F1(y),lower =0.001, upper =1)
}
# write the function for minimization
f2 <- function(y) {
sum ((int-zobs)^2)
}
# minimization
optim(0.01,f2, method = "Brent", lower =0, upper = 1000, hessian=TRUE)
Which didn't work. I received the following error message:
Error in int - zobs : non-numeric argument to binary operator
I think there must be something fundamentally wrong with the way how the function was written.
I am trying to construct a new variable, z, using two pre-existing variables - x and y. Suppose for simplicity that there are only 5 observations (corresponding to 5 time periods) and that x=c(5,7,9,10,14) and y=c(0,2,1,2,3). I’m really only using the first observation in x as the initial value, and then constructing the new variable z using depreciated values of x[1] (depreciation rate of 0.05 per annum) and each of the observations over time in the vector, y. The variable I am constructing takes the form of a new 5 by 1 vector, z, and it can be obtained using the following simple commands in R:
z=NULL
for(i in 1:length(x)){n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))}
The problem I am having is that I need to define this operation as a function. That is, I need to create a function f that will spit out the vector z whenever any arbitrary vectors x and y are plugged into the function, f(x,y). I’ve been going around in circles for days now and I was wondering if someone would be kind enough to provide me with a suggestion about how to proceed. Thanks in advance.
I hope following will work for you...
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getZ = function(x,y){
z = NULL
for(i in 1:length(x)){
n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))
}
return = z
}
z = getZ(x,y)
z
5.000000 6.750000 7.412500 9.041875 11.589781
This will allow .05 (or any other value) passed in as r.
ConstructZ <- function(x, y, r){
n <- length(y)
d <- 1 - r
Z <- vector(length = n)
for(i in seq_along(x)){
n = seq_len(i)
Z[i] = sum(c(d^(i-1)*x[1],d^(i-n)*y[n]))
}
return(Z)
}
Here is a cool (if I say so myself) way to implement this as an infix operator (since you called it an operation).
ff = function (x, y, i) {
n = seq.int(i)
sum(c(0.95 ^ (i - 1) * x[[1]],
0.95 ^ (i - n) * y[n]))
}
`%dep%` = function (x, y) sapply(seq_along(x), ff, x=x, y=y)
x %dep% y
[1] 5.000000 6.750000 7.412500 9.041875 11.589781
Doing the loop multiple times and recalculating the exponents every time may be inefficient. Here's another way to implement your calculation
getval <- function(x,y,lambda=.95) {
n <- length(y)
pp <- lambda^(1:n-1)
yy <- sapply(1:n, function(i) {
sum(y * c(pp[i:1], rep.int(0, n-i)))
})
pp*x[1] + yy
}
Testing with #vrajs5's sample data
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getval(x,y)
# [1] 5.000000 6.750000 7.412500 9.041875 11.589781
but appears to be about 10x faster when testing on larger data such as
set.seed(15)
x <- rpois(200,20)
y <- rpois(200,20)
I'm not sure of how often you will run this or on what size of data so perhaps efficiency isn't a concern for you. I guess readability is often more important long-term for maintenance.