What are the correct Conditions to test for Weighted Die? - r

I was commanded with the following question. Write a function, named pdice, to simulate a weighted die that has probability of landing on 1:6 of (p1,p2,p3,p4,p5,p6) respectively. You can simulate this by using the parameter prob = p in the sample function, where p is a vector of non-negative numbers of length 6 (at least one needs to be >0). It will use p/sum(p) as the probabilities. This function has arguments p, n. Check ?sample to find out what conditions you need to check for p. Your function should generate an error message if these conditions are not met. Below is my code thus far, which runs, though giving back a warning about a coercing error of double to logical.
pdice <- function(n, p){
weightofDies <- c(1/40, rep(4/40,4), 23/40)
roll <- sample(1:6, size = n, replace = TRUE, prob = weightofDies)
if(n>0 && all(p=1)) {
return(roll)
}
else {
print("Error, Conditions Not Met")
}
}
I'm confused when the question says use parameter prob = p, then defines p as a vector of non-negative numbers of length 6. How can a probability be defined as a vector? Thus when it came to the conditions my brief understanding made sure the number of rolls (n) was greater than zero. And with p I just went on ahead to make sure the probabilities of this "Vector" added up to one. However not sure if my process thus far is correct. I created my biased probabilities via weightofDies.

Related

add second independent variable to ode in R

I am using a pharmacokinetic model to fit concentrations to different exposure times.
Additionally to the independent time variable, I would like to add a vector with data for the variable cw, so at t=1 I have the value cw=41 and at t=2 the value cw=17 and so on.
How do I modify the ode, so this actually works? I can only get it to run with one value for cw.
fun<-function(time, y, parms, ...){
with(as.list(c(parms.atl, y)), {
dCp<-(k1*cw-k2*cp-k3*cp+k4*cs)
dCs<-(k3*cp-k4*cs)
list(c(dCp, dCs))
})
}
cw<-100
y0<-c(cp=0,
cs=0)
time<-seq(0:56)
parms<-c(k1=200, k2=0.1, k3=1, k4=0.1)
ode(y0, times, fun, parms)
The code works the way it is written above, now I would like to exchange cw with a vector
set.seed(42)
cw<-.Random.seed[1:57]
At the moment I get the following error, if cw is a vector instead of a number:
The number of derivatives returned by func() (58) must equal the length of the initial conditions vector (2)

Optimize within for loop cannot find function

I've got a function, KozakTaper, that returns the diameter of a tree trunk at a given height (DHT). There's no algebraic way to rearrange the original taper equation to return DHT at a given diameter (4 inches, for my purposes)...enter R! (using 3.4.3 on Windows 10)
My approach was to use a for loop to iterate likely values of DHT (25-100% of total tree height, HT), and then use optimize to choose the one that returns a diameter closest to 4". Too bad I get the error message Error in f(arg, ...) : could not find function "f".
Here's a shortened definition of KozakTaper along with my best attempt so far.
KozakTaper=function(Bark,SPP,DHT,DBH,HT,Planted){
if(Bark=='ob' & SPP=='AB'){
a0_tap=1.0693567631
a1_tap=0.9975021951
a2_tap=-0.01282775
b1_tap=0.3921013594
b2_tap=-1.054622304
b3_tap=0.7758393514
b4_tap=4.1034897617
b5_tap=0.1185960455
b6_tap=-1.080697381
b7_tap=0}
else if(Bark=='ob' & SPP=='RS'){
a0_tap=0.8758
a1_tap=0.992
a2_tap=0.0633
b1_tap=0.4128
b2_tap=-0.6877
b3_tap=0.4413
b4_tap=1.1818
b5_tap=0.1131
b6_tap=-0.4356
b7_tap=0.1042}
else{
a0_tap=1.1263776728
a1_tap=0.9485083275
a2_tap=0.0371321602
b1_tap=0.7662525552
b2_tap=-0.028147685
b3_tap=0.2334044323
b4_tap=4.8569609081
b5_tap=0.0753180483
b6_tap=-0.205052535
b7_tap=0}
p = 1.3/HT
z = DHT/HT
Xi = (1 - z^(1/3))/(1 - p^(1/3))
Qi = 1 - z^(1/3)
y = (a0_tap * (DBH^a1_tap) * (HT^a2_tap)) * Xi^(b1_tap * z^4 + b2_tap * (exp(-DBH/HT)) +
b3_tap * Xi^0.1 + b4_tap * (1/DBH) + b5_tap * HT^Qi + b6_tap * Xi + b7_tap*Planted)
return(y=round(y,4))}
HT <- .3048*85 #converting from english to metric (sorry, it's forestry)
for (i in c((HT*.25):(HT+1))) {
d <- KozakTaper(Bark='ob',SPP='RS',DHT=i,DBH=2.54*19,HT=.3048*85,Planted=0)
frame <- na.omit(d)
optimize(f=abs(10.16-d), interval=frame, lower=1, upper=90,
maximum = FALSE,
tol = .Machine$double.eps^0.25)
}
Eventually I would like this code to iterate through a csv and return i for the best d, which will require some rearranging, but I figured I should make it work for one tree first.
When I print d I get multiple values, so it is iterating through i, but it gets held up at the optimize function.
Defining frame was my most recent tactic, because d returns one NaN at the end, but it may not be the best input for interval. I've tried interval=c((HT*.25):(HT+1)), defining KozakTaper within the for loop, and defining f prior to the optimize, but I get the same error. Suggestions for what part I should target (or other approaches) are appreciated!
-KB
Forestry Research Fellow, Appalachian Mountain Club.
MS, University of Maine
**Edit with a follow-up question:
I'm now trying to run this script for each row of a csv, "Input." The row contains the values for KozakTaper, and I've called them with this:
Input=read.csv...
Input$Opt=0
o <- optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='Input$Species',
DHT=x,
DBH=(2.54*Input$DBH),
HT=(.3048*Input$Ht),
Planted=0)),
lower=Input$Ht*.25, upper=Input$Ht+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
Input$Opt <- o$minimum
Input$Mht <- Input$Opt/.3048. # converting back to English
Input$Ht and Input$DBH are numeric; Input$Species is factor.
However, I get the error invalid function value in 'optimize'. I get it whether I define "o" or just run optimize. Oddly, when I don't call values from the row but instead use the code from the answer, it tells me object 'HT' not found. I have the awful feeling this is due to some obvious/careless error on my part, but I'm not finding posts about this error with optimize. If you notice what I've done wrong, your explanation will be appreciated!
I'm not an expert on optimize, but I see three issues: 1) your call to KozakTaper does not iterate through the range you specify in the loop. 2) KozakTaper returns a a single number not a vector. 3) You haven't given optimize a function but an expression.
So what is happening is that you are not giving optimize anything to iterate over.
All you should need is this:
optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
SPP='RS',
DHT=x,
DBH=2.54*19,
HT=.3048*85,
Planted=0)),
lower=HT*.25, upper=HT+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
$minimum
[1] 22.67713 ##Hopefully this is the right answer
$objective
[1] 0
Optimize will now substitute x in from lower to higher, trying to minimize the difference

Inverse of matrix and numerical integration in R

in R I try to
1) get a general form of an inverse of a matrix (I mean a matrix with parameters instead of specific numbers),
2) then use this to compute an integral.
I mean, I've got a P matrix with a parameter theta, I need to add and subtract something, then take an inverse of this and multiply it by a vector so that I am given a vector pil. From the vector pil I take term by term and multiply it by a function with again the parameter theta and the result must be integrated from 0 to infinity.
I tried this, but it didn't work because I know the result should be pst=
(0.3021034 0.0645126 0.6333840)
c<-0.1
g<-0.15
integrand1 <- function(theta) {
pil1 <- function(theta) {
P<-matrix(c(
1-exp(-theta), 1-exp(-theta),1-exp(-theta),exp(-theta),0,0,0,exp(-theta),exp(-theta)
),3,3);
pil<-(rep(1,3))%*%solve(diag(1,3)-P+matrix(1,3,3));
return(pil[[1]])
}
q<-pil1(theta)*(c^g/gamma(g)*theta^(g-1)*exp(-c*theta))
return(q)}
(pst1<-integrate(integrand1, lower = 0, upper = Inf)$value)
#0.4144018
This was just for the first term of the vector pst, because when I didn't know how to a for cycle for this.
Please, do you have any idea why it won't work and how to make it work?
Functions used in integrate should be vectorized as stated in the help.
At the end of your code add this
integrand2 <- Vectorize(integrand1)
integrate(integrand2, lower = 0, upper = Inf)$value
#[1] 0.3021034
The result is the first element of your expected result.
You will have to present more information about the input to get your expected vector.

Optimizing from a function and a matrix containing -infinities. Using R, optim()

(If anyone has a suggestion for a better title, please let me know.)
I am trying to write a backward induction optimization problem. (That might not be important, but if it helps, great.)
I have a function that is a function of two variables, x and y.
I have a matrix for which I know only the terminal column, and each column needs to be solves backwards using the last column and optimization over x and y.
For example
m.state=matrix(1:16,16,1)
m.valuemat=matrix(0,16,5)
# five is number of periods
#16 is num of states (rows)
##Suppose i want to make optim avoid chosing a configuration that lands us in states 1-5 at the end
m.valuemat[1:5,5]=-Inf
f.foo0=function(x,y){
util=2*x^2-y^1.5
return(util)
}
foo=function(x,y,a){
footomorrow=function(x,y,a){
at1=-x+2*y+a
atround=abs(m.state-at1)
round2=m.state[which(min(atround)==atround)]
at1=round2
Vtp1=m.valuemat[which(m.state==at1),(5+1)]
return(Vtp1)
}
valuetoday=f.foo0(x,y)+.9*footomorrow(x,y,a)
return(valuetoday)
}
# I know the final column should be all 0's
for(i in 1:4){
print(i)
i=5-i
for(j in 1:16){
tempfunction=function(x){
foo(x[1],x[2],m.state[j])
}
result=optim(c(.001,1), tempfunction, gr = NULL, method = "L-BFGS-B",
lower = c(0.001,0.001), upper = c(5,1),
control = list(fnscale=-1,
maxit=50000), hessian = FALSE)
m.valuemat[j,i]=result$value
print( m.valuemat)
}
}
The error you get is: Error in optim(c(0.001, 1), f.Vt.ext, gr = NULL, method = "L-BFGS-B", :
L-BFGS-B needs finite values of 'fn'.
Is there a way to make optim smarter about this? Or a condition I can put or something? This is obviously a simplified version of my real code.

cost function in cv.glm of boot library in R

I am trying to use the crossvalidation cv.glm function from the boot library in R to determine the number of misclassifications when a glm logistic regression is applied.
The function has the following signature:
cv.glm(data, glmfit, cost, K)
with the first two denoting the data and model and K specifies the k-fold.
My problem is the cost parameter which is defined as:
cost: A function of two vector arguments specifying the cost function
for the crossvalidation. The first argument to cost should correspond
to the observed responses and the second argument should correspond to
the predicted or fitted responses from the generalized linear model.
cost must return a non-negative scalar value. The default is the
average squared error function.
I guess for classification it would make sense to have a function which returns the rate of misclassification something like:
nrow(subset(data, (predict >= 0.5 & data$response == "no") |
(predict < 0.5 & data$response == "yes")))
which is of course not even syntactically correct.
Unfortunately, my limited R knowledge let me waste hours and I was wondering if someone could point me in the correct direction.
It sounds like you might do well to just use the cost function (i.e. the one named cost) defined further down in the "Examples" section of ?cv.glm. Quoting from that section:
# [...] Since the response is a binary variable an
# appropriate cost function is
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
This does essentially what you were trying to do with your example. Replacing your "no" and "yes" with 0 and 1, lets say you have two vectors, predict and response. Then cost() is nicely designed to take them and return the mean classification rate:
## Simulate some reasonable data
set.seed(1)
predict <- seq(0.1, 0.9, by=0.1)
response <- rbinom(n=length(predict), prob=predict, size=1)
response
# [1] 0 0 0 1 0 0 0 1 1
## Demonstrate the function 'cost()' in action
cost(response, predict)
# [1] 0.3333333 ## Which is right, as 3/9 elements (4, 6, & 7) are misclassified
## (assuming you use 0.5 as the cutoff for your predictions).
I'm guessing the trickiest bit of this will be just getting your mind fully wrapped around the idea of passing a function in as an argument. (At least that was for me, for the longest time, the hardest part of using the boot package, which requires that move in a fair number of places.)
Added on 2016-03-22:
The function cost(), given above is in my opinion unnecessarily obfuscated; the following alternative does exactly the same thing but in a more expressive way:
cost <- function(r, pi = 0) {
mean((pi < 0.5) & r==1 | (pi > 0.5) & r==0)
}
I will try to explain the cost function in simple words. Let's take
cv.glm(data, glmfit, cost, K) arguments step by step:
data
The data consists of many observations. Think of it like series of numbers or even.
glmfit
It is generalized linear model, which runs on the above series. But there is a catch it splits data into several parts equal to K. And runs glmfit on each of them separately (test set), taking the rest of them as training set. The output of glmfit is a series consisting of same number of elements as the split input passed.
cost
Cost Function. It takes two arguments first the split input series(test set), and second the output of glmfit on the test input. The default is mean square error function.
.
It sums the square of difference between observed data point and predicted data point. Inside the function a loop runs over the test set (output and input should have same number of elements) calculates difference, squares it and adds to output variable.
K
The number to which the input should be split. Default gives leave one out cross validation.
Judging from your cost function description. Your input(x) would be a set of numbers between 0 and 1 (0-0.5 = no and 0.5-1 = yes) and output(y) is 'yes' or 'no'. So error(e) between observation(x) and prediction(y) would be :
cost<- function(x, y){
e=0
for (i in 1:length(x)){
if(x[i]>0.5)
{
if( y[i]=='yes') {e=0}
else {e=x[i]-0.5}
}else
{
if( y[i]=='no') {e=0}
else {e=0.5-x[i]}
}
e=e*e #square error
}
e=e/i #mean square error
return (e)
}
Sources : http://www.cs.cmu.edu/~schneide/tut5/node42.html
The cost function can optionally be defined if there is one you prefer over the default average squared error. If you wanted to do so then the you would write a function that returns the cost you want to minimize using two inputs: (1) the vector of known labels that you are predicting, and (2) the vector of predicted probabilities from your model for those corresponding labels. So for the cost function that (I think) you described in your post you are looking for a function that will return the average number of accurate classifications which would look something like this:
cost <- function(labels,pred){
mean(labels==ifelse(pred > 0.5, 1, 0))
}
With that function defined you can then pass it into your glm.cv() call. Although I wouldn't recommend using your own cost function over the default one unless you have reason to. Your example isn't reproducible, so here is another example:
> library(boot)
>
> cost <- function(labels,pred){
+ mean(labels==ifelse(pred > 0.5, 1, 0))
+ }
>
> #make model
> nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
> #run cv with your cost function
> (nodal.glm.err <- cv.glm(nodal, nodal.glm, cost, nrow(nodal)))
$call
cv.glm(data = nodal, glmfit = nodal.glm, cost = cost, K = nrow(nodal))
$K
[1] 53
$delta
[1] 0.8113208 0.8113208
$seed
[1] 403 213 -2068233650 1849869992 -1836368725 -1035813431 1075589592 -782251898
...
The cost function defined in the example for cv.glm clearly assumes that the predictions are probabilities, which would require the type="response" argument in the predict function. The documentation from library(boot) should state this explicitly. I would otherwise be forced to assume that the default type="link" is used inside the cv.glm function, in which case the cost function would not work as intended.

Resources