R - Random number generation in ggplot and Shiny - r

I am building a plot of Net Present Value (NPV), using FinCal package, and its odds. For the NPV, the cash-flows are simulated using a triangular distribution for sales, normal distribution for costs and so on. So, here is a snippet of what I am doing:
npvCdf <- function(n) {
N <- sort(n)
P <- ecdf(N)
return(P)
}
makePlot <- function(C, m) {
N <- m$NPV / C$MILLION
P <- npvCdf(N)
#
# NPV distribution curve
n <- sort(N)
p <- P(n) * 100
df <- data.frame(npv = n, odds = p)
#
# Points of interest
o <- C$NPV_BREAK_EVEN_WORST_ODDS
q <- round((quantile(n, o)), C$DIGITS)
e <- C$NPV_BREAK_EVEN_VALUE
b <- P(e) * 100 # THIS IS THE ERROR I CANT FIGURE OUT
w <- o * 100
s <- getBreakEven(C, m)
#
# Labels
npvOdds <- paste("Odds of break-even : ", b, "%")
salesThresh <- paste("Sales threshold : ", s)
worstCase <-
paste("Worst case (# 5% odds) : ", q, "million")
#
# Make plot
#
g <- ggplot(df, aes(x = npv, y = odds)) +
geom_line(colour = "blue") +
labs(title = "NPV and Odds") +
labs(x = "NPV (million)") +
labs(y = "Percent (%)") +
geom_vline(xintercept = e,
colour = "red",
linetype = "longdash") +
geom_hline(yintercept = b,
colour = "green",
linetype = "longdash") +
geom_vline(xintercept = q,
colour = "green",
linetype = "dotdash") +
geom_hline(yintercept = w,
colour = "red",
linetype = "dotdash")
The C is a data frame of all the constants that are used for calculations of cash-flows, NPV calculations, etc. For example, C$MILLION=1000000 used to divide NPV for simpler representation. The m is a data-frame of sales, cash-flows and NPV per simulation. The simulations are used for cash-flows (triangular distribution), variable cost (normal distribution) and so on.
And, here is the Shiny code that uses the above snippet.
library(shiny)
source("../npd-c.R")
# Define server logic
shinyServer(function(input, output) {
output$npdPlot <- renderPlot({
C <- data.frame(2017,5000,1000000,3,100,500000,0.0,0.05,0.1,
input$salesRange[1],
input$salesRange[2],
input$salesMode,
input$demDeclMean,
input$demDeclSd,
input$varCostMean,
input$varCostSd,
input$fixedCostRange[1],
input$fixedCostRange[2]
)
names(C) <-
c(
"SEED",
"ITERATIONS",
"MILLION",
"DIGITS",
"PRICE",
"OUTLAY",
"NPV_BREAK_EVEN_VALUE",
"NPV_BREAK_EVEN_WORST_ODDS",
"HURDLE_RATE",
"SALES_TRIANG_MIN",
"SALES_TRIANG_MAX",
"SALES_TRIANG_MODE",
"DEM_DECL_FACTOR_MEAN",
"DEM_DECL_FACTOR_SD",
"VAR_COST_RATE_MEAN",
"VAR_COST_RATE_SD",
"FIX_COST_RATE_MIN",
"FIX_COST_RATE_MAX"
)
n <- npd(C,-1)
g <- makePlot(C,n)
g
})
})
The problem is as follows.
The same code when run in R, I get the plot right in terms of the NPV curve, horizontal and vertical lines. Whereas, when run as a Shiny application, the horizontal and vertical lines are hugely displaced. This is despite, hiving of the NPV and cash-flows code into a separate .R file and setting the same seed value for both the Shiny and non-shiny versions. For example, P(0)=40.07 without Shiny and P(0)=4.7 with Shiny application.
What am I missing?

First of all, let me say this is pretty useful code. It is a nice represenation of a monte-carlo simulation using NPV and I like the plots. It is a post I am pretty sure I will refer back to.
I think I see where the problem is though, it is basically more a matter of mis-interpretation and one small programming error.
The stated problem is that these plots are not showing the same results although they should be. The blue ecdf-NPV curves do look at first glance to be the same:
Shiny version:
Stand alone version:
However if you look carefully, you will see that in fact they are not the same, the expected NPV value (50 percent) in the first case is about 1.5 million, whereas it is only about 0.2 in the second case.
The curves look the same, but they are not. The other point is that there is an error in one of the calculations further confusing things. The "Odds of break-even" are incorrectly calculated and are actually the "Odds of losing money".
The correct calculation should be:
b <- (1-P(e)) * 100
And the correct odds of breaking even in the first case would be around 60%, and in the second case around 95%, which matches up with the expected NPV as well.

Related

Is there a way to test a range of exponents in a lm() model in the same way as the code below more efficiently?

The basic gist is that I have a set of housing data that I need to create a model for to minimize the predicted price vs actual price of house based on the dataset. So I created this bit of code to essentially test for a range of different numerators and find the one that minimized the difference between them. I'm using the median instead of the mean as the data isn't exactly normal.
Since I only have experience with lm(), I'm using that to create the coefficients and C values. But since the model likes exponents, I have to also test various exponents. It does this for each of the variables and then goes back to the first and re-evaluates it based on the other exponents. The model starts out with all the exponents ending up equal to 1. So the same as the basic linear model. I know that this is probably horribly inefficient and probably uses a lot of code in a somewhat wasteful, but I'm in my first r class so sorry about the mess and/or convoluted coding logic.
Is there any way to do this same thing but being more efficient. Also, I can't really decrease the number of variables as the model likes having more variables and produces a greater margin of error when they aren't present.
w <- seq(1,10000,1)
r <- seq(1,10000,1)
t <- seq(1,10000,1)
z <- seq(1,10000,1)
s <- seq(1,10000,1)
coef_1 <- c(6000,6000,6000,6000,6000,6000,6000,6000)
v <- rep(6000, each = 8)
for(l_1 in 1:10){
for(t_1 in 1:8){
for(i in 1:10000){
t = t_1
coef_1[t] = i
mod5 <- lm(log(SALE_PRC) ~ I(TOT_LVG_AREA^((coef_1[1]-5000)/1000)) + I(LND_SQFOOT^((coef_1[2]-5000)/1000)) + I(RAIL_DIST^((coef_1[3]-5000)/1000)) + I(OCEAN_DIST^((coef_1[4]-5000)/1000)) + I(CNTR_DIST^((coef_1[5]-5000)/1000)) + I(HWY_DIST^((coef_1[6]-5000)/1000)) + I(structure_quality^((coef_1[7]-5000)/1000)) + SUBCNTR_DI + SPEC_FEAT_VAL + (exp(((coef_1[8]-5000)/1000)*SPECIAL_RATIO)) + age, data = kaggle_transform_final)
kaggle_new <- kaggle_transform_final %>%
add_predictions(model = mod5, var = "prediction") %>%
mutate(new_predict = exp(prediction)) %>%
mutate(new_difference = abs((new_predict-SALE_PRC))/SALE_PRC) %>%
mutate(average_percent_difference = median(new_difference)) %>%
mutate(mean_percent_difference = mean(new_difference)) %>%
mutate(quart_75 = quantile(new_difference,.75))
w[i] = kaggle_new$average_percent_difference[1]
r[i] = kaggle_new$mean_percent_difference[1]
t[i] = kaggle_new$quart_75[1]
z[i] = i
s[i] = (i-5000)/1000
if(i%%100 ==0){show(i)}
}
u <- data.frame(median_diff = w, mean_diff = r, quart_75 = t, actual = s, number = z) %>%
arrange(median_diff)
coef_1[t_1] <- u$number[1]
v[t_1] <- u$actual[1]
show(coef_1)
}
coef_1 <- coef_1
}

Heat diffusion a ring with a temperature activated flame using deSolve::ode in R

I'm trying to model a ring that is heated at one point if the temperature goes below a certain value. Here's my R code:
library(deSolve)
library(dplyr)
library(ggplot2)
library(tidyr)
local({
heatT <- 100
v <- c(rep(1, 49), heatT, rep(1, 50))
alpha <- .02
fun <- function(t, v, pars) {
L <- length(v)
d2T <- c(v[2:L], v[1]) + c(v[L], v[1:(L - 1)]) - 2 * v
dt <- pars * d2T
# Uncomment to trigger the problem
#if (v[50] < 25) dt[50] <- 100 - v[50]
return(list(dt - .005 * (v - 1)))
}
ode(v, 1:200, fun, parms = alpha)
}) %>% as.data.frame() %>%
pivot_longer(-time, values_to = "val", names_to = "x") %>%
filter(time %in% round(seq.int(1, 200, length.out = 40))) %>%
ggplot(aes(as.numeric(x), val)) +
geom_line(alpha = .5, show.legend = FALSE) +
geom_point(aes(color = val)) +
scale_color_gradient(low = "#56B1F7", high = "red") +
facet_wrap(~ time) +
theme_minimal() +
scale_y_continuous(limits = c(0, 100)) +
labs(x = 'x', y = 'T', color = 'T')
The line: if (v[50] < 25) dt[50] <- 100 - v[50] tells the model to increase temperature on segment 50 if it goes below 25°.
If this line is commented the model works fine. If the line is active the model fails (asking to increase maxsteps) as soon 25° are reached (it still outputs the results until that point).
The model can run successfully if the solving method is switched to "ode45", but then is very slow, or if switched to an explicit method like "euler" but then it works only until alpha is low enough.
Is there a correct way to implement this in order to run it fast with the default implicit methods or it is simply something that ode cannot manage?
It seems that the if-line makes the model very stiff. This is not surprising as ODEs are continuous and differentiable by definition. It is not uncommon that this is violated in practical cases but the solvers are, fortunately, quite robust. However, it is always possible to "drive the solvers against a wall", that seems to be the case here. There are several possibilities ins such cases: tune the tolerances, make the signal a little bit smoother by using a less rectangular signal with rounded edges, change the grid. Sometimes, a more robust solver will do. The default lsoda is fine for most applications, but in this case vode would be better. Replace the call to ode with the following line:
ode(v, 1:200, fun, parms = alpha, method = "vode")
and it should work without error. vode is another excellent solver of the Livermore ODEPACK family. Another approach is to use an external forcing or an event.

Filter datapoints in a scatterplot using a linear equation

I am writing a function in order to filter out datapoints out of my plot based on a linear equation.
I currently have the following function (with a different function within it):
MD_filter<- function(dataframe, mz_col){
#In-function MD calculation
MZ<- mz_col
MZR<- trunc(mz_col, digits = 0)#Either floor() or trunc() can be used for this part.
MD<- as.numeric(MZ-MZR)
dataframe<- dataframe%>%
dplyr::mutate(MD)%>%
dplyr::select(MD, everything())
#fit data to m/z defect maxima equation
f<- function(x){#This could be problem `1`, maybe resolved by leaving x....
y<-0.00112*x + 0.01953
return(y)}
fit<-as.data.frame(t(apply(dataframe,1,f)))# t() transforms df to matrix...?
filtered<-dataframe[which((dataframe$MD<= fit$MZ)),]
#keep rows in dataframe if MD is less than or equal to fitted value (mz after equation)
#As "fit" calculated the maximum MD value for each MZ value in the MZ column, we subset fit$MZ, as this contains the dataframe MZ values.
#The MD calculated at the very start, needs to be lower than the equivalent MZ value of the fitted dataframe.
filtered<-write.table(filtered,"feature_list_mz_defect_filtered.txt",sep="\t",col.names=NA)
#Now we have pre filter dataframe (dataframe) and post filter df (filtered)
#2 Different plots: (highlight to be removed as well, so we need a 3rd eventually)
MD_plot<- ggplot(dataframe, aes(x= MZ, y = MD)) +
geom_point() +#THE FOLLOWING PART DOES NOT WORK YET
ggtitle(paste("Unfiltered MD data - ", dataframe))
#stat_smooth(method="lm", se=FALSE)-> For linear line through the plot, but may not be necessary to show
return(MD_plot)#While I do get a plot, I have not yet gotten the equation. I could use my earlier approach maybe.
MD_plot_2<- ggplot(filtered, aes(x= MZ, y = MD)) +#Filtered is basically the second dataframe,
#which subsets datapoints with an Y value (which is the MD), below the linear equation MD...
geom_point() +#THE FOLLOWING PART DOES NOT WORK YET
ggtitle(paste("Filtered MD data - ", dataframe))
#stat_smooth(method="lm", se=FALSE) -> For linear line through the plot, but may not be necessary to show
return(MD_plot_2)
}
The function works as follows:
The argument inputs are a dataframe and a specific column inside that dataframe which I call the mz_col.
From this column a second column, the MD column, is generated.
From here on out I want to make two plots:
ggplot 1: A plot with the mz_col (MZ) values on the X axis and the MD values on the Y axis
ggplot 2: EXACTLY the same as ggplot 1, but I want to filter out the datapoints if MD exceeds the linear equation y<-0.00112*x + 0.01953 (as is visible in the code). This linear line is basically my maximum filter limit in the plot, everything above this I want gone.
I've tried many different solutions. I swapped the "x" argument with mz_col among many other solutions such as trying to use plot() instead of ggplot. Currently I'm getting no plot, but I do get this:
Basically my question is: How do I solve my function, so I can get my two plots? The first plot is no real issue, this already works, but the second plot just won't filter out datapoints based on my linear equation.
Thanks in advance! I'm quite new to SO and R, so I apologize if anything is unclear. Please let me know if any clearification is needed and thanks in advance for all the help!
Reproducable sample data:
structure(list(mz = c(446.0394, 346.043, 199.0446, 199.0464, 97.057, 657.0595, 879.0606, 978.0631, 199.0684, 199.0707, 199.0724, 86.0748, 199.0761, 196.0789, 199.0819, 199.0852, 199.0878, 199.089, 908.0935, 147.0958, 199.0999,199.1299, 199.1322, 199.1384, 199.1398, 199.1434, 124.1475, 199.1513, 187.156, 199.1686, 199.1766, 199.1797, 199.1879, 199.1924, 187.1959, 479.1981, 169.1999, 109.2049, 399.2092, 299.2125, 159.2146, 199.2242, 356.2405, 69.2423, 956.4337, 978.5537, 199.5695, 676.5769, 199.5851, 500.6021, 260.6039, 270.6081, 200.6114, 200.6131, 200.6172, 200.6221,
200.6315, 200.6402, 200.6476, 200.766, 200.8591, 200.8732, 200.8768,
200.89, 200.8937, 200.8972, 200.9067, 200.9127, 200.9147, 200.9231,
200.9253, 200.9288, 200.9324, 200.935, 200.9468, 200.9515, 200.9536,
200.9557, 200.9568, 200.9594, 200.9661, 200.968, 200.9729, 200.9745,
200.9819, 200.9837, 200.9858, 200.9937)), row.names = c(NA, -88L), class = c("tbl_df", "tbl", "data.frame"))
I got a bit lost trying to follow your code, but based on your description, does the following work for you?
library(dplyr)
library(ggplot2)
MD_filter <- function(dataframe, mz_col, a = 0.01953, b = 0.00112){
# rename column so that rest of function doesn't depend on inputted column name
dataframe[["MZ"]] <- dataframe[[mz_col]]
# process dataframe
dataframe <- dataframe %>%
select(MZ) %>%
mutate(MD = MZ - trunc(MZ, digits = 0),
MD.limit = a + b*MZ)
p1 <- ggplot(dataframe,
aes(x = MZ, y = MD)) +
geom_point() +
geom_smooth(method = "lm", se = F) +
ggtitle("Unfiltered MD data")
p2 <- p1 %+% filter(dataframe, MD <= MD.limit) +
expand_limits(y = range(dataframe[["MD"]])) + # optional (if you want same
# y-axis range for both plots)
ggtitle("Filtered MD data")
cowplot::plot_grid(p1, p2, nrow = 1)
}
Data & usage
dd <- structure(list(mz = c(
446.0394, 346.043, 199.0446, 199.0464, 97.057, 657.0595, 879.0606,
978.0631, 199.0684, 199.0707, 199.0724, 86.0748, 199.0761, 196.0789,
199.0819, 199.0852, 199.0878, 199.089, 908.0935, 147.0958, 199.0999,
199.1299, 199.1322, 199.1384, 199.1398, 199.1434, 124.1475, 199.1513,
187.156, 199.1686, 199.1766, 199.1797, 199.1879, 199.1924, 187.1959,
479.1981, 169.1999, 109.2049, 399.2092, 299.2125, 159.2146, 199.2242,
356.2405, 69.2423, 956.4337, 978.5537, 199.5695, 676.5769, 199.5851,
500.6021, 260.6039, 270.6081, 200.6114, 200.6131, 200.6172, 200.6221,
200.6315, 200.6402, 200.6476, 200.766, 200.8591, 200.8732, 200.8768,
200.89, 200.8937, 200.8972, 200.9067, 200.9127, 200.9147, 200.9231,
200.9253, 200.9288, 200.9324, 200.935, 200.9468, 200.9515, 200.9536,
200.9557, 200.9568, 200.9594, 200.9661, 200.968, 200.9729, 200.9745,
200.9819, 200.9837, 200.9858, 200.9937)),
row.names = c(NA, -88L),
class = c("tbl_df", "tbl", "data.frame"))
MD_filter(dd, "mz")
# MD_filter(dd, "mz", a = 0.02, b = 0.001) # if you want to change the limit

Find the y-coordinate at intersection of two curves when x is known

Background and Summary of Objective
I am trying to find the y-coordinate at the intersection of two plotted curves using R. I will provide complete details and sample data below, but in the hopes that this is a simple problem, I'll be more concise up front.
The cumulative frequencies of two curves(c1 and c2 for simplicity) are defined by the following function, where a and b are known coefficients:
f(x)=1/(1+exp(-(a+bx)))
Using the uniroot() function, I found "x" at the intersection of c1 and c2.
I had assumed that if x is known then determining y should be simple substitution: for example, if x = 10, y=1/(1+exp(-(a+b*10))) (again, a and b are known values); however, as will be shown below, this is not the case.
The objective of this post is to determine how to find y-coordinate.
Details
This data replicates respondents' stated price at which they find the product's price to be too.cheap (i.e., they question its quality) and the price at which they feel the product is a bargain.
The data will be cleaned before use to ensure that too.cheap is
always less than the bargain price.
The cumulative frequency for the
bargain price will be inverted to become not.bargain.
The intersection of bargain and too.cheap will represent the point at
which an equal share of respondents feel the price is not a bargain
and too.cheap --- the point of marginal cheapness ("pmc").
Getting to the point where I'm having a challenge will take a number of steps.
Step 1: Generate some data
# load libraries for all steps
library(car)
library(ggplot2)
# function that generates the data
so.create.test.dataset <- function(n, mean){
step.to.bargain <- round(rnorm(n = n, 3, sd = 0.75), 2)
price.too.cheap <- round(rnorm(n = n, mean = mean, sd = floor(mean * 100 / 4) / 100), 2)
price.bargain <- price.too.cheap + step.to.bargain
df.temp <- cbind(price.too.cheap,
price.bargain)
df.temp <- as.data.frame(df.temp)
return(df.temp)
}
# create 389 "observations" where the too.cheap has a mean value of 10.50
# the function will also create a "bargain" price by
#adding random values with a mean of 3.00 to the too.cheap price
so.test.df <- so.create.test.dataset(n = 389, mean = 10.50)
Step 2: Create a data frame of cumulative frequencies
so.get.count <- function(p.points, p.vector){
cc.temp <- as.data.frame(table(p.vector))
cc.merged <- merge(p.points, cc.temp, by.x = "price.point", by.y = "p.vector", all.x = T)
cc.extracted <- cc.merged[,"Freq"]
cc.extracted[is.na(cc.extracted)] <- 0
return(cc.extracted)
}
so.get.df.price<-function(df){
# creates cumulative frequencies for three variables
# using the price points provided by respondents
# extract and sort all unique price points
# Thanks to akrun for their help with this step
price.point <- sort(unique(unlist(round(df, 2))))
#create a new data frame to work with having a row for each price point
dfp <- as.data.frame(price.point)
# Create cumulative frequencies (as percentages) for each variable
dfp$too.cheap.share <- 1 - (cumsum(so.get.count(dfp, df$price.too.cheap)) / nrow(df))
dfp$bargain.share <- 1 - cumsum(so.get.count(dfp, df$price.bargain)) / nrow(df)
dfp$not.bargain.share <- 1 - dfp$bargain.share# bargain inverted so curves will intersect
return(dfp)
}
so.df.price <- so.get.df.price(so.test.df)
Step 3: Estimate the curves for the cumulative frequencies
# Too Cheap
so.l <- lm(logit(so.df.price$too.cheap.share, percents = TRUE)~so.df.price$price.point)
so.cof.TCh <- coef(so.l)
so.temp.nls <- nls(too.cheap.share ~ 1 / (1 + exp(-(a + b * price.point))), start = list(a = so.cof.TCh[1], b = so.cof.TCh[2]), data = so.df.price, trace = TRUE)
so.df.price$Pr.TCh <- predict(so.temp.nls, so.df.price$price.point, lwd=2)
#Not Bargain
so.l <- lm(logit(not.bargain.share, percents = TRUE) ~ price.point, so.df.price)
so.cof.NBr <- coef(so.l)
so.temp.nls <- nls(not.bargain.share ~ 1 / (1 + exp(-(a + b * price.point))), start = list(a = so.cof.NBr[1], b = so.cof.Br[2]), data= so.df.price, trace=TRUE)
so.df.price$Pr.NBr <- predict(so.temp.nls, so.df.price$price.point, lwd=2)
# Thanks to John Fox & Sanford Weisberg - "An R Companion to Applied Regression, second edition"
At this point, we can plot and compare the "observed" cumulative frequencies against the estimated frequencies
ggplot(data = so.df.price, aes(x = price.point))+
geom_line(aes(y = so.df.price$Pr.TCh, colour = "Too Cheap"))+
geom_line(aes(y = so.df.price$Pr.NBr, colour = "Not Bargain"))+
geom_line(aes(y = so.df.price$too.cheap.share, colour = "too.cheap.share"))+
geom_line(aes(y = so.df.price$not.bargain.share, colour = "not.bargain.share"))+
scale_y_continuous(name = "Cummulative Frequency")
The estimate appears to fit the observations reasonably well.
Step 4: Find the intersection point for the two estimate functions
so.f <- function(x, a, b){
# model for the curves
1 / (1 + exp(-(a + b * x)))
}
# note, this function may also be used in step 3
#I was building as I went and I don't want to risk a transpositional error that breaks the example
so.pmc.x <- uniroot(function(x) so.f(x, so.cof.TCh[1], so.cof.TCh[2]) - so.f(x, so.cof.Br[1], so.cof.Br[2]), c(0, 50), tol = 0.01)$root
We may visually test the so.pmc.x by plotting it with the two estimates. If it is correct, a vertical line for so.pmc.x should pass through the intersection of too.cheap and not.bargain.
ggplot(data = so.df.price, aes(x = price.point)) +
geom_line(aes(y = so.df.price$Pr.TCh, colour = "Too Cheap")) +
geom_line(aes(y = so.df.price$Pr.NBr, colour = "Not Bargain")) +
scale_y_continuous(name = "Cumulative Frequency") +
geom_vline(aes(xintercept = so.pmc.x))
...which it does.
Step 5: Find y
Here is where I get stumped, and I'm sure I'm overlooking something very basic.
If a curve is defined by f(x) = 1/(1+exp(-(a+bx))), and a, b and x are all known, then shouldn't y be the result of 1/(1+exp(-(a+bx))) for either estimate?
In this instance, it is not.
# We attempt to use the too.cheap estimate to find y
so.pmc.y <- so.f(so.pmc.x, so.cof.TCh[1], so.cof.TCh[2])
# In theory, y for not.bargain at price.point so.pmc.x should be the same
so.pmc.y2 <- so.f(so.pmc.x, so.cof.NBr[1], so.cof.NBr[2])
EDIT: This is where the error occurs (see solution below).
a != so.cof.NBr[1] and b != so.cof.NBr[2], instead a and be should be defined as the coefficients from so.temp.nls (not so.l)
# Which they are
#> so.pmc.y
#(Intercept)
# 0.02830516
#> so.pmc.y2
#(Intercept)
# 0.0283046
If we calculate the correct value for y, a horizontal line at yintercept = so.pmc.y, should pass through the intersection of too.cheap and not.bargain.
...which it obviously does not.
So how does one estimate y?
I've solved this, and as I suspected, it was a simple error.
My assumption that y = 1/(1+exp(-(a+bx))) is correct.
The issue is that I was using the wrong a, b coefficients.
My curve was defined using the coefficients in so.cof.NBr as defined by so.l.
#Not Bargain
so.l <- lm(logit(not.bargain.share, percents = TRUE) ~ price.point, so.df.price)
so.cof.NBr <- coef(so.l)
so.temp.nls <- nls(not.bargain.share ~ 1 / (1 + exp(-(a + b * price.point))), start = list(a = so.cof.NBr[1], b = so.cof.Br[2]), data= so.df.price, trace=TRUE)
so.df.price$Pr.NBr <- predict(so.temp.nls, so.df.price$price.point, lwd=2)
But the resulting curve is so.temp.nls, NOT so.l.
Therefore, once I find so.pmc.x I need to extract the correct coefficients from so.temp.nls and use those to find y.
# extract coefficients from so.temp.nls
so.co <- coef(so.temp.nls)
# find y
so.pmc.y <- 1 / (1 + exp(-(so.co[1] + so.co[2] * so.pmc.x)))
ggplot(data = so.df.price, aes(x = price.point))+
geom_line(aes(y = so.df.price$Pr.TCh, colour = "Too Cheap"))+
geom_line(aes(y = so.df.price$Pr.NBr, colour = "Not Bargain"))+
scale_y_continuous(name = "Cumulative Frequency")+
geom_hline(aes(yintercept = so.pmc.y))
Yielding the following...
which graphically depicts the correct answer.

How to plot the probabilistic density function of a function?

Assume A follows Exponential distribution; B follows Gamma distribution
How to plot the PDF of 0.5*(A+B)
This is fairly straight forward using the "distr" package:
library(distr)
A <- Exp(rate=3)
B <- Gammad(shape=2, scale=3)
conv <- 0.5*(A+B)
plot(conv)
plot(conv, to.draw.arg=1)
Edit by JD Long
Resulting plot looks like this:
If you're just looking for fast graph I usually do the quick and dirty simulation approach. I do some draws, slam a Gaussian density on the draws and plot that bad boy:
numDraws <- 1e6
gammaDraws <- rgamma(numDraws, 2)
expDraws <- rexp(numDraws)
combined <- .5 * (gammaDraws + expDraws)
plot(density(combined))
output should look a little like this:
Here is an attempt at doing the convolution (which #Jim Lewis refers to) in R. Note that there are probably much more efficient ways of doing this.
lower <- 0
upper <- 20
t <- seq(lower,upper,0.01)
fA <- dexp(t, rate = 0.4)
fB <- dgamma(t,shape = 8, rate = 2)
## C has the same distribution as (A + B)/2
dC <- function(x, lower, upper, exp.rate, gamma.rate, gamma.shape){
integrand <- function(Y, X, exp.rate, gamma.rate, gamma.shape){
dexp(Y, rate = exp.rate)*dgamma(2*X-Y, rate = gamma.rate, shape = gamma.shape)*2
}
out <- NULL
for(ix in seq_along(x)){
out[ix] <-
integrate(integrand, lower = lower, upper = upper,
X = x[ix], exp.rate = exp.rate,
gamma.rate = gamma.rate, gamma.shape = gamma.shape)$value
}
return(out)
}
fC <- dC(t, lower=lower, upper=upper, exp.rate=0.4, gamma.rate=2, gamma.shape=8)
## plot the resulting distribution
plot(t,fA,
ylim = range(fA,fB,na.rm=TRUE,finite = TRUE),
xlab = 'x',ylab = 'f(x)',type = 'l')
lines(t,fB,lty = 2)
lines(t,fC,lty = 3)
legend('topright', c('A ~ exp(0.4)','B ~ gamma(8,2)', 'C ~ (A+B)/2'),lty = 1:3)
I'm not an R programmer, but it might be helpful to know that for independent random variables with PDFs f1(x) and f2(x), the PDF
of the sum of the two variables is given by the convolution f1 * f2 (x) of the two input PDFs.

Resources