Trouble with fitting a power curve in ggplot2 - r

I have a dataset in r with the following columns:
> names(dataset)
[1] "Corp.Acct.Name" "Product-name" "Package.Type" "Total.Quantity" "ASP.Ex.Works"
What I am trying to do is create a scatter plot with Total.Quantity on the x axis and ASP.Ex.Works on the y axis, and then fit a power curve to the scatterplot.
I have tried the following using stat_smooth:
p <- ggplot(data = dataset, # specify dataset
aes(x = Total.Quantity, y = ASP.Ex.Works)) + # Quantity on x, ASP on Y
geom_point(pch = 1) + # plot points (pch = 1: circles, type '?pch' for other options)
xlim(0, xlimmax) +
ylim(0, ylimmax) +
xlab("Quantity (lbs)") +
ylab("Average Sale Price Ex Freight ($)") +
#Add line using non-linear regreassion
stat_smooth(method="nls",formula = ASP.Ex.Works ~a*exp(-Total.Quantity*b),method.args=list(start=c(a=2,b=2)),se=F,color="red")
p
but am thrown the following error:
Warning message: Computation failed in stat_smooth(): parameters
without starting value in 'data': ASP.Ex.Works, Total.Quantity
I have tried several different methods, including specifying the model outside of ggplot, but haven't had any luck. I am trying to recreate excel's power curve option in r for a dynamic visual in Power BI.

Related

Plotting Linear Regression Line with Confidence Interval

I am trying to plot how 'Square feet' of a home affects 'Sales Price (in $1000)' of the same. Particularly, I want the coefficient line from Square ft vs. Sales price plotted with a hypothetical grey area around the line with the original datapoints superimposed.
I have tried to complete this a few different ways. One way I have tried is using the function effect_plot from library(jtools). I used the coding I found from https://cran.r-project.org/web/packages/jtools/vignettes/effect_plot.html.
But when I run this function, I don't get a plot, I just get an error: Error in FUN(X[[i]], ...) : object 'Sales Price (in $1000)' not found.
The second way I have attempted is through manually creating a new vector and attempting to plot the confidence interval. My code inspiration is from Plotting a 95% confidence interval for a lm object.
But with this one, I get an error in the conf_interval line: Error in eval(predvars, data, env) : object 'Square feet' not found. I cannot figure out how to correct this error.
And finally, I have tried to use library(ggplot2) to complete the problem with inspiration from https://rpubs.com/aaronsc32/regression-confidence-prediction-intervals.
But each time I run R, it creates a coordinate plane with a single point in the center of the plane; there is no line, no real points, no hypothetical confidence interval. There are no errors and I also cannot figure out the issue with the coding.
library("jtools")
LRA1 <- lm(`Sales Price (in $1000)` ~ `Square feet` + Rooms +
Bedrooms + Age,data=HomedataSRS) #LRA1 is the regression model
effect_plot(LRA1, pred = 'Square feet', inherit.aes = FALSE,
plot.points = TRUE) #function should create graph
newSF = seq(min(HomedataSRS$`Square feet`),
max(HomedataSRS$`Square feet`), by = 0.05)
conf_interval <- predict(LRA1, newdata=data.frame(x=newSF),
interval="confidence",level = 0.95)
plot(HomedataSRS$`Square feet`, HomedataSRS$`Sales Price (in $1000)`,
xlab="Square feet", ylab="Sales Price(in $1000)",
main="Regression")
abline(LRA1, col="lightblue")
matlines(newSF, conf_interval[,2:3], col = "blue", lty=2)
library(ggplot2)
SFHT <- HomedataSRS %>% select(1:2)
#This is to select the 2 variables I'm working with
ggplot(SFHT, aes(x='Square feet', inherit.aes = FALSE,
y='Sales Price (in $1000)')) +
geom_point(color='#2980B9', size = 4) +
geom_smooth(method=lm, color='#2C3E50')
Data:
arguments to aes() should not be quoted. Try
ggplot(SFHT, aes(x = `Square feet`, y = `Sales Price (in $1000)`)) +
geom_point(color='#2980B9', size = 4) +
geom_smooth(method=lm, color='#2C3E50')
alternatively, you could use the new aes_string() function:
ggplot(SFHT, aes_string(x='Square feet',y='Sales Price (in $1000)')) +
geom_point(color='#2980B9', size = 4) +
geom_smooth(method=lm, color='#2C3E50')
more info is available on the package site: https://ggplot2.tidyverse.org/reference/aes_.html

r: Blank graph when plotting multiple lines on scatterplot

My goal is to produce a graph showing the differences between regression lines using continuous vs categorical variables. I'm using is the "SleepStudy" dataset from Lock5Data, and I want to show the regression lines predicting GPA from ClassYear as either continuous or categorical. The code is below:
library(Lock5Data)
data("SleepStudy")
fit2 <- lm(GPA ~ factor(ClassYear), data = SleepStudy)
fit2_line <- aggregate(fit2$fitted.values ~ SleepStudy$ClassYear, FUN = mean)
colnames(fit2_line) <- c('ClassYear','GPA')
options(repr.plot.width=5, repr.plot.height=5)
library(ggplot2)
ggplot() +
geom_line(data=fit2_line, aes(x=ClassYear, y=GPA)) + # Fit line, ClassYear factor
geom_smooth(data=SleepStudy, method='lm', formula=GPA~ClassYear) + # Fit line, ClassYear continuous
geom_point(data=SleepStudy, aes(x=ClassYear, y=GPA)) # Data points as dots
What is producing the blank graph? What am I missing here?
You have to define the data you are using for the geom_smooth in the ggplot(). This code works:
ggplot(data=SleepStudy, aes(y = GPA,x = ClassYear)) +
geom_smooth(data=SleepStudy, method='lm', formula=y~x)+
geom_line(data=fit2_line, aes(x=ClassYear, y=GPA)) +
geom_point(data=SleepStudy, aes(x=ClassYear, y=GPA))

Displaying smoothed (convolved) densities with ggplot2

I'm trying to display some frequencies convolved with a Gaussian kernel in ggplot2. I tried smoothing the lines with:
+ stat_smooth(se = F,method = "lm", formula = y ~ poly(x, 24))
Without success.
I read an article suggesting the frequencies should be convolved with a Gaussian kernel. Which ggplot2's stat_density function (http://docs.ggplot2.org/current/stat_density.html) seem to be able to produce.
However, I can't seem to be able to replace my geometry with stat_density. I there anything wrong with my code?
require(reshape2)
library(ggplot2)
library(RColorBrewer)
fileName = "/1.csv" # downloadable there: https://www.dropbox.com/s/l5j7ckmm5s9lo8j/1.csv?dl=0
mydata = read.csv(fileName,sep=",", header=TRUE)
dataM = melt(mydata,c("bins"))
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
ggplot(data=dataM,
aes(x=bins, y=value, colour=variable)) +
geom_line() + scale_x_continuous(limits = c(0, 2))
This code produces the following plot:
I'm looking at smoothing the lines a little bit, so they look more like this:
(from http://journal.frontiersin.org/Journal/10.3389/fncom.2013.00189/full)
Since my comments solved your problem, I'll convert them to an answer:
The density function takes individual measurements and calculates a kernel density distribution by convolution (gaussian is the default kernel). For example, plot(density(rnorm(1000))). You can control the smoothness with the bw (bandwidth) parameter. For example, plot(density(rnorm(1000), bw=0.01)).
But your data frame is already a density distribution (analogous to the output of the density function). To generate a smoother density estimate, you need to start with the underlying data and run density on it, adjusting bw to get the smoothness where you want it.
If you don't have access to the underlying data, you can smooth out your existing density distributions as follows:
ggplot(data=dataM, aes(x=bins, y=value, colour=variable)) +
geom_smooth(se=FALSE, span=0.3) +
scale_x_continuous(limits = c(0, 2)).
Play around with the span parameter to get the smoothness you want.

ggplot trend line is flat

I'm trying to plot a trend line along with a 95% confidence interval for my data in this csv file. When I issue this command:
ggplot(trimmed_data, aes(x=week, y=V4)) +
geom_smooth(fill='blue', alpha=.2, color='blue')
I get this plot, which is great:
However, when I use the since_weeks column (which is the correct one I'd like to use), I get a flat line:
ggplot(trimmed_data, aes(x=since_weeks, y=V4)) +
geom_smooth(fill='blue', alpha=.2, color='blue')
the weeks column has a range of 0-51, while the since_weeks column has a range of 1-52. Essentially I'm just re-ordering the rows.
I get this warning with both plots:
geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

How can I overlay timeseries models for exponential decay into ggplot2 graphics?

I'm trying to plot an exponential decay line (with error bars) onto a scatterplot in ggplot of price information over time. I currently have this:
f2 <- ggplot(data, aes(x=date, y=cost) ) +
geom_point(aes(y = cost), colour="red", size=2) +
geom_smooth(se=T, method="lm", formula=y~x) +
# geom_smooth(se=T) +
theme_bw() +
xlab("Time") +
scale_y_log10("Price over time") +
opts(title="The Falling Price over time")
print(f2)
The key line is in the geom_smooth command, of formula=y~x Although this looks like a linear model, ggplot seems to automatically detect my scale_y_log10 and log it.
Now, my issue here is that date is a date data type. I think I need to convert it to seconds since t=0 to be able to apply an exponential decay model of the form y = Ae^-(bx).
I believe this because when I tried things like y = exp(x), I get a message that I think(?) is telling me I can't take exponents of dates. It reads:
Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :
NA/NaN/Inf in foreign function call (arg 1)
However, log(y) = x works correctly. (y is a numeric data type, x is a date.)
Is there a convenient way to fit exponential growth/decay time series models within ggplot plots in the geom_smooth(formula=formula) function call?
This appears to work, although I don't know how finicky it will be with real/messy data:
set.seed(101)
dat <- data.frame(d=seq.Date(as.Date("2010-01-01"),
as.Date("2010-12-31"),by="1 day"),
y=rnorm(365,mean=exp(5-(1:365)/100),sd=5))
library(ggplot2)
g1 <- ggplot(dat,aes(x=d,y=y))+geom_point()+expand_limits(y=0)
g1+geom_smooth(method="glm",family=gaussian(link="log"),
start=c(5,0))

Resources