I am trying to plot how 'Square feet' of a home affects 'Sales Price (in $1000)' of the same. Particularly, I want the coefficient line from Square ft vs. Sales price plotted with a hypothetical grey area around the line with the original datapoints superimposed.
I have tried to complete this a few different ways. One way I have tried is using the function effect_plot from library(jtools). I used the coding I found from https://cran.r-project.org/web/packages/jtools/vignettes/effect_plot.html.
But when I run this function, I don't get a plot, I just get an error: Error in FUN(X[[i]], ...) : object 'Sales Price (in $1000)' not found.
The second way I have attempted is through manually creating a new vector and attempting to plot the confidence interval. My code inspiration is from Plotting a 95% confidence interval for a lm object.
But with this one, I get an error in the conf_interval line: Error in eval(predvars, data, env) : object 'Square feet' not found. I cannot figure out how to correct this error.
And finally, I have tried to use library(ggplot2) to complete the problem with inspiration from https://rpubs.com/aaronsc32/regression-confidence-prediction-intervals.
But each time I run R, it creates a coordinate plane with a single point in the center of the plane; there is no line, no real points, no hypothetical confidence interval. There are no errors and I also cannot figure out the issue with the coding.
library("jtools")
LRA1 <- lm(`Sales Price (in $1000)` ~ `Square feet` + Rooms +
Bedrooms + Age,data=HomedataSRS) #LRA1 is the regression model
effect_plot(LRA1, pred = 'Square feet', inherit.aes = FALSE,
plot.points = TRUE) #function should create graph
newSF = seq(min(HomedataSRS$`Square feet`),
max(HomedataSRS$`Square feet`), by = 0.05)
conf_interval <- predict(LRA1, newdata=data.frame(x=newSF),
interval="confidence",level = 0.95)
plot(HomedataSRS$`Square feet`, HomedataSRS$`Sales Price (in $1000)`,
xlab="Square feet", ylab="Sales Price(in $1000)",
main="Regression")
abline(LRA1, col="lightblue")
matlines(newSF, conf_interval[,2:3], col = "blue", lty=2)
library(ggplot2)
SFHT <- HomedataSRS %>% select(1:2)
#This is to select the 2 variables I'm working with
ggplot(SFHT, aes(x='Square feet', inherit.aes = FALSE,
y='Sales Price (in $1000)')) +
geom_point(color='#2980B9', size = 4) +
geom_smooth(method=lm, color='#2C3E50')
Data:
arguments to aes() should not be quoted. Try
ggplot(SFHT, aes(x = `Square feet`, y = `Sales Price (in $1000)`)) +
geom_point(color='#2980B9', size = 4) +
geom_smooth(method=lm, color='#2C3E50')
alternatively, you could use the new aes_string() function:
ggplot(SFHT, aes_string(x='Square feet',y='Sales Price (in $1000)')) +
geom_point(color='#2980B9', size = 4) +
geom_smooth(method=lm, color='#2C3E50')
more info is available on the package site: https://ggplot2.tidyverse.org/reference/aes_.html
Related
I have a dataset in r with the following columns:
> names(dataset)
[1] "Corp.Acct.Name" "Product-name" "Package.Type" "Total.Quantity" "ASP.Ex.Works"
What I am trying to do is create a scatter plot with Total.Quantity on the x axis and ASP.Ex.Works on the y axis, and then fit a power curve to the scatterplot.
I have tried the following using stat_smooth:
p <- ggplot(data = dataset, # specify dataset
aes(x = Total.Quantity, y = ASP.Ex.Works)) + # Quantity on x, ASP on Y
geom_point(pch = 1) + # plot points (pch = 1: circles, type '?pch' for other options)
xlim(0, xlimmax) +
ylim(0, ylimmax) +
xlab("Quantity (lbs)") +
ylab("Average Sale Price Ex Freight ($)") +
#Add line using non-linear regreassion
stat_smooth(method="nls",formula = ASP.Ex.Works ~a*exp(-Total.Quantity*b),method.args=list(start=c(a=2,b=2)),se=F,color="red")
p
but am thrown the following error:
Warning message: Computation failed in stat_smooth(): parameters
without starting value in 'data': ASP.Ex.Works, Total.Quantity
I have tried several different methods, including specifying the model outside of ggplot, but haven't had any luck. I am trying to recreate excel's power curve option in r for a dynamic visual in Power BI.
first of all, I have to apologize for my poor English. Second, the objective of this post is that I want to reproduce the plot of the ridge regression's MSE with ggplot2 instead of the function plot which is included in R.
The object of cv.out is defined by the next expression:
cv.out <- cv.glmnet(x_var[train,], y_var[train], alpha = 0). And when I print that object these are the elements of cv.out
Lambda
Measure
SE
Nonzero
min
439.8
32554969
1044541
5
lse
1343.1
33586547
1068662
5
This is the plot with plot(cv.out):
The thing what I want to do the same plot but more elaborated with ggplot and I don't know which aesthetics put in the function. These are the elements of cv.out when I call the object like this: cv.out$ :
lambda
cmv
cvsd
cvup
cvlo
nzero
call
name
lambda.min
lambda.lse
Finally, thanks for your help. I really appreciate it. :)
Using example dataset:
X = as.matrix(mtcars[,-1])
y = as.matrix(mtcars[,1])
cv.out = cv.glmnet(X,y,alpha=0)
plot(cv.out)
You just need to pull out the values and put into a data.frame, and plot using geom_point() and geom_errorbar() :
df = with(cv.out,
data.frame(lambda = lambda,MSE = cvm,MSEhi=cvup,MSElow=cvlo))
ggplot(df,aes(x=lambda,y=MSE)) +
geom_point(col="#f05454") +
scale_x_log10("log(lambda)") +
geom_errorbar(aes(ymin = MSElow,ymax=MSEhi),col="#30475e") +
geom_vline(xintercept=c(cv.out$lambda.1se,cv.out$lambda.min),
linetype="dashed")+
theme_bw()
I am relatively new to R and obviously not very experienced.
However, I used multilevel modeling to identify influences of voice on sleep parameters.
E.g. TST in this snippet is total sleep time, intensity is voice intensity (in this case as mean).
I manage to get a scatterplot, depending on participant number as I want to. However, I now want to include regression lines for my model, that show the intercept and slope for the model vs for the null-model (excluding my independent variable).
Yet, no matter what I try, I do not seem to be able to display the regression lines based on intercept and slope, even after entering their values manually!
Here's my code for the calculation and my calculation and for the plot.
Model:
library(lme4)
TST_RE_Intensity.model = lmer(Intensity_mean ~ TST_re + Day + (1+ TST_re|Participant_ID) + ( 1|Filename), data=my.df, REML = FALSE)
TST_RE_Intensity.null = lmer(Intensity_mean ~ Day + (1+ TST_re|Participant_ID) + (1|Filename), data=my.df, REML = FALSE)
Plot:
library(ggplot2)
p <- ggplot(my.df, aes(x=my.df$TST_re, y=my.df$Intensity_mean, colour=my.df$Participant_ID))+
theme(legend.position = "none")+
geom_point(shape=20) +
geom_abline(aes(intercept=64, slope = - 0.0167, size= 1.5)+
geom_abline(aes(intercept=61, slope = - 0.0162, size=1.5)+
scale_size_manual(values = c(0.3, 0.3))+
scale_y_log10(name="Log10(TST)", limits=c(40,80)) +
scale_x_log10(name="Log10(Intensity)")
I do not get any error messages but also never get to see any lines. I tried to follow instructions of the ggplot2 manual like this one, but end up short
p + geom_abline(intercept = 37, slope = -5)
Is there a way to just plot a line "manually"?
Thanks in advance!!
I'm reasonably familiar with the usual ways of modifying a plot by writing your own x axis labels or a main title, but I've been unable to customize the output when plotting the results of a time series decomposition.
For example,
library(TTR)
t <- ts(co2, frequency=12, start=1, deltat=1/12)
td <- decompose(t)
plot(td)
plot(td, main="Title Doesn't Work") # gets you an error message
gives you a nice, basic plot of the observed time series, trend, etc. With my own data (changes in depth below the water surface), however, I'd like to be able to switch the orientation of the y axes (eg ylim=c(40,0) for 'observed', or ylim=c(18,12) for 'trend'), change 'seasonal' to 'tidal', include the units for the x axis ('Time (days)'), and provide a more descriptive title for the figure.
My impression is that the kind of time series analyses I'm doing is pretty basic and, eventually, I may be better off using another package, perhaps with better graphical control, but I'd like to use ts() and decompose() if I can for now (yeah, cake and consumption). Assuming this doesn't get too horrendous.
Is there a way to do this?
Thanks! Pete
You can modify the plot.decomposed.ts function (that's the plot "method" that gets dispatched when you run plot on an object of class decomposed.ts (which is the class of td).
getAnywhere(plot.decomposed.ts)
function (x, ...)
{
xx <- x$x
if (is.null(xx))
xx <- with(x, if (type == "additive")
random + trend + seasonal
else random * trend * seasonal)
plot(cbind(observed = xx, trend = x$trend, seasonal = x$seasonal, random = x$random),
main = paste("Decomposition of", x$type, "time series"), ...)
}
Notice in the code above that the function hard-codes the title. So let's modify it so that we can choose our own title:
my_plot.decomposed.ts = function(x, title="", ...) {
xx <- x$x
if (is.null(xx))
xx <- with(x, if (type == "additive")
random + trend + seasonal
else random * trend * seasonal)
plot(cbind(observed = xx, trend = x$trend, seasonal = x$seasonal, random = x$random),
main=title, ...)
}
my_plot.decomposed.ts(td, "My Title")
Here's a ggplot version of the plot. ggplot requires a data frame, so the first step is to get the decomposed time series into data frame form and then plot it.
library(tidyverse) # Includes the packages ggplot2 and tidyr, which we use below
# Get the time values for the time series
Time = attributes(co2)[[1]]
Time = seq(Time[1],Time[2], length.out=(Time[2]-Time[1])*Time[3])
# Convert td to data frame
dat = cbind(Time, with(td, data.frame(Observed=x, Trend=trend, Seasonal=seasonal, Random=random)))
ggplot(gather(dat, component, value, -Time), aes(Time, value)) +
facet_grid(component ~ ., scales="free_y") +
geom_line() +
theme_bw() +
labs(y=expression(CO[2]~(ppm)), x="Year") +
ggtitle(expression(Decomposed~CO[2]~Time~Series)) +
theme(plot.title=element_text(hjust=0.5))
I'm trying to plot an exponential decay line (with error bars) onto a scatterplot in ggplot of price information over time. I currently have this:
f2 <- ggplot(data, aes(x=date, y=cost) ) +
geom_point(aes(y = cost), colour="red", size=2) +
geom_smooth(se=T, method="lm", formula=y~x) +
# geom_smooth(se=T) +
theme_bw() +
xlab("Time") +
scale_y_log10("Price over time") +
opts(title="The Falling Price over time")
print(f2)
The key line is in the geom_smooth command, of formula=y~x Although this looks like a linear model, ggplot seems to automatically detect my scale_y_log10 and log it.
Now, my issue here is that date is a date data type. I think I need to convert it to seconds since t=0 to be able to apply an exponential decay model of the form y = Ae^-(bx).
I believe this because when I tried things like y = exp(x), I get a message that I think(?) is telling me I can't take exponents of dates. It reads:
Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :
NA/NaN/Inf in foreign function call (arg 1)
However, log(y) = x works correctly. (y is a numeric data type, x is a date.)
Is there a convenient way to fit exponential growth/decay time series models within ggplot plots in the geom_smooth(formula=formula) function call?
This appears to work, although I don't know how finicky it will be with real/messy data:
set.seed(101)
dat <- data.frame(d=seq.Date(as.Date("2010-01-01"),
as.Date("2010-12-31"),by="1 day"),
y=rnorm(365,mean=exp(5-(1:365)/100),sd=5))
library(ggplot2)
g1 <- ggplot(dat,aes(x=d,y=y))+geom_point()+expand_limits(y=0)
g1+geom_smooth(method="glm",family=gaussian(link="log"),
start=c(5,0))