I have a Database, and want to show a figure using stat_smooth.
I can show the avg_time vs Scored_Probabilities figure, which looks like this:
c <- ggplot(dataset1, aes(x=Avg.time, y=Scored.Probabilities))
c + stat_smooth()
But when changing Avg.time to time or Age, an error occurs:
c <- ggplot(dataset1, aes(x=Age, y=Scored.Probabilities))
c + stat_smooth()
error: geom_smooth: Only one unique x value each group. Maybe you want aes(group = 1)?
How could I fix it?
the error message says to set group=1, doing that gives another error
ggplot(dataset1, aes(x=Age, y=Scored.Probabilities, group=1))+stat_smooth()
geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Error in smooth.construct.cr.smooth.spec(object, data, knots) :
x has insufficient unique values to support 10 knots: reduce k.
Now the number of unique x values is not enough.
So two solutions : i) using another function like mean, ii) using jitter to move slightly Age.
ggplot(dataset1, aes(x=Age, y=Scored.Probabilities, group=1))+
geom_point()+
stat_summary(fun.y=mean, colour="red", geom="line", size = 3) # draw a mean line in the data
Or
ggplot(dataset1, aes(x=jitter(as.numeric(as.character(Age))), y=Scored.Probabilities, group=1))+
geom_point()+stat_smooth()
Note the use of as.numeric because Age is a factor.
Related
I generated a boxplot with three variables ("Jahreszeit","Fracht","Bewirtschaftungsform") like this:
ggplot(daten,aes(x=Jahreszeit, y=Fracht))+ geom_boxplot() +
facet_wrap(~ Bewirtschaftungsform)+
geom_point(position = position_jitter(width = 0.1))+
stat_summary(fun.data=f, geom="text", vjust=+1.5, col="black")
My question is, whether there is a way to extract the exact value of the mean of eacht category of the factor?
I would approach such a task using aggregate or plyr. With aggregate you get the group means (of Fracht I assume) with the following call:
groupMeans <- aggregate(Fracht ~ Bewirtschaftungsform, daten, mean)
Rounding is suggested for printing:
groupMeans$Fracht <- round(groupMeans$Fracht, 2)
Within the ggplot object you can then just add:
+ geom_text(data=groupMeans,aes(label=price,y=0,x=0))
The last term may require some tweaking for the x and y values to optimize the position.
I'm trying to plot a trend line along with a 95% confidence interval for my data in this csv file. When I issue this command:
ggplot(trimmed_data, aes(x=week, y=V4)) +
geom_smooth(fill='blue', alpha=.2, color='blue')
I get this plot, which is great:
However, when I use the since_weeks column (which is the correct one I'd like to use), I get a flat line:
ggplot(trimmed_data, aes(x=since_weeks, y=V4)) +
geom_smooth(fill='blue', alpha=.2, color='blue')
the weeks column has a range of 0-51, while the since_weeks column has a range of 1-52. Essentially I'm just re-ordering the rows.
I get this warning with both plots:
geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
I am trying to plot a graph of predicted values in ggplot.The script is depicted below -
Program1
lumber.predict.plm1=lm(lumber.1980.2000 ~ scale(woman.1980.2000) +
I(scale(woman.1980.2000)^2), data=lumber.unemployment.women)
xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.all=data.frame(woman.1980.2000=seq(xmin,xmax,length.out=100))
predicted.lumber.all$lumber=predict(lumber.predict.plm1,newdata=predicted.lumber.all)
lumber.predict.plot=ggplot(lumber.unemployment.women,mapping=aes(x=woman.1980.2000,
y=lumber.1980.2000)) +
geom_point(colour="red") +
geom_line(data=predicted.lumber.all,size=1)
lumber.predict.plot
Error: Aesthetics must either be length one, or the same length as the dataProblems:woman.1980.2000
I believe, we do not need to match the number of observations in base dataset with the one in predicted values dataset. The same logic/program works when I try it on 'cars' dataset.
speed.lm = lm(speed ~ dist, data = cars)
xmin=10
xmax=120
new = data.frame(dist=seq(xmin,xmax,length.out=200))
new$speed=predict(speed.lm,newdata=new,interval='none')
sp <- ggplot(cars, aes(x=dist, y=speed)) +
geom_point(colour="grey40") + geom_line(data=new, colour="green", size=.8)
The above code works fine.
Unable to figure out the problem with my first program.
You should use the same y value in the predicted data. Change this line
predicted.lumber.all$lumber=
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
by this one :
predicted.lumber.all$lumber.1980.2000= ## very bad variable name!
predict(lumber.predict.plm1,newdata=predicted.lumber.all)
Or recall aes as :
geom_line(data=new,aes(y=lumber),
colour="green", size=.8)
The basic problem is that in your code,
...
geom_line(data=predicted.lumber.all,size=1)
...
ggplot does not know which column from predicted.lumber to use. As #agstudy says, you can specify this with aes(...) in geom_line:
...
geom_line(data=predicted.lumber.all, aes(y=lumber), size=1)
...
Since you're just plotting the regression curve, you could accomplish the same thing with less code using:
df <- lumber.unemployment.women
model <- lumber.1980.2000 ~ scale(woman.1980.2000) + I(scale(woman.1980.2000)^2)
ggplot(df, aes(x=woman.1980.2000, y=lumber.1980.2000)) +
geom_point(color="red") +
stat_smooth(formula=model, method="lm", se=T, color="green", size=0.8)
Note that se=T gives you the confidence limits on the regression curves.
Trying to understand how ggplot mapping works…
Consider data table dt with two columns:
group: data grouping variable [a, b, … e]
values: the data [here, N(x,1) where x depends on group]
The following generates a sample dataset.
library(data.table)
set.seed(333)
dt <- data.table(group=rep(letters[1:5],each=20))
dt[,values:=rnorm(100,mean=as.numeric(factor(group)))]
The following generates density plots for each group scaled to (0,1).
ggp <- ggplot(dt) # establish dt as the default dataset
ggp + stat_density(aes(x=values, color=group, y=..scaled..),
geom="line", position="identity")
The following generates density plots with scale changed from (0,1) to (-25,+25).
ggp + stat_density(aes(x=values, color=group, y=-25+50*..scaled..),
geom="line", position="identity")
But the following generates and error:
ggp + stat_density(aes(x=values, color=group, y=min(values)+diff(range(values))*..scaled..),
geom="line", position="identity")
Error in eval(expr, envir, enclos) : object 'values' not found
My question is: why does aes correctly map “values” to dt in x=values, but not in y=… ?
NB: The reason I am trying to do this is to put density plots in the diagonal facets in a scatterplot matrix. And yes, I know there are about 5 different ways to generate scatterplot matrices in ggplot.
Thanks in advance to anyone who can help.
It seems that stat_density() can only use values of x and y for calculation. So if you need scale data by range of values variable then you can write x instead of values because values are already mapped to x.
ggplot(dt)+stat_density(aes(x=values, color=group, y=min(x)+diff(range(x))*..scaled..),
geom="line", position="identity")
I'm trying to plot an exponential decay line (with error bars) onto a scatterplot in ggplot of price information over time. I currently have this:
f2 <- ggplot(data, aes(x=date, y=cost) ) +
geom_point(aes(y = cost), colour="red", size=2) +
geom_smooth(se=T, method="lm", formula=y~x) +
# geom_smooth(se=T) +
theme_bw() +
xlab("Time") +
scale_y_log10("Price over time") +
opts(title="The Falling Price over time")
print(f2)
The key line is in the geom_smooth command, of formula=y~x Although this looks like a linear model, ggplot seems to automatically detect my scale_y_log10 and log it.
Now, my issue here is that date is a date data type. I think I need to convert it to seconds since t=0 to be able to apply an exponential decay model of the form y = Ae^-(bx).
I believe this because when I tried things like y = exp(x), I get a message that I think(?) is telling me I can't take exponents of dates. It reads:
Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :
NA/NaN/Inf in foreign function call (arg 1)
However, log(y) = x works correctly. (y is a numeric data type, x is a date.)
Is there a convenient way to fit exponential growth/decay time series models within ggplot plots in the geom_smooth(formula=formula) function call?
This appears to work, although I don't know how finicky it will be with real/messy data:
set.seed(101)
dat <- data.frame(d=seq.Date(as.Date("2010-01-01"),
as.Date("2010-12-31"),by="1 day"),
y=rnorm(365,mean=exp(5-(1:365)/100),sd=5))
library(ggplot2)
g1 <- ggplot(dat,aes(x=d,y=y))+geom_point()+expand_limits(y=0)
g1+geom_smooth(method="glm",family=gaussian(link="log"),
start=c(5,0))