Barplot Error in R using ggplot - r

I try to make a barplot of a time-series dataset with ggplot2 but I get following error message (I have performed this on a similar dataset and it works):
Error in if (!is.null(data$ymin) && !all(data$ymin == 0)) warning("Stacking not well defined when ymin != 0", : missing value where TRUE/FALSE needed
For this I have used following code:
p <- ggplot(dataset, aes(x=date, y=value)) + geom_bar(stat="identity")
If I use geom_point() instead of geom_bar() it works fine.

You haven't provided a reproducible example, so I'm just guessing, but your syntax doesn't look right to me. Check here: http://docs.ggplot2.org/current/geom_bar.html
Bar charts by default produce tabulations of counts:
p <- ggplot( dataset, aes( factor(date) ) ) + geom_bar()
If you want it to do something different, you'll need to tell it what statistic to use. See the link above (towards the bottom) for an example using the mean. Alternatively, see here for a hybrid point/scatterplot (very bottom of the page):
http://docs.ggplot2.org/current/position_jitter.html
But fundamentally you have two continuous variables and it's not clear to me why you'd want anything but a scatterplot.

Related

ordering function seq() in R with the order of input value

My apologize for my bad english i'm a student from france.
I have a little problem with a function in R, indeed i have a dataframe like that :
https://imgur.com/G5ToQrL
With this code :
testtransect2$TOTAL<-testtransect2$TOTAL*-1
plot(testtransect2$DECA,testtransect2$TOTAL,asp = 1)
xl <- seq(min(testtransect2$DECA),max(testtransect2$DECA), (max(testtransect2$DECA)-min(testtransect2$DECA))/1000)
lines(xl, predict(loess(testtransect2$TOTAL~testtransect2$DECA,span = 0.25), newdata=xl))
I want to create a plot with a smooth line which pass through all the point in the order of the dataframe but when i want put my line with my value xl and predict my plot is not like i want :
https://imgur.com/cSlhNtV
I link you a plot where you can see what i want :
https://imgur.com/mnVgvQ7
i think it's a problem of order in my xl value but i can't do it, if you have any solution
Thanks for give it to me
You can use ggplot
Storing your dataframe in df
df <- data.frame(DECA=c(0,10,15,-23,15,40,90,140,190,250,310,370,420),
TOTAL=c(0,-9,-15,-31.5,-48,-50,-44,-24,-17,-10,-6,-5,0))
You are interested in geom_point and geom_line. You can specify df$DECA and df$TOTAL in aes like this:
library(ggplot)
ggplot(df, aes(x=DECA, y=TOTAL)) +
geom_line() + geom_point()
Yielding
The "but when i want put my line with my value xl and predict my plot is not like i want" part is unfortunately unclear to me, please rephrase if this solution does not work for you.
Updated
There are other smooth_lines that may be added, eg. geom_smooth. Is this what you request?
ggplot(df, aes(x=DECA, y=TOTAL)) +
geom_line() + geom_point() +
geom_smooth(se=F, method = lm, col="red") + #linear method
geom_smooth(se=F, col="green") # loess method

geom_smooth does not plot line of best fit

I hope this question isn't a duplicate. I tried to find answers per the site's requirements before posting, but since I am so new, the help forums are too foreign to me.
Following Wickham's R for data visualization, I easily used geom_point for an integrated data set, mpg:
simple reference code:
ggplot(data = mpg)+
geom_smooth(mapping = aes(x=displ, y=hwy))+
geom_point(mapping = aes(x=displ, y=hwy))
Excited by this cool plot, I tried to do the same for some personal research data, which describes inteferon-beta production over five time points (A,b,c,d,e instead of numerical data).
I used the same code, essentially:
ggplot(data = ifnonly)+
geom_smooth(mapping = aes(x=HOURS, y=IFNB))+
geom_point(mapping = aes(x=HOURS, y=IFNB))
Unfortunately, the line does not display. In fact, nothing displays until I add the geom_point function. What am I missing here? Is there more complex code required or is there some subtlety that I can apply to future uses of this function and ggplot?
I think you should get your desired output with following one line code
library(ggplot2)
ggplot(mtcars, aes(disp,mpg))+geom_smooth() # one line code where I have mentioned data is mtcars , and disp as x axis and mpg as y axis you could get following output
# please check this link for output
o/p without geom_point
library(ggplot2)
ggplot(mtcars, aes(disp,mpg))+geom_smooth()+geom_point()
o/p with geom_point

addressing `data` in `geom_line` of ggplot

I am building a barplot with a line connecting two bars in order to show that asterisk refers to the difference between them:
Most of the plot is built correctly with the following code:
mytbl <- data.frame(
"var" =c("test", "control"),
"mean1" =c(0.019, 0.022),
"sderr"= c(0.001, 0.002)
);
mytbl$var <- relevel(mytbl$var, "test"); # without this will be sorted alphabetically (i.e. 'control', then 'test')
p <-
ggplot(mytbl, aes(x=var, y=mean1)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=mean1-sderr, ymax=mean1+sderr), width=.2)+
scale_y_continuous(labels=percent, expand=c(0,0), limits=c(NA, 1.3*max(mytbl$mean1+mytbl$sderr))) +
geom_text(mapping=aes(x=1.5, y= max(mean1+sderr)+0.005), label='*', size=10)
p
The only thing missing is the line itself. In my very old code, it was supposedly working with the following:
p +
geom_line(
mapping=aes(x=c(1,1,2,2),
y=c(mean1[1]+sderr[1]+0.001,
max(mean1+sderr) +0.004,
max(mean1+sderr) +0.004,
mean1[2]+sderr[2]+0.001)
)
)
But when I run this code now, I get an error: Error: Aesthetics must be either length 1 or the same as the data (2): x, y. By trying different things, I came to an awkward workaround: I add data=rbind(mytbl,mytbl), before mapping but I don't understand what really happens here.
P.S. additional little question (I know, I should ask in a separate SO post, sorry for that) - why in scale_y_continuous(..., limits()) I can't address data by columns and have to call mytbl$ explicitly?
Just put all that in a separate data frame:
line_data <- data.frame(x=c(1,1,2,2),
y=with(mytbl,c(mean1[1]+sderr[1]+0.001,
max(mean1+sderr) +0.004,
max(mean1+sderr) +0.004,
mean1[2]+sderr[2]+0.001)))
p + geom_line(data = line_data,aes(x = x,y = y))
In general, you should avoid using things like [ and $ when you map aesthetics inside of aes(). The intended way to use ggplot2 is usually to adjust your data into a format such that each column is exactly what you want plotted already.
You can't reference variables in mytbl in the scale_* functions because that data environment isn't passed along like it is with layers. The scales are treated separately than the data layers, and so the information about them is generally assumed to live somewhere separate from the data you are plotting.

Smooth Error in qplot from ggplot2

I have some data that I am trying to plot faceted by its Type with a smooth (Loess, LM, whatever) superimposed. Generation code is below:
testFrame <- data.frame(Time=sample(20:60,50,replace=T),Dollars=round(runif(50,0,6)),Type=sample(c("First","Second","Third","Fourth"),50,replace=T,prob=c(.33,.01,.33,.33)))
I have no problem either making a faceted plot, or plotting the smooth, but I cannnot do both. The first three lines of code below work fine. The fourth line is where I have trouble:
qplot(Time,Dollars,data=testFrame,colour=Type)
qplot(Time,Dollars,data=testFrame,colour=Type) + geom_smooth()
qplot(Time,Dollars,data=testFrame) + facet_wrap(~Type)
qplot(Time,Dollars,data=testFrame) + facet_wrap(~Type) + geom_smooth()
It gives the following error:
Error in [<-.data.frame(*tmp*, var, value = list(NA = NULL)) :
missing values are not allowed in subscripted assignments of data frames
What am I missing to overlay a smooth in a faceted plot? I could have sworn I had done this before, possibly even with the same data.
It works for me. Are sure you have the latest version of ggplot2?

plotting two vectors of data on a GGPLOT2 scatter plot using R

I've been experimenting with both ggplot2 and lattice to graph panels of data. I'm having a little trouble wrapping my mind around the ggplot2 model. In particular, how do I plot a scatter plot with two sets of data on each panel:
in lattice I could do this:
xyplot(Predicted_value + Actual_value ~ x_value | State_CD, data=dd)
and that would give me a panel for each State_CD with each column
I can do one column with ggplot2:
pg <- ggplot(dd, aes(x_value, Predicted_value)) + geom_point(shape = 2)
+ facet_wrap(~ State_CD) + opts(aspect.ratio = 1)
print(pg)
What I can't grok is how to add Actual_value to the ggplot above.
EDIT Hadley pointed out that this really would be easier with a reproducible example. Here's code that seems to work. Is there a better or more concise way to do this with ggplot? Why is the syntax for adding another set of points to ggplot so different from adding the first set of data?
library(lattice)
library(ggplot2)
#make some example data
dd<-data.frame(matrix(rnorm(108),36,3),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("Predicted_value", "Actual_value", "x_value", "State_CD")
#plot with lattice
xyplot(Predicted_value + Actual_value ~ x_value | State_CD, data=dd)
#plot with ggplot
pg <- ggplot(dd, aes(x_value, Predicted_value)) + geom_point(shape = 2) + facet_wrap(~ State_CD) + opts(aspect.ratio = 1)
print(pg)
pg + geom_point(data=dd,aes(x_value, Actual_value,group=State_CD), colour="green")
The lattice output looks like this:
(source: cerebralmastication.com)
and ggplot looks like this:
(source: cerebralmastication.com)
Just following up on what Ian suggested: for ggplot2 you really want all the y-axis stuff in one column with another column as a factor indicating how you want to decorate it. It is easy to do this with melt. To wit:
qplot(x_value, value,
data = melt(dd, measure.vars=c("Predicted_value", "Actual_value")),
colour=variable) + facet_wrap(~State_CD)
Here's what it looks like for me:
(source: princeton.edu)
To get an idea of what melt is actually doing, here's the head:
> head(melt(dd, measure.vars=c("Predicted_value", "Actual_value")))
x_value State_CD variable value
1 1.2898779 A Predicted_value 1.0913712
2 0.1077710 A Predicted_value -2.2337188
3 -0.9430190 A Predicted_value 1.1409515
4 0.3698614 A Predicted_value -1.8260033
5 -0.3949606 A Predicted_value -0.3102753
6 -0.1275037 A Predicted_value -1.2945864
You see, it "melts" Predicted_value and Actual_value into one column called value and adds another column called variable letting you know what column it originally came from.
Update: several years on now, I almost always use Jonathan's method (via the tidyr package) with ggplot2. My answer below works in a pinch, but gets tedious fast when you have 3+ variables.
I'm sure Hadley will have a better answer, but - the syntax is different because the ggplot(dd,aes()) syntax is (I think) primarily intended for plotting just one variable. For two, I would use:
ggplot() +
geom_point(data=dd, aes(x_value, Actual_value, group=State_CD), colour="green") +
geom_point(data=dd, aes(x_value, Predicted_value, group=State_CD), shape = 2) +
facet_wrap(~ State_CD) +
theme(aspect.ratio = 1)
Pulling the first set of points out of the ggplot() gives it the same syntax as the second. I find this easier to deal with because the syntax is the same and it emphasizes the "Grammar of Graphics" that is at the core of ggplot2.
you might just want to change the form of your data a little bit, so that you have one y-axis variable, with an additional factor variable indicating whether it is a predicted or actual variable.
Is this something like what you are trying to do?
dd<-data.frame(type=rep(c("Predicted_value","Actual_value"),20),y_value=rnorm(40),
x_value=rnorm(40),State_CD=rnorm(40)>0)
qplot(x_value,y_value,data=dd,colour=type,facets=.~State_CD)
well after posting the question I ran across this R Help thread that may have helped me. It looks like I can do this:
pg + geom_line(data=dd,aes(x_value, Actual_value,group=State_CD), colour="green")
is that a good way of doing things? It odd to me because adding the second item has a totally different syntax than the first.

Resources