Long story short, I decided to try a simulation in order to provide some insight upon the reproducability of my data.
However the plot seems pretty awful and I would like to smooth the lines a bit.
The plot is as follows:
Scatter: actual data
Black Line: Simulated means
Red: +/- 2 Standard Deviations
You may try ggplot's geom_smooth().
ggplot(iris, aes(Sepal.Width, Sepal.Length))+
geom_point()+
geom_smooth(method="loess")
However, this code won't give you the standard deviations, as I haven't got sample data resembling your data structure. Still, you might go from here and set geom_smooth(se=FALSE) to get rid of the confidence-intervall area and plot a geom_area() with your standard deviations instead.
Related
I have overlayed violin plots comparing group A and group B scores for a particular section of a survey, facet wrapped by section. The scores are discrete 1-7 values. In some of these violin plots, the smoothing works as expected. In others, one group or the other looks very "wavy" between discrete scores (shown below).
I thought the problem may be a difference in the group sizes, but then surely the "waviness" would appear in all the section plots.
Also, this doesn't explain to me why the plots "dip in" despite being discrete 1-7 values.
When I add the adjust parameter it over-smooths the already smooth sections, so it's not quite ideal.
I use this code to create the plots
create_violin_across_groups_by_section <- function(data, test_group="first") {
g <- ggplot(data) +
aes(x=factor(nrow(data)),y=score,fill=group) +
geom_violin(alpha=0.5,position="identity") +
facet_wrap("section") +
labs(
title = paste("Comparison across groups for ", test_group)
)
return(g)
}
which results in something like this
in this case, "openness," is oddly wavy while the others all appear to be smoothed as normal.
I've thought perhaps it has something to do with the x=factor(nrow(data)) but again, surely the waviness would appear in all the section plots.
I would expect either all of the plots to be wavy (though I still wouldn't understand why) or all of them to have the same smoothness.
How can I make all of the facet-wrapped plots have the same smoothness, and why are they different in the first place?
Thanks all
The shape of the violin plot is calculated with a kernel density estimation. Kernel density estimations are designed for continuous data and not for discrete data, like your scores. While you can feed discrete data to the kernel estimator, the result may not always be beautiful or even meaningful. You can try to use different kernel and bw argument values in the geom_violin or you might consider something designed for discrete data, such as geom_dotplot.
+ geom_dotplot(binaxis = "y", stackdir = "center", position = "dodge")
Check out the corresponding example of geom_dotplot https://ggplot2.tidyverse.org/reference/geom_dotplot.html for a preview of how it can look like.
Check out the kernel and bw description of the violin plot https://ggplot2.tidyverse.org/reference/geom_violin.html that points to the density function https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/density for further information on how kernel density estimations are calculated.
I created the forecasting plot with the point forecast and confidence interval. However, i only want the point forecast(blue line) without the confidence interval(the grey background). How do i do that? Below is my current code and the screenshot of my plot.
plot(snv.data$mean,main="Forecast for monthly Turnover in Food
Retailing",xlab="Years",ylab="$ million",+ geom_smooth(se=FALSE))
Currently it seems to me that you try a mixture between the base function plot and the ggplot2 function geom_smooth. I don't think it is a very good idea in this case.
Since you want to use geom_smooth why not try to do it all with `ggplot2'?
Here is how you would do it with ggplot2 ( I used R-included airmiles data as example data)
library(ggplot2)
data = data.frame("Years"=seq(1937,1960,1),"Miles"=airmiles) #Creating a sample dataset
ggplot(data,aes(x=Years,y=Miles))+
geom_point()+
geom_smooth(se=F)
With ggplot you can set options like your x and y variables once and for all in the aes() of your ggplot() call, which is the reason why I didnt need any aes() call for geom_point().
Then I add the smoother function geom_smooth(), with option se=F to remove the confidence interval
Code:
require(ggplot2)
set.seed(0)
xvar <- rnorm(100)
ggplot(data.frame(xvar), aes(xvar)) + geom_density(fill="lightblue") + scale_y_log10()
The graph is something like this:
How can I make the graph shade on the right side of (viz. below) the density estimate?
The problem is that stat_density by default fills between the density and the y=0 line of the transformed data. So transformations that alter the y=0 line will fall victim to problems of this sort. I personally think this is a bug in ggplot2, although since graphical grammar experts probably argue that y-transformed densities are meaningless, the bug may not get a lot of attention.
A very kludgy workaround is to manually add an offset to ..density.., which you will have to explicitly invoke, and then change the breaks to make it look like you didn't do anything weird.
require(ggplot2)
require(scales)
set.seed(0)
xvar <- rnorm(100000)
quartz(height=4,width=6)
ggplot(data.frame(xvar), aes(x=xvar, y=log10(..density..)+4)) +
geom_density(fill='lightblue') +
scale_y_continuous(breaks=c(0,1,2,3,4),
labels=c('0.0001', '0.001', '0.01', '0.1','1'), limits=c(0,4),
name='density')
quartz.save('![StackOverflow_29111741_v2][1].png')
That code produces this graph:
This isn't a ggplot2 or even an R issue but is simply an issue with the tails of a probability distribution being undersampled for your sample sizes. The log axis can go down forever, taking infinitely long to "reach" zero, but no finite sample size can ever hope to cover the increasingly improbable regions of the distribution.
So, to make the plot pretty, you need to both (a) increase the number of points from 100 to 10,000 or higher, while (b) keeping the plot ylims the same. (Otherwise the extra data you draw in your rnorm call will sparsely populate the tails of the gaussian even farther away from the mean, convincing ggplot2 to make automatic y axis limits even lower, in the range of the poorly-sampled tails, and the noisiness that you don't like will return.)
require(ggplot2)
require(scales)
set.seed(0)
xvar <- rnorm(100000)
ggplot(data.frame(xvar), aes(xvar)) +
geom_density(fill="lightblue") +
scale_y_continuous(trans=log10_trans(), limits = c(0.01, 1))
This generates this plot, which I think is what you want.
I'd like to use ggplot2 density geometry using a log transformation for the x scale:
qplot(rating, data=movies, geom="density", log="x")
This, however, produces a chart with probabilities larger than 1. One solution that seems to work is to scale the dataset before calling qplot:
qplot(rating, data=transform(movies, rating=log(rating))
But then the x axis doesn't look nice. What is the correct way to handle this?
It seems that my question doesn't not, in fact, make sense. It seems that it is OK that probability densities are larger than one, as per [2]. What is important is that the integral over the entire space is equal to one [3].
This gives the right answer.
qplot(rating, y = ..scaled.., data=movies, geom="density", log="x")
stat_density produces new values, one of them is ..scaled.. which is the density scaled from 0 to 1.
HTH
I've been trying to create a 3D bar plot based on categorical data, but have not found a way.
It is simple to explain. Consider the following example data (the real example is more complex, but it reduces to this), showing the relative risk of incurring something broken down by income and age, both categorical data.
I want to display this in a 3D bar plot (similar in idea to http://demos.devexpress.com/aspxperiencedemos/NavBar/Images/Charts/ManhattanBar.jpg). I looked at the scatterplot3d package, but it's only for scatter plots and doesn't handle categorical data well. I was able to make a 3d chart, but it shows dots instead of 3d bars. There is no chart type for what I need. I've also tried the rgl package, but no luck either. I've been googling for more than an hour now and haven't found a solution. I have a copy of the ggplot2 - Elegant Graphics for Data Analysis book as well, but ggplot2 doesn't have this kind of chart.
Is there another freeware app I could use? OpenOffice 3.2 doesn't have this chart either.
Thank you for any hints.
Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
I'm not sure how to make a 3d chart in R, but there are other, better ways to represent this data than with a 3d bar chart. 3d charts make interpretation difficult, because the heights of the bars and then skewed by the 3d perspective. In that example chart, it's hard to tell if Wisconsin in 2004 is really higher than Wisconsin 2001, or if that's an effect of the perspective. And if it is higher, how much so?
Since both Age and Income have meaningful orders, it wouldn't be awful to make a line graph. ggplot2 code:
ggplot(data, aes(Age, Risk, color = Income))+
geom_line(aes(group = Income))
Or, you could make a heatmap.
ggplot(data, aes(Age, Income, fill = Risk)) +
geom_tile()
Like the others suggested there are better ways to present this, but this should get you started if you want something similar to what you had.
df <- read.csv(textConnection("Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
"))
df$Age <- ordered(df$Age, levels=c('young', 'adult', 'old'))
df$Income <- ordered(df$Income, levels=c('low', 'medium', 'high'))
library(rgl)
plot3d(Risk ~ Age|Income, type='h', lwd=10, col=rainbow(3))
This will just produce flat rectangles. For an example to create nice looking bars, see demo(hist3d).
You can find a starting point here but you need to add in more lines and some rectangles to get a plot like you posted.