Geom_area plot doesn't fill the area between the lines - r

I want to make an area plot with ggplot(mpg, aes(x=year,y=hwy, fill=manufacturer)) + geom_area(), but I get this:
I'm realy new in R world, can anyone explain why it does not fill the area between the lines? Thanks!

First of all, there's nothing wrong with your code. It's working as intended and you are correct in the syntax required to do what you are looking to do.
Why don't you get the area geom to plot correctly, then? Simple answer is that you don't have enough points to draw a proper line between your x values for all of the aesthetics (manufacturers). Try the geom_point plot and you'll see what I mean:
ggplot(mpg, aes(x=year,y=hwy)) + geom_point(aes(color=manufacturer))
You need a different dataset. Here's a dummy one that is simply two lines with different slopes. It works as expected because each of the aesthetics has y values which span the x labels:
# dummy dataset
df <- data.frame(
x=rep(1:10,2),
y=c(seq(1,10,length.out=10), seq(1,5,length.out=10)),
z=c(rep('A',10), rep('B', 10))
)
# plot
ggplot(df, aes(x,y)) + geom_area(aes(fill=z))

Related

Plot with two different x axis for the same variable in R

I am trying to create a plot that displays a line with two x axis, one is a continuous numeric and the other is discrete.
This an example of the data:
df <-cbind.data.frame("Category"=c("A","A","A","A","A","B","B","B","B","B"),
"Y"=c(5,6,4,8,9,4,5,3,7,8),
"X1"=c(0,10,20,30,40,0,10,20,30,40),
"X2"=c(0,0,1,1,2,0,1,2,2,3))
I tried to add a secondary axis and re-scale it, but since my two variables are not proportional I don't know how to re-scale so the same Y point in the line will fit both x axis.
ggplot(data=df) +
geom_path(aes(y=Y,x=X1),color="red")+
geom_path(aes(y=Y,x=X2*10),color="blue")+
facet_wrap(~Category)+
scale_y_continuous("Y")+
scale_x_continuous("X1",sec.axis = sec_axis(~ .*1/10, "X2"))
I read different problems with two axis, but was not able to find a solution for my problem.
I am looking for something like this:
I will appreciate a lot any help on this!
The plot you provide does not evidence a clear algebraic relationship, so I'm going to give you an example of a completely-arbitrary second x-axis.
library(ggplot2)
ggplot(mtcars, aes(mpg, disp)) +
geom_point() +
scale_x_continuous(sec.axis=sec_axis(~., breaks=c(15,20,30), labels=c('a','b','c')))
The first argument is the transformation "~." (essentially x2=x1) and is required, so in this case it's a 1-for-1 transformation. The other two are relatively clear, you place 'a' at x=15, 'b' at x=20, etc. I don't think there's a way to put both on the same axis (with ggplot2 alone).

Wrong density values in a histogram with `fill` option in `ggplot2`

I was creating histograms with ggplot2 in R whose bins are separated with colors and noticed one thing. When the bins of a histogram are separated by colors with fill option, the density value of the histogram turns funny.
Here is the data.
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
This is a histogram without fill.
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..))
This is a histogram with fill.
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(aes(y=..density..))
You can see the latter is pretty crazy. The left side of the bins is sticking out. The density values of the bins of each color are obviously wrong.
I thought over this issue for a while. The data can't be wrong for the first histogram was normal. It should be something in ggplot2 or geom_histogram function. I googled "geom_histogram density fill" and couldn't find much help.
I want the end product to look like:
Separated by colors as you see in the second histogram
Size and shape identical to the first histogram
The vertical axis being density
How would you deal with issue?
I think what you may want is this:
ggplot(df, aes(x = x, fill=b)) +
geom_histogram()
Rather than the density. As mentioned above the density is asking for extra calcuations.
One thing that is important (in my opinion) is that histograms are graphs of one variable. As soon as you start adding data from other variables you start to change them more into bar charts or something else like that.
You will want work on setting the axis manually if you want it to range from 0 to .4.
The solution is to hand-compute density like this (instead of using the built-in ggplot2 version):
library(ggplot2)
# Generate test data
set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)
ggplot(df, aes(x = x, fill=b)) +
geom_histogram(mapping = aes(y = ..count.. / (sum(..count..) * ..width..)))
when you provide a column name for the fill parameter in ggplot it groups varaiables and plots them according to each group with a unique color.
if you want a single color for the plot just specify the color you want:
FIXED
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),fill="Blue")

ggplot geom_histogram color by factor not working properly

In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)

ggplot: How to increase space between axis labels for categorical data?

I love ggplot, but find it hard to customize some elements such as X axis labels and grid lines. The title of the question says it all, but here's a reproducible example to go with it:
Reproducible example
library(ggplot2)
library(dplyr)
# Make a dataset
set.seed(123)
x1 <- c('2015_46','2015_47','2015_48','2015_49'
,'2015_50','2015_51','2015_52','2016_01',
'2016_02','2016_03')
y1 <- runif(10,0.0,1.0)
y2 <- runif(10,0.5,2.0)
# Make the dataset ggplot friendly
df_wide <- data.table(x1, y1, y2)
df_long <- melt(df_wide, id = 'x1')
# Plot it
p <- ggplot(df_long, aes(x=x1,
y=value,
group=variable,
colour=variable )) + geom_line(size=1)
plot(p)
# Now, plot the same thing with the same lines and numbers,
# but with increased space between x-axis labels
# and / or space between x-axis grid lines.
Plot1
The plot looks like this, and doesn't look too bad in it's current form:
Plot2
The problem occurs when the dataset gets bigger, and the labels on the x-axis start overlapping each other like this:
What I've tried so far:
I've made several attempts using scale_x_discrete as suggested here, but I've had no luck so far. What really bugs me is that I saw some tutorial about these things a while back, but despite two days of intense googling I just can't find it. I'm going to update this section when I try new things.
I'm looking forward to your suggestions!
As mentioned above, assuming that x1 represents a year_day, ggplot provides sensible defaults for date scales.
First make x1 into a valid date format, then plot as you already did:
df_long$x1 <- strptime(as.character(df_long$x1), format="%Y_%j")
ggplot(df_long, aes(x=x1, y=value, group=variable, colour=variable)) +
geom_line(size=1)
The plot looks a little odd because of the disconnected time series, but scales_x_date() provides an easy way to customize the axis:
http://docs.ggplot2.org/current/scale_date.html

Problems making a graphic in ggplot

I an working with ggplot. I want to desine a graphic with ggplot. This graphics is with two continuous variables but I would like to get a graphic like this:
Where x and y are the continuous variables. My problem is I can't get it to show circles in the line of the plot. I would like the plot to have circles for each pair of observations from the continuous variables. For example in the attached graphic, it has a circle for pairs (1,1), (2,2) and (3,3). It is possible to get it? (The colour of the line doesn't matter.)
# dummy data
dat <- data.frame(x = 1:5, y = 1:5)
ggplot(dat, aes(x,y,color=x)) +
geom_line(size=3) +
geom_point(size=10) +
scale_colour_continuous(low="blue",high="red")
Playing with low/high will change the colours.
In general, to remove the legend, use + theme(legend.position="none")

Resources