I have a pretty straightforward dataset consisting of a week of two totals in groups, which I'm displaying in an identity bar plot using ggplot2 (version 3.3.0).
library(ggplot2)
library(lubridate)
weeksummary <- data.frame(
Date = rep(as.POSIXct("2020-01-01") + days(0:6), 2),
Total = rpois(14, 30),
Group = c(rep("group1", 7), rep("group2", 7))
)
ggplot(data = weeksummary, mapping = aes(x = Date, y = Total, fill = Group)) +
geom_col(position = "dodge") +
geom_text(aes(label = Total), position = position_dodge(width = 0.9), size = 3)
I cannot for the life of me get this to put the numbers at the top of their own bars, been hunting around for an answer and trying everything I found with no luck, until I randomly tried this:
weeksummary$Date <- as.factor(weeksummary$Date)
But this seems unnecessary manipulation, and I'd need to make sure the dates appear in the right format and order and rewrite the additional bits that currently rely on dates... I'd rather understand what I'm doing wrong.
What you're looking for is to use as.Date.POSIXct. as.factor() works to force weeksummary$Date into a factor, but it forces the conversion of your POSIXct class into a character first (thus erasing "date"). However, you need to convert to a factor so that dodging works properly - that's the question.
You can either convert before (e.g. weeksummary$Date <- as.Date.POXIXct(weeksummary$Date)), or do it right in your plot call:
ggplot(weeksummary, aes(x = as.Date.POSIXct(Date), y = Total, fill = Group)) +
geom_col(position = 'dodge') +
geom_text(aes(label = Total, y = Total + 1),
position = position_dodge(width = 0.9), size = 3)
Giving you this:
Note: the values are different than your values, since our randomization seeds are likely not the same :)
You'll notice I nudged the labels up a bit. You can normally do this with nudge_y, but you cannot specify nudge_x or nudge_y the same time you specify a position= argument. In this case, you can just nudge by overwriting the y aesthetic.
Because geom_text inherits x aesthetics which is Date in this case, which is totally correct. You don't have to mutate your data frame, you can specify the behaviour when plotting instead
aes(x = factor(Date), y = ...),
Related
Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!
I have got a file like this one:
Month,Open,Closed
2017-08,53,38
2017-09,102,85
2017-10,58,38
2017-11,51,42
2017-12,32,24
2018-01,24,30
2018-02,56,46
2018-03,82,74
2018-04,95,89
2018-05,16,86
I want to plot both lines, and also shade the difference between them. So this works:
ggplot() +geom_line(data=issues.m,aes(x=Month,y=Open,group=1))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
producing this
First problem here is that I would like the whole area between the two lines shaded if possible, not a single line. How can I do that?
But I would also like to color the two lines. If I add a color to one of them:
ggplot()
+geom_line(data=issues.m,aes(x=Month,y=Open,group=1,color='open'))
+geom_line(data=issues.m,aes(x=Month,y=Closed,group=1))
+geom_ribbon(data=issues.m, aes(x=Month,ymin=Closed,ymax=Open,color=Open-Closed))
+theme_tufte()
+theme(axis.text.x = element_text(angle = 90, hjust = 1))
I get the error:
Error: Continuous value supplied to discrete scale
So, can what I want to do be done at all? Would it be possible to change the colour palette of the ribbon too?
Your error was because you were mapping Open - Closed onto the color, which will be a continuous variable, i.e. the difference between those two values for each month. But you also assigned "open" to color inside the aes in one of your geom_lines. That means you're trying to assign both continuous values and discrete values to the same scale, and that's not going to work.
If all you need to do is get 2 colors, one for each line, you can do this one of two ways, the second of which fits more into the ggplot/tidyverse way of doing things.
First off I turned your dates into date objects to clean up the x-axis and avoid rotating the labels—feel free to experiment with the date breaks that work well in scale_x_date.
The less "tidy" way is to just make two geom_lines, one for Open and one for Closed, and assign a color to each.
library(tidyverse)
df_dated <- df %>%
mutate(month2 = sprintf("%s-01", Month) %>% lubridate::ymd())
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = Open), color = "green3") +
geom_line(aes(y = Closed), color = "red") +
ggthemes::theme_tufte()
But the more idiomatically "tidy" way is to make a long-shaped version of the data so you can map a variable—in this case whether an observation is the opening or closing value—onto an aesthetic such as color. This also gives you a legend—if you don't want it, you can get rid of it in the theme. This lets you set a scale for the colors, instead of hard-coding into each geom_line.
df_date_long <- df_dated %>%
gather(key, value, -month2, -Month)
ggplot(df_dated, aes(x = month2)) +
geom_ribbon(aes(ymin = Open, ymax = Closed), fill = "lightblue2") +
geom_line(aes(y = value, color = key), data = df_date_long) +
scale_color_manual(values = c(Open = "green3", Closed = "red")) +
ggthemes::theme_tufte()
I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.
ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")
This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?
The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.
The following codes builds a binned frequency table for the dataframe:
# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))
# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)
# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")
You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
position=position_stack(vjust=0.5), colour="black")
I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
position=position_stack(vjust=0.5), colour="black")
Update
I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))
This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)
This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.
Looking at the examples from the other stackoverflow links you shared, all you need to do is change the vjust parameter.
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", vjust=1.5)
That said, it looks like you have other issues. Namely, the labels stack on top of each other because there aren't many observations at each point. Instead I'd just let people use the legend to read the graph.
When I combine geom_vline() with facet_grid() like so:
DATA <- data.frame(x = 1:6,y = 1:6, f = rep(letters[1:2],3))
ggplot(DATA,aes(x = x,y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(xintercept = 2:3,
colour =c("goldenrod3","dodgerblue3"))
I get an error message stating Error: Aesthetics must be either length 1 or the same as the data (4): colour because there are two lines in each facet and there are two facets. One way to get around this is to use rep(c("goldenrod3","dodgerblue3"),2), but this requires that every time I change the faceting variables, I also have to calculate the number of facets and replace the magic number (2) in the call to rep(), which makes re-using ggplot code so much less nimble.
Is there a way to get the number of facets directly from ggplot for use in this situation?
You could put the xintercept and colour info into a data.frame to pass to geom_vline and then use scale_color_identity.
ggplot(DATA, aes(x = x, y = y)) +
geom_point() +
facet_grid(f~.) +
geom_vline(data = data.frame(xintercept = 2:3,
colour = c("goldenrod3","dodgerblue3") ),
aes(xintercept = xintercept, color = colour) ) +
scale_color_identity()
This side-steps the issue of figuring out the number of facets, although that could be done by pulling out the number of unique values in the faceting variable with something like length(unique(DATA$f)).
The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. The column is a factor, where each interval is a different factor level.
Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to have a line, like that provided by geom_freqpoly, with the area filled.
Currently I'm using geom_freqpoly like this:
ggplot(quake.data, aes(interval, fill=tweet.type)) + geom_freqpoly(aes(group = tweet.type, colour = tweet.type)) + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
I would prefer to have a filled area, such as provided by geom_density, but without smoothing the line:
The geom_area has been suggested, is there any way to use a ggplot2-generated statistic, such as ..count.., for the geom_area's y-values? Or, does the count aggregation need to occur prior to using ggplot2?
As stated in the answer, geom_area(..., stat = "bin") is the solution:
ggplot(quake.data, aes(interval)) + geom_area(aes(y = ..count.., fill = tweet.type, group = tweet.type), stat = "bin") + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
produces:
Perhaps you want:
geom_area(aes(y = ..count..), stat = "bin")
geom_ribbon can be used to produce a filled area between two lines without needing to explicitly construct a polygon. There is good documentation here.
ggplot(quake.data, aes(interval, fill=tweet.type, group = 1)) + geom_density()
But I don't think this is a meaningful graphic.
I'm not entirely sure what you're aiming for. Do you want a line or bars. You should check out geom_bar for filled bars. Something like:
p <- ggplot(data, aes(x = time, y = count))
p + geom_bar(stat = "identity")
If you want a line filled in underneath then you should look at geom_area which I haven't personally used but it appears the construct will be almost the same.
p <- ggplot(data, aes(x = time, y = count))
p + geom_area()
Hope that helps. Give some more info and we can probably be more helpful.
Actually i would throw on an index, just the row of the data and use that as x, and then use
p <- ggplot(data, aes(x = index, y = count))
p + geom_bar(stat = "identity") + scale_x_continuous("Intervals",
breaks = index, labels = intervals)