I wanted to create a graph to show which continent has the highest co2 emissions total per capita over the years,
so I coded this:
barchart <- ggplot(data = worldContinents,
aes(x = Continent,
y = `Per.Capita`,
colour = Continent))
+ geom_bar(stat = "identity")
barchart
This is my dataframe:
Geometry is just for some geographical mapping later.
I checked which(is.na(worldContinents$Per.Capita)) to see whether there were NA values but it returned nothing
What's causing these gray lines?
How do I get rid of them?
These are the gray lines inside the bar graph
Thank you
You have a couple of issues here. First of all, I'm guessing you want the fill to be mapped to the continent, not color, which only controls the color of the bars' outlines.
Secondly, there are multiple values for each continent in your data, so they are simply stacking on top of each other. This is the reason for the lines in your bars, and is probably not what you want. If you want the average value per capita in each continent, you either need to summarise your data beforehand or use stat_summary like this:
barchart <- ggplot(data = worldContinents,
aes(x = Continent,
y = `Per.Capita`,
fill = Continent)) +
stat_summary(geom = "col", fun = mean, width = 0.7,
col = "gray50") +
theme_light(base_size = 20) +
scale_fill_brewer(palette = "Spectral")
barchart
Data used
Obviously, we don't have your data, so I used a modified version of the gapminder data set to match your own data structure
worldContinents <- gapminder::gapminder
worldContinents$Per.Capita <- worldContinents$gdpPercap
worldContinents$Continent <- worldContinents$continent
worldContinents <- worldContinents[worldContinents$year == 2007,]
Related
I am trying to make a bar graph showing ages of first alcohol use by county by percent. I am not quite sure where the mistake is and would appreciate another set of eyes.
Data is publicly available here: https://www.datafiles.samhsa.gov/dataset/national-survey-drug-use-and-health-2020-nsduh-2020-ds0001 although I have cleaned it on my computer.
The percentages are definitely not out of 100 and the numbers are not adjusting for population. They are the same as my chart showing raw numbers.
palc.age.ct<-data1.cleaned%>%
mutate(ALCTRY= na_if(x=ALCTRY, y="Never Used"))%>%
drop_na(ALCTRY)%>%
ggplot(aes(x=ALCTRY, fill=COUTYP4))+
geom_bar (position = "dodge") +
geom_bar(aes(y = (..count..)/sum(..count..)))+
scale_y_continuous(labels=scales::percent)+
theme_minimal()+
labs(title = "First Alcohol Use by Age and Locality",
x="Age Initiated", y="Number Initiated")+
scale_color_viridis(option = "D")
I'm not recreating everything you did like labelling the bins, but based on the data you can do something like below. Note that you need to include the position = "dodge" in the bar chart where you want to calculate the percentage. Otherwise the calculation is done in a different geom than the one that is creating the grouped bar geom. Which is the reason for your issue.
library(dplyr)
library(ggplot2)
NSDUH_2020 %>%
select(alctry, COUTYP4) %>%
mutate(altcry = if_else(alctry > 66, NA_integer_, alctry),
COUTYP4 = forcats::as_factor(COUTYP4)) %>%
filter(!is.na(altcry)) %>%
ggplot(aes(x = alctry, fill = COUTYP4)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = "dodge") +
scale_y_continuous(labels = scales::label_percent(accuracy = .1)) +
scale_x_binned()
Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!
I wish to add the number of observations to this boxplot, not by group but separated by factor. Also, I wish to display the number of observations in addition to the x-axis label that it looks something like this: ("PF (N=12)").
Furthermore, I would like to display the mean value of each box inside of the box, displayed in millions in order not to have a giant number for each box.
Here is what I have got:
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
}
mean.n <- function(x){x <- x/1000000
return(c(y = median(x)*0.97, label = round(mean(x),2)))
}
ggplot(Soils_noctrl) +
geom_boxplot(aes(x=Slope,y=Events.g_Bacteria, fill = Detergent),
varwidth = TRUE) +
stat_summary(aes(x = Slope, y = Events.g_Bacteria), fun.data = give.n, geom = "text",
fun = median,
position = position_dodge(width = 0.75))+
ggtitle("Cell Abundance")+
stat_summary(aes(x = Slope, y = Events.g_Bacteria),
fun.data = mean.n, geom = "text", fun = mean, colour = "red")+
facet_wrap(~ Location, scale = "free_x")+
scale_y_continuous(name = "Cell Counts per Gram (Millions)",
breaks = round (seq(min(0),
max(100000000), by = 5000000),1),
labels = function(y) y / 1000000)+
xlab("Sample")
And so far it looks like this:
As you can see, the mean value is at the bottom of the plot and the number of observations are in the boxes but not separated
Thank you for your help! Cheers
TL;DR - you need to supply a group= aesthetic, since ggplot2 does not know on which column data it is supposed to dodge the text geom.
Unfortunately, we don't have your data, but here's an example set that can showcase the rationale here and the function/need for group=.
set.seed(1234)
df1 <- data.frame(detergent=c(rep('EDTA',15),rep('Tween',15)), cells=c(rnorm(15,10,1),rnorm(15,10,3)))
df2 <- data.frame(detergent=c(rep('EDTA',20),rep('Tween',20)), cells=c(rnorm(20,1.3,1),rnorm(20,4,2)))
df3 <- data.frame(detergent=c(rep('EDTA',30),rep('Tween',30)), cells=c(rnorm(30,5,0.8),rnorm(30,3.3,1)))
df1$smp='Sample1'
df2$smp='Sample2'
df3$smp='Sample3'
df <- rbind(df1,df2,df3)
Instead of using stat_summary(), I'm just going to create a separate data frame to hold the mean values I want to include as text on my plot:
summary_df <- df %>% group_by(smp, detergent) %>% summarize(m=mean(cells))
Now, here's the plot and use of geom_text() with dodging:
p <- ggplot(df, aes(x=smp, y=cells)) +
geom_boxplot(aes(fill=detergent))
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2)),
color='blue', position=position_dodge(0.8)
)
You'll notice the numbers are all separated along y= just fine, but the "dodging" is not working. This is because we have not supplied any information on how to do the dodging. In this case, the group= aesthetic can be supplied to let ggplot2 know that this is the column by which to use for the dodging:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), group=detergent),
color='blue', position=position_dodge(0.8)
)
You don't have to supply the group= aesthetic if you supply another aesthetic such as color= or fill=. In cases where you give both a color= and group= aesthetic, the group= aesthetic will override any of the others for dodging purposes. Here's an example of the same, but where you don't need a group= aesthetic because I've moved color= up into the aes() (changing fill to greyscale so that you can see the text):
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), color=detergent),
position=position_dodge(0.8)
) + scale_fill_grey()
FUN FACT: Dodging still works even if you supply geom_text() with a nonsensical aesthetic that would normally work for dodging, such as fill=. You get a warning message Ignoring unknown aesthetics: fill, but the dodging still works:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), fill=detergent),
position=position_dodge(0.8)
)
# gives you the same plot as if you just supplied group=detergent, but with black text
In your case, changing your stat_summary() line to this should work:
stat_summary(aes(x = Slope, y = Events.g_Bacteria, group = Detergent),...
I'm currently working on a plot. It's all coming together, but i'm wondering about one thing... I tried googling this, but I couldn't found what I was looking for.
Right now, I've created a legend that I liked. With 'scale_color_manual' and 'scale_size_manual', I've created a combined legend that includes the thickness of the corresponding line and the color. I have put the code I used below.
scale_color_manual(name = "combined legend",
labels = c("Nederland totaal", "Noord-Holland", "Utrecht", "Noord-Brabant", "Zuid-Holland", "Gelderland", "Flevoland", "Overijssel", "Limburg", "Drenthe", "Zeeland", "Friesland", "Groningen"),
values=c("#000000", "#001EFF", "#2ECC71","#FF009D","#00FFCD",
"#FF8400", "#8514EB", "#EB1A14", "#FFE100",
"#FF00AA", "#00AAFF", "#16A085", "#B903FC")) +
scale_size_manual(name = "combined legend",
labels = c("Nederland totaal", "Noord-Holland", "Utrecht", "Noord-Brabant", "Zuid-Holland", "Gelderland", "Flevoland", "Overijssel", "Limburg", "Drenthe", "Zeeland", "Friesland", "Groningen"),
values = c(1.75, 0.8, 0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8))
This is what the legend looks like right now
My question is: Is it possible to create a little bit of space between the first 'legend' part (so "Nederland totaal" and the other lines?).
I want it to look more like this
(I made this for clarification via word).
Is there a function to add some space between certain legend items? I hope somebody could help me :))
More detailed:
The dataset I'm working with, are the average yearly housing prices per Dutch province for 2005 until 2019. I created a ggplot, the current plot looks like this now. It is basically a ggplot with the year on the horizontal axis and the average housing price on the vertical axis. I have sorted the color by province, and added the black, thicker line that corresponds the total average of the netherlands (which I also put in the province vector). I used geom_line all of the legend items are factors. I hope this was clear enough, if not, let me know
Could this be applied to your data?
library(tidyverse)
tib <-
tibble(x = 1:3,
a = 1:3,
b = 1.5:3.5,
c = 2:4) %>%
pivot_longer(cols = a:c, names_to = "id", values_to = "val")
ggplot()+
geom_line(data = filter(tib, id == "a"), aes(x, val, linetype = id))+
geom_line(data = filter(tib, id != "a"), aes(x, val, colour = id))+
labs(linetype = "legend", colour = NULL)
Gives you:
One option is to use stat_summary.
This will add another aesthetic to the graph, and thus another legend, far apart from the other one.
airquality %>%
mutate(Month=factor(Month)) %>%
ggplot(aes(x=Day, y=Temp, col=Month)) +
geom_line() +
stat_summary(aes(lwd="Nederland totaal"), fun=mean, geom="line", col="black") +
theme_classic() +
theme(legend.title = element_blank()) +
guides(lwd = guide_legend(order = 1))
But the benefit of this method is that you don't have to calculate the averages per year and manually add it to your Province variable.
I am getting this error while plotting a bar graph and I am not able to get rid of it, I have tried both qplot and ggplot but still the same error.
Following is my code:
library(dplyr)
library(ggplot2)
#Investigate data further to build a machine learning model
data_country = data %>%
group_by(country) %>%
summarise(conversion_rate = mean(converted))
#Ist method
qplot(country, conversion_rate, data = data_country,geom = "bar", stat ="identity", fill = country)
#2nd method
ggplot(data_country)+aes(x=country,y = conversion_rate)+geom_bar()
Error:
stat_count() must not be used with a y aesthetic
Data in data_country:
country conversion_rate
<fctr> <dbl>
1 China 0.001331558
2 Germany 0.062428188
3 UK 0.052612025
4 US 0.037800687
The error is coming in bar chart and not in the dotted chart.
First off, your code is a bit off. aes() is an argument in ggplot(), you don't use ggplot(...) + aes(...) + layers
Second, from the help file ?geom_bar:
By default, geom_bar uses stat="count" which makes the height of the
bar proportion to the number of cases in each group (or if the weight
aethetic is supplied, the sum of the weights). If you want the heights
of the bars to represent values in the data, use stat="identity" and
map a variable to the y aesthetic.
You want the second case, where the height of the bar is equal to the conversion_rate So what you want is...
data_country <- data.frame(country = c("China", "Germany", "UK", "US"),
conversion_rate = c(0.001331558,0.062428188, 0.052612025, 0.037800687))
ggplot(data_country, aes(x=country,y = conversion_rate)) +geom_bar(stat = "identity")
Result:
You can use geom_col() directly. See the differences between geom_bar() and geom_col() in this link https://ggplot2.tidyverse.org/reference/geom_bar.html
geom_bar() makes the height of the bar proportional to the number of cases in each group If you want the heights of the bars to represent values in the data, use geom_col() instead.
ggplot(data_country)+aes(x=country,y = conversion_rate)+geom_col()
when you want to use your data existing in your data frame as y value, you must add stat = "identity" in mapping parameter. Function geom_bar have default y value. For example,
ggplot(data_country)+
geom_bar(mapping = aes(x = country, y = conversion_rate), stat = "identity")
I was looking for the same and this may also work
p.Wages.all.A_MEAN <- Wages.all %>%
group_by(`Career Cluster`, Year)%>%
summarize(ANNUAL.MEAN.WAGE = mean(A_MEAN))
names(p.Wages.all.A_MEAN)
[1] "Career Cluster" "Year" "ANNUAL.MEAN.WAGE"
p.Wages.all.a.mean <- ggplot(p.Wages.all.A_MEAN, aes(Year, ANNUAL.MEAN.WAGE , color= `Career Cluster`))+
geom_point(aes(col=`Career Cluster` ), pch=15, size=2.75, alpha=1.5/4)+
theme(axis.text.x = element_text(color="#993333", size=10, angle=0)) #face="italic",
p.Wages.all.a.mean