My dataset has 3 columns: high school name, year, and percent enrolled in college, and it includes 104 high schools across 8 years.
school
chrt_grad
enrolled
Alba
2012
0.486
Alba
2013
0.593
Alba
2014
0.588
Alba
2015
0.588
Alba
2016
0.547
Alba
2017
0.613
Alba
2018
0.622
Alba
2019
0.599
Alba
2020
0.614
City
2012
0.588
City
2013
0.613
and so on...
I'm trying to produce 104 individual line plots--one for each school. I started by creating a single line plot showing every school:
ggplot(nsc_enroll,
mapping = aes(x = chrt_grad, y = enrolled, group = school)) +
geom_line() +
geom_point()
How can I create an individual plot for each of the 104 schools without having to filter for each school name over and over again?
You could use facet_wrap with ggplot,
ggplot(mtcars, aes(x = hp, y = mpg))+
geom_point() +
facet_wrap(~cyl)
In your case you would facet_wrap(~school), but it will produce a huge amount of plots.
First of all, I suppose that where you write chrt_grad it's the same as year or do you have another variable?
Anyway, it's not the point.
As you may know, +facet could do multiple plots, but not individually as I know.
I have a similar situation and what I would do, at least it works for me, is to:
Spread (if you know gather/spread) variable school
For loop to plot each column.
I am not at home now if you need it I can text the code.
Recently I saw some new dplyr tidyverse code using nest_by. It's very interesting although I haven't tried it yet.
Related
I am plotting some data of water metal levels where I report the 90% to the MI-DEQ. I have boxplots of each of the metals and I want to label the hinge and whisker values. I've done something similar in base R for discrete data sets. Here is my starting code:
ggplot(data = Agm, aes(x = Street, y = Level) , na.rm=TRUE) +
ggtitle("Lead Levels",subtitle=subtext )+
xlab("Streets") + ylab("ppb") +
geom_boxplot( fill="red",width = 0.8) + theme_bw()
Agm is a subsetted df with Street being chr and Level being numeric. How would I label each of the groups quantiles? Also, how would I have geom_boxplot whisker the max value, ie include outliers? If I create a df with the Street and quantile(0.9) for each street. Then how would I have geom_point plot and label over the above boxplot using the same grouping?
The data looks like this:
Agm<-read.table(header = TRUE, text = "
Street Year Month Person Level Metal
Crawford 2019 June RCR 0.13 Ag
Crawford 2019 June RCR 160 Cu
Crawford 2019 June RAR 0.92 Ag
Crawford 2019 June RAR 140 Cu
Gratiot 2019 June RL NA Ag
Gratiot 2019 June RL 24 Cu
Seneca 2019 June DS 0.33 Ag
Seneca 2019 June DS 75 Cu")
Sorry for the delay but I observe on my iPad but my R work is on my Linux box which was not readily available. The data is more expansive and will be growing.
This brings up another issue: the reporting metric is the 90 percentile. Is there a way to plot that point from the geom_boxplot call? This way each group would have the hinge and whisker values labeled as well as the 90 percentile.
I am trying to get my cumulative area plot to stack using the code below, which is based on http://dantalus.github.io/2015/08/16/step-plots/. I have added in position=stack, however the plot still overlaps.
The aim of what I am trying to achieve is to show the cumulative number of publications each year within a given period. So, as an example, in 1940 there may be one publication, the following year there may be 2 more, bringing the cumulative total to 3.
What would be the best way to get the areas to stack on top of each other?
How can the order be controlled? Would I need to use arrange() to order TERM2?
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(data = subset(working, TERM2=="A"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
stat_bin(data = subset(working, TERM2=="B"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack",alpha=0.1) +
stat_bin(data = subset(working, TERM2=="Both"),bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")
What I am currently getting:
Example of what I am trying to achieve:
The following chart was created in Excel using the same data which is exactly what I am looking to achieve in R.
My Data:
Example of how my data is currently structured:
Year TERM2
1944 A
1959 B
1966 A
1968 B
1968 A
1970 A
1971 B
1971 B
1971 A
1971 A
1971 Both
1971 Both
1971 Both
1972 A
1972 Both
1972 Both
1973 B
1973 A
1974 A
1974 A
'data.frame': 803 obs. of 6 variables:
$ Year : int 1944 1959 1966 1968 1968 1970 1971 1971 1971 1971 ...
$ TERM2 : Factor w/ 3 levels "B","A","Both": 2 1 2 1 2 2 1 1 2 2 ...
Changes based on user127649's suggestions
This is the plot after user127649's suggestions, which is close to what I would expect except I am looking for it to start at 0 and end at 803 (total number of publications).
ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
stat_bin(bins=80, aes(y=cumsum(..count..)), geom="area", alpha=0.1) +
ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")
I think there were two issues.
When You use stat_bin() in three separate layers, each effectively has it’s own independent data set. This will give the correct count, but (and this is a guess really) I think being in three separate layers means you can’t stack them.
If you use stat_bin() on all the layers I think stat = '..count..' performs cumsum() on the data as a whole.
I don’t know whether this is the best approach or not, but I think it’s what you’re after.
Data
The data are grouped and cumsum() is used on each group separately.
library(tidyverse)
working <- working %>%
count(Year, TERM2) %>%
spread(TERM2, n, fill = 0) %>%
mutate_at(vars('A', 'B', 'Both'), cumsum) %>%
gather(TERM2, N, -Year, factor_key = T) #%>%
# mutate(TERM2 = ordered(TERM2, levels = rev(levels(TERM2))))
Plot
This code will produce the first plot below. If you prefer the look of the second plot, you can un-comment the last line of the data manipulation chunk.
ggplot(working, aes(Year, N, fill = TERM2)) +
geom_area(position = 'stack') +
ylab("Total Number")
Result
I have the following data
Geography Population.Estimate Energy.Consump Employed Year
1 Alameda County, California 1513228 3038.53227 676598 2010
2 Alpine County, California 1163 17.14083 387 2010
3 Amador County, California 37862 140.65325 15103 2011
4 Butte County, California 219973 722.73871 90130 2011
5 Calaveras County, California 45457 198.95724 17085 2012
6 Colusa County, California 21483 63.77387 9489 2012
This is just part of the data from 58 counties.
I want to make a box plot to show x axis -Population and y axis -energy consumption for the years 2010, 2011, 2012. I tried a lot of things but it just doesnt work. Please help me with the plots. I used qplot as well as ggplot. Nothing seems to work on this data :(
I tried this
qplot(factor(Year),data=Population,geom="bar",fill=Population.Estimate,weight=Energy_Consump,position="dodge", main = "Effect of Energy", xlab="Population",ylab="Energy")
I tried this too
ggplot(Population)+ geom_bar(aes(x=Housing.Units,y=Energy.Consump, fill=factor(Year)),stat="identity")
I am struggling to get it right. I tried the other examples in stack overflow, since I am fairly new to R but nothing seems to work
Is this what you want?
ggplot(data=Population, aes(x=Population.Estimate, y=Energy.Consump, fill=as.factor(Year))) +
geom_bar(colour="black", stat="identity",
position=position_dodge()) + # Thinner lines
xlab("Population") + ylab("Energy Consumption")
Not this?
ggplot(data=Population, aes(x=Year, y=Population.Estimate, fill=Geography)) +
geom_bar(colour="black", stat="identity",
position=position_dodge()) +
xlab("Year") + ylab("Energy Consumption")
Given the large gaps in scale, if you want both the population and the energy consumption on a same graph, IMO, energy consumption per capita is better suited.
I'm sure this has been done many times, but clearly I'm not searching using the correct terms.
I have some time series data in R with columns like this:
country year deaths region global.region
1 Afghanistan 2006 0.095830775 Asia & Pacific Global South
2 Afghanistan 1994 0.127597064 Asia & Pacific Global South
3 Algeria 2000 0.003278038 Arab States Global South
4 Algeria 2001 0.003230578 Arab States Global South
5 Algeria 1998 0.006746176 Arab States Global South
6 Algeria 1999 0.019952364 Arab States Global South
...
Basically, I want to plot all the lines by country, but I want them colored (and labeled in the legend) by region. I'm hoping to look at some regional trends in the data without trying build an average model (partly because I want to see outliers, partly because a lot of the countries have missing data and I think a good regional model might be difficult for me to make at this point, at best just misleading).
So in the end I'll have, for example, separate lines for Burkina Faso, Algeria, and Cote d'Ivoire plotted, but they'll all be orange. And I'll have separate lines for Afghanistan, Pakistan, and Iran, but they'll all be blue.
It is preferable that it's done with ggplot2 since that's the plotting library I am learning at the moment. But maybe there's a standard way of doing this in R that works across all (most) plot libraries?
Edit: Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')
Which makes:
Slightly different than your desired result, but here it goes..
ggplot(df, aes(x = year, y = deaths)) +
geom_line(aes(color = country, linetype = region))
Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')
I am very new to R, and so this question is extremely elementary, but I can't solve it myself. I would very much appreciate your help.
This is a sort of dataframe I want to use:
Period Value Cut.off
1 January 1998 - August 2002 8.798129 1.64
2 September 2002 - Jun 2006 4.267268 1.64
3 Jul 2006 - Dec 2009 7.280275 1.64
This the code I am using:
require(ggplot2)
bq <- ggplot(data=glomor, aes(x=as.character(Period),y=Value))+geom_point()+ylim(0,10)
bq <- bq + scale_x_discrete(limits=c("January 1998 - August 2002","September 2002 - Jun 2006","Jul 2006 - Dec 2009"))
bq + geom_line()
I receive the following error message:
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
How do I need to change the code, so that the points will be connected by a line?
You should add group=1 in your aes() call to conect points with line. This will inform geom_line() that all your points belong to one level and they should be connected.
ggplot(data=glomor, aes(x=as.character(Period),y=Value,group=1))+
geom_point()+ylim(0,10) + geom_line()