Create line graph with ggplot2, using time periods as x-variable - r

I am very new to R, and so this question is extremely elementary, but I can't solve it myself. I would very much appreciate your help.
This is a sort of dataframe I want to use:
Period Value Cut.off
1 January 1998 - August 2002 8.798129 1.64
2 September 2002 - Jun 2006 4.267268 1.64
3 Jul 2006 - Dec 2009 7.280275 1.64
This the code I am using:
require(ggplot2)
bq <- ggplot(data=glomor, aes(x=as.character(Period),y=Value))+geom_point()+ylim(0,10)
bq <- bq + scale_x_discrete(limits=c("January 1998 - August 2002","September 2002 - Jun 2006","Jul 2006 - Dec 2009"))
bq + geom_line()
I receive the following error message:
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
How do I need to change the code, so that the points will be connected by a line?

You should add group=1 in your aes() call to conect points with line. This will inform geom_line() that all your points belong to one level and they should be connected.
ggplot(data=glomor, aes(x=as.character(Period),y=Value,group=1))+
geom_point()+ylim(0,10) + geom_line()

Related

R ggplot geom_boxplot groups of data and LABEL quartiles (hinges,whiskers)

I am plotting some data of water metal levels where I report the 90% to the MI-DEQ. I have boxplots of each of the metals and I want to label the hinge and whisker values. I've done something similar in base R for discrete data sets. Here is my starting code:
ggplot(data = Agm, aes(x = Street, y = Level) , na.rm=TRUE) +
ggtitle("Lead Levels",subtitle=subtext )+
xlab("Streets") + ylab("ppb") +
geom_boxplot( fill="red",width = 0.8) + theme_bw()
Agm is a subsetted df with Street being chr and Level being numeric. How would I label each of the groups quantiles? Also, how would I have geom_boxplot whisker the max value, ie include outliers? If I create a df with the Street and quantile(0.9) for each street. Then how would I have geom_point plot and label over the above boxplot using the same grouping?
The data looks like this:
Agm<-read.table(header = TRUE, text = "
Street Year Month Person Level Metal
Crawford 2019 June RCR 0.13 Ag
Crawford 2019 June RCR 160 Cu
Crawford 2019 June RAR 0.92 Ag
Crawford 2019 June RAR 140 Cu
Gratiot 2019 June RL NA Ag
Gratiot 2019 June RL 24 Cu
Seneca 2019 June DS 0.33 Ag
Seneca 2019 June DS 75 Cu")
Sorry for the delay but I observe on my iPad but my R work is on my Linux box which was not readily available. The data is more expansive and will be growing.
This brings up another issue: the reporting metric is the 90 percentile. Is there a way to plot that point from the geom_boxplot call? This way each group would have the hinge and whisker values labeled as well as the 90 percentile.

How to plot a box plot in R for outlier detection for a huge number of rows?

I have a dataset which has a huge number of rows. I want to plot the boxplot of a single feature, but the simple boxplot() command in R gives me an error.
I am working on a dataset with more than 200,000 rows. The head looks like this:
year
month
day
n_impacted
2013 Jan Tue 4
2013 Jan Mon 4
2013 Jan Sat 5
2013 Jan Wed 4
2013 Jan Fri 4
2013 Jan Sat 5
boxplot(na_omit_noguns$n_impacted)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) : need finite 'ylim' values
I should be able to plot the box plot with the outliers showing up.
The issue occurred due to Inf or -Inf values. It can be corrected by removing those elements by subsetting only the finite values (with is.finite)
i1 <- is.finite(na_omit_noguns$n_impacted)
boxplot(na_omit_noguns$n_impacted[i1])

Shifted bars in ggplot

I created a bar graph in ggplot to show how counts in column scheme changed over time (i.e. from 2001 to 2016).
The x-axis is the year, the y-axis shows the frequencies (I used the fill=) to get the counts.
The data set consists of two columns (year and scheme) filled with character values:
year scheme
2016 yes
2016 yes
2016 yes
2016 yes
2015 yes
2015 yes
2014 yes
2013 yes
....
2006 no
2006 no
2006 no
2006 no
2005 no
2005 no
2004 no
2003 no
2002 no
2002 no
2001 no
2001 no
My code:
a <- ggplot(s) +
stat_bin(aes(x=year, fill=scheme, group=scheme), geom="bar", position = "dodge",bins=30)
b <- a + scale_x_continuous(breaks = c(2001:2016), labels = factor(2001:2016))
c <- b + theme(axis.text.x=element_text(size = 10, colour = "black"))
The graph:
The problem I have is that the bars are shifted in the graph for no reason. You can recognize it by looking at the x-axis and the year label. The bars are moved too much to the left (e.g.2007) or to the right (2002).
I have no clue why it happened and how can I fix it? Any type of suggestions is very much welcome.
Use binwidth = 1 instead of bins = 30. When you specify there should be 30 bins, you're asking for the years to be broken into the segments whose endpoints are sequential values in seq(2001, 2016, length.out = 30).
All the weird gaps are from the bins which didn't include a whole number.

Create an Index Chart in R - relative starting point

I need to look at relative change in 2 groups of data which have very different scales.
I would therefore think that by setting my first value to 100% and then creating a proportion to that value per group is the way forward. I can then create a line chart to show the relative movement.
I would call this an index chart so may have missed existing questions.
However I don't know how to set my data up in R to do this.
My aggregated data below. I want each of 1999 to be 100% and the subsequent years to be % of that.
> Totals
year fips Emissions
1 1999 06037 6109.6900
2 2002 06037 7188.6802
3 2005 06037 7304.1149
4 2008 06037 6421.0170
5 1999 24510 403.7700
6 2002 24510 192.0078
7 2005 24510 185.4144
8 2008 24510 138.2402
I'm probably going to want to add a bar chart behind it to show weighting too as relative change is much more dramatic for smaller data. Tips on that are appreciated too but I've not searched for that yet as the above is the primary issue IMO.
Appreciate your help.
James
For example with dplyr:
library(dplyr)
dat <-
df1 %>%
group_by(fips) %>%
mutate(ind = Emissions / first(Emissions))
And using ggplot2 to plot a line chart:
library(ggplot2)
ggplot(dat, aes(x = year, y = ind, color = as.factor(fips))) +
geom_line()

Trying to plot temperature and count data on same plot using xyplot?

I am using the xyplot in lattice trying to make a plot that shows temperature change over time in correlation with count data. I am not sure if ggplot2 would be better? My data is arrange like this:
Year (1998 1998 1999 2000 2001 2001 2002)
Low (2.777778 8.333330 10.555556 4.444444 26.388889 15.555556 12.500000)
Geese (2 14 10 16 7 10 15)
State (Arkansas California California California California Florida California)
I am stuck at this part of the code:
xyplot(c(geese,low)~year,subset=state=="California", par.settings=bwtheme, auto.key=TRUE)
The plot has the geese and low (temperature) as the same type of point and if I add a line there is no separation between the two. Please any help for this would be awesome.
To plot multiple series on the same plot, use + rather than c() to specify multiple y values. For example
xyplot(geese + low ~year, subset=state=="California", auto.key=TRUE, type="b")
That will produce

Resources