Too many dates on X axis- ggplot2 - r

I have two questions.
I am using a dataset detailing information on Chicago (temp, ozone, season, date).
I would like a plot show the temperature across time, along with a smooth line to
show the trend, specifically with dates isolated from 1997-2000.
Thus:
library(ggplot2)
library(dplyr)
chicago <- read.csv(file = 'FilePath')
## Limit data to 1997-2000
chicago2 <- chicago %>%
filter(date >= "1997-01-01" & date <= "2000-12-31")
ggplot(chicago2, aes(x=date, y=temp)) +
geom_point() +
geom_smooth() +
labs(title="Temperature")
My issues are as follows:
There seems to be an issue with the x-axis, with the dates not cleanly represented. When I zoom into the picture, it seems like R is plotting every single date on the x-axis, but I am not sure if this is the actual issue. Scatterplot
While I am able to accurately plot a scatterplot, there is no overlayed smooth line, despite the use of the geom_smooth() function.
Looking forward to your responses,

Related

Cannot get ggplot2 to show a trendline by month

I am trying to plot trendlines for my data by month and visualize them on a faceted ggplot. R is not allowing me to show the trendline. I want to use geom_line() or geom_smooth(). I have a hypothesis about why this is happening, I am assuming it's because ggplot is viewing the Month names as categorical (factors), seeing as my code otherwise seems to match online tutorials for showing points and lines on the same faceted ggplot.
I tried using month.name, to convert my month names to recognizable months to test my hypothesis, but R does this incorrectly (May, July, and September were converted to April, February, and June?).
I also have the plots mainly displaying as I want, for example yields the following:
Toledo_month_log_cyano %>% ggplot(aes(x=Month, y = `Log GC/L`)) +
geom_point(aes(color = gene)) +
geom_smooth() +
theme_classic() +
facet_wrap(~gene, ncol = 1)
It looks mainly how I want, I just want a trend line which doesn't show. If someone could help me figure out what the problem is and solve it that would be great!

Overlapping R ggplots with two different y axis (different time period) and x axis (different scale)

I currently have two different data sets, with two columns each (date and value). Basically, these two data sets differ in that they occur in different time periods (y axis), I would say roughly twenty years apart and as for the x axis (value) the ratio is about 1:10. The aim here is that since they occur in different time periods, is there a way for me to overlay these two plots using ggplot and have the two different y axes (different time periods) placed one above and the other below i.e. to say one of the dataset is from 1994-2002 and second dataset is 2017-2020. Reason is because they both exhibit the same pattern and i would like them to be placed together to exhibit the pattern clearly.
Example of such a chart is as attached.
It's very doable. For ggplot2, you can easily add a secondary axis, with a simple transform. I'll use the EuStockMarkets data as an example.
library(ggplot2)
library(dplyr)
library(tidyr)
rm(list=ls())
data(EuStockMarkets)
StockMarkets <- EuStockMarkets[, c("DAX","FTSE")]
plot(StockMarkets, plot.type="single", col=c(2,3),
main="European Stock Markets (1991 - 1998)",
ylab="Closing price (value)")
legend("topleft", inset=0.02, legend=colnames(StockMarkets),
lwd=2, lty=1, col=c(2,3))
You'll get a nice plot like this:
The ggplot2 version is also quite simple:
StockMarkets %>%
as.data.frame() %>%
mutate(sDate=as.Date(seq(1,1860,1), origin="1991-05-10")) %>%
pivot_longer(-sDate) %>%
ggplot(aes(x=sDate, y=value, color=name)) +
geom_line()
Now suppose FTSE is 20 years ago. I will change the date manually, so you can see the results.
DAX <- tibble(Stock=as.vector(EuStockMarkets[,c("DAX")])) %>%
mutate(stDate=as.Date(seq(1,1860,1), origin="1991-05-10"), name="DAX")
FTSE <- tibble(Stock=as.vector(EuStockMarkets[,c("FTSE")])) %>%
mutate(stDate=as.Date(seq(1,1860,1), origin="1971-05-10"), name="FTSE")
Now combine them into one data frame. Imagine you start with this data. And imagine it has NASDAQ and Bitcoin instead of FTSE and DAX.
DAX_FTSE <- bind_rows(DAX, FTSE)
If you try to plot this data, you get the following, which is correct, but not what the OP wanted:
DAX_FTSE %>%
ggplot(aes(x=stDate, y=Stock, color=name)) +
geom_line()
The trick here is to add a secondary axis with a simple transform:
DAX_FTSE %>%
mutate(st2Date=if_else(name=="FTSE", stDate+20*365.25, stDate)) %>%
ggplot(aes(x=st2Date, y=Stock, color=name)) +
geom_line() + xlab(label="DAX") +
scale_x_date("DAX", sec.axis=sec_axis(~ . -20*365.25, name="FTSE"))

Stop graph touching zero in ggplot geom_freqpoly function

I am creating a frequency plot using the geom_freqpoly function in ggplot2. I have a large data set of social media comments across 14 months and am plotting the number of comments for each week of that data. I am using this code, first converting the UTC to POSIXct and the doing the frequency plot:
ggplot(data = TRP) +
geom_freqpoly(mapping = aes(x = created_utc), binwidth = 604800)
This is creating a plot that looks like this:
I want however to top and tail the plot, as it touches 'zero' at both the start and end, making it look like there was rapid growth and rapid decline. This is not the case as this is simply a snapshot of the data, which exists before and after my analysis. The data begins at the 4,000 mark and ends at around 2,000 and I want it represented like that. I have checked the 'pad' instruction and have insured it is set at FALSE.
Any help as to why this may be occurring would be greatly appreciated.
Thanks!
Rather than adjusting the geom_freqpoly to work differently than intended, it might be simpler to calculate the weekly totals yourself and use geom_line:
library(lubridate); library(dplyr)
set.seed(1)
df <- data.frame(
datetime = ymd_h(2018010101) + dhours(runif(1000, 0, 14*30*24))
)
df %>%
count(week_count = floor_date(datetime, "1 week")) %>%
ggplot(aes(week_count, n)) +
geom_line()

ggplot2: Time-series plot by continuous variable, color/fill by group

I have searched considerably for what I want to accomplish, but I haven't run across examples or plots that are specifically what I'm looking for), so I am reaching out to the community.
What I have (data downloadable here):
Time-series data (each record 2 hours apart and spanning nearly a year) with associated elevation and property ownership.
library(ggplot2)
data <- read.csv("dataex.csv")
data$timestamp <-as.POSIXct(as.character(data$timestamp),format="%m/%d/%Y %H:%M", tz="GMT")
What I want (via ggplot):
A line or bar plot showing elevation (y-axis) across time (x-axis) for each data record colored by ownership (for a line plot, filling the area under the line, or for a bar plot, filling the bar). I've tried iterations of geom_line, geom_bar, and geom_area (w geom_bar below the closest I have come). I'd like at least one of the following options to come true!
Option A - The closest I have come to achieving this (plotting per data record) is with the following code:
ggplot(data, aes(x=timestamp, y=elev, fill=OWNER)) + geom_bar(stat="identity")
However, I'd like the bars to be touching each other, but if I adjust the width in geom_bar(), everything disappears. (Also, if I run the above code on other batches of similar data, it will only show a fraction of the bars, likely because they have more data records). Seems like its just too much data to plot. So I tried another route...
Option B - Plotting by day, which turns out to be more informative, showing each day the variability in ownership.
ggplot(data, aes(x=as.Date(Date, format='%Y-%m-%d'), y=elev, fill=OWNER)) + geom_bar(stat="identity", width=1)
However, this sums the y-axis, so the elevation is not interpretable. I could divide the y-axis by 12 (the typical number of records per day) but there are occasional days with fewer than 12 records, which causes the y-axis to be incorrect. Is there a function or a way to divide the y-axis by the respective number of records per day that is being represented in the plot? Or does someone have advice for a better solution?
Something like:
library(readr)
library(dplyr)
library(ggplot2)
library(ggalt)
readr::read_csv("~/Desktop/dataex.csv") %>%
mutate(timestamp=lubridate::mdy_hm(timestamp)) %>%
select(timestamp, elev, Owner=OWNER) -> df
ggplot(df, aes(timestamp, elev, group=Owner, color=Owner)) +
geom_segment(aes(xend=timestamp, yend=0), size=0.1) +
scale_x_datetime(expand=c(0,0), date_breaks="2 months") +
scale_y_continuous(expand=c(0,0), limits=c(0,2250), label=scales::comma) +
ggthemes::scale_color_tableau() +
hrbrmisc::theme_hrbrmstr(grid="Y") +
labs(x=NULL, y="Elevation") +
theme(legend.position="bottom") +
theme(axis.title.y=element_text(angle=0, margin=margin(r=-20)))
?

plotting multiple plot in R for different calendar date

I have about 20 years of daily data in a time series. It has columns Date, rainfall and other data.
I am trying plot rainfall vs Time. I want to get 20 line plots with different colours and legend is generated that show the years in one graph. I tried the following codes but it is not giving me the desired results. Any suggestion to fix my issue would be most welcome
library(ggplot2)
library(seas)
data(mscdata)
p<-ggplot(data=mscdata,aes(x=date,y=precip,group=year,color=year))
p+geom_line()+scale_x_date(labels=date_format("%m"),breaks=date_breaks("1 months"))
It doesnt look great but here's a method. We first coerce the data into dates in the same year:
mscdata$dayofyear <- as.Date(format(mscdata$date, "%j"), format = "%j")
Then we plot:
library(ggplot2)
library(scales)
p <- ggplot(data = mscdata, aes(x = dayofyear, y = precip, group = year, color = year))
p + geom_line() +
scale_x_date(labels = date_format("%m"), breaks = date_breaks("1 months"))
While I agree with #Jaap that this may not be the best way to depict these data, try to following:
mscdata$doy <- as.numeric(strftime(mscdata$date, format="%j"))
ggplot(data=mscdata,aes(x=doy,y=precip,group=year)) +
geom_line(aes(color=year))
Although the given answers are good answers to your questions as it stands, i don't think it will solve your problem. I think you should be looking at a different way to present the data. #Jaap already suggested using facets. Take for example this approach:
#first add a month column to your dataframe
mscdata$month <- format(mscdata$date, "%m")
#then plot it using boxplot with year on the X-axis and month as facet.
p1 <- ggplot(data = mscdata, aes(x = year, y = precip, group=year))
p1 + geom_boxplot(outlier.shape = 3) + facet_wrap(~month)
This will give you a graph per month, showing the rainfall per year next to one each other. Because i use boxplot, the peaks in rainfall show up as dots ('normal' rain events are inside box).
Another possible approach would be to use stat_summary.

Resources