I would like to produce a speghatii plot where i need to see days of the year on the x-axis and data on the y-axis for each Year. I would then want a separate year that had data for only 3 months (PCPNewData) to be plotted on the same figure but different color and bold line. Here is my sample code which produce a graph (attached) where the data for each Year for a particular Day is stacked- i don't want bar graph. I would like to have a line graph. Thanks
library(tidyverse)
library(tidyr)
myDates=as.data.frame(seq(as.Date("2000-01-01"), to=as.Date("2010-12-31"),by="days"))
colnames(myDates) = "Date"
Dates = myDates %>% separate(Date, sep = "-", into = c("Year", "Month", "Day"))
LatestDate=as.data.frame(seq(as.Date("2011-01-01"), to=as.Date("2011-03-31"),by="days"))
colnames(LatestDate) = "Date"
NewDate = LatestDate %>% separate(Date, sep = "-", into = c("Year", "Month", "Day"))
PCPDataHis = data.frame(total_precip = runif(4018, 0,70), Dates)
PCPNewData = data.frame(total_precip = runif(90, 0,70), NewDate)
PCPDataHisPlot =PCPDataHis %>% group_by(Year) %>% gather(key = "Variable", value = "Value", -Year, -Day,-Month)
ggplot(PCPDataHisPlot, aes(Day, Value, colour = Year))+
geom_line()+
geom_line(data = PCPNewData, aes(Day, total_precip))
I would like to have a Figure like below where each line represent data for a particular year
UPDATE:
I draw my desired figure with hand (see attached). I would like to have all the days of the Years on x-axis with its data on the y-axis
You have few errors in your code.
First, your days are in character format. You need to pass them in a numerical format to get line being continuous.
Then, you have multiple data for each days (because you have 12 months per year), so you need to summarise a little bit these data:
Pel2 <- Pelly2Data %>% group_by(year,day) %>% summarise(Value = mean(Value, na.rm = TRUE))
Pel3 <- Pelly2_2011_3months %>% group_by(year, day) %>% summarise(total_precip = mean(total_precip, na.rm = TRUE))
ggplot(Pel2, aes(as.numeric(day), Value, color = year))+
geom_line()+
geom_line(data = Pelly2_2011_3months, aes(as.numeric(day), y= total_precip),size = 2)
It looks better but it is hard to apply a specific color pattern
To my opinion, it will be less confused if you can compare mean of each dataset, such as:
library(tidyverse)
Pel2 <- Pelly2Data %>% group_by(day) %>%
summarise(Mean = mean(Value, na.rm = TRUE),
SEM = sd(Value,na.rm = TRUE)/sqrt(n())) %>%
mutate(Name = "Pel_ALL")
Pel3 <- Pelly2_2011_3months %>% group_by(day) %>%
summarise(Mean = mean(total_precip, na.rm = TRUE),
SEM = sd(total_precip, na.rm = TRUE)/sqrt(n())) %>%
mutate(Name = "Pel3")
Pel <- bind_rows(Pel2,Pel3)
ggplot(Pel, aes(x = as.numeric(day), y = Mean, color = Name))+
geom_ribbon(aes(ymin = Mean-SEM, ymax = Mean+SEM), alpha = 0.2)+
geom_line(size = 2)
EDIT: New graph based on update
To get the graph you post as a drawing, you need to have the day of the year and not the day of the month. We can get this information by setting a date sequence and extract the day of the year by using yday function from `lubridate package.
library(tidyverse)
library(lubridate)
Pelly2$Date = seq(ymd("1990-01-01"),ymd("2010-12-31"), by = "day")
Pelly2$Year_day <- yday(Pelly2$Date)
Pelly2_2011_3months$Date <- seq(ymd("2011-01-01"), ymd("2011-03-31"), by = "day")
Pelly2_2011_3months$Year_day <- yday(Pelly2_2011_3months$Date)
Pelly2$Dataset = "ALL"
Pelly2_2011_3months$Dataset = "2011_Dataset"
Pel <- bind_rows(Pelly2, Pelly2_2011_3months)
Then, you can combine both dataset and represent them with different colors, size, transparency (alpha) as show here:
ggplot(Pel, aes(x = Year_day, y = total_precip, color = year, size = Dataset, alpha = Dataset))+
geom_line()+
scale_size_manual(values = c(2,0.5))+
scale_alpha_manual(values = c(1,0.5))
Does it answer your question ?
Related
I want to plot 3D temperature distribution to demonstrate the trend of temperature according to years and months in the same graph. The x and y axes denote the month and year. The z-axis shows the hourly temperature. How to show multiple distribution to show the trend of temperature.
This is the temperature data.
https://1drv.ms/x/s!AndXEcE6b4oxeaSk0sBJrcJuJ0c?e=6DvrFG
If you want a rotatable 3D surface plot, then plotly is your best bet. You will need to first get the monthly average of temperatures and create a matrix from them.
One option for doing this is using the tidyverse to pivot to long format, summarize, then pivot to wide format. Here's how to do that, assuming your data frame as loaded from the csv is called temp
library(tidyverse)
library(plotly)
temp %>%
pivot_longer(starts_with('Hour'),
names_to = 'Hour',
values_to = 'Temperature') %>%
group_by(Year, Month) %>%
summarise(Temperature = mean(Temperature, na.rm = TRUE)) %>%
pivot_wider(names_from = Month, values_from = Temperature) %>%
ungroup() %>%
select(-1) %>%
as.matrix() %>%
plot_ly(x = month.name, y = unique(temp$Year), z = .) %>%
add_surface()
A nice alternative 2D way to show this kind of data would be with a heatmap:
temp %>%
pivot_longer(starts_with('Hour'),
names_to = 'Hour',
values_to = 'Temperature') %>%
group_by(Year, Month) %>%
summarise(Temperature = mean(Temperature, na.rm = TRUE)) %>%
ggplot(aes(Year, Month, fill = Temperature)) +
geom_tile() +
scale_fill_viridis_c(option = 1) +
scale_y_continuous(breaks = 1:12,
labels = month.name) +
coord_equal()
I am trying to compare different years' variables but I am having trouble plotting them together.
The time series is a temperature series which can be found in https://github.com/gonzalodqa/timeseries as temp.csv
I would like to plot something like the image but I find it difficult to subset the months between the years and then combine the lines in the same plot under the same months
If someone can give some advice or point me in the right direction I would really appreciate it
You can try this way.
The first chart shows all the available temperatures, the second chart is aggregated by month.
In the first chart, we force the same year so that ggplot will plot them aligned, but we separate the lines by colour.
For the second one, we just use month as x variable and year as colour variable.
Note that:
with scale_x_datetime we can hide the year so that no one can see that we forced the year 2020 to every observation
with scale_x_continous we can show the name of the months instead of the numbers
[just try to run the charts with and without scale_x_... to understand what I'm talking about]
month.abb is a useful default variable for months names.
# read data
df <- readr::read_csv2("https://raw.githubusercontent.com/gonzalodqa/timeseries/main/temp.csv")
# libraries
library(ggplot2)
library(dplyr)
# line chart by datetime
df %>%
# make datetime: force unique year
mutate(datetime = lubridate::make_datetime(2020, month, day, hour, minute, second)) %>%
ggplot() +
geom_line(aes(x = datetime, y = T42, colour = factor(year))) +
scale_x_datetime(breaks = lubridate::make_datetime(2020,1:12), labels = month.abb) +
labs(title = "Temperature by Datetime", colour = "Year")
# line chart by month
df %>%
# average by year-month
group_by(year, month) %>%
summarise(T42 = mean(T42, na.rm = TRUE), .groups = "drop") %>%
ggplot() +
geom_line(aes(x = month, y = T42, colour = factor(year))) +
scale_x_continuous(breaks = 1:12, labels = month.abb, minor_breaks = NULL) +
labs(title = "Average Temperature by Month", colour = "Year")
In case you want your chart to start from July, you can use this code instead:
months_order <- c(7:12,1:6)
# line chart by month
df %>%
# average by year-month
group_by(year, month) %>%
summarise(T42 = mean(T42, na.rm = TRUE), .groups = "drop") %>%
# create new groups starting from each July
group_by(neworder = cumsum(month == 7)) %>%
# keep only complete years
filter(n() == 12) %>%
# give new names to groups
mutate(years = paste(unique(year), collapse = " / ")) %>%
ungroup() %>%
# reorder months
mutate(month = factor(month, levels = months_order, labels = month.abb[months_order], ordered = TRUE)) %>%
# plot
ggplot() +
geom_line(aes(x = month, y = T42, colour = years, group = years)) +
labs(title = "Average Temperature by Month", colour = "Year")
EDIT
To have something similar to the first plot but starting from July, you could use the following code:
# libraries
library(ggplot2)
library(dplyr)
library(lubridate)
# custom months order
months_order <- c(7:12,1:6)
# fake dates for plot
# note: choose 4 to include 29 Feb which exist only in leap years
dates <- make_datetime(c(rep(3,6), rep(4,6)), months_order)
# line chart by datetime
df %>%
# create date time
mutate(datetime = make_datetime(year, month, day, hour, minute, second)) %>%
# filter years of interest
filter(datetime >= make_datetime(2018,7), datetime < make_datetime(2020,7)) %>%
# create increasing group after each july
group_by(year, month) %>%
mutate(dummy = month(datetime) == 7 & datetime == min(datetime)) %>%
ungroup() %>%
mutate(dummy = cumsum(dummy)) %>%
# force unique years and create custom name
group_by(dummy) %>%
mutate(datetime = datetime - years(year - 4) - years(month>=7),
years = paste(unique(year), collapse = " / ")) %>%
ungroup() %>%
# plot
ggplot() +
geom_line(aes(x = datetime, y = T42, colour = years)) +
scale_x_datetime(breaks = dates, labels = month.abb[months_order]) +
labs(title = "Temperature by Datetime", colour = "Year")
To order month differently and sum up the values in couples of years, you've to work a bit with your data before plotting them:
library(dplyr) # work data
library(ggplot2) # plots
library(lubridate) # date
library(readr) # fetch data
# your data
df <- read_csv2("https://raw.githubusercontent.com/gonzalodqa/timeseries/main/temp.csv")
df %>%
mutate(date = make_date(year, month,day)) %>%
# reorder month
group_by(month_2 = factor(as.character(month(date, label = T, locale = Sys.setlocale("LC_TIME", "English"))),
levels = c('Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar','Apr','May','Jun')),
# group years as you like
year_2 = ifelse( year(date) %in% (2018:2019), '2018/2019', '2020/2021')) %>%
# you can put whatever aggregation function you need
summarise(val = mean(T42, na.rm = T)) %>%
# plot it!
ggplot(aes(x = month_2, y = val, color = year_2, group = year_2)) +
geom_line() +
ylab('T42') +
xlab('month') +
theme_light()
A slightly different solution without the all dates to 2020 trick.
library(tidyverse)
library(lubridate)
df <- read_csv2("https://raw.githubusercontent.com/gonzalodqa/timeseries/main/temp.csv")
df <- df |>
filter(year %in% c(2018, 2019, 2020)) %>%
mutate(year = factor(year),
month = ifelse(month<10, paste0(0,month), month),
day = paste0(0, day),
month_day = paste0(month, "-", day))
df |> ggplot(aes(x=month_day, y=T42, group=year, col=year)) +
geom_line() +
scale_x_discrete(breaks = c("01-01", "02-01", "03-01", "04-01", "05-01", "06-01", "07-01", "08-01", "09-01", "10-01", "11-01", "12-01"))
I have the following data set:
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
With that, I create the following plot:
Data %>% mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot()+ theme_classic() +
geom_line(aes(x = date, y = prop, color = treated)) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
Unfortunately the plot is pretty 'jumpy' and I would like to smooth it. I tried geom_smooth() but can't get it to work. Other questions regarding smoothing didn't help me because they missed the grouping aspect and therefore had a different structure. However, the example data set is in reality part of a larger data set so I need to stick to that code.
[Edit: the geom_smooth() code I tried is geom_smooth(method = 'auto', formula = y ~ x)]
Can someone point me into the right direction?
Many thanks and all the best.
Is this what you want by a smoothed line? You call geom_smooth with aesthetics, not in combination with geom_line. You can choose different smoothing methods, though the default loess with low observations is usually what people want. As an aside, I don't think this is necessarily nicer to look at than the geom_line version, and in fact is slightly less readable. geom_smooth is best used when there are many y observations for every x which makes patterns hard to see, geom_line is good for 1-1.
EDIT: After looking at what you're doing more closely, I added a second plot that doesn't directly calculate the treatment-date means and just uses geom_smooth directly. That lets you get a more reasonable confidence interval instead of having to remove it as before.
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
library(tidyverse)
Data %>%
mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = prop, color = treated), se = F) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Data %>%
mutate(treated = factor(group)) %>%
mutate(y = ifelse(y == "0", 0, 1)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = y, color = treated), method = "loess") +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
Created on 2018-03-27 by the reprex package (v0.2.0).
I have a dataframe that I would like to plot, generated by the following code.
df_rn1 = as.data.frame(cbind(rnorm(40, 1, 1), rep("rn1", 40)))
df_rn2 = as.data.frame(cbind(rnorm(40, 10, 1), rep("rn2", 40)))
df_rn3 = as.data.frame(cbind(rnorm(40, 100, 1), rep("rn3", 40)))
df_test = rbind(df_rn1, df_rn2, df_rn3)
colnames(df_test) <- c("value", "type")
I would like to plot the dataframe normalized by the respective first observation s.t. they are scaled properly. However, I am not getting further than this:
ggplot(aes(x = rep(1:40, 3), y=as.numeric(as.character(value)), color = type), data = df_test) +
geom_line()
Is it possible to do the normalization by types directly in the ggplot code?
Thx
How about this?
library(tidyverse);
df_test %>%
group_by(type) %>%
mutate(
value = as.numeric(as.character(value)),
value.scaled = (value - mean(value)) / sd(value),
idx = 1:n()) %>%
ggplot(aes(idx, value.scaled, colour = type)) + geom_line()
Note that values are scaled within type; not sure what you're after, for global scaling, see #ManishSaraswat's answer.
You can use scale function to normalize the values.
df_test %>%
mutate(value = scale(value)) %>%
ggplot(aes(x = rep(1:40, 3), y = value, color=type))+
geom_line()
I feel like this should be an easy task for ggplot, tidyverse, lubridate, but I cannot seem to find an elegant solution.
GOAL: Create a bar graph of my data aggregated/summarized/grouped_by year and month.
#Libraries
library(tidyverse)
library(lubridate)
# Data
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by="day"), 10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value)
# Summarise
df2 <- df %>%
mutate(year = year(date), month = month(date)) %>%
unite(year_month,year,month) %>%
group_by(year_month) %>%
summarise(avg = mean(value),
cnt = n())
# Plot
ggplot(df2) +
geom_bar(aes(x=year_month, y = avg), stat = 'identity')
When I create the year_month variable, it naturally becomes a character variable instead of a date variable. I have also tried grouping by year(date), month(date) but then I can't figure out how to use two variables as the x-axis in ggplot. Perhaps this could be solved by flooring the dates to the first day of the month...?
You were really close. The missing pieces are floor_date() and scale_x_date():
library(tidyverse)
library(lubridate)
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by = "day"),
10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value) %>%
group_by(month = floor_date(date, unit = "month")) %>%
summarize(avg = mean(value))
ggplot(df, aes(x = month, y = avg)) +
geom_bar(stat = "identity") +
scale_x_date(NULL, date_labels = "%b %y", breaks = "month")