Time series multiple plot for different group in R - r

I have a large data frame of several variables (around 50) with first column as date and second column id.
My data roughly look like this:
df <- data.frame(date = c("01-04-2001 00:00","01-04-2001 00:00","01-04-2001 00:00",
"01-05-2001 00:00","01-05-2001 00:00","01-05-2001 00:00",
"01-06-2001 00:00","01-06-2001 00:00","01-06-2001 00:00",
"01-07-2001 00:00","01-07-2001 00:00","01-07-2001 00:00"),
id = c(1,2,3,1,2,3,1,2,3,1,2,3), a = c(1,2,3,4,5,6,7,8,9,10,11,12),
b = c(2,2.5,3,3.2,4,4.6,5,5.6,8,8.9,10,10.6))
I want time series plots for all three ids separately in same graph of variables, a and b in different graphs.
I tried ggplot but it didn't work. Please help me

Do you mean something like this?
library(reshape)
library(lattice)
df2 <- melt(df, id.vars = c("date", "id"), measure.vars = c("a", "b"))
xyplot(value ~ date | variable, group = id, df2, t='l')
Addendum
# The following is from a comment by jbaums.
# It will create a single plot/file for each variable of df2
png('plots%02d.png')
xyplot(value ~ date | variable, group = id, df2, t='l', layout=c(1, 1),
scales=list(alternating=FALSE, tck=1:0))
dev.off()
You can also add relation='free' to scales so that y-axis limits are calculated separately for each plot.

Edit: After reading the comments, maybe you should try something like this:
library(tidyr)
df2 <- gather(df, variable, value, -date, -id)
vars <- unique(df2$variable)
library(ggplot2)
for (i in 1:length(vars)) {
ggplot() +
geom_line(data = subset(df2, variable == vars[[i]]),
aes(date, value, group = id, color = factor(id))) +
ylab(as.character(vars[[i]])) +
ggsave(file = paste0(vars[[i]], ".png"))
}
This should save a PNG for each variable in your dataframe (and will change y label of every plot to variable name, as per your request)

Here's how to do it in ggplot, using the tidyr package to get it in the right format:
library(ggplot2)
library(tidyr)
library(dplyr)
df <- data.frame(date = c("01-04-2001 00:00","01-04-2001 00:00","01-04-2001 00:00",
"01-05-2001 00:00","01-05-2001 00:00","01-05-2001 00:00",
"01-06-2001 00:00","01-06-2001 00:00","01-06-2001 00:00",
"01-07-2001 00:00","01-07-2001 00:00","01-07-2001 00:00"),
id = c(1,2,3,1,2,3,1,2,3,1,2,3), a = c(1,2,3,4,5,6,7,8,9,10,11,12),
b = c(2,2.5,3,3.2,4,4.6,5,5.6,8,8.9,10,10.6))
Then using dplyr's group_by and do functions, we can save multiple plots.
df %>%
gather(variable, value, -date, -id) %>%
mutate(id = factor(id)) %>%
group_by(variable) %>%
do(
qplot(data = ., x = date, y = value, geom = "line", group = id, color = id, main = paste("variable =", .$variable)) +
ggsave(filename = paste0(.$variable, ".png")
)
)

Related

r ggplot barplot with multiple date columns

I have a data frame with multiple date columns and I want to make a single plot with 3 bar charts (one for ID/dat1, ID/dat2 and ID/dat3). Anyone know how to do this?
EDIT: I'm looking for a plot with the date on the x-axis and count of ID on the y-axis.
Example data frame:
dat <- data.frame(ID = c(1:80),
dat1 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80),
dat2 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80),
dat3 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80))
Are you after this?
melt(setDT(dat), id.vars = "ID") %>%
ggplot(aes(x = value, fill = variable)) +
geom_bar()
If you want to have line plot, you can try
melt(setDT(dat),id.vars = "ID") %>%
ggplot(aes(x = value, y = ID, group = variable, color = variable)) +
geom_line()

Spaghetti plot using ggplot in R?

I would like to produce a speghatii plot where i need to see days of the year on the x-axis and data on the y-axis for each Year. I would then want a separate year that had data for only 3 months (PCPNewData) to be plotted on the same figure but different color and bold line. Here is my sample code which produce a graph (attached) where the data for each Year for a particular Day is stacked- i don't want bar graph. I would like to have a line graph. Thanks
library(tidyverse)
library(tidyr)
myDates=as.data.frame(seq(as.Date("2000-01-01"), to=as.Date("2010-12-31"),by="days"))
colnames(myDates) = "Date"
Dates = myDates %>% separate(Date, sep = "-", into = c("Year", "Month", "Day"))
LatestDate=as.data.frame(seq(as.Date("2011-01-01"), to=as.Date("2011-03-31"),by="days"))
colnames(LatestDate) = "Date"
NewDate = LatestDate %>% separate(Date, sep = "-", into = c("Year", "Month", "Day"))
PCPDataHis = data.frame(total_precip = runif(4018, 0,70), Dates)
PCPNewData = data.frame(total_precip = runif(90, 0,70), NewDate)
PCPDataHisPlot =PCPDataHis %>% group_by(Year) %>% gather(key = "Variable", value = "Value", -Year, -Day,-Month)
ggplot(PCPDataHisPlot, aes(Day, Value, colour = Year))+
geom_line()+
geom_line(data = PCPNewData, aes(Day, total_precip))
I would like to have a Figure like below where each line represent data for a particular year
UPDATE:
I draw my desired figure with hand (see attached). I would like to have all the days of the Years on x-axis with its data on the y-axis
You have few errors in your code.
First, your days are in character format. You need to pass them in a numerical format to get line being continuous.
Then, you have multiple data for each days (because you have 12 months per year), so you need to summarise a little bit these data:
Pel2 <- Pelly2Data %>% group_by(year,day) %>% summarise(Value = mean(Value, na.rm = TRUE))
Pel3 <- Pelly2_2011_3months %>% group_by(year, day) %>% summarise(total_precip = mean(total_precip, na.rm = TRUE))
ggplot(Pel2, aes(as.numeric(day), Value, color = year))+
geom_line()+
geom_line(data = Pelly2_2011_3months, aes(as.numeric(day), y= total_precip),size = 2)
It looks better but it is hard to apply a specific color pattern
To my opinion, it will be less confused if you can compare mean of each dataset, such as:
library(tidyverse)
Pel2 <- Pelly2Data %>% group_by(day) %>%
summarise(Mean = mean(Value, na.rm = TRUE),
SEM = sd(Value,na.rm = TRUE)/sqrt(n())) %>%
mutate(Name = "Pel_ALL")
Pel3 <- Pelly2_2011_3months %>% group_by(day) %>%
summarise(Mean = mean(total_precip, na.rm = TRUE),
SEM = sd(total_precip, na.rm = TRUE)/sqrt(n())) %>%
mutate(Name = "Pel3")
Pel <- bind_rows(Pel2,Pel3)
ggplot(Pel, aes(x = as.numeric(day), y = Mean, color = Name))+
geom_ribbon(aes(ymin = Mean-SEM, ymax = Mean+SEM), alpha = 0.2)+
geom_line(size = 2)
EDIT: New graph based on update
To get the graph you post as a drawing, you need to have the day of the year and not the day of the month. We can get this information by setting a date sequence and extract the day of the year by using yday function from `lubridate package.
library(tidyverse)
library(lubridate)
Pelly2$Date = seq(ymd("1990-01-01"),ymd("2010-12-31"), by = "day")
Pelly2$Year_day <- yday(Pelly2$Date)
Pelly2_2011_3months$Date <- seq(ymd("2011-01-01"), ymd("2011-03-31"), by = "day")
Pelly2_2011_3months$Year_day <- yday(Pelly2_2011_3months$Date)
Pelly2$Dataset = "ALL"
Pelly2_2011_3months$Dataset = "2011_Dataset"
Pel <- bind_rows(Pelly2, Pelly2_2011_3months)
Then, you can combine both dataset and represent them with different colors, size, transparency (alpha) as show here:
ggplot(Pel, aes(x = Year_day, y = total_precip, color = year, size = Dataset, alpha = Dataset))+
geom_line()+
scale_size_manual(values = c(2,0.5))+
scale_alpha_manual(values = c(1,0.5))
Does it answer your question ?

geom_area plot with min and max values

I have been trying to plot min and max values of temperature. I actually wanted to plot using geom_area. My data can be downloaded from here.
library(dplyr)
library(ggplot2)
dat <- read.csv("energydata_complete.csv", stringsAsFactors = FALSE)
#renaming attributes meaningfully
#names(dat)[] <- 'temp_kitchen'
dat <- dat %>%
dplyr::rename('temp_kitchen'=T1,'temp_living'=T2,'temp_laundry'=T3,
'temp_office'=T4,'temp_bath'=T5,'temp_build'=T6,'temp_iron'=T7,
'temp_teen'=T8,'temp_parent'=T9,'hum_kitchen'=RH_1,'hum_living'=RH_2,
'hum_laundry'=RH_3,'hum_office'=RH_4,'hum_bath'=RH_5,'hum_build'=RH_6,
'hum_iron'=RH_7,'hum_teen'=RH_8,'hum_parent'=RH_9)
dat$month <- as.factor(months(dat$date))
dat$date <- strptime(dat$date, format = "%Y-%m-%d %H:%M:%S")
dat$date <- as.POSIXct(dat$date, format = "%Y-%m-%d %H:%M:%S")
I have created another dataframe with month and min and max temperature values of each room.
temparature <- dat %>% group_by(month) %>% dplyr::summarise(min_temp_kitch=min(temp_kitchen),
max_temp_kitch=max(temp_kitchen),
min_temp_living=min(temp_living),
max_temp_living=max(temp_living),
min_temp_laundry=min(temp_laundry),
max_temp_laundry=max(temp_laundry),
min_temp_iron=min(temp_iron),
max_temp_iron=max(temp_iron),
min_temp_office=min(temp_office),
max_temp_office=max(temp_office),
min_temp_bath=min(temp_bath),
max_temp_bath=max(temp_bath),
min_temp_parent=min(temp_parent),
max_temp_parent=max(temp_parent),
min_temp_teen=min(temp_teen),
max_temp_teen=max(temp_teen))
Now I am trying to plot min and max temperature values from this dataframe for each room.
Below code didn't give any plot.
ggplot() + geom_area(data = temparature,aes(x=month,y=min_temp_kitch), position = 'stack') +
geom_area(data = temparature,aes(x=month, y=max_temp_kitch), position = 'stack')
Tried to create with geom_ribbon as below.
ggplot(temparature) +
geom_ribbon(aes(x=month, ymin = min_temp_kitch, ymax = max_temp_kitch), color='blue', alpha = 0.5)
This has given
But I want a plot something similar to this with points for each value.
Can someone suggest how to do this please.
You don't need to change your dates to factor and need to make the temperature dataframe into long format :
library(dplyr)
library(ggplot2)
library(lubridate)
dat <- read.csv("energydata_complete.csv", stringsAsFactors = FALSE)
dat <- dat %>%
rename('temp_kitchen'=T1,'temp_living'=T2,'temp_laundry'=T3,
'temp_office'=T4,'temp_bath'=T5,'temp_build'=T6,'temp_iron'=T7,
'temp_teen'=T8,'temp_parent'=T9,'hum_kitchen'=RH_1,'hum_living'=RH_2,
'hum_laundry'=RH_3,'hum_office'=RH_4,'hum_bath'=RH_5,'hum_build'=RH_6,
'hum_iron'=RH_7,'hum_teen'=RH_8,'hum_parent'=RH_9) %>%
mutate(month = floor_date(date(date), unit = 'months'))
temparature <- dat %>%
group_by(month) %>%
summarise(min_temp_kitch=min(temp_kitchen),
max_temp_kitch=max(temp_kitchen),
min_temp_living=min(temp_living),
max_temp_living=max(temp_living),
min_temp_laundry=min(temp_laundry),
max_temp_laundry=max(temp_laundry),
min_temp_iron=min(temp_iron),
max_temp_iron=max(temp_iron),
min_temp_office=min(temp_office),
max_temp_office=max(temp_office),
min_temp_bath=min(temp_bath),
max_temp_bath=max(temp_bath),
min_temp_parent=min(temp_parent),
max_temp_parent=max(temp_parent),
min_temp_teen=min(temp_teen),
max_temp_teen=max(temp_teen))
temp2 <- temparature %>%
tidyr::gather(temp_min_max, Temp, -month)
ggplot() +
geom_area(data = temp2 %>%
filter(temp_min_max %in% c('min_temp_kitch', 'max_temp_kitch')),
aes(x=month,y=Temp,fill = temp_min_max, color = temp_min_max),
position = 'identity')

ggplot using grouped date variables (such as year_month)

I feel like this should be an easy task for ggplot, tidyverse, lubridate, but I cannot seem to find an elegant solution.
GOAL: Create a bar graph of my data aggregated/summarized/grouped_by year and month.
#Libraries
library(tidyverse)
library(lubridate)
# Data
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by="day"), 10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value)
# Summarise
df2 <- df %>%
mutate(year = year(date), month = month(date)) %>%
unite(year_month,year,month) %>%
group_by(year_month) %>%
summarise(avg = mean(value),
cnt = n())
# Plot
ggplot(df2) +
geom_bar(aes(x=year_month, y = avg), stat = 'identity')
When I create the year_month variable, it naturally becomes a character variable instead of a date variable. I have also tried grouping by year(date), month(date) but then I can't figure out how to use two variables as the x-axis in ggplot. Perhaps this could be solved by flooring the dates to the first day of the month...?
You were really close. The missing pieces are floor_date() and scale_x_date():
library(tidyverse)
library(lubridate)
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by = "day"),
10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value) %>%
group_by(month = floor_date(date, unit = "month")) %>%
summarize(avg = mean(value))
ggplot(df, aes(x = month, y = avg)) +
geom_bar(stat = "identity") +
scale_x_date(NULL, date_labels = "%b %y", breaks = "month")

Plot subscribers using start and end dates in R

I wish to plot the frequency of subscribers over time using start and end date.
I have a method that creates a row for each day per subscriber, then calculates the frequency per day, then plots the frequency by day.
This works fine for small data but does not scale to large subscriber numbers because the rows per customer step is too big.
Is there an efficient method? Many thanks for any help.
library(ggplot2)
library(dplyr)
# create dummy dataset
subscribers <- data.frame(id = seq(1:10),
start = sample(seq(as.Date('2016/01/01'), as.Date('2016/06/01'), by="day"), 10),
end = sample(seq(as.Date('2017/01/01'), as.Date('2017/06/01'), by="day"), 10))
# creates a row for each day per user - OK for small datasets, but not scalable
date_map <- Map(seq, subscribers$start, subscribers$end, by = "day")
date_rows <- data.frame(
org = rep.int(subscribers$id, vapply(date_map, length, 1L)),
date = do.call(c, date_map))
# finds the frequency of users for each day
date_rows %>%
group_by(date) %>%
dplyr::summarise(users = n()) -> plot_data
ggplot(data = plot_data,
aes(x = date, y = users)) +
geom_line(size = 1.2,alpha = .6)
How's this?
library(tidyverse)
df <- subscribers %>%
gather(key, value, start, end) %>%
mutate(key = ifelse(key == "start",1,-1)) %>%
arrange(value)
df$cum <- cumsum(df$key)
ggplot(data = df,
aes(x = value, y = cum)) +
geom_step()

Resources