r ggplot barplot with multiple date columns - r

I have a data frame with multiple date columns and I want to make a single plot with 3 bar charts (one for ID/dat1, ID/dat2 and ID/dat3). Anyone know how to do this?
EDIT: I'm looking for a plot with the date on the x-axis and count of ID on the y-axis.
Example data frame:
dat <- data.frame(ID = c(1:80),
dat1 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80),
dat2 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80),
dat3 = sample(seq(as.Date('2021/01/01'), as.Date('2021/04/01'), by="day"), 80))

Are you after this?
melt(setDT(dat), id.vars = "ID") %>%
ggplot(aes(x = value, fill = variable)) +
geom_bar()
If you want to have line plot, you can try
melt(setDT(dat),id.vars = "ID") %>%
ggplot(aes(x = value, y = ID, group = variable, color = variable)) +
geom_line()

Related

R Plotting three timeseries in two facet_grids in ggplot

Is it possible to plot three timeseries in only two grids using ggplot and facet_grid()?
# Create some fake data
stock1 = cumprod(1+c(0, rnorm(99, 0, .05)))
stock2 = cumprod(1+c(0, rnorm(99, 0, .075)))
indicator = sample(1:50, 100, replace = TRUE)
date_seq = seq.Date(as.Date("2023-01-01"), length.out = 100, by = 1)
df = data.frame(date = date_seq, stock1 = stock1, stock2 = stock2, indicator = indicator)
Now I would like to see an upper graph with the two stocks and one lower graph with the indicator using facet_grid().
The only result I get is a three-grid plot
grid_df = pivot_longer(df, c(stock1, stock2, indicator), names_to = "underlying", values_to = "values")
ggplot(grid_df, aes(x = date, y = values, colour = underlying)) +
geom_line() +
facet_grid(vars(underlying), scales = "free")
I dont know how to group the two stocks to bring them into one grid.
Thanks for help!
You could add an extra column to your longer format data where you could combine the stocks 1 and 2 to one string called stocks and leave the indicator alone using an ifelse to assign them to the facet_grid like this:
library(ggplot2)
library(dplyr)
library(tidyr)
grid_df = pivot_longer(df, c(stock1, stock2, indicator), names_to = "underlying", values_to = "values") %>%
mutate(grids = ifelse(underlying == "indicator", "indicator", "stock"))
ggplot(grid_df, aes(x = date, y = values, colour = underlying)) +
geom_line() +
facet_grid(vars(grids), scales = "free")
Created on 2023-02-19 with reprex v2.0.2

Spaghetti plot using ggplot in R?

I would like to produce a speghatii plot where i need to see days of the year on the x-axis and data on the y-axis for each Year. I would then want a separate year that had data for only 3 months (PCPNewData) to be plotted on the same figure but different color and bold line. Here is my sample code which produce a graph (attached) where the data for each Year for a particular Day is stacked- i don't want bar graph. I would like to have a line graph. Thanks
library(tidyverse)
library(tidyr)
myDates=as.data.frame(seq(as.Date("2000-01-01"), to=as.Date("2010-12-31"),by="days"))
colnames(myDates) = "Date"
Dates = myDates %>% separate(Date, sep = "-", into = c("Year", "Month", "Day"))
LatestDate=as.data.frame(seq(as.Date("2011-01-01"), to=as.Date("2011-03-31"),by="days"))
colnames(LatestDate) = "Date"
NewDate = LatestDate %>% separate(Date, sep = "-", into = c("Year", "Month", "Day"))
PCPDataHis = data.frame(total_precip = runif(4018, 0,70), Dates)
PCPNewData = data.frame(total_precip = runif(90, 0,70), NewDate)
PCPDataHisPlot =PCPDataHis %>% group_by(Year) %>% gather(key = "Variable", value = "Value", -Year, -Day,-Month)
ggplot(PCPDataHisPlot, aes(Day, Value, colour = Year))+
geom_line()+
geom_line(data = PCPNewData, aes(Day, total_precip))
I would like to have a Figure like below where each line represent data for a particular year
UPDATE:
I draw my desired figure with hand (see attached). I would like to have all the days of the Years on x-axis with its data on the y-axis
You have few errors in your code.
First, your days are in character format. You need to pass them in a numerical format to get line being continuous.
Then, you have multiple data for each days (because you have 12 months per year), so you need to summarise a little bit these data:
Pel2 <- Pelly2Data %>% group_by(year,day) %>% summarise(Value = mean(Value, na.rm = TRUE))
Pel3 <- Pelly2_2011_3months %>% group_by(year, day) %>% summarise(total_precip = mean(total_precip, na.rm = TRUE))
ggplot(Pel2, aes(as.numeric(day), Value, color = year))+
geom_line()+
geom_line(data = Pelly2_2011_3months, aes(as.numeric(day), y= total_precip),size = 2)
It looks better but it is hard to apply a specific color pattern
To my opinion, it will be less confused if you can compare mean of each dataset, such as:
library(tidyverse)
Pel2 <- Pelly2Data %>% group_by(day) %>%
summarise(Mean = mean(Value, na.rm = TRUE),
SEM = sd(Value,na.rm = TRUE)/sqrt(n())) %>%
mutate(Name = "Pel_ALL")
Pel3 <- Pelly2_2011_3months %>% group_by(day) %>%
summarise(Mean = mean(total_precip, na.rm = TRUE),
SEM = sd(total_precip, na.rm = TRUE)/sqrt(n())) %>%
mutate(Name = "Pel3")
Pel <- bind_rows(Pel2,Pel3)
ggplot(Pel, aes(x = as.numeric(day), y = Mean, color = Name))+
geom_ribbon(aes(ymin = Mean-SEM, ymax = Mean+SEM), alpha = 0.2)+
geom_line(size = 2)
EDIT: New graph based on update
To get the graph you post as a drawing, you need to have the day of the year and not the day of the month. We can get this information by setting a date sequence and extract the day of the year by using yday function from `lubridate package.
library(tidyverse)
library(lubridate)
Pelly2$Date = seq(ymd("1990-01-01"),ymd("2010-12-31"), by = "day")
Pelly2$Year_day <- yday(Pelly2$Date)
Pelly2_2011_3months$Date <- seq(ymd("2011-01-01"), ymd("2011-03-31"), by = "day")
Pelly2_2011_3months$Year_day <- yday(Pelly2_2011_3months$Date)
Pelly2$Dataset = "ALL"
Pelly2_2011_3months$Dataset = "2011_Dataset"
Pel <- bind_rows(Pelly2, Pelly2_2011_3months)
Then, you can combine both dataset and represent them with different colors, size, transparency (alpha) as show here:
ggplot(Pel, aes(x = Year_day, y = total_precip, color = year, size = Dataset, alpha = Dataset))+
geom_line()+
scale_size_manual(values = c(2,0.5))+
scale_alpha_manual(values = c(1,0.5))
Does it answer your question ?

How to plot a(n unknown) number of data series as geom_line in same chart

My first Q here, so please go lightly if I'm out of step anywhere.
I'm trying to code R to produce a single chart to contain a number of data series lines. The number of data series may vary but will be provided in the data frame. I have tried to rearrange another thread's content to print the geom_line , but not successfully.
The logic is:
#desire to replace loop of 1:5 with ncol(df)
print(ggplot(df,aes(x=time))
for (i in 1:5) {
print (+ geom_line(aes(y=df[,i]))
}
#functioning geom point loops ggplot production:
for (i in 1:5) {
print(ggplot(df,aes(x=time,y=df[,i]))+geom_point())
}
#functioning multi-line ggplot where n is explicit:
ggplot(data=df, aes(x=time), group=1) +
geom_line(aes(y=df$`3`))+
geom_line(aes(y=df$`4`))
The functioning example code produces n number of point charts, 5 in this case. I would like just one chart to contain n line series.
This may be similar to How to plot n dimensional matrix? for which there are currently no relevant answers
Any contributions much appreciated, thanks
You can use gather from tidyverse "world" to do that.
As you didn't supply a sample data I used mtcars.
I created two data.frames one with 3 columns one with 9. In each one of them I plotted all of the variables against the variable mpg.
library(tidyverse)
df3Columns <- mtcars[, 1:4]
df9Columns <- mtcars[, 1:10]
df3Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
df9Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
Edit - using the sample data in comments.
library(tidyverse)
df %>%
rownames_to_column("time") %>%
gather(var, value, -time) %>%
ggplot(aes(time, value, group = var, color = var)) +
geom_line()
Sample data:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
To strictly answer your question, you can simply store your ggplot in a variable and add the geom_line one by one:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
g <- ggplot(df, aes(x = 1:nrow(df)))
for (i in colnames(df))
{
g <- g + geom_line(y = df[,i])
}
g <- g + scale_y_continuous(limits = c(min(df), max(df)))
print(g)
However, this is not a very convenient solution. I would highly recommend to refactor your data frame to be more ggplot style.
df.ultimate <- data.frame(time = numeric(), value = numeric(), group = character())
for (i in colnames(df))
{
df.ultimate <- rbind(df.ultimate, data.frame(time = 1:nrow(df), value = df[, i], group = i))
}
g <- ggplot(df.ultimate, aes(x = time, y = value, color = group))
g <- g + geom_line()
print(g)
A one-line solution:
ggplot(data.frame(time = rep(1:nrow(df), ncol(df)),
value = as.vector(as.matrix(df)),
group = rep(colnames(df), each = nrow(df))),
aes(x = time, y = value, color = group)) + geom_line()

Displaying a stacked bar plot with a condition

I have this dataframe with numbers being percentages:
`df <- data.frame(spoken = c(10, 90, 30, 70),
lexicon = c(10, 90, 50, 50),
row.names = c("consonant_initial",
"vowel_initial",
"consonant_final", "vowel_final"))`
I want to display that in a nice way so that I get
a stacked barplot for the distribution of vowel vs consonant initial words
and the distribution of vowel vs consonant final words,
including facet_wrap to show the two conditions lexicon vs. spoken.
I have tried to reshape the data:
df$row <- seq_len(nrow(df))
df <- melt(df, id.vars = "row")
However, I can't wrap my head around how I would need to reshape the data in order to display it accordingly
You need to split the row names since the information you need to color code the stacked bars is encoded within, if I understand your desired graph correctly.
library(tidyverse)
df$label <- row.names(df)
df %>%
separate(label, c("lettertype", "position"), "_") %>%
gather(key = 'condition', value = 'prop', -lettertype, -position) %>%
ggplot() +
aes(x = position, y = prop, fill = lettertype) +
geom_bar(stat = 'identity') +
facet_wrap(~condition)
df$row1 <- sapply(strsplit(row.names(df), "_"), function(x) x[1])
df$row2 <- sapply(strsplit(row.names(df), "_"), function(x) x[2])
library(reshape2)
df <- melt(df, id.vars = c("row1", "row2"))
library(ggplot2)
ggplot(df, aes(x = row2, y = value, fill = row1)) +
geom_col() +
facet_wrap(~variable)

Time series multiple plot for different group in R

I have a large data frame of several variables (around 50) with first column as date and second column id.
My data roughly look like this:
df <- data.frame(date = c("01-04-2001 00:00","01-04-2001 00:00","01-04-2001 00:00",
"01-05-2001 00:00","01-05-2001 00:00","01-05-2001 00:00",
"01-06-2001 00:00","01-06-2001 00:00","01-06-2001 00:00",
"01-07-2001 00:00","01-07-2001 00:00","01-07-2001 00:00"),
id = c(1,2,3,1,2,3,1,2,3,1,2,3), a = c(1,2,3,4,5,6,7,8,9,10,11,12),
b = c(2,2.5,3,3.2,4,4.6,5,5.6,8,8.9,10,10.6))
I want time series plots for all three ids separately in same graph of variables, a and b in different graphs.
I tried ggplot but it didn't work. Please help me
Do you mean something like this?
library(reshape)
library(lattice)
df2 <- melt(df, id.vars = c("date", "id"), measure.vars = c("a", "b"))
xyplot(value ~ date | variable, group = id, df2, t='l')
Addendum
# The following is from a comment by jbaums.
# It will create a single plot/file for each variable of df2
png('plots%02d.png')
xyplot(value ~ date | variable, group = id, df2, t='l', layout=c(1, 1),
scales=list(alternating=FALSE, tck=1:0))
dev.off()
You can also add relation='free' to scales so that y-axis limits are calculated separately for each plot.
Edit: After reading the comments, maybe you should try something like this:
library(tidyr)
df2 <- gather(df, variable, value, -date, -id)
vars <- unique(df2$variable)
library(ggplot2)
for (i in 1:length(vars)) {
ggplot() +
geom_line(data = subset(df2, variable == vars[[i]]),
aes(date, value, group = id, color = factor(id))) +
ylab(as.character(vars[[i]])) +
ggsave(file = paste0(vars[[i]], ".png"))
}
This should save a PNG for each variable in your dataframe (and will change y label of every plot to variable name, as per your request)
Here's how to do it in ggplot, using the tidyr package to get it in the right format:
library(ggplot2)
library(tidyr)
library(dplyr)
df <- data.frame(date = c("01-04-2001 00:00","01-04-2001 00:00","01-04-2001 00:00",
"01-05-2001 00:00","01-05-2001 00:00","01-05-2001 00:00",
"01-06-2001 00:00","01-06-2001 00:00","01-06-2001 00:00",
"01-07-2001 00:00","01-07-2001 00:00","01-07-2001 00:00"),
id = c(1,2,3,1,2,3,1,2,3,1,2,3), a = c(1,2,3,4,5,6,7,8,9,10,11,12),
b = c(2,2.5,3,3.2,4,4.6,5,5.6,8,8.9,10,10.6))
Then using dplyr's group_by and do functions, we can save multiple plots.
df %>%
gather(variable, value, -date, -id) %>%
mutate(id = factor(id)) %>%
group_by(variable) %>%
do(
qplot(data = ., x = date, y = value, geom = "line", group = id, color = id, main = paste("variable =", .$variable)) +
ggsave(filename = paste0(.$variable, ".png")
)
)

Resources