problem plotting time series graph in r with date - r

I need to plot a time series graph but the data that I'm using is proving to be quite challenging.
Ideally, I'd like a graph that looks something like this:
But mine looks like this:
I have tried a series of different things but none of them have worked.
The dataset can be found here and I'll attach a picture of what the dataset itself looks like:
some code I have tried includes
ggplot( aes(x=date, y=northEast)) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("test") +
theme_ipsum()
ggplot(covidData2) +
geom_line(
mapping = aes(x = weekBeginning, y=northEast, group=northEast)
)
Any help would be greatly appreciated!

You need to tidy your data up before plotting it. If you look at your data frame, all of the "numeric" columns have been interpreted as character vectors because the column names are nested and therefore appear in the first couple of rows. You need to consolidate these and convert them to column names. Then, you need to convert the numeric columns to numbers. Finally, you need to parse the dates, as ggplot will simply read the periods as character vectors:
library(readxl)
library(lubridate)
library(ggplot2)
library(hrbrthemes)
wb <- read_xlsx(path.expand("~/covid.xlsx"), sheet = "Table 9")
df <- as.data.frame(wb)
df[1, 1] <- ""
for(i in 2:length(df)) {
if(is.na(df[1, i])) df[1, i] <- df[1, i - 1]
}
nms <- trimws(paste(df[1,], df[2,]))
df <- df[-c(1:2),]
names(df) <- nms
df <- df[sapply(df, function(x) !all(is.na(x)))]
df[-1] <- lapply(df[-1], as.numeric)
df <- head(df, -3)
df$Period <- dmy(substr(df$Period, 1, 10))
Now we can plot:
ggplot(df, aes(x = Period, y = `North East Rate`)) +
geom_area(fill = "#69b3a2", alpha=0.5) +
geom_line(color = "#69b3a2") +
ylab("Rate per 100,000") +
xlab("") +
theme_ipsum()
Created on 2022-03-08 by the reprex package (v2.0.1)

Related

How to apply ggplot2 to each row in a data frame

I want to code a ggplot2 visualization as a function, and then apply the function on each row of a dataframe (I want to use apply to avoid a for loop, as suggested here.)
The data:
library(ggplot2)
point1 <- c(1,2)
point2 <- c(2,2)
points <-as.data.frame(rbind(point1,point2))
I saved points as a data frame and it runs fine in ggplot2:
ggplot(data = points) +
geom_point(aes(x = points[, 1], y = points[, 2])) +
xlim(-3, 3) +
ylim(-3, 3) +
theme_bw()
That's not really the plot I want though: I would like two plots, each one with one point.
Now I build a function that will loop through the rows of the data frame:
plot_data <- function(data) {
ggplot(data) +
geom_point(aes(x = data[, 1], y = data[, 2])) +
xlim(-3, 3) +
ylim(-3, 3) +
theme_bw()
}
I create a list to store the plots:
myplots <- list()
And here is the call to apply, following this suggestion:
myplots <- apply(points, 1, plot_data)
But I get the following error:
#> Error: `data` must be a data frame, or other object coercible by `fortify()`,
not a numeric vector
But my data are a data frame.
Is this because: "apply() will try to convert the data.frame into a matrix (see the help docs). If it does not gracefully convert due to mixed data types, I'm not quite sure what would result" as noted in a comment to the answer I referred to?
Still, if I check the data class after the call to apply, the data are still a dataframe:
class(points)
#> [1] "data.frame"
Created on 2021-04-09 by the reprex package (v0.3.0)
As suggested by Gregor Thomas in the comment:
library(ggplot2)
point1 <- c(1, 2)
point2 <- c(2, 2)
points <- as.data.frame(rbind(point1, point2))
plot_data <- function(data) {
ggplot(data) +
geom_point(aes(x = data[, 1], y = data[, 2])) +
xlim(-3, 3) +
ylim(-3, 3) +
theme_bw()
}
myplots <- list()
myplots <- lapply(1:nrow(points), function(i) plot_data(points[i, ]))
myplots
#> [[1]]
#>
#> [[2]]
Created on 2021-04-09 by the reprex package (v0.3.0)

ggplot in R to add significance asterisk vs control group over multiple variables

I have barplots, but would like to run a Wilcox.test within each "grp1" comparing the bars to the control for that group, and then putting an asterix if it is significant.
I've seen "compare_means" to get the comparisons, but I'm trying to make it automated and not so manual. Would "geom_signif" or "stat_compare_means" do this? Can someone help with this? Thank you very much.
I need the comparison to be made using the full dataset, not just the means (which is only one value per bar). I added a line at the end of the code running one of the comparisons so you can see where I need the p-values from.
y <- c(runif(100,0,4.5),runif(100,3,6),runif(100,4,7))
grp1 <- sample(c("A","B","C","D"),size = 300, replace = TRUE)
grp2 <- rep(c("High","Med","Contrl"),each=100)
dataset <- data.frame(y,grp1,grp2)
means <- aggregate(y~grp1+grp2,data=dataset,mean)
sd <- aggregate(y~grp1+grp2,data=dataset,function(x){sd(x)})
means.all <- merge(sd,means,by=c("grp1","grp2"))
names(means.all)[3:4] <- c("sd","y.mean")
library(ggplot2)
p<- ggplot(means.all, aes(x=grp1, y=y.mean, fill=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))
p
compare_means(y~grp2,data = dataset[dataset$grp1=="A",],method="wilcox.test")
Maybe this is not the optimal way but you can create a list splitting the data and applying the stat_compare_means() function individually at each level of your data. After that you can arrange the plots in one using patchwork:
library(ggplot2)
library(ggpubr)
library(patchwork)
#Split data
List <- split(means.all,means.all$grp1)
#Function for plot
myfun <- function(x)
{
#Ref group
rg <- paste0(unique(x$grp1),'.','Contrl')
#Plot
G <- ggplot(x, aes(x=interaction(grp1,grp2), y=y.mean, fill=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))+
stat_compare_means(ref.group = rg,label = "p.signif",method = "wilcox.test",label.y = 7)+
theme(axis.text.x = element_blank())+
xlab(unique(x$grp1))
return(G)
}
#Apply
Lplot <- lapply(List, myfun)
#Wrap plots
wrap_plots(Lplot,nrow = 1)+plot_layout(guides = 'collect')
Output:
Consider this update that takes the values for asterisks stored in a new dataframe:
#Create p-vals dataset
List2 <- split(dataset,dataset$grp1)
#p-val function
mypval <- function(x)
{
y <- compare_means(y~grp2,data = x,method="wilcox.test")
y <- y[,c('group2', 'group1','p.signif')]
names(y)<-c('grp2','grp1','p.signif')
y <- y[y$grp2=='Contrl',]
y$grp2 <- y$grp1
y <- rbind(y,data.frame(grp2='Contrl',grp1='',p.signif=''))
y$grp1 <- unique(x$grp1)
y$y.mean=7
return(y)
}
#Apply
dfpvals <- lapply(List2, mypval)
df <- do.call(rbind,dfpvals)
#Plot
ggplot(means.all, aes(x=grp1, y=y.mean, fill=grp2,group=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))+
geom_text(data=df,aes(x=grp1, y=y.mean,group=grp2,label=p.signif),
position=position_dodge(0.9))
Output:

Plot data using loop in R

I want to make a plot of the Daily Streamflow in each Station and save it in png format. I want a separate png for each station, something like the image below:
I have a list with the data frame for each station, as shown in the figure below:
I am trying using the following code, but it is not working because R aborted, I am not sure if it is because of the quantity of data:
for (i in 1:length(listDF2))
{
df1 <- as.data.frame(listDF2[[i]])
df1[is.na(df1)] <- 0
temp_plot <- ggplot(df1, aes(x = day, y = DailyMeanStreamflow, colour=Station)) +
geom_line(size = 1) +
geom_point(size=1.5, shape=21, fill="white") +
facet_wrap(~ month, ncol = 3) +
labs(title = "Daily Mean Streamflow",
subtitle = "Data plotted by month",
y = "Daily Mean Streamflow [m3/s]", x="Days") +
scale_y_continuous (breaks=seq(0,max(df1$DailyMeanStreamflow, na.rm=TRUE),by=1500)) +
scale_x_continuous (breaks=seq(1,max(df1$day),by=1)) + theme(axis.text.x = element_text(size=9))
print(temp_plot)
name4<- paste("DailyStreamflow_byMonth","_", siteNumber[i], ".png", sep="")
ggsave(temp_plot,filename = name4,width=22,height=11,units="in",dpi=500)
#while (!is.null(dev.list()))
dev.off()
}
I have also a "big" data frame with the data for each station one after the other. This data frame is useful when I want to apply functions like data_frame %>% group_by(station) %>% summarise(...)
Any idea in how to make the plots for each station? Is it better to use the list or the "big" data frame for this purpose?
I am not sure where the problem in your workflow occures. It is quite hard to help you, as we have not minimal working example. Also I am not sure if you just want to produce your plots in a loop or if you (also) want to put them together in one visualization?
Anyways ... I tried to give you a starting point ... maybe this will help?
"%>%" <- magrittr::"%>%"
df_list <- list(
A=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)),
B=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)),
C=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)),
D=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)))
# Lapply approach
lapply(df_list, function(dat){
p <- dat %>%
ggplot2::ggplot(ggplot2::aes(x=x,y=y)) +
ggplot2::geom_point()
print(p)
})
# Loop approach
for (i in 1:length(df_list)){
p <- df_list[[i]] %>%
ggplot2::ggplot(ggplot2::aes(x=x,y=y)) +
ggplot2::geom_point()
print(p)
fname <- paste("test","_", i, ".png", sep="")
ggsave(p,
filename=fname,
width=22,
height=11,
units="in",
dpi=500)
}

Plot multiple distributions by year using ggplot Boxplot

I'm trying to evaluate the above data in a boxplot similar to this: https://www.r-graph-gallery.com/89-box-and-scatter-plot-with-ggplot2.html
I want the x axis to reflect my "Year" variable and each boxplot to evaluate the 8 methods as a distribution. Eventually I'd like to pinpoint the "Selected" variable in relation to that distribution but currently I just want this thing to render!
I figure out how to code my y variable and I get various errors no matter what I try. I think the PY needs to be as.factor but I've tried some code that way and I just get other errors.
anyway here is my code (Send Help):
# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(ggplot2)
library(readxl) # For reading in Excel files
library(lubridate) # For handling dates
library(dplyr) # for mutate and pipe functions
# Path to current and prior data folders
DataPath_Current <- "C:/R Projects/Box Plot Test"
Ult_sum <- read_excel(path = paste0(DataPath_Current, "/estimate.XLSX"),
sheet = "Sheet1",
range = "A2:J12",
guess_max = 100)
# just want to see what my table looks like
Ult_sum
# create a dataset - the below is code I commented out
# data <- data.frame(
# name=c(Ult_sum[,1]),
# value=c(Ult_sum[1:11,2:8])
#)
value <- Ult_sum[2,]
# Plot
Ult_sum %>%
ggplot( aes(x= Year, y= value, fill=Year)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("A boxplot with jitter") +
xlab("")
I do not see how your code matches the screenshot of your dataset. However, just a general hint: ggplot likes data in long format. I suggest you reshape your data using tidyr::reshape_long oder data.table::melt. This way you get 3 columns: year, method, value, of which the first two should be a factor. The resulting dataset can then be neatly used in aes() as aes(x=year, y=value, fill=method).
Edit: Added an example. Does this do what you want?
library(data.table)
library(magrittr)
library(ggplot2)
DT <- data.table(year = factor(rep(2010:2014, 10)),
method1 = rnorm(50),
method2 = rnorm(50),
method3 = rnorm(50))
DT_long <- DT %>% melt(id.vars = "year")
ggplot(DT_long, aes(x = year, y = value, fill = variable)) +
geom_boxplot()

Assigning plot to a variable in a loop

I am trying to create 2 line plots.
But I noticed that using a for loop will generate two plots with y=mev2 (instead of a plot based on y=mev1 and another one based on y=mev2).
The code below shows the observation here.
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
library(ggplot2)
# Method 1: Creating plot1 and plot2 without using "for" loop (hard-code)
plot1 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[2])))) + geom_line()
plot2 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[3])))) + geom_line()
# Method 2: Creating plot1 and plot2 using "for" loop
for (i in 1:2) {
y_var <- unlist(as.list(df[i+1]))
assign(paste("plot", i, sep = ""), ggplot(data = df, aes(x=Period, y=y_var)) + geom_line())
}
Seems like this is due to some ggplot()'s way of working that I am not aware of.
Question:
If I want to use Method 2, how should I modify the logic?
People said that using assign() is not an "R-style", so I wonder what's an alternate way to do this? Say, using list?
One possible answer with no tidyverse command added is :
library(ggplot2)
y_var <- colnames(df)
for (i in 1:2) {
assign(paste("plot", i, sep = ""),
ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line())
}
plot1
plot2
You may use aes_string. I hope it helps.
EDIT 1
If you want to stock your plot in a list, you can use this :
Initialize your list :
n <- 2 # number of plots
list_plot <- vector(mode = "list", length = n)
names(list_plot) <- paste("plot", 1:n)
Fill it :
for (i in 1:2) {
list_plot[[i]] <- ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line()
}
Display :
list_plot[[1]]
list_plot[[2]]
For lines in different "plots", you can simplify it with facet_wrap():
library(tidyverse)
df %>%
gather(variable, value, -c(Period)) %>% # wide to long format
ggplot(aes(Period, value)) + geom_line() + facet_wrap(vars(variable))
You can also put it in a loop if necessary and store the results in a list:
# empty list
listed <- list()
# fill the list with the plots
for (i in c(2:3)){
listed[[i-1]] <- df[,-i] %>%
gather(variable, value, -c(Period)) %>%
ggplot(aes(Period, value)) + geom_line()
}
# to get the plots
listed[[1]]
listed[[2]]
Why do you want 2 separate plots? ggplots way to do this would be to get data in long format and then plot.
library(tidyverse)
df %>%
pivot_longer(cols = -Period) %>%
ggplot() + aes(Period, value, color = name) + geom_line()
Here is an alternative approach using a function and lapply. I recognize that you asked how to solve this using a loop. Still, I think it might be useful to consider this approach.
library(ggplot2)
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
myplot <- function(yvar){
plot <- ggplot(df, aes(Period, !!sym(yvar))) + geom_line()
return(plot)
}
colnames <- c("mev1","mev2")
list <- lapply(colnames, myplot)
names(list) <- paste0("plot_", colnames)
# Alternativing naming: names(list) <- paste0("plot", 1:2)
Using this approach you can easily apply your plot function to whatever columns you like. You can specify the columns by name, which may be preferrabe to specifying by position. Plots are saved in a list, and they are named afterwards using the names attribute. In my example I named the plots plot_mev1 and plot_mev2. But you can easily adjust to some other naming. E.g. write names(list) <- paste0("plot", 1:2) to get plot1 and plot2.
Note that I used !!sym() in the ggplot call. This is essentally an alternative to aes_string which was used in the answer of RĂ©mi Coulaud. In this way ggplot understands even in the context of a function or in the context of a loop that "mev1" is a column of your dataset and not just a text string

Resources