I am looking to create a cycle plot of hours within months. I am hoping it will look something like the plot bellow. I am aiming for the plot to indicate mean temperature for each month with a horizontal line, and then within each month have the graph show the temperature fluctuations across the typical day of that month. I was trying to use monthplot() but it doesn't seem to be working:
library(nycflights13)
tempdata <- weather %>% group_by(hour)
monthplot(tempdata, labels = NULL, ylab = "temp")
It keeps saying argument is not numeric or logical: returning NA but I am not sure where the code is going wrong.
Hope that this ggplot2 solution will work:
library(nycflights13)
library(ggplot2)
library(dplyr)
# Prepare data
tempdata <- weather %>%
group_by(month, day) %>%
summarise(temp = mean(temp, na.rm = TRUE))
meanMonth <- tempdata %>%
group_by(month) %>%
summarise(temp = mean(temp, na.rm = TRUE))
# Plot using ggplot2
ggplot(tempdata, aes(day, temp)) +
geom_hline(data = meanMonth, aes(yintercept = temp)) +
geom_line() +
facet_grid(~ month, switch = "x") +
labs(x = "Month",
y = "Temperature") +
theme_classic() +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.line.x = element_blank())
temp has a missing value which causes an error. You also need to set the times and phase arguments.
library(nycflights13)
# Find mean value : this automatically removes the observation with missing data that was causing an error
tempdata <- aggregate(temp ~ month + day, data=weather, mean)
with(tempdata, monthplot(temp, times=day , phase=month, ylab = "temp"))
Related
I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
Here's the pastebin link with my data
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday for bars and datetime for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)
I'm trying to show in a linear chart in R from a Y (project) given the maximum date that corresponds to that Y, but it only shows me the highest date of that dataframe and not of each of the projects in order.
Example of Dataframe :
Chart:
Code:
output$grafica5 <-renderPlotly({
p <- ggplot(data(),aes(x=max(as.Date(data()[,names(data())[3]], format = "%d/%m/%Y")), y=as.factor(data()[,names(data())[2]]), group=1))+
geom_line()+
geom_point()+
theme_classic()+
theme(axis.text.x=element_text(angle=45, hjust=1))+scale_color_viridis(discrete = TRUE)+
labs(
y="Proyectos",
x="Fecha")})
Can you help me? Thanks in advance
You should do your data manipulation prior to the plotting (as.date and filtering the max values)
library(dplyr)
library(ggplot2)
# creation of a data.frame with repeated projects with due dates
df <- data.frame(
Projects = paste0("projects", sample(1:9, 20, T)),
Duedate = paste0(sample(10:30, 20, T), "/01/2020" )
)
# creation of a variable `dates` and filtering of the max date per project
df <- df %>%
mutate(dates = as.Date(Duedate, format = "%d/%m/%Y")) %>%
group_by(Projects) %>%
filter(dates == max(dates)) %>%
ungroup()
ggplot(df, aes(x = dates, y = Projects, group = 1)) +
geom_line() +
geom_point()
The name of the countries are long and are on top of each other in the x labels, how can I make it readable?
ggplot(results, aes(x = Nationality, horiz=TRUE)) +
theme_solarized() +
geom_bar() +
labs(y = "Number of Medals",
title = "Number of Medals by Country")
Welcome to stackoverflow. Here are some suggestions on how you can deal with the many values. In both methods, I am using the forcats library within the tidyverse. You can read more about it here: https://r4ds.had.co.nz/factors.html
First, some fake data & replicating your problem
library(tidyverse)
df <-
mpg %>%
arrange(manufacturer) %>%
mutate(
n = row_number(),
vehicle = paste(year, manufacturer, model)
) %>%
uncount(n)
# this replicates your problem
ggplot(df, aes(vehicle)) +
geom_bar() +
coord_flip()
Option 1: consolidate
df %>%
mutate(
vehicle = # making heavy use of forcats here
fct_lump(vehicle, 35) %>% # keep only the 35 most frequent values, others in "Other" category
fct_infreq() %>% # order them by frequency
fct_rev() #reverse the order
) %>%
ggplot(aes(vehicle)) +
geom_bar() +
coord_flip()
Option 2: facet
Someone may have a more elegant way of getting these groups but I use this method quite a bit
df %>%
mutate(
vehicle = # similar methods to earlier
fct_infreq(vehicle) %>%
fct_rev(),
num_fct = as.integer(vehicle), # generates a number for each factor
facet = (max(num_fct)-num_fct) %/% 20 # will make groups of 20, but they need to be in descending order within each facet
) %>%
ggplot(aes(vehicle)) +
geom_bar() +
coord_flip() +
facet_wrap(~facet, scales = "free_y", nrow = 1) +
theme(
strip.background = element_blank(),
strip.text = element_blank()
)
Hope this helps.
I make for each variable in my dataframe a histogram, lineplot and boxplot to assess the distribution of each variable and plot these graphs in one window.
For variable VARIABLE my code looks like:
variable_name_string = "VARIABLE"
hist = qplot(VARIABLE, data = full_data_noNO, geom="histogram",
fill=I("lightblue"))+
theme_light()
avg_price = full_data_noNO %>%
group_by(Month, Country) %>%
dplyr::summarize(avg = mean(VARIABLE, na.rm =
TRUE))
#line graph for different countries over time
line = ggplot(data=avg_price, aes(x=anydate(Month), y=VARIABLE,
group=Country)) +
xlab("Date")+
ylab(variable_name_string)+
geom_line(aes(color=Country), size = 1)+
theme_light()
#boxplot over different years
avg_price2 = avg_price
avg_price2$Month = format(as.Date(anydate(avg_price$Month), "%Y-%m-%d"),
"%Y")
box = ggplot(avg_price2, aes(x = Month, y=VARIABLE, fill = Month)) +
geom_boxplot()+
xlab("Date")+
ylab(variable_name_string)+
guides(fill=FALSE)+
theme_light()
var_name = grid.text(variable_name_string, gp=gpar(fontsize=20))
#merge plot into one window
grid.arrange(var_name, hist, line, box, ncol=2)
This works fine for one variable, but now I want to do this for every variable in my dataframe and save the merged plot window for all variables. I have been looking for almost the entire day but I cannot find a solution. Can anyone help me?
Without reproducible example it is hard to help, but you could try to wrap your plotting code in a function and use lapply to repeatedly call the function for all your variables.
make_plots <- function (variable_string) {
var_quo <- rlang::sym(variable_string)
hist = qplot(!!var_quo, data = full_data_noNO, geom="histogram",
fill=I("lightblue"))+
theme_light()
avg_price = full_data_noNO %>%
group_by(Month, Country) %>%
dplyr::summarize(avg = mean(!!var_quo, na.rm =
TRUE))
#line graph for different countries over time
line = ggplot(data=avg_price, aes(x=anydate(Month), y=!!var_quo,
group=Country)) +
xlab("Date")+
ylab(variable_string)+
geom_line(aes(color=Country), size = 1)+
theme_light()
#boxplot over different years
avg_price2 = avg_price
avg_price2$Month = format(as.Date(anydate(avg_price$Month), "%Y-%m-%d"),
"%Y")
box = ggplot(avg_price2, aes(x = Month, y=!!var_quo, fill = Month)) +
geom_boxplot()+
xlab("Date")+
ylab(variable_string)+
guides(fill=FALSE)+
theme_light()
var_name = grid.text(!!var_quo, gp=gpar(fontsize=20))
#merge plot into one window
combined <- grid.arrange(var_name, hist, line, box, ncol=2)
# Save combined plot at VARIABLE_plots.pdf
ggsave(paste0(variable_string, "_plots.pdf"), combined)
combined
}
# Make sure to pass the variable names as character vector
plots <- lapply(c("VARIABLE1", "VARIABLE2"), make_plots)
# OR
plots <- lapply(colnames(full_data_noNO), make_plots)
# Plots can also be accessed and printed individually
print(plots[["VARIABLE1"]])
I'm trying to use ggplot to create sequence plots, for the sake of keeping the same visual style within my paper using sequence analysis. I do:
library(ggplot2)
library(TraMineR)
library(dplyr)
library(tidyr)
data(mvad)
mvad_seq<-seqdef(mvad,15:length(mvad))
mvad_trate<-seqsubm(mvad_seq,method="TRATE")
mvad_dist<-seqdist(mvad_seq,method="OM",sm=mvad_trate)
cluster<-cutree(hclust(d=as.dist(mvad_dist),method="ward.D2"),k=6)
mvad$cluster<-cluster
mvad_long<-gather(select(mvad,id,contains("."),-matches("N.Eastern"),-matches("S.Eastern")),
key="Month",value="state",
Jul.93, Aug.93, Sep.93, Oct.93, Nov.93, Dec.93, Jan.94, Feb.94, Mar.94,
Apr.94, May.94, Jun.94, Jul.94, Aug.94, Sep.94, Oct.94, Nov.94, Dec.94, Jan.95,
Feb.95, Mar.95, Apr.95, May.95, Jun.95, Jul.95, Aug.95, Sep.95, Oct.95, Nov.95,
Dec.95, Jan.96, Feb.96, Mar.96, Apr.96, May.96, Jun.96, Jul.96, Aug.96, Sep.96,
Oct.96, Nov.96, Dec.96, Jan.97, Feb.97, Mar.97, Apr.97, May.97, Jun.97, Jul.97,
Aug.97, Sep.97, Oct.97, Nov.97, Dec.97, Jan.98, Feb.98, Mar.98, Apr.98, May.98,
Jun.98, Jul.98, Aug.98, Sep.98, Oct.98, Nov.98, Dec.98, Jan.99, Feb.99, Mar.99,
Apr.99, May.99, Jun.99)
mvad_long<-left_join(mvad_long,select(mvad,id,cluster))
ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+geom_tile()+facet_wrap(~cluster)
I try to plot the sequences by cluster, and this gives me the following plot:
As you can see, there are gaps for the ids that don't belong to the cluster represented by each facet. I would like to get rid of these gaps, so that the sequences show up stacked just as with the seqIplot() function of TraMineR as in the next figure:
Any suggestions of how to proceed?
Two small changes:
mvad_long$id <- as.factor(mvad_long$id)
ggplot(data=mvad_long,aes(x=Month,y=id,fill=state))+
geom_tile()+facet_wrap(~cluster,scales = "free_y")
ggplot was treating id as a numerical variable, rather than a factor, and then the scales were fixed.
An update: I needed to convert the month in to a date for it to work. Full solution follows:
library(ggplot2)
library(TraMineR)
library(dplyr)
library(tidyr)
library(lubridate)
data(mvad)
mvad_seq <- seqdef(mvad, 15:length(mvad))
mvad_trate <- seqsubm(mvad_seq, method = "TRATE")
mvad_dist <- seqdist(mvad_seq, method = "OM", sm = mvad_trate)
cluster <- cutree(hclust(d = as.dist(mvad_dist), method = "ward.D2"), k = 6)
mvad$cluster <- cluster
mvad_long <- mvad %>%
select(id, matches("\\.\\d\\d")) %>%
gather(key = "month", value = "state", -id) %>%
inner_join(
mvad %>%
select(id, cluster),
by = "id"
) %>%
mutate(
id = factor(id),
date = myd(paste0(month, "01"))
)
mvad_long %>%
ggplot(aes(x = date, y = id, fill = state, color = state)) +
geom_tile() +
facet_wrap(~cluster, scales = "free_y", ncol = 2) +
theme_bw() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid = element_blank()
) +
scale_fill_brewer(palette = "Accent") +
scale_colour_brewer(palette = "Accent") +
labs(x = "", y = "")