I have a dataset with 4000 categoric variables which are city names arranged by date. I can do a plot of the entire dataset with an overall count.
What I need to do is be able to plot aggregates of specific cities by specific date ranges. I cannot use by quarter or anything like that because the required date ranges every year are different. I need to be able to, say, select 2016/4/1 to 2016/6/23 to get a count of how many are Denver.
How can I do this?
library(ggplot2)
library(ggpubr)
theme_set(theme_classic())
df <- log %>%
group_by(Location) %>%
summarise(counts = n())
df
ggplot(df, aes(x = Location, y = counts)) +
geom_bar(fill = "#0073C2FF", stat = "identity",width = .65) +
geom_text(aes(label = counts), vjust = -0.3) +
theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
labs(title="Locations of Library Instruction",
subtitle="2016-2020")
Related
I have a dataframe of Lots, Time, Value with the same structure as the sample data below.
df <- tibble(Lot = c(rep(123,4),rep(265,5),rep(132,3),rep(455,4)),
time = c(seq(4), seq(5), seq(3), seq(4)), Value = runif(16))
I'd like to split the dataframe by every N Lots and plot them. The Lots are different sizes so I can't subset the data by every n rows!
I've been using an approach like this but it's not scalable for a large dataset.
df %>% filter(Lot == c(123, 265)) %>% ggplot(., aes(x = time, y = Value)) +
geom_point() + stat_smooth()
How can I do this?
Create a lot number column and create a list of plots for every n unique lot values.
This would give you list of plots.
library(tidyverse)
lot_n <- 2
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
group_split(group) %>%
map(~ggplot(.x, aes(x = time, y = Value)) +
geom_point() + stat_smooth()) -> list_plots
list_plots
Individual plots can be accessed via list_plots[[1]], list_plots[[2]] etc.
You can also plot the data with facets.
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
ggplot(aes(x = time, y = Value)) +
geom_point() + stat_smooth() +
facet_wrap(~group, scales = 'free')
I have made a time series plot for total count data of 4 different species. As you can see the results with sharksucker have a much higher count than the other 3 species. To see the trends of the other 3 species they need to plotted separately (or on a smaller y axis). However, I have a figure limit in my masters paper. So, I was trying to create a dual axis plot or have the y axis split into two. Does anyone know of a way I could do this?
library(tidyverse)
library(reshape2)
dat <- read_xlsx("ReefPA.xlsx")
dat1 <- dat
dat1$Date <- format(dat1$Date, "%Y/%m")
plot_dat <- dat1 %>%
group_by(Date) %>%
summarise(Sharksucker_Remora = sum(Sharksucker_Remora)) %>%
melt("Date") %>%
filter(Date > '2018-01-01') %>%
arrange(Date)
names(plot_dat) <- c("Date", "Species", "Count")
ggplot(data = plot_dat) +
geom_line(mapping = aes(x = Date, y = Count, group = Species, colour = Species)) +
stat_smooth(method=lm, aes(x = Date, y = Count, group = Species, colour = Species)) +
scale_colour_manual(values=c(Golden_Trevally="goldenrod2", Red_Snapper="firebrick2", Sharksucker_Remora="darkolivegreen3", Juvenile_Remora="aquamarine2")) +
xlab("Date") +
ylab("Total Presence Per Month") +
theme(legend.title = element_blank()) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
The thing is, the problem you're trying to solve doesn't seem like a 2nd Y axis issue. The problem here is of relative scale of the species. You might want to think of something like standardizing the initial species presence to 100 and showing growth or decline from there.
Another option would be faceting by species.
I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!
First, we need to add a count to your original data.frame.
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected,
count = n())
Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.
df.plot %>%
mutate(ID = paste0(ID, " (n=", count, ")")) %>%
select(-count) %>%
melt() %>%
mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_manual("Infection Status",
values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
labels = c("Non-Infected", "Infected"))+
coord_flip()
I want to build a graph to determine the seasonality of the influenza virus from January to December and group it per year (2011-2018), but I have problems with my results.
I would like graphics like the one that appear in this publication : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193263
outcome<- factor(c(rep("positive", 60)))
month<- sample(1:12,60,replace=T)
year <- sample(2011:2018,60,replace=T)
data<- data.frame(outcome, month, year)
ggplot(data, aes(x=month, y= frequency(data$outcome),
group = year, fill = year)) + geom_col(position = "dodge")
library(dplyr)
data %>%
count(year, month , wt = outcome == "positive") %>%
ggplot(aes(x=month, y= n, group = year, fill = year)) +
geom_col(position = "dodge") +
scale_x_continuous(breaks = 1:12, labels = month.abb)
Assuming you mean the first graphic in the paper the easiest way to do this is a facet_wrap or facet_grid.
Edit- I would also like to add that R does not do block comments like *** if that's what you were trying.
I have data that looks kinda like this:
id = rep(1:33,3)
year = rep(1:3,33)
group = sample(c(1:3),99, replace=T)
test_result = sample(c(TRUE,FALSE), size=99, replace = T)
df = data.frame(id, year, group, test_result)
df$year = as.factor(year)
df$group = as.factor(group)
My goal is to visualize it so that I can see how group number and year relate to test_result.
df %>%
group_by(id,year) %>%
summarize(x=sum(test_result)) %>%
ggplot() +
geom_histogram(aes(fill = year,
x = x),
binwidth = 1,
position='dodge') +
theme_minimal()
gets me almost all the way there. What I want is to be able to add something like facet_wrap(group~.) to the end of this to show how these change by group but obviously group is not part of the aggregated dataframe.
Right now my best solution is just to show multiple plots like
df %>% filter(group==1) # Replace group number here
group_by(id,year) %>%
summarize(x=sum(test_result)) %>%
ggplot() +
geom_histogram(aes(fill = year,
x = x),
binwidth = 1,
position='dodge') +
theme_minimal()
but I'd love to figure out how to put them all in one figure and I'm wondering if maybe the way to do that is to put more of the grouping logic into ggplot?