I want to show different barplots for the years and gender with the mean values of the variables Q1 to Q5, which should look like a density.
I have data that looks like this:
data <- data.frame(userid = c(1,1,1,2,2,2,3,3,3),
year = c(2013,2014,2015,2013,2014,2015,2013,2014,2015),
gender = c(1,1,1,0,0,0,0,0,0),
Q1 = c(3,2,3,1,0,1,2,1,0),
Q2 = c(4,3,4,2,0,2,1,4,3),
Q3 = c(1,2,1,3,5,4,5,4,5),
Q4 = c(1,2,1,2,4,3,2,2,1),
Q5 = c(1,1,1,2,1,0,0,0,1))
My solution was to filter() for year and gender first and then use summarise(),
to get a vector of the means and put this into the barplot() function:
data %>% filter(gender==1,year==2013) %>% select(-userid,-gender,-year) %>% summarise_all(mean) %>%
as.numeric() %>%
barplot()
Instead of doing this for every combination of year and gender,
is there a more elegant way, using ggplot and facet_wrap()?
I may have misunderstood how you want the plot arranged, but if you want to show the mean score answer per year and gender group, you could do facets like this:
library(tidyverse)
data %>%
pivot_longer(starts_with("Q")) %>%
group_by(year, gender, name) %>%
summarize(value = mean(value)) %>%
ggplot(aes(name, value)) +
geom_col(fill = 'deepskyblue4') +
facet_grid(year ~ gender) +
labs(x = 'Question', y = 'Average score') +
theme_minimal(base_size = 16)
Maybe you want something like this with facet_wrap and geom_col where the mean is calculate using rowMeans like this:
library(dplyr)
library(ggplot2)
data %>%
mutate(mean = rowMeans(select(., starts_with("Q")), na.rm = TRUE)) %>%
ggplot(aes(x = year, y = mean, fill = factor(gender))) +
geom_col() +
labs(x = 'Year', y = 'Mean Q1 to Q5', fill = 'Gender') +
theme_bw() +
facet_wrap(~userid)
Created on 2022-10-28 with reprex v2.0.2
First, pivot your data from wide to long format, group by year, gender, and Q, and summarize to mean values.
library(tidyr)
library(dplyr)
library(ggplot2)
data_long <- data %>%
pivot_longer(Q1:Q5, names_to = "Q", values_to = "value") %>%
group_by(year, gender, Q) %>%
summarize(value = mean(value), .groups = "drop")
data_long
# A tibble: 30 × 4
year gender Q value
<dbl> <dbl> <chr> <dbl>
1 2013 0 Q1 1.5
2 2013 0 Q2 1.5
3 2013 0 Q3 4
4 2013 0 Q4 2
5 2013 0 Q5 1
6 2013 1 Q1 3
7 2013 1 Q2 4
8 2013 1 Q3 1
9 2013 1 Q4 1
10 2013 1 Q5 1
# … with 20 more rows
Then plot using ggplot2::facet_grid().
ggplot(data_long, aes(Q, value)) +
geom_col() +
facet_grid(year ~ gender)
aggregate then barplot.
par(mfrow=c(1, 4))
sapply(unique(data$year), \(x) {
as.matrix(aggregate(cbind(Q1, Q2, Q3, Q4, Q5) ~ gender, data[data$year == x, ], FUN=mean)[-1]) |>
barplot(beside=TRUE, col=c(2, 4), main=x)
})
plot.new()
legend('left', legend=c('m', 'f'), col=c(2, 4), cex=1.2, pch=15, bty='n')
This approach does not require you to first calculate the mean, that is handled by stat_summary(), specifying fun = mean.
library(tidyverse)
data <- data.frame(userid = c(1,1,1,2,2,2,3,3,3),
year = c(2013,2014,2015,2013,2014,2015,2013,2014,2015),
gender = c(1,1,1,0,0,0,0,0,0),
Q1 = c(3,2,3,1,0,1,2,1,0),
Q2 = c(4,3,4,2,0,2,1,4,3),
Q3 = c(1,2,1,3,5,4,5,4,5),
Q4 = c(1,2,1,2,4,3,2,2,1),
Q5 = c(1,1,1,2,1,0,0,0,1))
data %>%
select(starts_with("Q")) %>% # Selects each column that starts with "Q"
pivot_longer(cols = everything()) %>% # Pivot to long format
ggplot(aes(x = name, y = value, fill = name)) +
stat_summary(geom = "bar", fun = "mean") + # Geom and function can be changed easily
theme_classic() +
labs(x = "Q", y = "Mean value", fill = NULL)
Created on 2022-10-28 by the reprex package (v2.0.1)
City Year Month Deaths Guns Shootings
Miami 2010 1 69 73800 701
Miami 2010 2 99 85050 738
Miami 2010 3 122 92650 784
Houston 2010 1 98 92100 789
Houston 2010 2 146 103900 799
Houston 2010 3 162 136100 772
For each city, I want to create a layered line plot with the Month and Year on the x-axis and colored lines corresponding to deaths, guns, and shootings. But I don't see how to do this.
So far I tried
df <- shootings %>%
select(city, date, sales, volume, median, listings, inventory) %>%
gather(key = "variable", value = "value", -c(city,date))
head(df)
df<-df[df$city=='Miami',]
ggplot(df, aes(x = date, y = value)) +
geom_line(aes(color = variable, linetype = variable))
And that can give me the layered line plot for one specific city, but is there a way to loop this through all the cities in my dataframe? Also how do I have this show the month and year labeled on the x-axis?
Last...
when I do
library(ggplot2)
library(reshape2)
library(tidyverse)
for (city_name in df$city){
df %>% filter(city == city_name) %>%
pivot_longer(-c(city, year, month, date),
names_to = "Statistic") %>%
ggplot(aes(x = date, y = log(value))) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic))
}
I get no output. Why is this?
Using facet_wrap():
library(tidyverse)
df <- read.table(text = "City Year Month Deaths Guns Shootings
Miami 2010 1 69 73800 701
Miami 2010 2 99 85050 738
Miami 2010 3 122 92650 784
Houston 2010 1 98 92100 789
Houston 2010 2 146 103900 799
Houston 2010 3 162 136100 772 ",
header = TRUE)
df %>%
mutate(Month = factor(Month, labels = month.name[1:3])) %>%
pivot_longer(-c(City, Year, Month),
names_to = "Statistic") %>%
ggplot(aes(x = Month, y = value)) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic)) +
facet_wrap(City ~ Year)
# The difference between the number of guns and the number of
# death is pretty large; perhaps plot 'log(value)' instead?
df %>%
mutate(Month = factor(Month, labels = month.name[1:3])) %>%
pivot_longer(-c(City, Year, Month),
names_to = "Statistic") %>%
ggplot(aes(x = Month, y = log(value))) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic)) +
facet_wrap(~ City)
# For Month and Year on the x-axis, one option would be:
df %>%
mutate(Month = factor(Month, labels = month.abb[1:3])) %>%
mutate(Month_Year = paste(Month, Year)) %>%
mutate(Date = factor(Month_Year,
levels = paste(
month.abb,
2010:2010,
sep = " "
), ordered = TRUE)) %>%
select(-Month_Year) %>%
pivot_longer(-c(City, Year, Month, Date),
names_to = "Statistic") %>%
ggplot(aes(x = Date, y = log(value))) +
geom_line(aes(color = Statistic,
linetype = Statistic,
group = Statistic)) +
facet_wrap(~ City)
Created on 2022-07-03 by the reprex package (v2.0.1)
I want to plot the average annual value of the stream flow data using
WATER YEAR which starts at October and ends at September (say 10/01/1983 to 09/30/1984, this is defined as 1984 water year)
I tried to find solutions elsewhere but I have failed.
Now I'm using the following script to plot the annual average flow
library(tidyverse)
library(lubridate)
library(ggplot2)
#df <- read_csv('dataframe.csv')
df <- df %>%
mutate(date = mdy(df$date))
df <- df %>%
mutate(year = floor_date(date, "year")) %>%
group_by(year) %>%
summarize(avg = mean(flow))
y <- df$avg
x <- as.Date(df$year, format = "Y")
d <- data.frame(x = x, y = y)
# interpolate values from zero to y and create corresponding number of x values
vals <- lapply(d$y, function(y) seq(0, y, by = 0.1))
y <- unlist(vals)
mid <- rep(d$x, lengths(vals))
d2 <- data.frame(x = mid - 100,
xend = mid + 100,
y = y,
yend = y)
ggplot(data = d2, aes(x = x, xend = xend, y = y, yend = yend, color = y)) +
geom_segment(size = 2) +
scale_color_gradient2(low = "midnightblue", mid = "deepskyblue", high = "aquamarine",
midpoint = max(d2$y)/2)+
scale_x_date(date_breaks = "1 year",date_labels = "%Y", expand = c(0,0)) +
theme(axis.text.x = element_text(angle=90, vjust=.5))+
labs(x = "Years", y = "Mean Annual Flow (cms)")+
ggtitle("Mean Annual Flow, Rancho River at ELdorado (1983-2020)")+
theme(plot.title = element_text(hjust = 0.5))
For this I got the following results using calendar year
If I used Water Year there will be no results for 1983
The data frame can be found in the following link
https://drive.google.com/file/d/11PVub9avzMFhUz02cHfceGh9DrlVQDbD/view?usp=sharing
Kindly assist.
If date is superior to 10/01/year(date) it means that this is the next year (in water years):
df %>%
mutate(date=mdy(date), year=year(date), year = year + (date >= mdy(paste0("10/01/", year))))
# A tibble: 5,058 x 3
date flow year
<date> <dbl> <dbl>
1 1983-10-01 3.31 1984
2 1983-10-02 3.19 1984
3 1983-10-03 3.7 1984
4 1983-10-04 3.83 1984
5 1983-10-05 3.44 1984
6 1983-10-06 4.37 1984
7 1983-10-07 6.78 1984
8 1983-10-08 6.3 1984
9 1983-10-09 6.46 1984
10 1983-10-10 6.62 1984
# … with 5,048 more rows
I am a starter in R and would like to plot a bar chart of my rainfall and solar radiation data of two years side by side from January to December (attached data).
data to plot:
I am trying to plot the first row (January) but I am getting this error
Error in -0.01 * height : non-numeric argument to binary operator
How to deal with that? and and which script to use to get my data plotted?
Regards,
Here is an example
library(tidyverse)
set.seed(123456)
df <- data.frame(Month = month.abb,
R_2014 = runif(n = 12, min = 0, max = 195),
R_2015 = runif(n = 12, min = 0, max = 295),
S_2014 = runif(n = 12, min = 3, max = 10),
S_2015 = runif(n = 12, min = 4, max = 10))
df
#> Month R_2014 R_2015 S_2014 S_2015
#> 1 Jan 155.56794 267.06645 6.344445 9.714178
#> 2 Feb 146.94519 259.85035 7.903533 9.229704
#> 3 Mar 76.29486 293.18178 9.159223 8.272923
#> 4 Apr 66.60356 264.30712 9.144556 7.632427
#> 5 May 70.45235 259.19979 8.977157 5.352593
#> 6 Jun 38.67722 58.29370 4.161913 8.437571
#> 7 Jul 104.29730 98.82311 6.660781 9.373255
#> 8 Aug 18.82262 229.27586 9.083897 5.766779
#> 9 Sep 192.63015 47.08010 4.618097 7.092115
#> 10 Oct 32.67605 23.79035 3.833566 6.607897
#> 11 Nov 155.60788 39.13185 8.767659 7.450991
#> 12 Dec 115.78983 50.71209 3.561939 8.445736
# convert from wide to long format
# separate columns to get variable and year
df_long <- df %>%
gather(key, value, -Month) %>%
separate(key, into = c("variable", "Year"), "_") %>%
mutate(Month = factor(Month, levels = month.abb))
head(df_long)
#> Month variable Year value
#> 1 Jan R 2014 155.56794
#> 2 Feb R 2014 146.94519
#> 3 Mar R 2014 76.29486
#> 4 Apr R 2014 66.60356
#> 5 May R 2014 70.45235
#> 6 Jun R 2014 38.67722
# facet by year
plt1 <- ggplot(df_long, aes(x = Month, y = value, fill = variable)) +
geom_col(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap(~ Year)
plt1
# facet by variable
plt2 <- ggplot(df_long, aes(x = Month, y = value, fill = Year)) +
geom_col(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
facet_wrap( ~ variable, scales = "free_y")
plt2
Created on 2018-06-01 by the reprex package (v0.2.0).
I have a data frame with data for max 2 years period on different objects:
ISBN Date Quantity
3457 2004-06-15 10
3457 2004-08-16 6
3457 2004-08-19 10
3457 2005-04-19 7
3457 2005-04-20 12
9885 2013-01-15 10
9885 2013-03-16 6
9855 2013-08-19 10
9885 2014-09-19 7
9885 2014-09-20 12
How can I plot Jan to Dec for the 1st year, continued by Jan to Dec for the 2nd year?
I guess the idea is to normalize the years (to have 1st, 2nd), but not the months. (here's an example)
Number of Items Sold over 2 Years Period Since Release
I'd use the lubridate package for something like this. Note I am calling for dataframe df because you didn't give it a name.
So for example:
library(lubridate)
First format the date like so:
df$Date <- ymd(df$Date)
Then extract the month and the year:
df$Month <- month(df$Date, label=TRUE, abbr=TRUE)
df$Year <- year(df$Date)
From there you can plot your results with ggplot2:
library(ggplot2)
ggplot(df, aes(x=Month, y=Quantity, colour=Year)) +
geom_point()
Note your question could be asked better here as you haven't provided a reproducible example.
You could try:
data <- df %>%
group_by(ISBN) %>%
arrange(Date) %>%
mutate(Year = year(Date),
Month = month(Date, label = TRUE),
Rank = paste(sapply(cumsum(Year != lag(Year,default=0)), toOrdinal), "Year")) %>%
group_by(Rank, Month, add = TRUE) %>%
summarise(Sum = sum(Quantity))
ggplot(data = data, aes(x = Month, y = Sum,
group = factor(ISBN),
colour = factor(ISBN))) +
geom_line(stat = "identity") +
facet_grid(. ~ Rank) +
scale_colour_discrete(name = "ISBN") +
theme(panel.margin = unit(0, "lines"),
axis.text.x = element_text(angle = 90))
Aussming the following df:
df <- data.frame(
ISBN = sample(c(3457, 9885), 1000, replace = TRUE),
Date = sample(seq(as.Date('2004/01/01'),
as.Date('2011/12/31'), by = "month"),
1000, replace = TRUE),
Quantity = sample(1:12, 1000, replace = TRUE)
)
This would produce: