Creating row-wise pairs in R - r

I am trying to pair rows for use in a dumbbell plot. I have a df that looks like this:
Year
Species
Tonnes
1960
Cod
123
1961
Cod
456
1970
Cod
124
1971
Cod
457
I want to pair the up results 10 years apart, resulting in this df:
Year
Species
Tonnes
Pair
1960
Cod
123
1
1961
Cod
456
2
1970
Cod
124
1
1971
Cod
457
2
I would very much appreciate help. I wasn't too sure where to begin with the problem.

You could do
df <- structure(list(Year = c(1960L, 1961L, 1970L, 1971L), Species = c("Cod",
"Cod", "Cod", "Cod"), Tonnes = c(123, 150, 256, 450)), row.names = c(NA,
-4L), class = "data.frame")
library(tidyverse)
df %>%
mutate(year = Year %% 10,
decade = 10 * Year %/% 10) %>%
select(-Year) %>%
group_by(Species, year) %>%
summarize(from = Tonnes[which.min(decade)],
to = Tonnes[which.max(decade)],
year = paste(min(year + decade), max(year + decade), sep = '-')) %>%
ggplot(aes(from, year)) +
geom_linerange(aes(xmin = from, xmax = to), alpha = 0.5) +
geom_point(color = 'green4', size = 3) +
geom_point(aes(x = to), color = 'red3', size = 3) +
xlab('Tonnes') +
theme_minimal(base_size = 16)

Using data.table, a join will get the pairs in wide format:
library(data.table)
dt <- setDT(df)[
, `:=`(Year2 = Year + 10, Pair = rleid(Year, Species))
][
df,
.(Year1 = i.Year, Year2 = x.Year, Species, Tonnes1 = i.Tonnes, Tonnes2 = Tonnes, Pair = i.Pair),
on = .(Year = Year2, Species), nomatch = 0
]
dt
#> Year1 Year2 Species Tonnes1 Tonnes2 Pair
#> 1: 1960 1970 Cod 123 124 1
#> 2: 1961 1971 Cod 456 457 2
which can be melted to long format, if desired:
setcolorder(
melt(dt, c("Species", "Pair"), list(c("Year1", "Year2"), c("Tonnes1", "Tonnes2")), value.name = c("Year", "Tonnes")),
c("Year", "Species", "Tonnes", "Pair")
)[, variable := NULL][]
#> Year Species Tonnes Pair
#> 1: 1960 Cod 123 1
#> 2: 1961 Cod 456 2
#> 3: 1970 Cod 124 1
#> 4: 1971 Cod 457 2
Data:
df <- data.frame(Year = c(1960, 1961, 1970, 1971), Species = "Cod", Tonnes = c(123, 456, 124, 457))

Related

R: Is there a way to select a column according to the current year?

Say you have a database like gapminder with the population per country. Even though the current year is 2021, you also have predictions for the following years to come.
location 2020.0 2021.0 2022.0
Canada 5 7 9
China 23 34 54
Congo 1 2 3
and another database like this, vaccins
location date amount_of_vaccins
Canada 2020-01-02 50
China 2021-05-03 59
Congo 2022-03-05 34
How can I merge the population of each country into the second database, but following the dates in the second database.
I managed to merge them by country like this:
merge(gapminder,vaccins, by = "location")
but I'm getting this
location date amount_of_vaccins 2020.0 2021.0 2022.0
Canada 2020-01-02 50 5 7 9
China 2021-05-03 59 23 34 54
Congo 2022-03-05 34 1 2 3
I'd like to have only a new variable giving the population of the country according to the year. Thank you.
You could do something like this with tidyverse.
library(tidyverse)
df1 <- df1 %>%
pivot_longer(!location, names_to = "date", values_to = "population") %>%
dplyr::mutate(year = str_sub(date, 1, 4))
df2 %>%
dplyr::mutate(year = str_sub(date, end = 4)) %>%
dplyr::left_join(., df1, by = c("location", "year")) %>%
dplyr::select(-c(date.y, year)) %>%
dplyr::rename(date = date.x)
Output
location date amount_of_vaccins population
1 Canada 2020-01-02 50 5
2 China 2021-05-03 59 34
3 Congo 2022-03-05 54 3
Data
df1 <-
structure(
list(
location = c("Canada", "China", "Congo"),
`2020.0` = c(5, 23, 1),
`2021.0` = c(7, 34, 2),
`2022.0` = c(9, 54, 3)
),
class = "data.frame",
row.names = c(NA,-3L)
)
df2 <-
structure(
list(
location = c("Canada", "China", "Congo"),
date = c("2020-01-02",
"2021-05-03", "2022-03-05"),
amount_of_vaccins = c(50, 59, 54)
),
class = "data.frame",
row.names = c(NA,-3L)
)

Find percentage of NA values per month and year in a list of dara frames

I have a list of 83 csv files with three variables.
I have created new date columns including, month and year.
One of my dataframes from the list looks like this:
> head(estaciones$AeropuertodeBocas_93002)
Date Tx2m Tn2m Pr year month day
1 1988-01-01 27.4 23.1 41.3 1988 1 1
2 1988-01-02 29.8 24.0 0.3 1988 1 2
3 1988-01-03 30.4 24.0 0.4 1988 1 3
4 1988-01-04 30.0 24.2 2.4 1988 1 4
5 1988-01-05 29.6 23.2 9.1 1988 1 5
6 1988-01-06 30.0 23.1 5.2 1988 1 6
I would like to create a new file with the percentage of NA values per variable and per month and year. For example Jun 1988: 2% of missing values for variable "Pr" and dataframe "x".
I have tried using:
na_by_month <- map(estaciones, ~ .x %>%
mutate(Month=month(Date), Mis = rowSums(is.na(.))) %>%
group_by(Month) %>%
summarise(Sum=sum(Mis), Percentage=mean(Mis)))
This is only calculating missing values percentage for each month for the whole series and not per year.
Data (one of several dfs):
df <- structure(list(Date = structure(c(6574,
6575, 6576, 6577, 6578, 6579), class = "Date"),
Tx2m = c(27.4, 29.8, 30.4, 30, 29.6, 30),
Tn2m = c(23.1, 24, 24, 24.2, 23.2, 23.1),
Pr = c(41.3, 0.3, 0.4, 2.4, 9.1, 5.2),
year = c(1988, 1988, 1988, 1988, 1988, 1988 ),
month = c(1, 1, 1, 1, 1, 1), day = 1:6),
row.names = c(NA, 6L), class = "data.frame")
How can I create a new file containing percentage of missing values for each of my data frames inside the list, per month and per year? Thank You
If you're trying to calculate the percentage of missing values by month/year and just by year you could write a function that you can then map to your list of dataframes:
library(dplyr)
library(purrr)
library(openxlsx)
library(rlang)
ldf <- list(df, df, df)
f <- function(data, ...){
v <- enquos(...)
data %>%
group_by(!!! v) %>%
summarize(across(Tx2m:Pr,
list(missing = ~ mean(is.na(.))),
.names = paste0("{.col}_{.fn}_", quo_name(v[[1]]))),
.groups = "drop")
}
miss <- imap(ldf, ~ left_join(f(.x, month, year), f(.x, year), by = "year"))
write.xlsx(miss, "output.xlsx")
How it works
You provide the function f your dataframe and the variables you want to group by and it will calculate the percentage of missing values for those group by variables. For example, f(df, month, year) will group your data by month and year and calculate the percentage of missing values for each variable in the range Tx2m:Pr.
f(df, month, year)
month year Tx2m_missing_month Tn2m_missing_month Pr_missing_month
<int> <int> <dbl> <dbl> <dbl>
1 1 1988 0 0 0
f(df, year)
year Tx2m_missing_year Tn2m_missing_year Pr_missing_year
<int> <dbl> <dbl> <dbl>
1 1988 0 0 0
Note: the order of your grouping variables matters here. The first group by variable is used to construct the output variable names (eg Tn2m_missing_month).
If you want the number of missing by month/year and by year for each element of your list, then we can apply this function using imap and merge the results by year.
left_join(f(df, month, year), f(df, year), by = "year")
month year Tx2m_missing_month Tn2m_missing_month Pr_missing_month
<int> <int> <dbl> <dbl> <dbl>
1 1 1988 0 0 0
# ... with 3 more variables: Tx2m_missing_year <dbl>,
# Tn2m_missing_year <dbl>, Pr_missing_year <dbl>
Note: The missing by year will be repeated for each month within the year.
Lastly, write.xlsx will write a list of dataframes to an Excel workbook, where each sheet will be an element of your list.
If I've misunderstood your post and you only want the percentage missing by month within year then you can simplify this to:
miss <- imap(ldf, ~ f(.x, month, year))
Plot
To plot you could do something like this:
library(ggplot2)
library(tidyr)
library(scales)
library(lubridate)
plots <- imap(miss, ~ .x %>%
select(ends_with("year")) %>%
distinct() %>%
pivot_longer(cols = -year,
names_pattern = "(.*?)_(.*)",
names_to = c("var", NA)) %>%
mutate(date = ymd(year, truncated = 2L)) %>%
ggplot(aes(x = date, y = value, color = var, group = var)) +
geom_point() +
geom_line() +
scale_y_continuous(labels = percent_format()) +
scale_x_date(date_breaks = "1 year",
date_labels = "%Y")
)
plots[[1]]
where each variable is a line, it's y-axis value is the percent missing, and the x-axis is the year.
Note: with the given data in the example, the graphic is not that interesting and gives a warning about there being only one point. Additionally, all the points are overlapping on the same (x,y) coordinate with the given data.
df <- structure(list(Date = structure(c(6574, 6575, 6576, 6577, 6578, 6579), class = "Date"),
Tx2m = c(27.4, 29.8, 30.4, 30, 29.6, 30), Tn2m = c(23.1, 24, 24, 24.2, 23.2, 23.1),
Pr = c(41.3, 0.3, 0.4, 2.4, 9.1, 5.2),
year = c(1988, 1988, 1988, 1988, 1988, 1988 ),
month = c(1, 1, 1, 1, 1, 1), day = 1:6),
row.names = c(NA, 6L), class = "data.frame")
nongroup_vars <- setdiff(colnames(df),c('year','month'))
nongroup_vars_mr <- paste0(nongroup_vars,'_missing_ratio')
df %>%
group_by(month,year) %>%
summarise_all(function(x) mean(is.na(x))) %>%
ungroup %>%
rename_with(~nongroup_vars_mr,all_of(nongroup_vars))
it says missing ratios for each group.
output;
# A tibble: 1 × 7
month year Date_missing_ratio Tx2m_missing_ratio Tn2m_missing_ratio Pr_missing_ratio day_missing_ratio
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1988 0 0 0 0 0

Calculation of "average sales share " with dplyr::mutate

My data concerns a company and includes Total Sales and the amount of sales in three counties CA , TX and WI.
Data :
> dput(head(WalData))
structure(list(CA = c(11047, 9925, 11322, 12251, 16610, 14696
), TX = c(7381, 5912, 9006, 6226, 9440, 9376), WI = c(6984, 3309,
8883, 9533, 11882, 8664), Total = c(25412, 19146, 29211, 28010,
37932, 32736), date = structure(c(1296518400, 1296604800, 1296691200,
1296777600, 1296864000, 1296950400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), event_type = c("NA", "NA", "NA", "NA", "NA", "Sporting"
), snap_CA = c(1, 1, 1, 1, 1, 1), snap_TX = c(1, 0, 1, 0, 1,
1), snap_WI = c(0, 1, 1, 0, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
With the following code i am trying to calculate the average sales share of the three states on the company's total sales.
In addition, i need the same average percentages for each year, month of the year and day of the week.
install.packages("dplyr")
install.packages("lubridate")
library(dplyr)
library(lubridate)
df1 <- df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
# Average per Year
df1 %>%
dplyr::group_by(YEAR) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
# Average per Month
df1 %>%
dplyr::group_by(MONTH) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
# Average per Weekday
df1 %>%
dplyr::group_by(WEEKDAY) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
Output :
> df1 <- df %>%
+ dplyr::mutate(YEAR = lubridate::year(date),
+ MONTH = lubridate::month(date),
+ WEEKDAY = lubridate::wday(date),
+ P_CA = CA / Total,
+ P_TX = TX / Total,
+ P_WI = WI / Total)
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
> # Average per Year
> df1 %>%
+ dplyr::group_by(YEAR) %>%
+ dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
+ AV_TX = mean(P_TX, na.rm = TRUE),
+ AV_WI = mean(P_WI, na.rm = TRUE))
Error in eval(lhs, parent, parent) : object 'df1' not found
It comes with an error : Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
I cant figure out whats wrong , i double checked the code and the correctness of the data .
Please give a solution .
The issue would be that df is not created as an object in the global env and there is a function with name df if we do ?df
df(x, df1, df2, ncp, log = FALSE)
Basically, the error is based on applying mutate on a function df rather than an object
Checking on a fresh R session with no objects created
df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
Now, we define 'df' as
df <- WalData
df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
# A tibble: 6 x 15
# CA TX WI Total date event_type snap_CA snap_TX snap_WI YEAR MONTH WEEKDAY P_CA P_TX P_WI
# <dbl> <dbl> <dbl> <dbl> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 11047 7381 6984 25412 2011-02-01 00:00:00 NA 1 1 0 2011 2 3 0.435 0.290 0.275
#2 9925 5912 3309 19146 2011-02-02 00:00:00 NA 1 0 1 2011 2 4 0.518 0.309 0.173
#3 11322 9006 8883 29211 2011-02-03 00:00:00 NA 1 1 1 2011 2 5 0.388 0.308 0.304
#4 12251 6226 9533 28010 2011-02-04 00:00:00 NA 1 0 0 2011 2 6 0.437 0.222 0.340
#5 16610 9440 11882 37932 2011-02-05 00:00:00 NA 1 1 1 2011 2 7 0.438 0.249 0.313
#6 14696 9376 8664 32736 2011-02-06 00:00:00 Sporting 1 1 1 2011 2 1 0.449 0.286 0.265

Creating min. date and max. date columns based on Quarter, Month, YTD

I have a data frame like the following:
Frequency Period Period No. Year
Monthly 1 1 2018
Quarterly Q1 3 2018
YTD YTD-Feb 2 2019
Based on these columns, I'd like to add a min. date and max. date column so that the data frame looks like this:
Frequency Period Period No. Year Min. Date Max. Date
Monthly 1 1 2018 1/1/2018 1/31/2018
Quarterly Q1 3 2018 1/1/2018 3/31/2018
YTD YTD-Feb 2 2019 1/1/2019 2/28/2019
If we need the max, min based on the 'PeriodNo.' column, create a sequence of Dates by month from the 'Year' column, then extract the min and max`
library(dplyr)
library(purrr)
library(lubridate)
library(stringr)
df1 %>%
mutate(date = map2(as.Date(str_c(Year, '-01-01')),
PeriodNo., ~ seq(.x, length.out = .y, by = '1 month')),
Min.Date = do.call(c, map(date, min)),
Max.Date = do.call(c, map(date, ~ceiling_date(max(.x), 'month')-1))) %>%
select(-date)
# Frequency Period PeriodNo. Year Min.Date Max.Date
#1 Monthly 1 1 2018 2018-01-01 2018-01-31
#2 Quarterly Q1 3 2018 2018-01-01 2018-03-31
#3 YTD YTD-Feb 2 2019 2019-01-01 2019-02-28
Or an option with Map
lst1 <- Map(function(x, y) seq(as.Date(paste0(x, "-01-01")),
length.out = y, by = '1 month'), df1$Year, df1$PeriodNo.)
df1$Min.Date <- do.call(c, lapply(lst1, min))
df1$Max.Date <- do.call(c, lapply(lst1, function(x) (max(x) + months(1) -1)) )
data
df1 <- structure(list(Frequency = c("Monthly", "Quarterly", "YTD"),
Period = c("1", "Q1", "YTD-Feb"), PeriodNo. = c(1L, 3L, 2L
), Year = c(2018L, 2018L, 2019L)), class = "data.frame",
row.names = c(NA,
-3L))

Group data into multiple season and boxplot side by side using ggplot in R?

I would like to group data into multiple seasin such that my season are winter: Dec - Feb; Spring: Mar - May; Summer: Jun -Aug, and Fall: Sep - Nov. I would then like to boxplot the Winter and Spring seasonal data comparing A to B and then A to C. Here is my laborious code so far. I would appreciate an efficient way of data grouping and plotting.
library(tidyverse)
library(reshape2)
Dates30s = data.frame(seq(as.Date("2011-01-01"), to= as.Date("2040-12-31"),by="day"))
colnames(Dates30s) = "date"
FakeData = data.frame(A = runif(10958, min = 0.5, max = 1.5), B = runif(10958, min = 1.6, max = 2), C = runif(10958, min = 0.8, max = 1.8))
myData = data.frame(Dates30s, FakeData)
myData = separate(myData, date, sep = "-", into = c("Year", "Month", "Day"))
myData$Year = as.numeric(myData$Year)
myData$Month = as.numeric(myData$Month)
SeasonalData = myData %>% group_by(Year, Month) %>% summarise_all(funs(mean)) %>% select(Year, Month, A, B, C)
Spring = SeasonalData %>% filter(Month == 3 | Month == 4 |Month == 5)
Winter1 = SeasonalData %>% filter(Month == 12)
Winter1$Year = Winter1$Year+1
Winter2 = SeasonalData %>% filter(Month == 1 | Month == 2 )
Winter = rbind(Winter1, Winter2) %>% filter(Year >= 2012 & Year <= 2040) %>% group_by(Year) %>% summarise_all(funs(mean)) %>% select(-"Month")
BoxData = gather(Winter, key = "Variable", value = "value", -Year )
ggplot(BoxData, aes(x=Variable, y=value,fill=factor(Variable)))+
geom_boxplot() + labs(title="Winter") +facet_wrap(~Variable)
I would like to have Two figures: Figure 1 split in two; one for Winter season and one for Summer season (see BoxPlot 1) and one for Monthly annual average representing average monthly values across the entire time period (2011 -2040) see Boxplot 2
This is what I usually do it. All calculation and plotting are based on water year (WY) or hydrologic year from October to September.
library(tidyverse)
library(lubridate)
set.seed(123)
Dates30s <- data.frame(seq(as.Date("2011-01-01"), to = as.Date("2040-12-31"), by = "day"))
colnames(Dates30s) <- "date"
FakeData <- data.frame(A = runif(10958, min = 0.3, max = 1.5),
B = runif(10958, min = 1.2, max = 2),
C = runif(10958, min = 0.6, max = 1.8))
### Calculate Year, Month then Water year (WY) and Season
myData <- data.frame(Dates30s, FakeData) %>%
mutate(Year = year(date),
MonthNr = month(date),
Month = month(date, label = TRUE, abbr = TRUE)) %>%
mutate(WY = case_when(MonthNr > 9 ~ Year + 1,
TRUE ~ Year)) %>%
mutate(Season = case_when(MonthNr %in% 9:11 ~ "Fall",
MonthNr %in% c(12, 1, 2) ~ "Winter",
MonthNr %in% 3:5 ~ "Spring",
TRUE ~ "Summer")) %>%
select(-date, -MonthNr, -Year) %>%
as_tibble()
myData
#> # A tibble: 10,958 x 6
#> A B C Month WY Season
#> <dbl> <dbl> <dbl> <ord> <dbl> <chr>
#> 1 0.645 1.37 1.51 Jan 2011 Winter
#> 2 1.25 1.79 1.71 Jan 2011 Winter
#> 3 0.791 1.35 1.68 Jan 2011 Winter
#> 4 1.36 1.97 0.646 Jan 2011 Winter
#> 5 1.43 1.31 1.60 Jan 2011 Winter
#> 6 0.355 1.52 0.708 Jan 2011 Winter
#> 7 0.934 1.94 0.825 Jan 2011 Winter
#> 8 1.37 1.89 1.03 Jan 2011 Winter
#> 9 0.962 1.75 0.632 Jan 2011 Winter
#> 10 0.848 1.94 0.883 Jan 2011 Winter
#> # ... with 10,948 more rows
Calculate seasonal and monthly average by WY
### Seasonal Avg by WY
SeasonalAvg <- myData %>%
select(-Month) %>%
group_by(WY, Season) %>%
summarise_all(mean, na.rm = TRUE) %>%
ungroup() %>%
gather(key = "State", value = "MFI", -WY, -Season)
SeasonalAvg
#> # A tibble: 366 x 4
#> WY Season State MFI
#> <dbl> <chr> <chr> <dbl>
#> 1 2011 Fall A 0.939
#> 2 2011 Spring A 0.907
#> 3 2011 Summer A 0.896
#> 4 2011 Winter A 0.909
#> 5 2012 Fall A 0.895
#> 6 2012 Spring A 0.865
#> 7 2012 Summer A 0.933
#> 8 2012 Winter A 0.895
#> 9 2013 Fall A 0.879
#> 10 2013 Spring A 0.872
#> # ... with 356 more rows
### Monthly Avg by WY
MonthlyAvg <- myData %>%
select(-Season) %>%
group_by(WY, Month) %>%
summarise_all(mean, na.rm = TRUE) %>%
ungroup() %>%
gather(key = "State", value = "MFI", -WY, -Month) %>%
mutate(Month = factor(Month))
MonthlyAvg
#> # A tibble: 1,080 x 4
#> WY Month State MFI
#> <dbl> <ord> <chr> <dbl>
#> 1 2011 Jan A 1.00
#> 2 2011 Feb A 0.807
#> 3 2011 Mar A 0.910
#> 4 2011 Apr A 0.923
#> 5 2011 May A 0.888
#> 6 2011 Jun A 0.876
#> 7 2011 Jul A 0.909
#> 8 2011 Aug A 0.903
#> 9 2011 Sep A 0.939
#> 10 2012 Jan A 0.903
#> # ... with 1,070 more rows
Plot seasonal and monthly data
### Seasonal plot
s1 <- ggplot(SeasonalAvg, aes(x = Season, y = MFI, color = State)) +
geom_boxplot(position = position_dodge(width = 0.7)) +
geom_point(position = position_jitterdodge(seed = 123))
s1
### Monthly plot
m1 <- ggplot(MonthlyAvg, aes(x = Month, y = MFI, color = State)) +
geom_boxplot(position = position_dodge(width = 0.7)) +
geom_point(position = position_jitterdodge(seed = 123))
m1
Bonus
### https://stackoverflow.com/a/58369424/786542
# if (!require(devtools)) {
# install.packages('devtools')
# }
# devtools::install_github('erocoar/gghalves')
library(gghalves)
s2 <- ggplot(SeasonalAvg, aes(x = Season, y = MFI, color = State)) +
geom_half_boxplot(nudge = 0.05) +
geom_half_violin(aes(fill = State),
side = "r", nudge = 0.01) +
theme_light() +
theme(legend.position = "bottom") +
guides(fill = guide_legend(nrow = 1))
s2
s3 <- ggplot(SeasonalAvg, aes(x = Season, y = MFI, color = State)) +
geom_half_boxplot(nudge = 0.05, outlier.color = NA) +
geom_dotplot(aes(fill = State),
binaxis = "y", method = "histodot",
dotsize = 0.35,
stackdir = "up", position = PositionDodge) +
theme_light() +
theme(legend.position = "bottom") +
guides(color = guide_legend(nrow = 1))
s3
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2019-10-16 by the reprex package (v0.3.0)

Resources