Plotting multiple box plots as a single graph in R - r

I am trying to plot multiple box plots as a single graph. The data is where I have done a wilcoxon test. It should be like this
I have four/five questions and I want to plot the respondent score for two sets as a box plot. This should be done for all questions (Two groups for each question).
I am thinking of using ggplot2. My data is like
q1o <- c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4)
q1s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4)
q2o <- c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4)
q2s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4)
....
....
q1 means question 1 and q2 means question 2. I also want to know how to align these stacked box plots based on my need. Like one row or two rows.

This should get you started:
Unfortunately you don't provide a minimal example with sample data, so I will generate some random sample data.
# Generate sample data
set.seed(2017);
df <- cbind.data.frame(
value = rnorm(1000),
Label = sample(c("Good", "Bad"), 1000, replace = T),
variable = sample(paste0("F", 5:11), 1000, replace = T));
# ggplot
library(tidyverse);
df %>%
mutate(variable = factor(variable, levels = paste0("F", 5:11))) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position=position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
You can specify the number of columns and rows in your 2d panel layout through arguments ncol and nrow, respectively, of facet_wrap. Many more details and examples can be found if you follow ?geom_boxplot and ?facet_wrap.
Update 1
A boxplot based on your sample data doesn't make too much sense, because your data are not continuous. But ignoring that, you could do the following:
df <- data.frame(
q1o = c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4),
q1s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4),
q2o = c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4),
q2s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4));
df %>%
gather(key, value, 1:4) %>%
mutate(
variable = ifelse(grepl("q1", key), "F1", "F2"),
Label = ifelse(grepl("o$", key), "Bad", "Good")) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position = position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
Update 2
One way of visualising discrete data would be in a mosaicplot.
mosaicplot(table(df2));
The plot shows the count of value (as filled rectangles) per Variable per Label. See ?mosaicplot for details.

Related

Creating Stacked Density Plot with Weightings

I am attempting to use ggplot2 to create a weighted density plot showing the distribution of two groups that each account for a fraction of a certain distribution. The difficulty that I am encountering stems from the fact that although both groups have the same number of observations in the data, they have different weightings, and I would like for each group's area in the graph to reflect this difference in weightings.
My data look something like this.
var <- sort(rnorm(1000, mean = 5, sd = 2))
df <- tibble(id = c(rep(1, 1000), rep(2, 1000)),
var = c(var,var),
weight = c(rep(.1, 500), rep(.2, 500), rep(.9, 500), rep(.8, 500)))
Observe that, group 1 is given low weightings (.1 or .2) while group 2 is given high weighting of (.9 or .8). Also observe that for any given value of var has weightings that add up to 1. In the real data, the shares accounted for by each group differ in a more complex manner across the distribution of var.
I have tried plotting this data as follows, and although using weight captures the way that the distributions vary within each group, it does not capture the way that the distribution varies between groups.
library(ggplot2)
var <- rnorm(1000, mean = 5, sd = 2)
df %>%
ggplot(aes(x = var, group = id, fill = factor(id), weight = weight)) +
geom_density(position = 'stack')
The resulting plot looks something like this.
It is clear that the groups do not account for around 15% and 85% of the area under the density curve respectively, but the issue is clearer to see when we use position = 'fill'.
Each group seems to take up a similar area, apparently because the weighting is applied before grouping is accounted for. I would like to see a solution that results in the area associated with group 1 being commensurate with it's weight (i.e. much smaller than the area associated with group 2).
To clarify, it is the height associated with each group that should differ. In the above plot, the line of demarcation between group 1 and group 2 should be significantly higher, making the area taken up by group 1 significantly smaller.
Dealing with the relative density of the two groups is a bit ambiguous. Clearly, each group's density needs to have an integral of 1 for it to be a true density. The closest you can come is probably to have the integral of both curves sum to 1, which I think requires you to do the density calculation yourself then plot as a stacked geom_area:
library(tidyverse)
df %>%
nest(data = -id) %>%
summarize(id = factor(id),
weight = unlist(map(data, ~sum(.x$weight))),
dens = map(data, function(.x) {
x <- density(.x$var, weights = .x$weight/sum(.x$weight))
data.frame(x = x$x, y = x$y)
})) %>%
mutate(weight = weight / sum(weight)) %>%
unnest(dens) %>%
mutate(y = y * weight) %>%
ggplot(aes(x, y, fill = id)) +
geom_area(position = 'stack', color = 'black') +
labs(y = 'density', x = 'var')
I am not completely sure if I understand you correctly, but maybe you can calculate the value beforehand based on the weight and then stack it like this:
library(ggplot2)
library(dplyr)
# Stacked
df %>%
mutate(weighted_var = var*weight) %>%
ggplot(aes(x = weighted_var, fill = factor(id), group = id)) +
geom_density(position = 'stack')
And check the groups with fill like this:
# Fill
df %>%
mutate(weighted_var = var*weight) %>%
ggplot(aes(x = weighted_var, fill = factor(id), group = id)) +
geom_density(position = 'fill')
Created on 2022-11-01 with reprex v2.0.2

pie charts in R where slices represent the frequency of values in the columns of the data set

I want to make pie charts for each column of my dataframe, where the slices represent the frequency, in which the values in the columns appear. For instance, the following will produce a data frame with 3 columns, and will round the numbers down to single digits.
test1<-rnorm(200,mean = 20, sd = 2)
test2<-rnorm(200,mean=20, sd =1)
test3<-rnorm(200,mean=20, sd =3)
testdata<-cbind(test,test2,test3)
testdata <-round(testdata,0)
So I would need to have 3 pie charts, where the slices represent the number of times, in which a given value appears in the respective column (with the name of the column on top of the pie chart, if possible)
So far, I have tried pie(frame(testdata$test1)) but it works for creating a single pie chart, and my real data has 25 columns. On top of that, trying to pass a "main=" argument to name it, results in error.
Thank you in advance.
ggplot2 is the go-to library to make nice plots. To have 3 different pie-plots one needs to adjust the data a bit, which is done with some tidyverse-functions.
test1<-rnorm(200,mean = 20, sd = 2)
test2<-rnorm(200,mean=20, sd =1)
test3<-rnorm(200,mean=20, sd =3)
testdata<-cbind(test1,test2,test3)
testdata <-round(testdata,0)
library(ggplot2)
library(tidyverse)
plotdata <- testdata %>%
as_tibble() %>%
pivot_longer(names(.),names_to = "data1", values_to = "value") %>%
group_by(data1) %>%
count(value)
ggplot(plot_data, aes( x = "", y = n, fill = factor(value))) +
geom_col(width = 1, show.legend = TRUE) +
coord_polar("y", start = 0) +
facet_wrap(~data1)

ggplot2 - Two color series in area chart

I've got a question regarding an edge case with ggplot2 in R.
They don't like you adding multiple legends, but I think this is a valid use case.
I've got a large economic dataset with the following variables.
year = year of observation
input_type = *labor* or *supply chain*
input_desc = specific type of labor (eg. plumbers OR building supplies respectively)
value = percentage of industry spending
And I'm building an area chart over approximately 15 years. There are 39 different input descriptions and so I'd like the user to see the two major components (internal employee spending OR outsourcing/supply spending)in two major color brackets (say green and blue), but ggplot won't let me group my colors in that way.
Here are a few things I tried.
Junk code to reproduce
spec_trend_pie<- data.frame("year"=c(2006,2006,2006,2006,2007,2007,2007,2007,2008,2008,2008,2008),
"input_type" = c("labor", "labor", "supply", "supply", "labor", "labor","supply","supply","labor","labor","supply","supply"),
"input_desc" = c("plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck", "plumber" ,"manager", "pipe", "truck"),
"value" = c(1,2,3,4,4,3,2,1,1,2,3,4))
spec_broad <- ggplot(data = spec_trend_pie, aes(y = value, x = year, group = input_type, fill = input_desc)) + geom_area()
Which gave me
Error in f(...) : Aesthetics can not vary with a ribbon
And then I tried this
sff4 <- ggplot() +
geom_area(data=subset(spec_trend_pie, input_type="labor"), aes(y=value, x=variable, group=input_type, fill= input_desc)) +
geom_area(data=subset(spec_trend_pie, input_type="supply_chain"), aes(y=value, x=variable, group=input_type, fill= input_desc))
Which gave me this image...so closer...but not quite there.
To give you an idea of what is desired, here's an example of something I was able to do in GoogleSheets a long time ago.
It's a bit of a hack but forcats might help you out. I did a similar post earlier this week:
How to factor sub group by category?
First some base data
set.seed(123)
raw_data <-
tibble(
x = rep(1:20, each = 6),
rand = sample(1:120, 120) * (x/20),
group = rep(letters[1:6], times = 20),
cat = ifelse(group %in% letters[1:3], "group 1", "group 2")
) %>%
group_by(group) %>%
mutate(y = cumsum(rand)) %>%
ungroup()
Now, use factor levels to create gradients within colors
df <-
raw_data %>%
# create factors for group and category
mutate(
group = fct_reorder(group, y, max),
cat = fct_reorder(cat, y, max) # ordering in the stack
) %>%
arrange(cat, group) %>%
mutate(
group = fct_inorder(group), # takes the category into account first
group_fct = as.integer(group), # factor as integer
hue = as.integer(cat)*(360/n_distinct(cat)), # base hue values
light_base = 1-(group_fct)/(n_distinct(group)+2), # trust me
light = floor(light_base * 100) # new L value for hcl()
) %>%
mutate(hex = hcl(h = hue, l = light))
Create a lookup table for scale_fill_manual()
area_colors <-
df %>%
distinct(group, hex)
Lastly, make your plot
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "stack") +
scale_fill_manual(
values = area_colors$hex,
labels = area_colors$group
)

several plots (scatter plot of a specific variable vs other variables) in R

How can I modify this code to have all plots together and on one page(loop function)? In my real data set, I want to have the scatter plot between the dependent variable and 10 independent variables. The scatter plot between the dependent variable and each IV separately.
plot(rock$area, rock$peri)
plot(rock$area, rock$shape)
plot(rock$area, rock$perm)
Since you tagged your question with ggplot2, I'll assume that you are interested in a ggplot2 solution.
rock <- data.frame(area = sample(1:100, 10, replace = TRUE),
peri = sample(1:100, 10, replace = TRUE),
shape = sample(1:100, 10, replace = TRUE),
perm = sample(1:100, 10, replace = TRUE))
Now we can make the data tidy (a column for y variable names, another column for y variable values) and use facets to create separate plots per y variable.
library(tidyr)
library(ggplot2)
rock %>%
gather(yvar, val, -area) %>%
ggplot(aes(area, val)) +
geom_point() +
facet_grid(yvar ~ .)

Multiplot of multiplots in ggplot2

I recently discovered the multiplot function from the Rmisc package to produce stacked plots using ggplot2 plots/objects. What I am trying to do now is to create a multiplot of multiplots. Unfortunately, unlike the ggplot function, multiplot does not produce objects, so my issue cannot be resolved by simply nesting multiplot.
I will create a dataframe to make my point clear. In my dataframe named df, I have 3 columns: period, group and value. A certain value is recorded for each of 3 groups over 10 periods. (Note: I don't use a seed number below despite the use of the sample function because the focus is not numerical, it is graphical)
# Create a data frame for illustration purposes
df <- data.frame(period = rep(1:10, 3),
group = rep(LETTERS[1:3], each = 10),
value = sample(100, 30, replace = TRUE))
I then add a fourth column to df, which is the exponential transformation of the value column.
df$exp.value = exp(df$value)
I would like to create stacked plots allowing me to compare the values in each group to their exponential counterparts.
# Split dataframe by group
df_split <- split(df, df$group)
# Plots of values in each group
plots <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = value)) + geom_line()
})
# Plots of logged values in each group
plots_exp <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = exp.value)) + geom_line()
})
plots and plots_exp are both lists of 3 elements each containing ggplot objects. The first element of each list corresponds to group A, the second element corresponds to group B and the third element corresponds to group C.
In order to compare each group's values to the exponential values, I can use the multiplot function. Following is an example with group A:
multiplot(plots[[1]], plots_log[[1]], cols = 1)
How can I create a grid which will include the multiplot above as well as the ones for groups B and C? As if the code included ... + facet_grid(. ~ group)?
We can use cowplot package:
library(cowplot)
plot_grid(plots[[1]], plots_exp[[1]],
plots[[2]], plots_exp[[2]],
plots[[3]], plots_exp[[3]],
labels = c("A", "A", "B", "B", "C", "C"),
ncol = 1, align = "v")
We can output to a pdf looping through plots and plots_exp list objects. Every page will contain 2 plots. This is a better option when we have a lot of groups:
pdf("myPlots.pdf")
lapply(seq(length(plots)), function(i){
plot_grid(plots[[i]], plots_exp[[i]], ncol = 1, align = "v")
})
dev.off()
Another option is to prepare the data for ggplot and use facet as usual:
library(dplyr)
library(tidyr)
library(ggplot2)
gather(df, valueType, value, -c(group, period)) %>%
mutate(myGroup = paste(group, valueType)) %>%
ggplot(aes(period, value)) +
geom_line() +
facet_grid(myGroup ~ ., scales = "free_y")

Resources