I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2
I have 400 columns with dynamic names (t_namelist1_namelist2). There are 20 names in each namelist1 and namelist2. I have to create a histogram with facets 20 by 20, with labels -
along row - namelist1
along col - namelist2
Can someone please show how to transform the data using pivot_longer() using _ which separates the three parts t, namelist1 and namelist2
In the sample problem, I have a tibble with 4 columns, I want to create 4 individual histograms in 2by2 facets with labels -
along row - a and b
along col - x and y
Thanks
library(tidyverse)
t_a_x <- rnorm(100)
t_b_x <- rnorm(100)
t_a_y <- rnorm(100)
t_b_y <- rnorm(100)
tbl <- tibble(t_a_x, t_a_y, t_b_x, t_b_y)
# create a histogram in 2 by 2 facets with labels -
# along row - a and b
# along col - x and y
The first chunk of code below rearranges the example data into three-column data frame, each column corresponding to either "ab", "xy", or the value ("t"); basically separating the original column name by "_". Then you can plot and facet based on "xy" and "ab".
# Rearrange table by separating the column names by "_" using pivot_longer()
tbl_formatted <- tbl %>% pivot_longer(everything(),
names_to = c(".value", "ab", "xy"),
names_sep = c("_")
)
# Plot
tbl_formatted %>% ggplot(aes(x = t, y = t)) +
geom_point() +
facet_wrap(facets = ab ~ xy)
This is a very basic version of the plot; you can customize it with colors and more.
To get it, you need to properly rearrange your dataframe:
library(tidyverse)
tbl <- tibble(value = c(t_a_x, t_b_x, t_a_y, t_b_y),
lab1 = rep(c("a", "b", "a", "b"), each = 100),
lab2 = rep(c("x", "x", "y", "y"), each = 100))
ggplot(tbl) +
geom_histogram(aes(value), binwidth = 0.5) +
facet_grid(lab1~lab2)
I am trying to plot multiple box plots as a single graph. The data is where I have done a wilcoxon test. It should be like this
I have four/five questions and I want to plot the respondent score for two sets as a box plot. This should be done for all questions (Two groups for each question).
I am thinking of using ggplot2. My data is like
q1o <- c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4)
q1s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4)
q2o <- c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4)
q2s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4)
....
....
q1 means question 1 and q2 means question 2. I also want to know how to align these stacked box plots based on my need. Like one row or two rows.
This should get you started:
Unfortunately you don't provide a minimal example with sample data, so I will generate some random sample data.
# Generate sample data
set.seed(2017);
df <- cbind.data.frame(
value = rnorm(1000),
Label = sample(c("Good", "Bad"), 1000, replace = T),
variable = sample(paste0("F", 5:11), 1000, replace = T));
# ggplot
library(tidyverse);
df %>%
mutate(variable = factor(variable, levels = paste0("F", 5:11))) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position=position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
You can specify the number of columns and rows in your 2d panel layout through arguments ncol and nrow, respectively, of facet_wrap. Many more details and examples can be found if you follow ?geom_boxplot and ?facet_wrap.
Update 1
A boxplot based on your sample data doesn't make too much sense, because your data are not continuous. But ignoring that, you could do the following:
df <- data.frame(
q1o = c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4),
q1s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4),
q2o = c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4),
q2s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4));
df %>%
gather(key, value, 1:4) %>%
mutate(
variable = ifelse(grepl("q1", key), "F1", "F2"),
Label = ifelse(grepl("o$", key), "Bad", "Good")) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position = position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
Update 2
One way of visualising discrete data would be in a mosaicplot.
mosaicplot(table(df2));
The plot shows the count of value (as filled rectangles) per Variable per Label. See ?mosaicplot for details.
I have a simple dataframe containing three columns:
ST_CODE | VALUE | HEIGHT
... ... ...
factor continuous continuous
I want a VALUE boxplot for each ST_CODE, but I want the order on the x axis to be determined by the ascending order of HEIGHT.
This is the code:
ggplot(ozone, aes(x = ST_CODE, y = VALUE)) +
geom_boxplot(notch=TRUE)
Ordering ozone inside the ggplot function by doing ozone[order(ozone$HEIGHT),] was useless, because the order is determined by ST_CODE. What should I do?
Here's the dataset: https://www.dropbox.com/s/kf0jcv50oaa5my9/ozone_example.csv?dl=0
I have found this question, but I didn't really get it: Rearrange x axis according to a variable in ggplot
The solution should be to order the levels of the factor variable ST_CODE according to the VALUE column.
Until you provide example data this is my best guess :-)
Edit 1: I have added read.csv to read your data and I would say it works. To make it easier to check the result I have used only the first 1000 rows which contain only three different ST_CODEs).
library(ggplot2)
# example data
# data <- data.frame( ST_CODE = rep(c("A", "B", "C"), 2), VALUE = rep(3:1, 2), HEIGHT = rep(c(2, 1, 3), 2))
# data
# Your data
data <- read.csv("ozone_example.csv")
data <- data[1:1000,]
table(data$ST_CODE, data$HEIGHT) # indicates how to order ST_CODEs
# plot (not sorted by HEIGHT)
ggplot(data, aes(x = ST_CODE, y = VALUE)) +
geom_boxplot(notch=TRUE)
# Plot sorted by HEIGHT by changing the factor level order
ordered.data <- data[order(data$HEIGHT),]
data$ST_CODE <- factor(data$ST_CODE, levels = unique(ordered.data$ST_CODE))
ggplot(data, aes(x = ST_CODE, y = VALUE)) +
geom_boxplot(notch=TRUE)
df1 <- data.frame(a=c(1,4,7),
b=c(3, 5, 6),
c=c(1, 1, 4),
d=c(2 ,6 ,3))
df2<-data.frame(id=c("a","f","f","b","b","c","c","c","d","d"),
var=c(12,20,15,18,10,30,5,8,5,5))
mediorder <- with(df2, reorder(id, -var, median))
boxplot(var~mediorder, data = df2)
fc = levels(as.factor(mediorder))
ndf1= df1[,intersect(fc, colnames(df1))]
ln<-lm( #confused here
boxplot(ndf1)
abline(ln)
I have the above boxplot (ndf1) with an x-axis ordered according to medians from another data frame, and I would like to add a trendline to it.
I am confused since it doesn't have an x and y variable to refer to, just columns with counts. Also the ordering is causing me problems.
EDITED for clarification...
I am building on the question here: How to match an ordered list (e.g., levels(as.factor(x)) ) to another dataframe in which only some columns match?
All I would like to do is fit a trend line to ndf1
Something like this should do. It's fairly easy using ggplot2. However, your data/question are a bit confusing e.g. Some factors (a,d) have one data point only. Is this what you want?
df2$id <- factor(df2$id , levels = levels(mediorder))
library(ggplot2)
ggplot(data = df2, aes(x = id, y = var)) + geom_boxplot() +
geom_smooth(method = "lm", aes(group = 1), se = F)