This question already has answers here:
Building a box plot from all columns of data frame with column names on x in ggplot2 [duplicate]
(1 answer)
Multiple boxplots using ggplot
(1 answer)
Closed 5 years ago.
EDIT: Added the boxplot generated with standard boxplot() function.
Given the iris dataste, the following code:
boxplot(iris[,])
Creates a boxplot with five boxes, one for each variable, without splitting them into categories such as, for instance, species. While this is simple enough, I have been unable to do the same in ggplot2.
My question, then, is simple: how can I achieve this?
Species is a factor with three levels (setosa, versicolor and virginica). I think it doesn't make sense if you plot it with the other variables.
It makes more sense if you want to plot all other 4 variables (Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width) in one plot as below
library(dplyr)
library(tidyr)
library(ggplot2)
iris %>% dplyr::select(Species, everything()) %>% tidyr::gather("id", "value",2:5) %>%
ggplot(., aes(x = id, y = value))+geom_boxplot()
If you want to plot all 5 variables in the same plot, you need to convert species to be numeric
iris %>% dplyr::mutate(Species = as.numeric(Species)) %>% tidyr::gather("id", "value",1:5) %>%
ggplot(., aes(x = id, y = value))+geom_boxplot()
Related
I am relatively new to ggplot, and I am interested in visualizing a categorical variable with 11 groups/levels. I ran the code below to produce a bar graph showing the frequency of each group. However, given that some groups within the categorical variable "active" only occur once or zero times, they clutter the graph. Therefore, is it possible to directly exclude groups in ggplot within the categorical variable with < 2 observations?
I am also open to recommendations on how to visualize a categorical variable with multiple groups/levels if a bar graph isn't suitable here.
Data type
sapply(df,class)
username active
"character" "character"
ggplot(data = df, aes(x = active)) +
geom_bar()
You can count() the categories first, and then filter(), before feeding to ggplot. In this way, you would use geom_col() instead:
df %>% count(active) %>% filter(n>2) %>%
ggplot(aes(x=active,y=n)) +
geom_col()
Alternatively, you could group_by() / filter() directly within your ggplot() call, like this:
ggplot(df %>% group_by(active) %>% filter(n()>2), aes(x=active)) +
geom_bar()
I created a data frame called "Pivot_long" by using the pivot_longer function to combine 3 variables from another dataset ("Leaves") into one column. Now i need to create a a figure with multiple boxplots to display this "Pivot_long" data. How would i go about doing that?
Formula for the new data frame:
Pivot_long<- pivot_longer(data = Leaves, names_to = "Type", values_to = "Values", cols = -X)
Do you mean how to visualize 4 plots next to eachother?
The setting par() allows for combining multiple plots.
par(mfrow=c(2,2))
Run this before your plot() function, and this will create a 2x2 matrix with 4 seperate plots.
Here is a example with the iris data set:
library(tidyverse)
iris_long <- iris %>%
select(-4, -5) %>%
pivot_longer(
cols = everything()
)
ggplot(iris_long, aes(x = name, y=value)) +
geom_boxplot()
when using the simple R boxplot function, I can easily place my dataframe directly into the parenthesis and a perfect boxplot emerges, eg:
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314)
boxplot(naive_capqx)
this is an image of the boxplot made with the simple R boxplot function
However, I need to make this boxplot slightly more aesthetic and so I need to use ggplot. When I place the dataframe itself in, the boxplot cannot form as I need to specify x, y and fill coordinates, which I don't have. My y coordinates are the values for each vector in the dataframe and my x coordinates are just the name of the vector. How can I do this using ggplot? Is there a way to reform my dataframe so I can split it into coordinates, or is there a way ggplot can read my data?
geom_boxplot expects tidy data. Your data isn't tidy because the column names contain information. So the first thing to do is to tidy your data by using pivot_longer...
library(tidyverse)
naive_capqx %>%
pivot_longer(everything(), values_to="Value", names_to="Variable") %>%
ggplot() +
geom_boxplot(aes(x=Variable, y=Value))
giving
Turn the df into a long format df. Below, I use gather() to lengthen the df; I use group_by() to ensure boxplot calculation by key (formerly column name).
pacman::p_load(ggplot2, tidyverse)
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314) %>%
gather("key", "value")) %>%
group_by(key)
ggplot(naive_capqx, mapping = aes(x = key, y = value)) +
geom_boxplot()
I am new to R and am struggling to understand how to create a matrix line plot (or plot with line subplots) given a data set with let's say one x and 5 y-columns such that:
-the first subplot is a plot of variables 1 and 2 (function of x)
-the second subplot variables 1 and 3 and so on
The idea is to use one of the variables (in this example number 1) as a reference and pair it with the rest so that they can be easily compared.
Thank you very much for your help.
Here's an example of one way to do that using tidyr and ggplot. tidyr::gather can pull the non-mpg columns into long format, each matched with its respective mpg. Then the data is mapped in ggplot so that x is mpg and y is the other value, and the name of the column it came from is mapped to facets.
library(tidyverse)
mtcars %>%
select(rowname, mpg, cyl, disp, hp) %>%
gather(stat, value, cyl:hp) %>%
ggplot(aes(mpg, value)) +
geom_point() +
facet_grid(stat~., scales = "free")
I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.
ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")