Generate boxplots for multiple variables in ggplot2 without factoring [duplicate]

Generate boxplots for multiple variables in ggplot2 without factoring [duplicate] - r

This question already has answers here:
Building a box plot from all columns of data frame with column names on x in ggplot2 [duplicate]
(1 answer)
Multiple boxplots using ggplot
(1 answer)
Closed 5 years ago.
EDIT: Added the boxplot generated with standard boxplot() function.
Given the iris dataste, the following code:
boxplot(iris[,])
Creates a boxplot with five boxes, one for each variable, without splitting them into categories such as, for instance, species. While this is simple enough, I have been unable to do the same in ggplot2.
My question, then, is simple: how can I achieve this?

Species is a factor with three levels (setosa, versicolor and virginica). I think it doesn't make sense if you plot it with the other variables.
It makes more sense if you want to plot all other 4 variables (Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width) in one plot as below
library(dplyr)
library(tidyr)
library(ggplot2)
iris %>% dplyr::select(Species, everything()) %>% tidyr::gather("id", "value",2:5) %>%
ggplot(., aes(x = id, y = value))+geom_boxplot()
If you want to plot all 5 variables in the same plot, you need to convert species to be numeric
iris %>% dplyr::mutate(Species = as.numeric(Species)) %>% tidyr::gather("id", "value",1:5) %>%
ggplot(., aes(x = id, y = value))+geom_boxplot()

Related

Excluding levels/groups within categorical variable (ggplot graph)

I am relatively new to ggplot, and I am interested in visualizing a categorical variable with 11 groups/levels. I ran the code below to produce a bar graph showing the frequency of each group. However, given that some groups within the categorical variable "active" only occur once or zero times, they clutter the graph. Therefore, is it possible to directly exclude groups in ggplot within the categorical variable with < 2 observations?
I am also open to recommendations on how to visualize a categorical variable with multiple groups/levels if a bar graph isn't suitable here.
Data type
sapply(df,class)
username active
"character" "character"
ggplot(data = df, aes(x = active)) +
geom_bar()

You can count() the categories first, and then filter(), before feeding to ggplot. In this way, you would use geom_col() instead:
df %>% count(active) %>% filter(n>2) %>%
ggplot(aes(x=active,y=n)) +
geom_col()
Alternatively, you could group_by() / filter() directly within your ggplot() call, like this:
ggplot(df %>% group_by(active) %>% filter(n()>2), aes(x=active)) +
geom_bar()

How do i create multiple boxplots in the same figure using pivot long data?

I created a data frame called "Pivot_long" by using the pivot_longer function to combine 3 variables from another dataset ("Leaves") into one column. Now i need to create a a figure with multiple boxplots to display this "Pivot_long" data. How would i go about doing that?
Formula for the new data frame:
Pivot_long<- pivot_longer(data = Leaves, names_to = "Type", values_to = "Values", cols = -X)

Do you mean how to visualize 4 plots next to eachother?
The setting par() allows for combining multiple plots.
par(mfrow=c(2,2))
Run this before your plot() function, and this will create a 2x2 matrix with 4 seperate plots.

Here is a example with the iris data set:
library(tidyverse)
iris_long <- iris %>%
select(-4, -5) %>%
pivot_longer(
cols = everything()
)
ggplot(iris_long, aes(x = name, y=value)) +
geom_boxplot()

Making a ggplot boxplot where each column is it's own boxplot

when using the simple R boxplot function, I can easily place my dataframe directly into the parenthesis and a perfect boxplot emerges, eg:
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314)
boxplot(naive_capqx)
this is an image of the boxplot made with the simple R boxplot function
However, I need to make this boxplot slightly more aesthetic and so I need to use ggplot. When I place the dataframe itself in, the boxplot cannot form as I need to specify x, y and fill coordinates, which I don't have. My y coordinates are the values for each vector in the dataframe and my x coordinates are just the name of the vector. How can I do this using ggplot? Is there a way to reform my dataframe so I can split it into coordinates, or is there a way ggplot can read my data?

geom_boxplot expects tidy data. Your data isn't tidy because the column names contain information. So the first thing to do is to tidy your data by using pivot_longer...
library(tidyverse)
naive_capqx %>%
pivot_longer(everything(), values_to="Value", names_to="Variable") %>%
ggplot() +
geom_boxplot(aes(x=Variable, y=Value))
giving

Turn the df into a long format df. Below, I use gather() to lengthen the df; I use group_by() to ensure boxplot calculation by key (formerly column name).
pacman::p_load(ggplot2, tidyverse)
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314) %>%
gather("key", "value")) %>%
group_by(key)
ggplot(naive_capqx, mapping = aes(x = key, y = value)) +
geom_boxplot()

Multiple line subplots in R

I am new to R and am struggling to understand how to create a matrix line plot (or plot with line subplots) given a data set with let's say one x and 5 y-columns such that:
-the first subplot is a plot of variables 1 and 2 (function of x)
-the second subplot variables 1 and 3 and so on
The idea is to use one of the variables (in this example number 1) as a reference and pair it with the rest so that they can be easily compared.
Thank you very much for your help.

Here's an example of one way to do that using tidyr and ggplot. tidyr::gather can pull the non-mpg columns into long format, each matched with its respective mpg. Then the data is mapped in ggplot so that x is mpg and y is the other value, and the name of the column it came from is mapped to facets.
library(tidyverse)
mtcars %>%
select(rowname, mpg, cyl, disp, hp) %>%
gather(stat, value, cyl:hp) %>%
ggplot(aes(mpg, value)) +
geom_point() +
facet_grid(stat~., scales = "free")

How to use ggplot to create facets with two factors?

I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.

ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Generate boxplots for multiple variables in ggplot2 without factoring [duplicate] - r

Related

Excluding levels/groups within categorical variable (ggplot graph)

How do i create multiple boxplots in the same figure using pivot long data?

Making a ggplot boxplot where each column is it's own boxplot

Multiple line subplots in R

How to use ggplot to create facets with two factors?

Categories

Resources