when using the simple R boxplot function, I can easily place my dataframe directly into the parenthesis and a perfect boxplot emerges, eg:
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314)
boxplot(naive_capqx)
this is an image of the boxplot made with the simple R boxplot function
However, I need to make this boxplot slightly more aesthetic and so I need to use ggplot. When I place the dataframe itself in, the boxplot cannot form as I need to specify x, y and fill coordinates, which I don't have. My y coordinates are the values for each vector in the dataframe and my x coordinates are just the name of the vector. How can I do this using ggplot? Is there a way to reform my dataframe so I can split it into coordinates, or is there a way ggplot can read my data?
geom_boxplot expects tidy data. Your data isn't tidy because the column names contain information. So the first thing to do is to tidy your data by using pivot_longer...
library(tidyverse)
naive_capqx %>%
pivot_longer(everything(), values_to="Value", names_to="Variable") %>%
ggplot() +
geom_boxplot(aes(x=Variable, y=Value))
giving
Turn the df into a long format df. Below, I use gather() to lengthen the df; I use group_by() to ensure boxplot calculation by key (formerly column name).
pacman::p_load(ggplot2, tidyverse)
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314) %>%
gather("key", "value")) %>%
group_by(key)
ggplot(naive_capqx, mapping = aes(x = key, y = value)) +
geom_boxplot()
Related
What I'm currently stuck on is trying to plot each column of my dataframe as its own histogram in ggplot. I attached a screenshot below:
Ideally I would be able to compare the values in every 'Esteem' column side-by-side by plotting multiple histograms.
I tried using the melt() function to reshape my dataframe, and then feed into ggplot() but somewhere along the way I'm going wrong...
You could pivot to long, then facet by column:
library(tidyr)
library(ggplot2)
esteem81_long <- esteem81 %>%
pivot_longer(
Esteem81_1:Esteem81_10,
names_to = "Column",
values_to = "Value"
)
ggplot(esteem81_long, aes(Value)) +
geom_bar() +
facet_wrap(vars(Column))
Or for a list of separate plots, just loop over the column names:
plots <- list()
for (col in names(esteem81)[-1]) {
plots[[col]] <- ggplot(esteem81) +
geom_bar(aes(.data[[col]]))
}
plots[["Esteem81_4"]]
Example data:
set.seed(13)
esteem81 <- data.frame(Subject = c(2,6,7,8,9))
for (i in 1:10) {
esteem81[[paste0("Esteem81_", i)]] <- sample(1:4, 5, replace = TRUE)
}
esteem_long <- esteem81 %>% pivot_longer(cols = -c(Subject))
plot <- ggplot(esteem_long, aes(x = value)) +
geom_histogram(binwidth = 1) +
facet_wrap(vars(name))
plot
I'm using pivot_longer() from tidyr and ggplot2 for the plotting.
The line pivot_longer(cols = -c(Subject)) reads as "apart from the "Subject" column, all the others should be pivoted into long form data." I've left the default new column names ("name" and "value") - if you rename them then be sure to change the downstream code.
geom_histogram automates the binning and tallying of the data into histogram format - change the binwidth parameter to suit your desired outcome.
facet_wrap() allows you to specify a grouping variable (here name) and will replicate the plot for each group.
I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.
ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")
Data :
Cat <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
variable <- c("IL_1_Flag_p", "IL_1_Flag_p", "IL_1_Flag_p", "IL_1_Flag_p", "IL_2_Flag_p", "IL_2_Flag_p", "IL_2_Flag_p","IL_2_Flag_p", "IL_3_Flag_p", "IL_3_Flag_p", "IL_3_Flag_p", "IL_3_Flag_p", "IL_4_Flag_p", "IL_4_Flag_p", "IL_4_Flag_p", "IL_4_Flag_p", "IL_5_Flag_p", "IL_5_Flag_p", "IL_5_Flag_p", "IL_5_Flag_p")
value <- c(21,17,16,210,20,17,15,189,20,17,15,188,19,17,15,188,20,17,15,194)
agg_melt_p <- data.frame(cat, variable, value)
I want to plot line chart for only "IL5_Flag_p" which is in the variable column.Tried using subset from plyr package but it is not working and showing error . I am combining 2 plots (a bar chart and this line chart).Original data uses melted dataframe from melt in reshape2
For ggplot I am using this piece:
ggplot() + geom_line(data = agg_melt_p, aes(x=Category , y=value , colour=variable))
Please help
One solution using dplyr:
agg_melt_p %>% filter(variable == "IL_5_Flag_p") %>%
ggplot() +
geom_line(aes(x=Cat, y=value, colour = variable))
This subsets the data frame the way you want without altering the object itself and then passes it to your ggplot command. The colour=variable bit in your code is not necessary, but you can leave it in if you want to generate a legend automatically.
I am facing a difficulty while plotting a parallel coordinates plot using the ggparcoord from the GGally package. As there are two categorical variables, what I want to show in the visualisation is like the image below. I've found that in ggparcoord, groupColumn is only allowed to a single variable to group (colour) by, and surely I can use showPoints to mark the values on the axes, but i also need to vary the shape of these markers according to the categorical variables. Is there other package that can help me to realise my idea?
Any response will be appreciated! Thanks!
It's not that difficult to roll your own parallel coordinates plot in ggplot2, which will give you the flexibility to customize the aesthetics. Below is an illustration using the built-in diamonds data frame.
To get parallel coordinates, you need to add an ID column so you can identify each row of the data frame, which we'll use as a group aesthetic in ggplot. You also need to scale the numeric values so that they'll all be on the same vertical scale when we plot them. Then you need to take all the columns that you want on the x-axis and reshape them to "long" format. We do all that on the fly below with the tidyverse/dplyr pipe operator.
Even after limiting the number of category combinations, the lines are probably too intertwined for this plot to be easily interpretable, so consider this merely a "proof of concept". Hopefully, you can create something more useful with your data. I've used colour (for the lines) and fill (for the points) aesthetics below. You can use shape or linetype instead, depending on your needs.
library(tidyverse)
theme_set(theme_classic())
# Get 20 random rows from the diamonds data frame after limiting
# to two levels each of cut and color
set.seed(2)
ds = diamonds %>%
filter(color %in% c("D","J"), cut %in% c("Good", "Premium")) %>%
sample_n(20)
ggplot(ds %>%
mutate(ID = 1:n()) %>% # Add ID for each row
mutate_if(is.numeric, scale) %>% # Scale numeric columns
gather(key, value, c(1,5:10)), # Reshape to "long" format
aes(key, value, group=ID, colour=color, fill=cut)) +
geom_line() +
geom_point(size=2, shape=21, colour="grey50") +
scale_fill_manual(values=c("black","white"))
I haven't used ggparcoords before, but the only option that seemed straightforward (at least on my first try with the function) was to paste together two columns of data. Below is an example. Even with just four category combinations, the plot is confusing, but maybe it will be interpretable if there are strong patterns in your data:
library(GGally)
ds$group = with(ds, paste(cut, color, sep="-"))
ggparcoord(ds, columns=c(1, 5:10), groupColumn=11) +
theme(panel.grid.major.x=element_line(colour="grey70"))
Sorry I don't have example code for this question.
All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6).
If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks!
Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.
ggplot2 requires that your data to be plotted on the y-axis are all in one column.
Here is an example:
set.seed(1)
df <- data.frame(
value = runif(810,0,6),
group = 1:9
)
df
library(ggplot2)
ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6)
The ylim(0,6) sets the y-axis to be between 0 and 6
If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).
You can do this if you reshape your data into long format
## Some sample data
dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
## Reshape data wide -> long
library(reshape2)
long <- melt(dat)
plot(value ~ variable, data=long)