How to use ggplot to create facets with two factors? - r

I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.

ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")

Related

Making a ggplot boxplot where each column is it's own boxplot

when using the simple R boxplot function, I can easily place my dataframe directly into the parenthesis and a perfect boxplot emerges, eg:
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314)
boxplot(naive_capqx)
this is an image of the boxplot made with the simple R boxplot function
However, I need to make this boxplot slightly more aesthetic and so I need to use ggplot. When I place the dataframe itself in, the boxplot cannot form as I need to specify x, y and fill coordinates, which I don't have. My y coordinates are the values for each vector in the dataframe and my x coordinates are just the name of the vector. How can I do this using ggplot? Is there a way to reform my dataframe so I can split it into coordinates, or is there a way ggplot can read my data?
geom_boxplot expects tidy data. Your data isn't tidy because the column names contain information. So the first thing to do is to tidy your data by using pivot_longer...
library(tidyverse)
naive_capqx %>%
pivot_longer(everything(), values_to="Value", names_to="Variable") %>%
ggplot() +
geom_boxplot(aes(x=Variable, y=Value))
giving
Turn the df into a long format df. Below, I use gather() to lengthen the df; I use group_by() to ensure boxplot calculation by key (formerly column name).
pacman::p_load(ggplot2, tidyverse)
baseline <- c(0,0,0,0,1)
post_cap <- c(1,5,5,6,11)
qx314 <- c(0,0,0,3,7)
naive_capqx <- data.frame(baseline, post_cap, qx314) %>%
gather("key", "value")) %>%
group_by(key)
ggplot(naive_capqx, mapping = aes(x = key, y = value)) +
geom_boxplot()

Graphing different variables in the same graph R- ggplot2

I have several datasets and my end goal is to do a graph out of them, with each line representing the yearly variation for the given information. I finally joined and combined my data (as it was in a per month structure) into a table that just contains the yearly means for each item I want to graph (column depicting year and subsequent rows depicting yearly variation for 4 different elements)
I have one factor that is the year and 4 different variables that read yearly variations, thus I would like to graph them on the same space. I had the idea to joint the 4 columns into one by factor (collapse into one observation per row and the year or factor in the subsequent row) but seem unable to do that. My thought is that this would give a structure to my y axis. Would like some advise, and to know if my approach to the problem is effective. I am trying ggplot2 but does not seem to work without a defined (or a pre defined range) y axis. Thanks
I would suggest next approach. You have to reshape your data from wide to long as next example. In that way is possible to see all variables. As no data is provided, this solution is sketched using dummy data. Also, you can change lines to other geom you want like points:
library(tidyverse)
set.seed(123)
#Data
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
#Plot
df %>% pivot_longer(-year) %>%
ggplot(aes(x=factor(year),y=value,group=name,color=name))+
geom_line()+
theme_bw()
Output:
We could use melt from reshape2 without loading multiple other packages
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.var = 'year'), aes(x = factor(year), y = value,
group = variable, color = variable)) +
geom_line()
-output plot
Or with matplot from base R
matplot(as.matrix(df[-1]), type = 'l', xaxt = 'n')
data
set.seed(123)
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))

Multiple line subplots in R

I am new to R and am struggling to understand how to create a matrix line plot (or plot with line subplots) given a data set with let's say one x and 5 y-columns such that:
-the first subplot is a plot of variables 1 and 2 (function of x)
-the second subplot variables 1 and 3 and so on
The idea is to use one of the variables (in this example number 1) as a reference and pair it with the rest so that they can be easily compared.
Thank you very much for your help.
Here's an example of one way to do that using tidyr and ggplot. tidyr::gather can pull the non-mpg columns into long format, each matched with its respective mpg. Then the data is mapped in ggplot so that x is mpg and y is the other value, and the name of the column it came from is mapped to facets.
library(tidyverse)
mtcars %>%
select(rowname, mpg, cyl, disp, hp) %>%
gather(stat, value, cyl:hp) %>%
ggplot(aes(mpg, value)) +
geom_point() +
facet_grid(stat~., scales = "free")

RStudio one numeric variable against n numeric variables in n plots

I'm eviews user and eviews very basically draws scatter plots matrix.
In the following graph, I have 13 different group datas and Eviews draws one group data against 12 groups' data in 12 plots in one graph with regression line.
How can I realize same graph with Rstudio?
Here is an example on how to do the requested plot in ggplot:
First some data:
z <- matrix(rnorm(1000), ncol= 10)
The basic idea here is to convert the wide matrix to long format where the variable that is compared to all others is duplicated as many times as there are other variables. Each of these other variables gets a specific label in the key column. ggplot likes the data in this format
library(tidyverse)
z %>%
as.tibble() %>% #convert matrix to tibble or data.frame
gather(key, value, 2:10) %>% #convert to long format specifying variable columns 2:10
mutate(key = factor(key, levels = paste0("V", 1:10))) %>% #specify levels so the facets go in the correct order to avoid V10 being before V2
ggplot() +
geom_point(aes(value, V1))+ #plot points
geom_smooth(aes(value, V1), method = "lm", se = F)+ #plot lm fit without se
facet_wrap(~key) #facet by key

How to plot parallel coordinates with multiple categorical variables in R

I am facing a difficulty while plotting a parallel coordinates plot using the ggparcoord from the GGally package. As there are two categorical variables, what I want to show in the visualisation is like the image below. I've found that in ggparcoord, groupColumn is only allowed to a single variable to group (colour) by, and surely I can use showPoints to mark the values on the axes, but i also need to vary the shape of these markers according to the categorical variables. Is there other package that can help me to realise my idea?
Any response will be appreciated! Thanks!
It's not that difficult to roll your own parallel coordinates plot in ggplot2, which will give you the flexibility to customize the aesthetics. Below is an illustration using the built-in diamonds data frame.
To get parallel coordinates, you need to add an ID column so you can identify each row of the data frame, which we'll use as a group aesthetic in ggplot. You also need to scale the numeric values so that they'll all be on the same vertical scale when we plot them. Then you need to take all the columns that you want on the x-axis and reshape them to "long" format. We do all that on the fly below with the tidyverse/dplyr pipe operator.
Even after limiting the number of category combinations, the lines are probably too intertwined for this plot to be easily interpretable, so consider this merely a "proof of concept". Hopefully, you can create something more useful with your data. I've used colour (for the lines) and fill (for the points) aesthetics below. You can use shape or linetype instead, depending on your needs.
library(tidyverse)
theme_set(theme_classic())
# Get 20 random rows from the diamonds data frame after limiting
# to two levels each of cut and color
set.seed(2)
ds = diamonds %>%
filter(color %in% c("D","J"), cut %in% c("Good", "Premium")) %>%
sample_n(20)
ggplot(ds %>%
mutate(ID = 1:n()) %>% # Add ID for each row
mutate_if(is.numeric, scale) %>% # Scale numeric columns
gather(key, value, c(1,5:10)), # Reshape to "long" format
aes(key, value, group=ID, colour=color, fill=cut)) +
geom_line() +
geom_point(size=2, shape=21, colour="grey50") +
scale_fill_manual(values=c("black","white"))
I haven't used ggparcoords before, but the only option that seemed straightforward (at least on my first try with the function) was to paste together two columns of data. Below is an example. Even with just four category combinations, the plot is confusing, but maybe it will be interpretable if there are strong patterns in your data:
library(GGally)
ds$group = with(ds, paste(cut, color, sep="-"))
ggparcoord(ds, columns=c(1, 5:10), groupColumn=11) +
theme(panel.grid.major.x=element_line(colour="grey70"))

Resources