Multiple line subplots in R - r

I am new to R and am struggling to understand how to create a matrix line plot (or plot with line subplots) given a data set with let's say one x and 5 y-columns such that:
-the first subplot is a plot of variables 1 and 2 (function of x)
-the second subplot variables 1 and 3 and so on
The idea is to use one of the variables (in this example number 1) as a reference and pair it with the rest so that they can be easily compared.
Thank you very much for your help.

Here's an example of one way to do that using tidyr and ggplot. tidyr::gather can pull the non-mpg columns into long format, each matched with its respective mpg. Then the data is mapped in ggplot so that x is mpg and y is the other value, and the name of the column it came from is mapped to facets.
library(tidyverse)
mtcars %>%
select(rowname, mpg, cyl, disp, hp) %>%
gather(stat, value, cyl:hp) %>%
ggplot(aes(mpg, value)) +
geom_point() +
facet_grid(stat~., scales = "free")

Related

Excluding levels/groups within categorical variable (ggplot graph)

I am relatively new to ggplot, and I am interested in visualizing a categorical variable with 11 groups/levels. I ran the code below to produce a bar graph showing the frequency of each group. However, given that some groups within the categorical variable "active" only occur once or zero times, they clutter the graph. Therefore, is it possible to directly exclude groups in ggplot within the categorical variable with < 2 observations?
I am also open to recommendations on how to visualize a categorical variable with multiple groups/levels if a bar graph isn't suitable here.
Data type
sapply(df,class)
username active
"character" "character"
ggplot(data = df, aes(x = active)) +
geom_bar()
You can count() the categories first, and then filter(), before feeding to ggplot. In this way, you would use geom_col() instead:
df %>% count(active) %>% filter(n>2) %>%
ggplot(aes(x=active,y=n)) +
geom_col()
Alternatively, you could group_by() / filter() directly within your ggplot() call, like this:
ggplot(df %>% group_by(active) %>% filter(n()>2), aes(x=active)) +
geom_bar()

Box plots not appearing properly in RStudio

I am creating box plots within R, however, they are appearing incorrectly. My data is based off of German Credit Dataset on Kaggle.
My code with two different attributes trying to be tested:
data %>%
ggplot(aes(x = Creditability, y = Purpose, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Purpose")
data %>%
ggplot(aes(x = Creditability, y = Account.Balance, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Account Balance")
I've tried a few of the different attributes for it, but results in the same error
Edited info: Is it because the attributes have too much information? I have split the sample into test (300) vs train (700) and I am currently using train. Would it simply be because there's too much info?
Edit picture:
Factors
Edit for graph error:
Error
As others have explained in the comments, you cannot show boxplots where the y axis is set to be a factor. Factors are by their nature discrete variables, even if the levels are named as numbers. In order to utilize the stat function for the boxplot geom, you need the y axis to be continuous and the x axis to be discrete (or able to be separated into discrete values via the group= aesthetic).
Let me demonstrate with the mtcars dataset built into ggplot2:
library(ggplot2)
ggplot(mtcars, aes(x=factor(carb), y=mpg)) + geom_boxplot()
Here we can draw boxpots because the x aesthetic is forced to be discrete (via factor(carb)), while the y axis is using mpg which is a numeric column in the mtcars dataset.
If you set both carb and mpg to be factors, you get something that should look pretty similar to what you're seeing:
ggplot(mtcars, aes(x=factor(carb), y=factor(mpg))) + geom_boxplot()
In your case, all your columns in your dataset are factors. If they are factors that can be coerced to be numbers, you can turn them into continuous vectors via using as.numeric(levels(column_name)[column_name]). Alternatively, you can use as.numeric(as.character(column_name)). Here's what it looks like to first convert the mtcars$mpg column to a factor of numeric values, and then back to being only numeric via this method.
df <- mtcars
# convert to a factor
df$mpg <- factor(df$mpg)
# back to numeric!
df$mpg <- as.numeric(levels(df$mpg)[df$mpg])
# this plot looks like it did before when we did the same with mtcars
ggplot(df, aes(x=factor(carb), y=mpg)) + geom_boxplot()
So, for your case, do this two step process:
data$Purpose <- as.numeric(levels(data$Purpose)[data$Purpose])
data %>%
ggplot(aes(x = Creditability, y = Purpose, fill = Creditability)) +
geom_boxplot() +
ggtitle("Creditability vs Purpose")
That should work. You can follow in a similar fashion for your other variables.

How to add legends in this context?

songs %>% group_by(year) %>% summarise(count=nth(pop,1))%>%
ggplot(aes(x=factor(year),y=count,fill=year))+geom_bar(stat ='identity' )+theme_classic()
1.How can I adjust my legends to show years(2010:2019) rather than what it is showing right now?
2.Scale_size_manual is not working.
You need to set year as a factor each time (or externally), not just once. I don't have your data, so I'll use mtcars.
library(ggplot2)
library(dplyr)
# first plot
mtcars %>%
ggplot(aes(factor(carb), disp, fill=carb)) +
geom_bar(stat="identity")
# second plot
mutate(mtcars, carb = factor(carb)) %>%
ggplot(aes(carb, disp, fill=carb)) +
geom_bar(stat="identity")
# alternate code for second plot, not shown
mtcars %>%
ggplot(aes(factor(carb), disp, fill=factor(carb))) +
# both ^^^^^^ and ^^^^^^
geom_bar(stat="identity")
(There are numerous ways to convert to a factor. I'm using dplyr here, but it can easily be done in base or data.table.)
I included the "alternate" code above that shows the manual factor being applied to each use of carb; this is not the preferred method in my mind, since if you're doing it multiple times, just do it once before the plotting and use it multiple times. If you need both the ordinal year and the numeric version, you can add a new field, such as ordinal_year=factor(year).

How to use ggplot to create facets with two factors?

I'm trying to do a plot with facets with some data from a previous model. As a simple example:
t=1:10;
x1=t^2;
x2=sqrt(t);
y1=sin(t);
y2=cos(t);
How can I plot this data in a 2x2 grid, being the rows one factor (levels x and y, plotted with different colors) and the columns another factor (levels 1 and 2, plotted with different linetypes)?
Note: t is the common variable for the X axis of all subplots.
ggplot will be more helpful if the data can be first put into tidy form. df is your data, df_tidy is that data in tidy form, where the series is identified in one column that can be mapped in ggplot -- in this case to the facet.
library(tidyverse)
df <- tibble(
t=1:10,
x1=t^2,
x2=sqrt(t),
y1=sin(t),
y2=cos(t),
)
df_tidy <- df %>%
gather(series, value, -t)
ggplot(df_tidy, aes(t, value)) +
geom_line() +
facet_wrap(~series, scales = "free_y")

Iterate over a dataframe and create one plot for each column

I would like to iterate over a data frame and plot each column against a particular column such as price.
What I have done so far is:
for(i in ncol(dat.train)) {
ggplot(dat.train, aes(dat.train[[,i]],price)) + geom_point()
}
What I want is to have the first introduction to my data (Approximately 300 columns) by plotting against the decision variable (i.e., price)
I know that there is a similar question, though I cannot really understand why the above is not really working.
You can do this, I have used mtcars data to plot other continuous variables with mpg. You have to melt the data into long form (use gather) and then use ggplot to plot these contiuous variables (disp,drat,qsec etc) against mpg. In your case instead of mpg you would take price and all the other continuous variables to be melted (like here disp,drat,qsec etc), the rest categorical variables can be taken for shape and colors etc (optional).
library(tidyverse)
mtcars %>%
gather(-mpg, -hp, -cyl, key = "var", value = "value") %>%
ggplot(aes(x = value, y = mpg, color = hp, shape = factor(cyl))) +
geom_point() +
facet_wrap(~ var, scales = "free") +
theme_bw()
EDIT:
This is another solution in case we need separate graphs for each of the variables.
Create a list of variables like this: lyst <- list("disp","hp") , you can use colnames function to get all the variable names. Use lapply to to loop through all the "lyst" objects on your data frame.
setwd("path") ###set the working directory here, This is the place where all the files are saved.
pdf(file=paste0("one.pdf"))
lapply(lyst, function(i)ggplot(mtcars, aes_string(x=i, y="mpg")) + geom_point())
dev.off()
A pdf file wil. be generated with all the graphs pdfs at your working directory which you have set
Output from solution first:

Resources