I am missing some basics in R.
How do I make a plot for each column in a data frame?
I have tried making plots for each column separately. I was wondering if there was a easier way?
library(dplyr)
library(ggplot2)
data(economics)
#scatter plots
ggplot(economics,aes(x=pop,y=pce))+
geom_point()
ggplot(economics,aes(x=pop,y=psavert))+
geom_point()
ggplot(economics,aes(x=pop,y=uempmed))+
geom_point()
ggplot(economics,aes(x=pop,y=unemploy))+
geom_point()
#boxplots
ggplot(economics,aes(y=pce))+
geom_boxplot()
ggplot(economics,aes(y=pop))+
geom_boxplot()
ggplot(economics,aes(y=psavert))+
geom_boxplot()
ggplot(economics,aes(y=uempmed))+
geom_boxplot()
ggplot(economics,aes(y=unemploy))+
geom_boxplot()
All I'm looking for is having 1 box plot 2*2 and 1 2*2 scatter plot with ggplot2. I understand there is facet grid which I have failed to understand how to implement.(I believe this can be achieved easily with par(mfrow()) and base R plots. I saw somewhere else using using widening the data? which i didn't understand.
In cases like this the solution is almost always to reshape the data from wide to long format.
economics %>%
select(-date) %>%
tidyr::gather(variable, value, -pop) %>%
ggplot(aes(x = pop, y = value)) +
geom_point(size = 0.5) +
facet_wrap(~ variable, scales = "free_y")
economics %>%
tidyr::gather(variable, value, -date) %>%
ggplot(aes(y = value)) +
geom_boxplot() +
facet_wrap(~ variable, scales = "free_y")
Related
This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 8 months ago.
I am new to R and have the following example code that I wish to apply for every column in my data.
data(economics, package="ggplot2")
economics$index <- 1:nrow(economics)
loessMod10 <- loess(uempmed ~ index, data=economics, span=0.10)
smoothed10 <- predict(loessMod10)
plot(economics$uempmed, x=economics$date, type="l", main="Loess Smoothing and Prediction", xlab="Date", ylab="Unemployment (Median)")
lines(smoothed10, x=economics$date, col="red")
Could someone please suggest how this would be possible?
It's possible to perform loess smoothing within ggplot.
library(data.table)
library(ggplot2)
df <- economics
##
#
gg.melt <- setDT(df) |> melt(id='date', variable.name = 'KPI')
ggplot(gg.melt, aes(x=date, y=value))+
geom_line()+
stat_smooth(method=loess, color='red', size=0.5, se=FALSE, method.args = list(span=0.1))+
facet_wrap(~KPI, scales = 'free_y')
Regarding combining everything on one plot I'm not seeing how you would do that as the y-scales are so different. If the point is to see how the peaks line up, etc. you could do this:
ggplot(gg.melt, aes(x=date, y=value))+
geom_line()+
stat_smooth(method=loess, color='red', size=0.5, se=FALSE, method.args = list(span=0.1))+
facet_grid(KPI~., scales = 'free_y')
There is also the dygraphs package which allows creation of dynamic graphics that can be saved to html:
gg.melt[, scaled:=scale(value, center = FALSE, scale=diff(range(value))), by=.(KPI)]
gg.melt[, pred:=predict(loess(scaled~as.integer(date), .SD, span=0.1)), by=.(KPI)]
gg.dt <- dcast(gg.melt, date~KPI, value.var = list('scaled', 'pred'))
library(dygraphs)
dygraph(gg.dt) |>
dyCrosshair(direction = 'vertical') |>
dyRangeSelector()
It's possible to create a dygraph(...) version of the second plot, where the different KPI are in different facets, but you have to use RMarkdown for that.
You can make your data from wide to long by the date and use facet_wrap. Maybe you want something like this:
library(ggplot2)
library(reshape2)
library(dplyr)
economics %>%
melt(., "date") %>%
ggplot(., aes(date, value)) +
geom_line() +
facet_wrap(~variable, scales = "free")
Output:
Comment: All plots in one graph
If you mean all plots in one graph, you can give the variables a color like this:
economics %>%
melt(., "date") %>%
ggplot(., aes(date, value, color = variable)) +
geom_line() +
scale_y_log10()
Output:
I am plotting a box plot that shows the height of students. However I am unsure what I use as x and y. I have only measurments, so one should be height and the other one amount of students that have that height.
x=N, y=Height
My code:
# Library
library(ggplot2)
library(tidyverse)
# 1. Read data (comma separated)
data = read.table(text = "184,180,183,184,184,160,173",
sep=",",stringsAsFactors=F, na.strings="unknown")
# 2. Print table
print(data)
# 3. Plot box plot
data %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x=value, y=value)) +
geom_boxplot() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")
I think the best plot to represent a vector of data is an histogram. However you could use the boxplot by create a dummy factor that group your observation. i.e.
data %>%
pivot_longer(cols = everything()) %>%
mutate(type="student") %>%
ggplot(aes(x=type, y=value)) +
geom_boxplot() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")
if you want a histogram (I think much better for your situation), you don'ty need the dummy factor and you could do something like :
data %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x=value)) +
geom_histogram() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")
To use a boxplot correctly, you have to have one categorical variable and one continuous. Put the categorical (e.g. make, female, etc.) on the x-axis and the continuous on the y-axis (height in your case).
I have uploaded a datafame and done a quick plot of all variables using:
df %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Reference: https://drsimonj.svbtle.com/quick-plot-of-all-variables
I have split this data frame into two data frames based on a binary variable (in my case, Smoker/Non-smoker) in one of the columns. I would like to perform the same quick plot of all variables but have overlayed, different coloured histograms for each of the new data frames (to see if they differ significantly).
I found the following:
Overlaying two ggplot facet_wrap histograms
But it only does the facet_wrap over a single variable. Is there a way to do this by filtering the gathered data frame by the binary value something like:
df %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(subset(df,Smoker==1), fill = "Red", alpha=0.3) +
geom_histogram(subset(df,Smoker==2), fill = "Blue", alpha=0.3)
Idea would be to overlay the following:
df_s %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(fill = "Red", alpha=0.3)
df_ns %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(fill = "Blue", alpha=0.3)
I could do this will a loop but would like to do it with the df key-value pairs if possible.
df %>%
keep(is.numeric) %>% # you may need to remove this as smoker will need to be factor for grouping to work
tidyr::gather(key,value, -Smoker) %>% #- preserve smoker and use to colour
ggplot(aes(value, fill = Smoker)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(alpha = 0.30) +
scale_fill_manual(values = c("red","blue"))
I'm new to R. I was trying to plot the last value of each variable in a data frame on top of a boxplot. Without success I was trying:
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(iris, aes(x=unique(iris$Species), y=tail(iris,n=1)))
Thanks, Bill
One approach is
library(tidyverse)
iris1 <- iris %>%
group_by(Species) %>%
summarise(LastVal = last(Sepal.Length))
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(data = iris1, aes(x = Species, y = LastVal))
I want to plot a subset of my dataframe. I am working with dplyr and ggplot2. My code only works with version 1, not version 2 via piping. What's the difference?
Version 1 (plotting is working):
data <- dataset %>% filter(type=="type1")
ggplot(data, aes(x=year, y=variable)) + geom_line()
Version 2 with piping (plotting is not working):
data %>% filter(type=="type1") %>% ggplot(data, aes(x=year, y=variable)) + geom_line()
Error:
Error in ggplot.data.frame(., data, aes(x = year, :
Mapping should be created with aes or aes_string
Thanks for your help!
Solution for version 2: a dot . instead of data:
data %>%
filter(type=="type1") %>%
ggplot(., aes(x=year, y=variable)) +
geom_line()
I usually do this, which also dispenses with the need for the .:
library(dplyr)
library(ggplot2)
mtcars %>%
filter(cyl == 4) %>%
ggplot +
aes(
x = disp,
y = mpg
) +
geom_point()
During typing with piping if you reenter the data name as you have as I shown with bold below, function confuses the sequence of arguments.
data %>% filter(type=="type1") %>% ggplot(***data***, aes(x=year, y=variable)) + geom_line()
Hope it works for you.