For a function, I need to keep variable names in a vector and I use a function to plot density graphs of my variables.
My problem is as follows in summary ;
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
plotter <- function(x){
plot(density(x),
main = "",
xlab = "",
ylab = "")
title(var_names)
}
par(mfrow=c(4,3),mar=c(1,1,1,1))
apply(mtcars,2,plotter)
Couldn't imagine how I can match them.
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
plotter <- function(x, var){
plot(density(x[[var]]),
main = var,
xlab = "",
ylab = "")
}
par(mfrow=c(4,3),mar=c(2.1,2.1,2.1,1))
for(vn in var_names) plotter(mtcars, vn)
will yield
for loops are discouraged as they are slow. However in conjunction with plotting, which is slow in its own way or if the loop is only run for 11 times as in this example, for loops are perfectly fine and beginner friendly.
If you really need an apply-family function of plotters to have only one argument, the following will do:
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
plotter <- function(x){
plot(density(x[[1]]),
main = names(x),
xlab = "",
ylab = "")
}
par(mfrow=c(4,3),mar=c(2.1,2.1,2.1,1))
sapply(1:11,function(n) plotter(mtcars[n]))
I would suggest a tidyverse approach with ggplot2 and the vector of names you have. You can format your data to longer and then filter the desired variables. Using facets and geom_density() you can avoid issues with titles. Here the code:
library(tidyverse)
#Vector
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
#Data
mtcars %>% pivot_longer(cols = everything()) %>%
filter(name %in% var_names) %>%
ggplot(aes(x=value))+
geom_density()+
facet_wrap(.~name,scales = 'free')+
theme_bw()
Output:
Related
this sounds like a very trivial question at first, but no one managed to help me thus far, hence I'm reaching out to you all.
I'd like to do the following:
I'm writing a simple function that allows me to plot two variables against each other, with a third variable coloring the observation points (depending on the corresponding value of the color variable). The code looks like that:
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes(x=x.variable, y = y.variable)) +
geom_point(aes_string(color = color.variable))
}
scatterplot(data_used = example_data, x.variable = example_data$education,
y.variable = example_data$wages,
color.variable = example_data$sex)
What I would like R to do now is to label the x- and y-axis (respectively) by the corresponding variable's name that I decide to be plotted. In this example here, x-axis would be 'education', y-axis would be 'wages'.
I tried to simply put + labs (x = x.variable, y = y.variable) and it doesn't work (when doing that, R labels the axes by the variable values!). By default, R just names the axes "x.variable" and "y.variable".
Can someone help me achieve what I'm trying to do?
Best regards,
xifrix
jpenzer's answer is a good one. Here it is without the quasi-quotation stuff.
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes_string(x=x.variable, y = y.variable)) +
geom_point(aes_string(color = color.variable)) +
labs(x=x.variable, y=y.variable, colour=color.variable)
}
mtcars %>%
mutate(am = as.factor(am)) %>%
scatterplot(., x.variable = "hp",
y.variable = "mpg",
color.variable = "am")
I'm not sure the quasi-quotation stuff is 100% necessary in hindsight, but this is the pattern I use for similar needs:
my_scatterplot <- function(data, x, y){
.x = rlang::enquo(x)
.y = rlang::enquo(y)
data %>%
ggplot(aes(x = x, y = y))+
geom_point()+
labs(x = .x,
y = .y)
}
Let me know if it doesn't work for you, it should though. edit: Should add after DaveArmstrong's answer, the function would be called without quotes for the x / y variable e.g.
diamonds %>% my_scatterplot(price, table)
To pass a column name in the function you could use double curly braces {{...}} around the desired column name in the function body:
library(dplyr)
library(ggplot2)
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes_string({{x.variable}}, {{y.variable}})) +
geom_point(aes_string(color = {{color.variable}})) +
labs(x=x.variable, y=y.variable, colour=color.variable)
}
scatterplot(mtcars %>% mutate(am = as.factor(am)), x.variable = "mpg",
y.variable = "hp",
color.variable = "am")
With the amazing help of #Tung we have created a function that creates a list of ggplots through a loop using purrr::pwalk. However, the ploblem is that the plots are printed automatically and it is not possible (or I am not able to solve the problem) to save them as a list of plots. I am coming from this post: Passing labels to xlab and ylab in ggplot2
NOTE: I need to change the ylab and xlab labels from each plot.
The function to plot is as follows:
library(tidyverse)
plot_scatter_with_label <- function(dat,
var_x,
var_y,
label_x,
label_y,
geom_smooth = FALSE,
point_shape = 16,
point_color = "#EB3300",
point_size = 1,
point_alpha = 1,
smooth_method = "loess",
smooth_se = FALSE,
smooth_color = "navy") {
if (is.character(var_x)) {
print('character column names supplied, use rlang::sym()')
var_x <- rlang::sym(var_x)
} else {
print('bare column names supplied, use dplyr::enquo()')
var_x <- enquo(var_x)
}
if (is.character(var_y)) {
var_y <- rlang::sym(var_y)
} else {
var_y <- enquo(var_y)
}
p <- ggplot(dat, aes(x = !! var_x, y = !! var_y)) +
geom_point(shape = point_shape, color = point_color,
size = point_size, alpha = point_alpha) +
ylab(label_y) +
xlab(label_x) +
ggtitle(paste0(label_x, " ~ ", label_y))
print(p)
}
Create a data frame so that we can loop through every row and column
var_y = c("mpg", "hp")
label_y = c("Miles per gallon [Mpg]", "Horse power [CV]")
var_x = c("cyl", "gear")
label_x = c("Cylinders [n]", "Gear [n]")
var_xy <- expand.grid(var_x, var_y, stringsAsFactors = FALSE)
label_xy <- expand.grid(label_x, label_y, stringsAsFactors = FALSE)
select_dat <- data.frame(var_xy, label_xy, stringsAsFactors = FALSE)
pwalk(select_dat, ~ plot_scatter_with_label(mtcars, ..1, ..2,..3,..4))
The problem is that using pwalk and I am guessing that due to the print(p) from the function plot_scatter_with_label, the plots are automatically displayed. Instead, I would like to save them in a list of plots. For example:
I would like:
p_list = pwalk(select_dat, ~ plot_scatter_with_label(mtcars, ..1, ..2,..3,..4))
where p_list is a list of plots to "play with" using some function to arrange them like
cowplot::plot_grid(plotlist=p_list, nrow=3,ncol=2)
or
ggpubr::ggarrange(plotlist=p_list, nrow=3,ncol=2)
#Tung has recomended me to have a look at this post: Multiple plots in for loop ignoring par
However, I am still unable to find the solution.
Any help will he highly appreciated.
Thanks a lot in advance,
Best regards,
Juan Antonio
Edit:: corrected spelling of function plotlist
The pwalk function is explicitly designed not to return an output, but instead to focus on side-effects (like printing, reading/writing, or plotting). It is an alternative function to pmap, which does return its output.
You could return the plots in a list like this:
Change the last line of your custom function from print(p) to return(p)
Use the pmap function rather than pwalk
I'm having issue to put correlation coefficient on my scatter plot after facet_wrap by another variable.
Below is the example I made using mtcars dataset for illustration purpose.
when I plot it out, both plot have the same correlation number. It seems the correlation coef is not calculated for each facet. I could not figure out a way to achieve that. Really appreciate it if anyone could kindly help with that...
library(ggplot2)
library(dplyr)
corr_eqn <- function(x,y, method='pearson', digits = 2) {
corr_coef <- round(cor.test(x, y, method=method)$estimate, digits = digits)
corr_pval <- tryCatch(format(cor.test(x,y, method=method)$p.value,
scientific=TRUE),
error=function(e) NA)
paste(method, 'r = ', corr_coef, ',', 'pval =', corr_pval)
}
sca.plot <- function (cor.coef=TRUE) {
df<- mtcars %>% filter(vs==1)
p<- df %>%
ggplot(aes(x=hp, y=mpg))+
geom_point()+
geom_smooth()+
facet_wrap(~cyl, ncol=3)
if (cor.coef) {
p<- p+geom_text(x=0.9*max(df$hp, na.rm=TRUE),
y=0.9*max(df$mpg, na.rm=TRUE),
label = corr_eqn(df[['hp']],df[['mpg']],
method='pearson'))
}
return (p)
}
sca.plot(cor.coef=TRUE)
Call facets through variable inputFacet, loop over this variable to calculate corr_enq and plot facets using variable name with get.
In shiny you'll probably have user input as input$facet here it's called inputFacet. We plot main plot getting this variable in facet_wrap(~ get(inputFacet), ncol = 3). Next we loop over all facet options with for(i in seq_along(resCor$facets)) and store result in rescore.
This should solve "correlation coef is not calculated for each facet" problem.
library(dplyr)
library(ggplot2)
inputFacet <- "cyl"
cor.coef = TRUE
df <- mtcars
p <- df %>%
ggplot(aes(hp, mpg))+
geom_point()+
geom_smooth()+
facet_wrap(~ get(inputFacet), ncol = 3)
if (cor.coef) {
resCor <- data.frame(facets = unique(mtcars[, inputFacet]))
for(i in seq_along(resCor$facets)) {
foo <- mtcars[mtcars[, inputFacet] == resCor$facets[i], ]
resCor$text[i] <- corr_eqn(foo$hp, foo$mpg)
}
colnames(resCor)[1] <- inputFacet
p <- p + geom_text(data = resCor,
aes(0.9 * max(df$hp, na.rm = TRUE),
0.9 * max(df$mpg, na.rm = TRUE),
label = text))
}
p
I am trying to learn purrr from the tidyverse
I have set up a piece of code to attempt to plot all variables in the iris data-set against each other to see if they are linearly related. Unfortunately I don't seem to get anything back except blank plots. Below is my example. Can anyone help
library(tidyverse)
mydf <- iris %>%
as_tibble %>%
dplyr::select(everything(), -Species)
# Create a grid of names of columns
mynames <- names(mydf)
mygrid <- expand.grid(x=mynames, y =mynames)
# Define function
plot_my_data <- function(mydata, x, y){
ggplot(mydata, aes(x, y)) +
geom_smooth()}
map2(.x = mygrid$x,
.y = mygrid$y,
.f = ~ plot_my_data(mydf, .x,.y))
You have 2 issues in your code.
First one is that you use aes where you should use aes_string, and second is that you have factors and not characters in mygrid.
This works:
mygrid <- expand.grid(x=mynames, y =mynames,stringsAsFactors = F)
# Define function
plot_my_data <- function(mydata, x, y){
ggplot(mydata, aes_string(x, y)) +
geom_smooth()}
map2(.x = mygrid$x,
.y = mygrid$y,
.f = ~ plot_my_data(mydf, .x,.y))
I got data.frame in the nested list after I split them by given threshold. However, I am going to generate stack bar plot to make data more informative and easy to understand. I think using ggplot2 packages could be good choice, but I am quite new for using this packages, doing this is not intuitive.How can I get stack bar plot for data.frame in the nested list ? Any way getting bar plot, or pie graph for data.frame object easily ? Any idea ?
mini data :
myList <- list(
hola= data.frame( from=seq(1, by=4, len=15), to=seq(3, by=4, len=15), value=sample(30, 15)),
boo = data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), value=sample(45, 20)),
meh = data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), value=sample(36, 25))
)
helper function :
splitter <- function(mlist, threshold) {
res <- lapply(mlist, function(x) {
splt <- split(x, ifelse(x$value >= threshold, "pass", "fail"))
})
return(res)
}
#' #example
splitMe <- splitter(myList, threshold = 10)
I want to generate stack bar plot, pie graph by using ggplot2 packages. How can I make this happen easily ? Can any one point me how to do this task ?
How can I get stack bar plot for data.frame in the nested list ? How can I achieve my desired output plot ? Thanks a lot
You may not get these chart directly with this. But i think you will get the idea. Key is you need to get your data in proper format in order to plot.
Here i do some data manipulation for getting the data into data frame.
df=as.data.frame(unlist(lapply(splitMe,function(x) unlist(x))))
df$col=row.names(df)
names(df)[1]='val';row.names(df)=NULL
You can make making the new columns more dynamic.
df$col=gsub(paste("\\.|*[0-9]",lapply(splitMe[[1]], function(x) paste(names(x), collapse = "|"))[1], collapse = "", sep = "|"),"",df$col)
df$col1=gsub(paste(lapply(splitMe, function(x) paste(names(x), collapse = "|"))[1], collapse = "", sep = "|"),"",df$col)
df$col2=gsub(paste(names(splitMe), collapse = "|"),"",df$col)
Now i get the data in format which ggplot can easily work with.
library(ggplot2)
ggplot(data = df, aes(x = col1, fill = col2)) + geom_bar()
And you will get a plot like this.
I get inspiration form Chirayu Chamoli' solution :
plot_data <- df %>%
group_by(col1, col, col2) %>%
tally %>%
group_by(col, col2) %>%
mutate(percentage = n/sum(n), cumsum = cumsum(percentage))
library(ggplot2)
ggplot(data = plot_data, aes(x = col1, y=n ,fill = col2, width = .85)) +
geom_bar(stat = "identity")+
geom_text(aes(label=n), position = position_stack(vjust = .5))