I have a dataset with numeric and factor variables. I want to do one page with numeric and other with factor var. First of all, i select factor var with his index.
My df is IRIS dataset.
df<-iris
df$y<-sample(0:1,nrow(iris),replace=TRUE)
fact<-colnames(df)[sapply(df,is.factor)]
index_fact<-which(names(df)%in%fact)
Then i calculate rest of it (numerics)
nm<-ncol(df)-length(fact)
Next step is create loop
i_F=1
i_N=1
list_plotN<- list()
list_plotF<- list()
for (i in 1:length(df)){
plot <- ggplot(df,aes(x=df[,i],color=y,fill=y))+xlab(names(df)[i])
if (is.factor(df[,i])){
p_factor<-plot+geom_bar()
list_plotF[[i_F]]<-p_factor
i_F=i_F+1
}else{
p_numeric <- plot+geom_histogram()
list_plotN[[i_N]]<-p_numeric
i_N=i_N+1
}
}
When i see list_plotF and list_plot_N,it didn't well. It always have same vars. i don't know what i'm doing wrong.
thanks!!!
I don't really follow your for loop code all that well. But from what I see it seems to be saving the last plot in every loop you make. I've reconstructed what I think you need using lapply. I generally prefer lapply to for loops whenever I can.
Lapply takes a list of values and a function and applies that function to every value. you can define your function separately like I have so everything looks cleaner. Then you just mention the function in the lapply command.
In our case the list is a list of columns from your dataframe df. The function it applies first creates our base plot. Then it does a quick check to see if the column it is looking at is a factor.. If it's a factor it creates a bar graph, else it creates a histogram.
histOrBar <- function(var) {
basePlot <- ggplot(df, aes_string(var))
if ( is.factor(df[[var]]) ) {
basePlot + geom_bar()
} else {
basePlot + geom_histogram()
}
}
loDFs <- lapply(colnames(df), histOrBar)
Consider passing column names with aes_string to better align x with df:
for (i in 1:length(df)){
plot <- ggplot(df, aes_string(x=names(df)[i], color="y", fill="y")) +
xlab(names(df)[i])
...
}
To demonstrate the problem using aes() and solution using aes_string() in OP's context, consider the following random data frame with columns of different data types: factor, char, int, num, bool, date.
Data
library(ggplot2)
set.seed(1152019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")
random_df <- data.frame(
group = sample(data_tools, 500, replace=TRUE),
int = as.numeric(sample(1:15, 500, replace=TRUE)),
num = rnorm(500),
char = replicate(500, paste(sample(LETTERS[1:2], 3, replace=TRUE), collapse="")),
bool = as.numeric(sample(c(TRUE, FALSE), 500, replace=TRUE)),
date = as.Date(sample(as.integer(as.Date('2019-01-01', origin='1970-01-01')):as.integer(Sys.Date()),
500, replace=TRUE), origin='1970-01-01')
)
Graph
fact <- colnames(random_df)[sapply(random_df,is.factor)]
index_fact <- which(names(random_df) %in% fact)
i_F=1
i_N=1
list_plotN <- list()
list_plotF <- list()
plot <- NULL
for (i in 1:length(random_df)){
# aes() VERSION
#plot <- ggplot(random_df, aes(x=random_df[,i], color=group, fill=group)) +
# xlab(names(random_df)[i])
# aes_string() VERSION
plot <- ggplot(random_df, aes_string(x=names(random_df)[i], color="group", fill="group")) +
xlab(names(random_df)[i])
if (is.factor(random_df[,i])){
p_factor <- plot + geom_bar()
list_plotF[[i_F]] <- p_factor
i_F=i_F+1
}else{
p_numeric <- plot + geom_histogram()
list_plotN[[i_N]] <- p_numeric
i_N=i_N+1
}
}
Problem (using aes() where graph outputs DO NOT change according to type)
Solution (using aes_string() where graphs DO change according to type)
Related
I am trying to write a function that returns a series of ggplot scatterplots from a data frame. Below is a reproducible data frame as well as the function I've written
a <- sample(0:20,20,rep=TRUE)
b <- sample(0:300,20,rep=TRUE)
c <- sample(0:3, 20, rep=TRUE)
d <- rep("dog", 20)
df <- data.frame(a,b,c,d)
loopGraph <- function(dataFrame, y_value, x_type){
if(is.numeric(dataFrame[,y_value]) == TRUE && x_type == "numeric"){
dataFrame_number <- dataFrame %>%
dplyr::select_if(is.numeric) %>%
filter(y_value!=0) %>%
dplyr::select(-y_value)
x <- 1
y<-ncol(dataFrame_number)
endList <- list()
for (i in x:y)
{
i
x_value <-colnames(dataFrame_number)[i]
plotDataFrame <- cbind(dataFrame_number[,x_value], dataFrame[,y_value]) %>% as.data.frame()
r2 <- summary(lm(plotDataFrame[,2]~plotDataFrame[,1]))$r.squared
ggplot <- ggplot(plotDataFrame, aes(y=plotDataFrame[,2],x=plotDataFrame[,1])) +
geom_point()+
geom_smooth(method=lm) +
labs(title = paste("Scatterplot of", y_value, "vs.", x_value), subtitle= paste0("R^2=",r2), x = x_value,y=y_value)
endList[[i]] <- ggplot
}
return(endList)
}
else{print("Try again")}
}
loopGraph(df, "a", "numeric")
What I want is to return the object endList so I can look at the multiple scatterplots generated by this function. What happens is that the function prints each scatterplot in the plots window without giving me access to the endList object.
How can I get this function to return the endList object? Is there a better way to go about this? Thanks in advance!
Update
Thanks to #GordonShumway for solving my first issue. Now, when I define plots as plots <- loopGraph(df, "a", "numeric"), I can view all the outputs. However, all the graphs are of the first ggplot feature, even though the labels change. Any intuition as to why this is happening? Or how to fix it? I tried adding dev.set(dev.next()) to no avail.
I have a dataset with 99 observations and I need to create boxplots for ones with a specific string in them. However, when I run this code I get 57 of the exact same plots from the original function instead of the loop. I was wondering how to prevent the plots from being overwritten but still create all 57. Here is the code and a picture of the plot.
Thanks!
Boxplot Format
#starting boxplot function
myboxplot <- function(mydata=ivf_dataset, myexposure =
"ART_CURRENT", myoutcome = "MEG3_DMR_mean")
{bp <- ggplot(ivf_dataset, aes(ART_CURRENT, MEG3_DMR_mean))
bp <- bp + geom_boxplot(aes(group =ART_CURRENT))
}
#pulling out variables needed for plots
outcomes = names(ivf_dataset)[grep("_DMR_", names(ivf_dataset),
ignore.case = T)]
#creating loop for 57 boxplots
allplots <- list()
for (i in seq_along(outcomes))
{
allplots[[i]]<- myboxplot (myexposure = "ART_CURRENT", myoutcome =
outcomes[i])
}
allplots
I recommend reading about standard and non-standard evaluation and how this works with the tidyverse. Here are some links
http://adv-r.had.co.nz/Functions.html#function-arguments
http://adv-r.had.co.nz/Computing-on-the-language.html
I also found this useful
https://rstudio-pubs-static.s3.amazonaws.com/97970_465837f898094848b293e3988a1328c6.html
Also, you need to produce an example so that it is possible to replicate your problem. Here is the data that I created.
df <- data.frame(label = rep(c("a","b","c"), 5),
x = rnorm(15),
y = rnorm(15),
x2 = rnorm(15, 10),
y2 = rnorm(15, 5))
I kept most of your code the same and only changed what needed to be changed.
myboxplot2 <- function(mydata = df, myexposure, myoutcome){
bp <- ggplot(mydata, aes_(as.name(myexposure), as.name(myoutcome))) +
geom_boxplot()
print(bp)
}
myboxplot2(myexposure = "label", myoutcome = "y")
Because aes() uses non-standard evaluation, you need to use aes_(). Again, read the links above.
Here I am getting all the columns that start with x. I am assuming that your code gets the columns that you want.
outcomes <- names(df)[grep("^x", names(df), ignore.case = TRUE)]
Here I am looping through in the same way that you did. I am only storing the plot object though.
allplots <- list()
for (i in seq_along(outcomes)){
allplots[[i]]<- myboxplot2(myexposure = "label", myoutcome = outcomes[i])$plot
}
allplots
I'm trying to make multiple ggplot charts from multiple data frames. I have developed the code below but the final loop doesn't work.
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x) {
x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b)) +
ggsave(paste0(substitute(x),".png"))
}
ll <- list(df1,df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]])
}
I know its something to do with
ll[[i]]
but I dont understand why because when I put that in the console it gives the dataframe I want. Also, is there a way do this the tidyverse way with the map functions instead of a for loop?
I assume you want to see two files called df1.png and df2.png at the end.
You need to somehow pass on the names of the dataframes to the function. One way of doing it would be through named list, passing the name along with the content of the list element.
library(ggplot2)
library(purrr)
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x, nm) {
p <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(nm,".png"), p, device = "png")
}
ll <- list(df1=df1,df2=df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]], names(ll[i]))
}
In tidyverse you could just replace the loop with the following command without modifying the function.
purrr::walk2(ll, names(ll),chart_it)
or simply
purrr::iwalk(ll, chart_it)
There's also imap and lmap, but they will leave some output in the console, which is not what you would like to do, I guess.
The problem is in your chart_it function. It doesn't return a ggplot. Try saving the result of the pipe into a variable and return() that (or place it as the last statement in the function).
Something along the lines of
chart_it <- function(x) {
chart <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(substitute(x),".png")) # this will save the last ggplot figure
return(chart)
}
My toy dataframe:
d <- data.frame(
value = sample(1:10),
class = sample(c("a","b"), 20, replace = TRUE)
)
I split my data frame up by values of 'class' and put them in a list where each list element is named after its class:
l <- dlply(d, .(class), function(x)return(x))
Then I want to lapply over each class and make a histogram. Note that I do NOT want a facet. I want as many individual files saved as classes. So I define a function doPlots that makes histograms, then ggsaves them (as a_hist.png and b_hist.png, in this example):
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name,"hist.png",sep="_"))
}
However, when I lapply:
lapply(l, FUN=doPlots, name=names(l))
I get Error: device must be NULL, a string or a function.
Thanks in advance.
Two problems with your code, one is that you are passing the entire vector of names to the function. Second, you have not added a plot to save to the ggsave function. You can use mapply to iterate over two or more lists.
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name, "hist.png", sep="_"), g)
}
mapply(doPlots, l, names(l))
Consider base R's by which slices a dataframe by factor levels into a list of elements. You can even pass your user-defined function into it all in one call:
dlist <- by(d, d$class, FUN=function(i) {
name <- max(as.character(i$class))
doPlots(i, name)
})
I have a dataframe with lots of possible combinations of variables and for exploratory purposes I need to see univariate distributions from these combinations of variables. I succeeded doing it with for loops but would like to find a better and a faster way of doing it. Anybody has an idea?
I have produced a following code:
library(ggplot2)
library(dplyr)
SubjectID <- c(3772113,3772468)
Group <- c("Easy","Hard")
Object <- c("A","B")
dat <- data.frame(expand.grid(SubjectID,Group,Object))
dat$RT <- rnorm(8,1500,700)
colnames(dat) <- c("SubjectID","Group","Object","RT")
# GGplot function
pl <- function(x,group, object){
x <- filter(x, Group==group, Object==object)
print(ggplot(x,aes(x=RT)) +
geom_histogram(binwidth = 0.05) +
xlab("Reactions per second") +
ggtitle(paste(as.character(group),"_",as.character(object)), sep=""))
ggsave(paste(as.character(group),"_",as.character(object),".png"), path = "...")
}
for (group in unique(dat$Group)){
for (object in unique(dat$Object)){
pl(dat,group,object)
}
}
How can I replace the nested for loops in this graph printing?
You can try with lapply:
all_comb <- with(dat, expand.grid(levels(Group), levels(Object)))
lapply(1:nrow(all_comb),
function(i) pl(dat, group = all_comb[i, 1], object=all_comb[i, 2]))