Continuing on my quest to work with functions and ggplot:
I sorted out basic ways on how to use lapply and ggplot to cycle through a list of y_columns to make some individual plots:
require(ggplot2)
# using lapply with ggplot
df <- data.frame(x=c("a", "b", "c"), col1=c(1, 2, 3), col2=c(3, 2, 1), col3=c(4, 2, 3))
cols <- colnames(df[2:4])
myplots <- vector('list', 3)
plot_function <- function(y_column, data) {
ggplot(data, aes_string(x="x", y=y_column, fill = "x")) +
geom_col() +
labs(title=paste("lapply:", y_column))
}
myplots <- lapply(cols, plot_function, df)
myplots[[3]])
I know what to bring in a second variable that I will use to select rows. In my minimal example I am skipping the selection and just reusing the same plots and dfs as before, I simply add 3 iterations. So I would like to generate the same three plots as above, but now labelled as iteration A, B, and C.
I took me a while to sort out the syntax, but I now get that mapply needs to vectors of identical length that get passed on to the function as matched pairs. So I am using expand.grid to generate all pairs of variable 1 and variable 2 to create a dataframe and then pass the first and second column on via mapply. The next problem to sort out was that I need to pass on the dataframe as list MoreArgs =. So it seems like everything should be good to go. I am using the same syntax for aes_string() as above in my lapply example.
However, for some reason now it is not evaluating the y_column properly, but simply taking it as a value to plot, not as an indicator to plate the values contained in df$col1.
HELP!
require(ggplot2)
# using mapply with ggplot
df <- data.frame(x=c("a", "b", "c"), col1=c(1, 2, 3), col2=c(3, 2, 1), col3=c(4, 2, 3))
cols <- colnames(df[2:4])
iteration <- c("Iteration A", "Iteration B", "Iteration C")
multi_plot_function <- function(y_column, iteration, data) {
plot <- ggplot(data, aes_string(x="x", y=y_column, fill = "x")) +
geom_col() +
labs(title=paste("mapply:", y_column, "___", iteration))
}
# mapply call
combo <- expand.grid(cols=cols, iteration=iteration)
myplots <- mapply(multi_plot_function, combo[[1]], combo[[2]], MoreArgs = list(df), SIMPLIFY = F)
myplots[[3]]
We may need to use rowwise here
out <- lapply(asplit(combo, 1), function(x)
multi_plot_function(x[1], x[2], df))
In the OP's code, the only issue is that the columns are factor for 'combo', so it is not parsed correctly. If we change it to character, it works
out2 <- mapply(multi_plot_function, as.character(combo[[1]]),
as.character(combo[[2]]), MoreArgs = list(df), SIMPLIFY = FALSE)
-testing
out2[[1]]
Related
reprod:
df1 <- data.frame(X = c(0:9), Y = c(10:19))
df2 <- data.frame(X = c(0:9), Y = c(10:19))
df3 <- data.frame(X = c(0:9), Y = c(10:19))
list_of_df <- list(A = df1, B = df2, C = df3)
list_of_df
I'm trying to apply the rollmean function from zoo to every 'Y' column in this list of dataframes.
I've tried lapply with no success, It seems no matter which way i spin it, there is no way to get around specifying the dataframe you want to apply to at some point.
This does one of the dataframes
roll_mean <- rollmean(list_of_df$A, 2)
roll_mean
obviously this doesn't work:
roll_mean1 <- rollmean(list_of_df, 2)
roll_mean1
I also tried this:
subset(may not be necessary)
Sub1 <- lapply(list_of_df, "[", 2)
roll_mean1 <- rollmean(Sub1, 2)
roll_mean1
there doesn't seem to be a way to do it without having to
specify the particular dataframe in the rollmean function
lapply(list_of_df), function(x) rollmean(list_of_df, 2))
for loop? also no success
For (i in list_of_df) {roll_mean1 <- rollmean(Sub1, 2)
Exp
}
Stating the obvious but I'm very new to coding in general and would appreciate some pointers.
It has occurred to me that even if it did work, the column that has been averaged would be one value longer than the rest of the dataframe; how would I get around that?
The question at one point says that it wants to perform the rollmean only on Y and at another point says that this works roll_mean <- rollmean(list_of_df$A, 2) but that does all columns.
1) Assuming that you want to apply rollmean to all columns:
Use lapply like this:
lapply(list_of_df, rollmean, 2)
This also works:
for(i in seq_along(list_of_df)) list_of_df[[i]] <- rollmean(list_of_df[[i]], 2)
2) If you only want to apply it to the Y column:
lapply(list_of_df, transform, Y = rollmean(Y, 2, fill = NA))
or
for(i in seq_along(list_of_df)) {
list_of_df[[i]]$Y <- rollmean(list_of_df[[i]]$Y, 2, fill = NA)
}
I have a dataset with numeric and factor variables. I want to do one page with numeric and other with factor var. First of all, i select factor var with his index.
My df is IRIS dataset.
df<-iris
df$y<-sample(0:1,nrow(iris),replace=TRUE)
fact<-colnames(df)[sapply(df,is.factor)]
index_fact<-which(names(df)%in%fact)
Then i calculate rest of it (numerics)
nm<-ncol(df)-length(fact)
Next step is create loop
i_F=1
i_N=1
list_plotN<- list()
list_plotF<- list()
for (i in 1:length(df)){
plot <- ggplot(df,aes(x=df[,i],color=y,fill=y))+xlab(names(df)[i])
if (is.factor(df[,i])){
p_factor<-plot+geom_bar()
list_plotF[[i_F]]<-p_factor
i_F=i_F+1
}else{
p_numeric <- plot+geom_histogram()
list_plotN[[i_N]]<-p_numeric
i_N=i_N+1
}
}
When i see list_plotF and list_plot_N,it didn't well. It always have same vars. i don't know what i'm doing wrong.
thanks!!!
I don't really follow your for loop code all that well. But from what I see it seems to be saving the last plot in every loop you make. I've reconstructed what I think you need using lapply. I generally prefer lapply to for loops whenever I can.
Lapply takes a list of values and a function and applies that function to every value. you can define your function separately like I have so everything looks cleaner. Then you just mention the function in the lapply command.
In our case the list is a list of columns from your dataframe df. The function it applies first creates our base plot. Then it does a quick check to see if the column it is looking at is a factor.. If it's a factor it creates a bar graph, else it creates a histogram.
histOrBar <- function(var) {
basePlot <- ggplot(df, aes_string(var))
if ( is.factor(df[[var]]) ) {
basePlot + geom_bar()
} else {
basePlot + geom_histogram()
}
}
loDFs <- lapply(colnames(df), histOrBar)
Consider passing column names with aes_string to better align x with df:
for (i in 1:length(df)){
plot <- ggplot(df, aes_string(x=names(df)[i], color="y", fill="y")) +
xlab(names(df)[i])
...
}
To demonstrate the problem using aes() and solution using aes_string() in OP's context, consider the following random data frame with columns of different data types: factor, char, int, num, bool, date.
Data
library(ggplot2)
set.seed(1152019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")
random_df <- data.frame(
group = sample(data_tools, 500, replace=TRUE),
int = as.numeric(sample(1:15, 500, replace=TRUE)),
num = rnorm(500),
char = replicate(500, paste(sample(LETTERS[1:2], 3, replace=TRUE), collapse="")),
bool = as.numeric(sample(c(TRUE, FALSE), 500, replace=TRUE)),
date = as.Date(sample(as.integer(as.Date('2019-01-01', origin='1970-01-01')):as.integer(Sys.Date()),
500, replace=TRUE), origin='1970-01-01')
)
Graph
fact <- colnames(random_df)[sapply(random_df,is.factor)]
index_fact <- which(names(random_df) %in% fact)
i_F=1
i_N=1
list_plotN <- list()
list_plotF <- list()
plot <- NULL
for (i in 1:length(random_df)){
# aes() VERSION
#plot <- ggplot(random_df, aes(x=random_df[,i], color=group, fill=group)) +
# xlab(names(random_df)[i])
# aes_string() VERSION
plot <- ggplot(random_df, aes_string(x=names(random_df)[i], color="group", fill="group")) +
xlab(names(random_df)[i])
if (is.factor(random_df[,i])){
p_factor <- plot + geom_bar()
list_plotF[[i_F]] <- p_factor
i_F=i_F+1
}else{
p_numeric <- plot + geom_histogram()
list_plotN[[i_N]] <- p_numeric
i_N=i_N+1
}
}
Problem (using aes() where graph outputs DO NOT change according to type)
Solution (using aes_string() where graphs DO change according to type)
This question already has an answer here:
R: Create custom output from list object
(1 answer)
Closed 4 years ago.
I'm building a function that performs some actions on a dataset, and I want it to return a variety of plots that may or may not be needed at any given time.
My approach to now is to return a list of objects, including some ggplot objects.
The issue is that when I call the function without assignment, or execute the resulting list, the ggplots are plotted alongside the printing of the list summary, a behaviour I want to avoid as the object may include many ggplot objects.
Example:
library(ggplot2)
df <- data.frame(
x = 1:10,
y = 10:1,
g = rep(c('a', 'b'), each=5)
)
df_list <- split(df, df$g)
plot_list <- lapply(df_list, function(d){
ggplot(d) +
geom_point(aes(x=x, y=y))
})
plot_list
# $`a` # plots plot_list$a
#
# $`b` # plots plot_list$b
I don't want to modify the default behaviours of the ggplot2 object, though I am open to a more advanced S3 solution where I have no idea how to go about this.
You can simply override the default behaviour by having your function return an object from a custom class.
Option 1: subclass your plots
Here our plots are now quiet_plot which do not print.
To actually print them you'll have to explicitly call print.ggplot
library(ggplot2)
df <- data.frame(
x = 1:10,
y = 10:1,
g = rep(c('a', 'b'), each=5)
)
df_list <- split(df, df$g)
plot_list <- lapply(df_list, function(d){
out <- ggplot(d) +
geom_point(aes(x=x, y=y))
class(out) <- c("quiet_plot", class(out))
out
})
print.quiet_plot <- function(x, ...) {
print("A plot not displayed!")
}
plot_list
Option 2 - subclass your list
This allows you to specify how you want the list to be printed when you just type plot_list in the console. Here I had it print the names of the list instead of the full list content.
plot_list <- lapply(df_list, function(d){
ggplot(d) +
geom_point(aes(x=x, y=y))
})
class(plot_list) <- c("quiet_list", class(plot_list))
print.quiet_list <- function(x, ...) {
cat("A list with names:")
print(names(x))
}
plot_list
My toy dataframe:
d <- data.frame(
value = sample(1:10),
class = sample(c("a","b"), 20, replace = TRUE)
)
I split my data frame up by values of 'class' and put them in a list where each list element is named after its class:
l <- dlply(d, .(class), function(x)return(x))
Then I want to lapply over each class and make a histogram. Note that I do NOT want a facet. I want as many individual files saved as classes. So I define a function doPlots that makes histograms, then ggsaves them (as a_hist.png and b_hist.png, in this example):
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name,"hist.png",sep="_"))
}
However, when I lapply:
lapply(l, FUN=doPlots, name=names(l))
I get Error: device must be NULL, a string or a function.
Thanks in advance.
Two problems with your code, one is that you are passing the entire vector of names to the function. Second, you have not added a plot to save to the ggsave function. You can use mapply to iterate over two or more lists.
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name, "hist.png", sep="_"), g)
}
mapply(doPlots, l, names(l))
Consider base R's by which slices a dataframe by factor levels into a list of elements. You can even pass your user-defined function into it all in one call:
dlist <- by(d, d$class, FUN=function(i) {
name <- max(as.character(i$class))
doPlots(i, name)
})
This question already has answers here:
How can R loop over data frames?
(2 answers)
Closed 6 years ago.
Here is a simple made up data set:
df1 <- data.frame(x = c(1,2,3),
y = c(4,6,8),
z= c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6),
y = c(3,4,9),
z= c(6, 7, 7))
What I want to do is to create a new variable "a" which is just the sum of all three variables (x,y,z)
Instead of doing this separately for each dataframe I thought it would be more efficient to just create a loop. So here is the code I wrote:
my.list<- list(df1, df2)
for (i in 1:2) {
my.list[i]$a<- my.list[i]$x +my.list[i]$y + my.list[i]$z
}
or alternatively
for (i in 1:2) {
my.list[i]<- transform(my.list[i], a= x+ y+ z)
}
In both cases it does not work and the error "number of items to replace is not a multiple of replacement length" is returned.
What would be the best solution to writing a loop code where I can loop through dataframes?
See ?Extract:
Recursive (list-like) objects
Indexing by [ is similar to atomic vectors and selects a list of the
specified element(s).
Both [[ and $ select a single element of the list.
In short, my.list[i] returns a list of length 1, and you are trying to assign it a data.frame, so that doesn't work; whereas my.list[[i]] returns the data.frame #i in your list, which you can replace with a data.frame.
So you can use either:
for (i in 1:2) {
my.list[[i]]$a<- my.list[[i]]$x +my.list[[i]]$y + my.list[[i]]$z
}
or
for (i in 1:2) {
my.list[[i]]<- transform(my.list[[i]], a= x+ y+ z)
}
But it would be even simpler to use lapply, where you don't need [[:
my.list <- lapply(my.list, function(df) df$a <- df$x + df$y + df$z)
Rather than using an explicit loop to extract the data.frames from the list, just use lapply. It takes a list of data.frames (or any object) and a function, applies the function to every element of the list, and returns a list with the results.
# Sample data
df1 <- data.frame(x = c(1,2,3), y = c(4,6,8), z = c(1, 6, 7))
df2 <- data.frame(x = c(3,5,6), y = c(3,4,9), z = c(6, 7, 7))
# Put them in a list
df_list <- list(df1, df2)
# Use lapply to iterate. FUN takes the function you want, and
# then its arguments (a = x + y + z) are just listed after it.
result_list <- lapply(df_list, FUN = transform, a = x + y + z)