Iterate through list of DFs, grab their listname and apply a function? - r

I have a list of data frames which are named correctly in the list. I want to create circos plots using those data frames and save them using a for loop.
Here is the script where I create the plot:
jpeg("df-name.jpeg")
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
I think I need to do something like this but it doesn't work because it can't grab list names:
for (df in dflist) {
jpeg(paste0(df, ".jpeg"))
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
}
Although this grabs list names, it can't grab data frames:
dfnames <- names(dflist)
for (df in seq_along(dflist)) {
print(df.names[i])
}
How can I both grab list names and data frames and use them in a for loop in R? I appreciate very much the answers.
I found the answer, it was that simple:
for (name in names(dflist)) {
df <- dflist[[name]]
drawCircos(df, name)
}
I wrap the script in a function and applied the for loop above.

We may loop over the names of the dflist, subset the data inside to create a temporary object df and use the same code
for (nm in names(dflist)) {
jpeg(paste0(nm, ".jpeg"))
df <- dflist[[nm]]
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
}

Related

R: How to carry over colnames attribute using sapply?

The following toy code yields a density plot for each column of the y dataframe. However, sapply does not carry over the column name attributes. I'd like to name each new plot with the column name from which the data comes from. Any help is appreciated!
y <- data.frame(sample(1:50), sample(1:50), sample(1:50))
colnames(y) <- c("col1", "col2", "col3")
toy.func <- function(y) {
X11()
plot = plot(density(y), main = colnames(y))
return(plot)
}
result <- sapply(y, toy.func)
You are right and it makes sense: y is seen as a list and sapply goes over its element that are vectors, and we cannot assign a name to a vector. So, a somewhat minimal deviation from your approach that achieves what you want would be to use mapply:
toy.func <- function(y, name) {
X11()
plot = plot(density(y), main = name)
return(plot)
}
mapply(toy.func, y, colnames(y))
It applies toy.func by taking one element from y and one from colnames(y) in every step.
Another option would be to go over the column names at the same time providing the data frame
toy.func <- function(name, data) {
X11()
plot = plot(density(data[, name]), main = name)
return(plot)
}
sapply(colnames(y), toy.func, y)
Also note that your function can be simplified to, in this case,
toy.func <- function(name, data) {
X11()
plot(density(data[, name]), main = name)
}

Sending dataframes within list to a plot function

I'm trying to make multiple ggplot charts from multiple data frames. I have developed the code below but the final loop doesn't work.
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x) {
x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b)) +
ggsave(paste0(substitute(x),".png"))
}
ll <- list(df1,df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]])
}
I know its something to do with
ll[[i]]
but I dont understand why because when I put that in the console it gives the dataframe I want. Also, is there a way do this the tidyverse way with the map functions instead of a for loop?
I assume you want to see two files called df1.png and df2.png at the end.
You need to somehow pass on the names of the dataframes to the function. One way of doing it would be through named list, passing the name along with the content of the list element.
library(ggplot2)
library(purrr)
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x, nm) {
p <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(nm,".png"), p, device = "png")
}
ll <- list(df1=df1,df2=df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]], names(ll[i]))
}
In tidyverse you could just replace the loop with the following command without modifying the function.
purrr::walk2(ll, names(ll),chart_it)
or simply
purrr::iwalk(ll, chart_it)
There's also imap and lmap, but they will leave some output in the console, which is not what you would like to do, I guess.
The problem is in your chart_it function. It doesn't return a ggplot. Try saving the result of the pipe into a variable and return() that (or place it as the last statement in the function).
Something along the lines of
chart_it <- function(x) {
chart <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(substitute(x),".png")) # this will save the last ggplot figure
return(chart)
}

lapply when both list elements their names are arguments of the function

My toy dataframe:
d <- data.frame(
value = sample(1:10),
class = sample(c("a","b"), 20, replace = TRUE)
)
I split my data frame up by values of 'class' and put them in a list where each list element is named after its class:
l <- dlply(d, .(class), function(x)return(x))
Then I want to lapply over each class and make a histogram. Note that I do NOT want a facet. I want as many individual files saved as classes. So I define a function doPlots that makes histograms, then ggsaves them (as a_hist.png and b_hist.png, in this example):
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name,"hist.png",sep="_"))
}
However, when I lapply:
lapply(l, FUN=doPlots, name=names(l))
I get Error: device must be NULL, a string or a function.
Thanks in advance.
Two problems with your code, one is that you are passing the entire vector of names to the function. Second, you have not added a plot to save to the ggsave function. You can use mapply to iterate over two or more lists.
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name, "hist.png", sep="_"), g)
}
mapply(doPlots, l, names(l))
Consider base R's by which slices a dataframe by factor levels into a list of elements. You can even pass your user-defined function into it all in one call:
dlist <- by(d, d$class, FUN=function(i) {
name <- max(as.character(i$class))
doPlots(i, name)
})

When Using Map(), How to Select the Name of Each Column?

I intend to write a function that plots a histogram for numeric columns in the dataframe using R. However, the problem is I do not know how to choose the name of that column as histogram's title. For example, the title of "age" column should be "Histogram of age". Can you guys give me some advice? Thanks very much.
# Plot histograms for x
hist_numeric <- function(x){
if (is.numeric(x) | is.integer(x)){
hist(x, main = "???")
} else {
message("Not integer or numeric varible")
}
}
# plot histograms for every column in the dataframe
map(df, hist_numeric)
You can use colnames to put the column names into a list prior to the function call and then use map2 to call multiple arguments for the function:
hist_numeric <- function(x, name){
if (is.numeric(x) | is.integer(x)){
hist(x, main = name)
} else {
message("Not integer or numeric varible")
}
}
df <- data.frame(x = rnorm(50),
y = letters[1:10],
z = runif(50))
names_col <- colnames(df)
map2(.x = df, .y = names_col, .f = hist_numeric)

How to simplify functions in a for loop

I have a for loop containing the following:
for (i in 1:100) {
#calculate correlation
correlationList1a[[i]] <- sapply(seq(1,14),
function(x) cor(validationSetsA.list[[i]][,x], medianListA[[i]]))
correlationList2a[[i]] <- sapply(seq(1,14),
function(x) cor(validationSetsA.list[[i]][,x], medianListB[[i]]))
}
How can I simplify this? correlationList1a and correlationList2a are basically doing the same thing the only thing that is different is that correlation1a contains medianListA and correlationList2a contains medianListB.
It looks like this is a case for mapply.
mapply(function(x, y) apply(x[,seq(1,14)], 2, cor, y=y),
x = validationSetsA.list,
y = medianListA,
SIMPLIFY = FALSE)

Resources