R: How to carry over colnames attribute using sapply? - r

The following toy code yields a density plot for each column of the y dataframe. However, sapply does not carry over the column name attributes. I'd like to name each new plot with the column name from which the data comes from. Any help is appreciated!
y <- data.frame(sample(1:50), sample(1:50), sample(1:50))
colnames(y) <- c("col1", "col2", "col3")
toy.func <- function(y) {
X11()
plot = plot(density(y), main = colnames(y))
return(plot)
}
result <- sapply(y, toy.func)

You are right and it makes sense: y is seen as a list and sapply goes over its element that are vectors, and we cannot assign a name to a vector. So, a somewhat minimal deviation from your approach that achieves what you want would be to use mapply:
toy.func <- function(y, name) {
X11()
plot = plot(density(y), main = name)
return(plot)
}
mapply(toy.func, y, colnames(y))
It applies toy.func by taking one element from y and one from colnames(y) in every step.
Another option would be to go over the column names at the same time providing the data frame
toy.func <- function(name, data) {
X11()
plot = plot(density(data[, name]), main = name)
return(plot)
}
sapply(colnames(y), toy.func, y)
Also note that your function can be simplified to, in this case,
toy.func <- function(name, data) {
X11()
plot(density(data[, name]), main = name)
}

Related

Iterate through list of DFs, grab their listname and apply a function?

I have a list of data frames which are named correctly in the list. I want to create circos plots using those data frames and save them using a for loop.
Here is the script where I create the plot:
jpeg("df-name.jpeg")
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
I think I need to do something like this but it doesn't work because it can't grab list names:
for (df in dflist) {
jpeg(paste0(df, ".jpeg"))
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
}
Although this grabs list names, it can't grab data frames:
dfnames <- names(dflist)
for (df in seq_along(dflist)) {
print(df.names[i])
}
How can I both grab list names and data frames and use them in a for loop in R? I appreciate very much the answers.
I found the answer, it was that simple:
for (name in names(dflist)) {
df <- dflist[[name]]
drawCircos(df, name)
}
I wrap the script in a function and applied the for loop above.
We may loop over the names of the dflist, subset the data inside to create a temporary object df and use the same code
for (nm in names(dflist)) {
jpeg(paste0(nm, ".jpeg"))
df <- dflist[[nm]]
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
}

Save ggplot in loop with R

I have a dataset with numeric and factor variables. I want to do one page with numeric and other with factor var. First of all, i select factor var with his index.
My df is IRIS dataset.
df<-iris
df$y<-sample(0:1,nrow(iris),replace=TRUE)
fact<-colnames(df)[sapply(df,is.factor)]
index_fact<-which(names(df)%in%fact)
Then i calculate rest of it (numerics)
nm<-ncol(df)-length(fact)
Next step is create loop
i_F=1
i_N=1
list_plotN<- list()
list_plotF<- list()
for (i in 1:length(df)){
plot <- ggplot(df,aes(x=df[,i],color=y,fill=y))+xlab(names(df)[i])
if (is.factor(df[,i])){
p_factor<-plot+geom_bar()
list_plotF[[i_F]]<-p_factor
i_F=i_F+1
}else{
p_numeric <- plot+geom_histogram()
list_plotN[[i_N]]<-p_numeric
i_N=i_N+1
}
}
When i see list_plotF and list_plot_N,it didn't well. It always have same vars. i don't know what i'm doing wrong.
thanks!!!
I don't really follow your for loop code all that well. But from what I see it seems to be saving the last plot in every loop you make. I've reconstructed what I think you need using lapply. I generally prefer lapply to for loops whenever I can.
Lapply takes a list of values and a function and applies that function to every value. you can define your function separately like I have so everything looks cleaner. Then you just mention the function in the lapply command.
In our case the list is a list of columns from your dataframe df. The function it applies first creates our base plot. Then it does a quick check to see if the column it is looking at is a factor.. If it's a factor it creates a bar graph, else it creates a histogram.
histOrBar <- function(var) {
basePlot <- ggplot(df, aes_string(var))
if ( is.factor(df[[var]]) ) {
basePlot + geom_bar()
} else {
basePlot + geom_histogram()
}
}
loDFs <- lapply(colnames(df), histOrBar)
Consider passing column names with aes_string to better align x with df:
for (i in 1:length(df)){
plot <- ggplot(df, aes_string(x=names(df)[i], color="y", fill="y")) +
xlab(names(df)[i])
...
}
To demonstrate the problem using aes() and solution using aes_string() in OP's context, consider the following random data frame with columns of different data types: factor, char, int, num, bool, date.
Data
library(ggplot2)
set.seed(1152019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")
random_df <- data.frame(
group = sample(data_tools, 500, replace=TRUE),
int = as.numeric(sample(1:15, 500, replace=TRUE)),
num = rnorm(500),
char = replicate(500, paste(sample(LETTERS[1:2], 3, replace=TRUE), collapse="")),
bool = as.numeric(sample(c(TRUE, FALSE), 500, replace=TRUE)),
date = as.Date(sample(as.integer(as.Date('2019-01-01', origin='1970-01-01')):as.integer(Sys.Date()),
500, replace=TRUE), origin='1970-01-01')
)
Graph
fact <- colnames(random_df)[sapply(random_df,is.factor)]
index_fact <- which(names(random_df) %in% fact)
i_F=1
i_N=1
list_plotN <- list()
list_plotF <- list()
plot <- NULL
for (i in 1:length(random_df)){
# aes() VERSION
#plot <- ggplot(random_df, aes(x=random_df[,i], color=group, fill=group)) +
# xlab(names(random_df)[i])
# aes_string() VERSION
plot <- ggplot(random_df, aes_string(x=names(random_df)[i], color="group", fill="group")) +
xlab(names(random_df)[i])
if (is.factor(random_df[,i])){
p_factor <- plot + geom_bar()
list_plotF[[i_F]] <- p_factor
i_F=i_F+1
}else{
p_numeric <- plot + geom_histogram()
list_plotN[[i_N]] <- p_numeric
i_N=i_N+1
}
}
Problem (using aes() where graph outputs DO NOT change according to type)
Solution (using aes_string() where graphs DO change according to type)

Using for() over variables that need to be changed

I'd like to be able use for() loop to automate the same operation that runs over many variables modifying them.
Here's simplest example to could design:
varToChange = list( 1:10, iris$Species[1:10], letters[1:10]) # assume that it has many more than just 3 elements
varToChange
for (i in varToChange ) {
if (is.character(y)) i <- as.integer(as.ordered(i))
if (is.factor(y)) i <- as.integer(i)
}
varToChange # <-- Here I want to see my elements as integers now
Here's actual example that led me to this question - taken from: Best way to plot automatically all data.table columns using ggplot2
In the following function
f <- function(dt, x,y,k) {
if (is.numeric(x)) x <- names(dt)[x]
if (is.numeric(y)) y <- names(dt)[y]
if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
instead of brutally repeating the same line many times, as a programmer, I would rather have a loop to repeat this line for me.
Something like this one:
for (i in c(x,y,k)) {
if (is.numeric(i)) i <- names(dt)[i]
}
In C/C++ this would have been done using pointers. In R - is it all possible?
UPDATE: Very nice idea to use Map below. However it does not work for this example
getColName <- function(dt, x) {
if (is.numeric(x)) {
x <- names(dt)[x]
}
x
}
f<- function(dt, x,y,k) {
list(x,y,k) <- Map(getColName, list(x,y,k), dt)
# if (is.numeric(x)) x <- names(dt)[x]
# if (is.numeric(y)) y <- names(dt)[y]
# if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2) # Brrr..
No need for for loop, just Map a function over each of your list items
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
myfun <- function(y) {
if (is.character(y)) y <- as.integer(as.ordered(y))
if (is.factor(y)) y <- as.integer(y)
y
}
varToChange <- Map(myfun, varToChange)
UPDATE: Map never modifies variables in place, This is simply not done in R. Use the new values returned by Map
f<- function(dt, x, y, k) {
args <- Map(function(x) getColName(dt, x), list(x=x,y=y,k=k))
ggplot(dt, aes_string(args$x,args$y, col=args$k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
You have two choices for iteration in R, iterate over variables themselves, or over their indices. I generally recommend iterating over indices. This case illustrates a strong advantage of that because your question is a non-issue if you are using indices.
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
for (i in seq_along(varToChange)) {
if (is.character(varToChange[[i]])) varToChange[[i]] <- as.integer(as.factor(varToChange[[i]]))
if (is.factor(varToChange[[i]])) varToChange[[i]] <- as.integer(varToChange[[i]])
}
I also replaced as.ordered() with as.factor() - the only difference between an ordered factor and a regular factor are the default contrasts used in modeling. As you are just coercing to integer, it doesn't matter.

lapply when both list elements their names are arguments of the function

My toy dataframe:
d <- data.frame(
value = sample(1:10),
class = sample(c("a","b"), 20, replace = TRUE)
)
I split my data frame up by values of 'class' and put them in a list where each list element is named after its class:
l <- dlply(d, .(class), function(x)return(x))
Then I want to lapply over each class and make a histogram. Note that I do NOT want a facet. I want as many individual files saved as classes. So I define a function doPlots that makes histograms, then ggsaves them (as a_hist.png and b_hist.png, in this example):
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name,"hist.png",sep="_"))
}
However, when I lapply:
lapply(l, FUN=doPlots, name=names(l))
I get Error: device must be NULL, a string or a function.
Thanks in advance.
Two problems with your code, one is that you are passing the entire vector of names to the function. Second, you have not added a plot to save to the ggsave function. You can use mapply to iterate over two or more lists.
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name, "hist.png", sep="_"), g)
}
mapply(doPlots, l, names(l))
Consider base R's by which slices a dataframe by factor levels into a list of elements. You can even pass your user-defined function into it all in one call:
dlist <- by(d, d$class, FUN=function(i) {
name <- max(as.character(i$class))
doPlots(i, name)
})

When Using Map(), How to Select the Name of Each Column?

I intend to write a function that plots a histogram for numeric columns in the dataframe using R. However, the problem is I do not know how to choose the name of that column as histogram's title. For example, the title of "age" column should be "Histogram of age". Can you guys give me some advice? Thanks very much.
# Plot histograms for x
hist_numeric <- function(x){
if (is.numeric(x) | is.integer(x)){
hist(x, main = "???")
} else {
message("Not integer or numeric varible")
}
}
# plot histograms for every column in the dataframe
map(df, hist_numeric)
You can use colnames to put the column names into a list prior to the function call and then use map2 to call multiple arguments for the function:
hist_numeric <- function(x, name){
if (is.numeric(x) | is.integer(x)){
hist(x, main = name)
} else {
message("Not integer or numeric varible")
}
}
df <- data.frame(x = rnorm(50),
y = letters[1:10],
z = runif(50))
names_col <- colnames(df)
map2(.x = df, .y = names_col, .f = hist_numeric)

Resources