When Using Map(), How to Select the Name of Each Column? - r

I intend to write a function that plots a histogram for numeric columns in the dataframe using R. However, the problem is I do not know how to choose the name of that column as histogram's title. For example, the title of "age" column should be "Histogram of age". Can you guys give me some advice? Thanks very much.
# Plot histograms for x
hist_numeric <- function(x){
if (is.numeric(x) | is.integer(x)){
hist(x, main = "???")
} else {
message("Not integer or numeric varible")
}
}
# plot histograms for every column in the dataframe
map(df, hist_numeric)

You can use colnames to put the column names into a list prior to the function call and then use map2 to call multiple arguments for the function:
hist_numeric <- function(x, name){
if (is.numeric(x) | is.integer(x)){
hist(x, main = name)
} else {
message("Not integer or numeric varible")
}
}
df <- data.frame(x = rnorm(50),
y = letters[1:10],
z = runif(50))
names_col <- colnames(df)
map2(.x = df, .y = names_col, .f = hist_numeric)

Related

Iterate through list of DFs, grab their listname and apply a function?

I have a list of data frames which are named correctly in the list. I want to create circos plots using those data frames and save them using a for loop.
Here is the script where I create the plot:
jpeg("df-name.jpeg")
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
I think I need to do something like this but it doesn't work because it can't grab list names:
for (df in dflist) {
jpeg(paste0(df, ".jpeg"))
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
}
Although this grabs list names, it can't grab data frames:
dfnames <- names(dflist)
for (df in seq_along(dflist)) {
print(df.names[i])
}
How can I both grab list names and data frames and use them in a for loop in R? I appreciate very much the answers.
I found the answer, it was that simple:
for (name in names(dflist)) {
df <- dflist[[name]]
drawCircos(df, name)
}
I wrap the script in a function and applied the for loop above.
We may loop over the names of the dflist, subset the data inside to create a temporary object df and use the same code
for (nm in names(dflist)) {
jpeg(paste0(nm, ".jpeg"))
df <- dflist[[nm]]
circos.initialize(df$sectors, x = df$x)
circos.track(df$sectors, y = df$y,
panel.fun = function(x, y) {
circos.text(CELL_META$xcenter,
CELL_META$cell.ylim[2] + mm_y(5),
CELL_META$sector.index)
circos.axis(labels.cex = 0.6)
})
dev.off()
}

looping over variables of a data.frame leading one final data.frame in R

I have written a function to change any one variable (i.e., column) in a data.frame to its unique levels and return the changed data.frame.
I wonder how to change multiple variables at once using my function and get one final data.frame with all the changes?
I have tried the following, but this gives multiple data.frames while only the last data.frame is the desired output:
data <- data.frame(sid = c(33,33, 41), pid = c('Bob', 'Bob', 'Jim'))
#== My function for ONE variable:
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
return(data)
}
# Looping over `what`:
what <- c('sid', 'pid')
lapply(seq_along(what), function(i) f(data, what[i]))
In the function, we could change to return the data[[what]]
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
data[[what]]
}
data[what] <- lapply(seq_along(what), function(i) f(data, what[i]))
Or do
data[what] <- lapply(what, function(x) f(data, x))
Or simply
data[what] <- lapply(what, f, data = data)

How do I calculate the correlation between variables x and y in data frame data for each category of a factor cat?

I am trying to generate a function to calculate the correlation between variables age and views in data frame data for each category of gender.
My data frame is called tv_viewing with 5 columns: adhd (numeric), sex (factor, boy/girl), famsize (factor with 4 levels (1 child, 2 child, 3 child, 4+child), age (numeric) and views (numeric, amount of watching television).
I have gotten this far:
partcorr <- function(tv_viewing, age, views, sex) {
corrs <- list()
for(i(tv_viewing[,sex])) {
corrs[i] <- round(sex(tv_viewing[tv_viewing[,sex] == i, age], tv_viewing[tv_viewing[,sex] == i, views], method = "pearson"), digits = 2)
}
return()
}
Or, more generally,
partcorr <- function(data, x, y, cat) {
corrs <- list()
for(i in levels(data[,cat])) {
corrs[i] <- round(cor(data[data[,cat] == i, x], data[data[,cat] == i, y], method = "pearson"), digits = 2)
}
return()
}
But this is not working. What am I doing wrong?
You can use the function by() and do not need a for loop. With your data and columns, it should work like this:
by(tv_viewing[, c("age", "views")], tv_viewing$sex, cor)
Or, as I do not have access to your data, a reproducible using the iris data set is as follows:
by(iris[, 1:2], iris$Species, cor)
Maybe your code did not work because you did not quote your column names (tv_viewing[, sex] will not read your column sex, but tv_viewing[, "sex"] does).
EDIT: If you need the correlations in one vector, you can use this code:
c(by(iris[, 1:2], iris$Species, function(x) cor(x)[2, 1]))
partcorr <- function(df,
col1,
col2,
cat,
digits = 2,
method = "pearson",
... ) {
dfs <- split(df[, c(col1, col2)], df[, cat])
sapply(dfs, function(sdf) round(cor(sdf[, col1],
sdf[, col2],
method = method,
...),
digits = digits))
}
creates for each category a subdataframe of the original data
with the columns given for col1 and col2 - in a list of data frames (dfs).
Then it loops over this list of data frames and calculates
the correlation with the given method between the col1 and col2 columns
of the subdataframes. The ... makes that you can pass to partcorr
any additional arguments for the cor function - it will forward it to the
cor function.
Also the digits can be changed.

R: How to carry over colnames attribute using sapply?

The following toy code yields a density plot for each column of the y dataframe. However, sapply does not carry over the column name attributes. I'd like to name each new plot with the column name from which the data comes from. Any help is appreciated!
y <- data.frame(sample(1:50), sample(1:50), sample(1:50))
colnames(y) <- c("col1", "col2", "col3")
toy.func <- function(y) {
X11()
plot = plot(density(y), main = colnames(y))
return(plot)
}
result <- sapply(y, toy.func)
You are right and it makes sense: y is seen as a list and sapply goes over its element that are vectors, and we cannot assign a name to a vector. So, a somewhat minimal deviation from your approach that achieves what you want would be to use mapply:
toy.func <- function(y, name) {
X11()
plot = plot(density(y), main = name)
return(plot)
}
mapply(toy.func, y, colnames(y))
It applies toy.func by taking one element from y and one from colnames(y) in every step.
Another option would be to go over the column names at the same time providing the data frame
toy.func <- function(name, data) {
X11()
plot = plot(density(data[, name]), main = name)
return(plot)
}
sapply(colnames(y), toy.func, y)
Also note that your function can be simplified to, in this case,
toy.func <- function(name, data) {
X11()
plot(density(data[, name]), main = name)
}

lapply when both list elements their names are arguments of the function

My toy dataframe:
d <- data.frame(
value = sample(1:10),
class = sample(c("a","b"), 20, replace = TRUE)
)
I split my data frame up by values of 'class' and put them in a list where each list element is named after its class:
l <- dlply(d, .(class), function(x)return(x))
Then I want to lapply over each class and make a histogram. Note that I do NOT want a facet. I want as many individual files saved as classes. So I define a function doPlots that makes histograms, then ggsaves them (as a_hist.png and b_hist.png, in this example):
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name,"hist.png",sep="_"))
}
However, when I lapply:
lapply(l, FUN=doPlots, name=names(l))
I get Error: device must be NULL, a string or a function.
Thanks in advance.
Two problems with your code, one is that you are passing the entire vector of names to the function. Second, you have not added a plot to save to the ggsave function. You can use mapply to iterate over two or more lists.
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name, "hist.png", sep="_"), g)
}
mapply(doPlots, l, names(l))
Consider base R's by which slices a dataframe by factor levels into a list of elements. You can even pass your user-defined function into it all in one call:
dlist <- by(d, d$class, FUN=function(i) {
name <- max(as.character(i$class))
doPlots(i, name)
})

Resources