Extracting the column names in R with lapply - r

So here is my code
h <- lapply(select(winedata, -quality), function(variable){
return(ggplot(aes(x = variable), data = winedata) +
geom_histogram(bins = 30) + xlab(variable))})
There is one problem, and that is xlab(variable) displays the value of the first column as the x axis title, if I choose variable[2] it displays the value of the second column as the x axis title. How do I get it to put the column names as the x axis title. names(variable) does not seem to work

You can use Map:
library(ggplot2)
library(dplyr)
Map(function(var, names){
return(ggplot(iris, aes(x = var)) +
geom_histogram(bins = 30) + xlab(names))},
select(iris, -Species), names(iris)[1:4])
Map is essentially mapply with SIMPLIFY=FALSE, which takes multiple inputs and returns a list.

Related

R: Programmatically changing ggplot scale labels to Greek letters with expressions

I am trying to change the labels in a ggplot object to Greek symbols for an arbitrary number of labels. Thanks to this post, I can do this manually when I know the number of labels in advance and the number is not too large:
# Simulate data
df <- data.frame(name = rep(c("alpha1","alpha2"), 50),
value = rnorm(100))
# Create a plot with greek letters for labels
ggplot(df, aes(x = value, y = name)) + geom_density() +
scale_y_discrete(labels = c("alpha1" = expression(alpha[1]),
"alpha2" = expression(alpha[2])))
For our purposes, assume I need to change k default labels, where each of the k labels is the pre-fix "alpha" followed by a number 1:k. Their corresponding updated labels would substitute the greek letter for "alpha" and use a subscript. An example of this is below:
# default labels
paste0("alpha", 1:k)
# desired labels
for (i in 1:k) { expression(alpha[i]) }
I was able to hack together the below programmatic solution that appears to produce the desired result thanks to this post:
ggplot(df, aes(x = value, y = name)) + geom_density() +
scale_y_discrete(labels = parse(text = paste("alpha[", 1:length(unique(df)), "]")))
However, I do not understand this code and am seeking clarification about:
What is parse() doing here that expression() otherwise would do?
While I understand everything to the right-hand side of =, what is text doing on the left-hand side of the =?
Another option to achieve your desired result would be to add a new column to your data which contains the ?plotmath expression as a string and map this new column on y. Afterwards you could use scales::label_parse() to parse the expressions:
set.seed(123)
df <- data.frame(name = rep(c("alpha1","alpha2"), 50),
value = rnorm(100))
df$label <- gsub("^(.*?)(\\d+)$", "\\1[\\2]", df$name)
library(ggplot2)
library(scales)
ggplot(df, aes(x = value, y = label)) + geom_density() +
scale_y_discrete(labels = scales::label_parse())

how to loop a geographic mapping function over a list of dataframes (or a subsetted dataframe)

I have a dataframe consisting of species names, longitude and latitude coordinates. there are 115 different species with 25000 lat/long coordinates. I need to make individual maps that show observations for each specific species.
first, I created a function that would generate the kind of map that I want, called platmaps. when I call the function for my full dataset (platmaps(df1)), it creates a map displaying all lat long observations.
Then I constructed a for loop which was supposed to subset my df by species name, and insert that subsetted dataframe into my platmaps function. It runs for a couple of minutes and then nothing happens.
so I then I split the dataframe by species name, and created a list of dataframes(out1), and used lapply(out1, platmaps) but it only returned a list of the names of my dfs.
Then I tried a variation of an example that I saw here, but it also did not work.
function
platmaps<-function(df1){
wm <- wm <- borders("world", colour="gray50", fill="gray50")
ggplot()+
coord_fixed()+
wm +
geom_point(data =df1 , aes(x = decimalLongitude, y = decimalLatitude),
colour = "pink", size = 0.5)
subset
for(i in 1:nrow(PP)){
query<-paste(PP$species[i])
p<-subset(df1, df1$species== query))
platmaps(p)
}
list
for (i in 1:length(out1)){
pp<-out1[[i]]
platmaps(pp)
}
applied example
p =
wm <- wm <- borders("world", colour="gray50", fill="gray50")
ggplot()+
coord_fixed()+
wm +
geom_point(data =df1 , aes(x = decimalLongitude, y = decimalLatitude),
colour = "pink", size = 0.5)
plots = df1 %>%
group_by(species) %>%
do(plots = p %+% . + facet_wrap(~species))
the error for the applied example is:
Error: Cannot add ggproto objects together. Did you forget to add this
object to a ggplot object?
As I'm new to R (and coding), I assume I'm getting the syntax wrong, or am not applying my function correctly to/within either of my loops, or I fundamentally misunderstand the way looping works.
data frame sample
species decimalLongitude decimalLatitude
Platanthera lacera -71.90000 42.80000
Platanthera lacera -90.54861 40.12083
Platanthera lacera -71.00889 42.15500
Platanthera lacera -93.20833 45.20028
Platanthera lacera -72.45833 41.91666
Platanthera bifolia 5.19800 59.64310
Platanthera sparsiflora -117.67472 34.36278
fixed platmaps function
ggplot(data=df1 %>% filter(species == s))+
coord_fixed()+
borders("world", colour="gray50", fill="gray50")+
geom_point(aes(x = decimalLongitude, y = decimalLatitude),
colour = "pink", size = 0.5)+
labs(title=as.character(s))
Because you didn't provide a test data set, let me give you a general idea how to make multiple plots you can inspect later. The code below will plot a parameter for a number of countries and save plot pdfs to a given path. You can replace the code behind the pl variable in the loop with your function.
library(ggplot2)
library(dplyr)
df <- data.frame(country = c(rep('USA',20), rep('Canada',20), rep('Mexico',20)),
wave = c(1:20, 1:20, 1:20),
par = c(1:20 + 5*runif(20), 21:40 + 10*runif(20), 1:20 + 15*runif(20)))
countries <- unique(df$country)
plot_list <- list()
i <- 1
for (c in countries){
pl <- ggplot(data = df %>% filter(country == c)) +
geom_point(aes(wave, par), size = 3, color = 'red') +
labs(title = as.character(c), x = 'wave', y = 'value') +
theme_bw(base_size = 16)
plot_list[[i]] <- pl
i <- i + 1
}
pdf('path/to/pdf')
pdf.options(width = 9, height = 7)
for (i in 1:length(plot_list)){
print(plot_list[[i]])
}
dev.off()
After the plots are obtained (the plot_list variable), we turn on the pdf terminal and print them. In the end, we turn off the pdf terminal.
there is a neat way to apply any function to a list of items. I have outlined a way to do this with the data you added. I cannot get platmaps to work so I have just made a scatter plot.
The method is to split your data frame into individual subsets using split() and then apply the plotting function to the resulting list using lapply(). Since lapply() returns a list, this can be passed directly to a function such as ggpubr::ggarrange() for visualizing.
library(ggplot2)
plot_function <- function(x){
p <- ggplot(x, aes(x = decimalLongitude, y = decimalLatitude)) + geom_point()
p
}
plot_list <-
df %>%
split(.$species) %>% # Separate df into subset dfs based on species column
lapply(., plot_function) # map plot_function to list
# Display on a grid (many ways to do this - I just find this package simple)
ggpubr::ggarrange(plotlist = plot_list)

R - Reorder a bar plot in a function using ggplot2

I have the following plot function using ggplot2.
Function_Plot <- function(Fun_Data, Fun_Color)
{
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
The result is :
I need to upgrade my function to reorder the bar according frequencies of the words (in descending order). As I see the answer for another question about reordering, I try to introduce reorder function in the aes_string but it doesn't work.
A reproducible example :
a <- c("G1","G1","G1","G1","G1","G1","G1","G1","G1","G1","G2","G2","G2","G2","G2","G2","G2","G2")
b <- c("happy","sad","happy","bravery","bravery","God","sad","happy","freedom","happy","freedom",
"God","sad","happy","freedom",NA,"money","sad")
MyData <- data.frame(Cluster = a, Word = b)
MyColor <- c("red","blue")
Function_Plot(Fun_Data = MyData, Fun_Color = MyColor)
Well, if reordering doesn't work inside aes_string, let's try it beforehand.
Function_Plot <- function(Fun_Data, Fun_Color)
{
Fun_Data[[2]] <- reorder(Fun_Data[[2]], Fun_Data[[2]], length)
MyPlot <- ggplot(data = na.omit(Fun_Data), aes_string(x = colnames(Fun_Data[2]), fill = colnames(Fun_Data[1]))) +
geom_bar(stat = "count") +
coord_flip() +
scale_fill_manual(values = Fun_Color)
return(MyPlot)
}
Function_Plot()
Couple other notes - I'd recommend you use a more consistent style, mixing whether or not use use _ to separate words in variable names is confusing and asking for bugs.
It won't matter much unless your data is really big, but extracting names from a data frame is very efficient, whereas subsetting a data frame is less efficient. Your code subsets a data frame and then extracts the column names remaining, e.g., colnames(Fun_Data[1]). It will be cleaner to extract the names and then subset that vector: colnames(Fun_Data)[1]

Loops, dataframes and ggplot

I would like to display multiple plots on the same page using ggplot, and the multiplot function described here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/. My data is stored in a large dataframe with the first column corresponding to period. I want to visualize columns 2:26. My issue is reproducible using:
rawdata1 <- data.frame("Period" = 1:34, "Sample" = sample(x = c(1,2),34, replace = TRUE),"Runif" = runif(n = 34))
Intuitively, I would use the following code: (With 2:3 replaced with 2:26)
out <- NULL
for (i in 2:3){
out[[i-1]] <- ggplot(rawdata1, aes("Period", y = value)) + geom_line(aes(x = Period, y = rawdata1[[i]])) + ggtitle(label = colnames(rawdata1)[i])
}
multiplot(plotlist = out, cols = 2)
This succeeds in plotting multiple graphs, however my problem is that each graph that is plotted uses data from the same column (column 3 in the above example, column 26 in my dataset). I've puzzled out that this is because my "out" list stores the ggplot list with the y values stored dynamically.
i's final value is 26, and when I call an item from "out", it uses the current value for i to create the graph. So every graph displays using the same column. As I am new to R, so my guess is that I am not managing my variables correctly. Any help would be appreciated
Below you find an alternative: using the melt function from reshape2 and then faceting with facet_wrap.
require(ggplot2)
require(reshape2)
data.melt <- melt(rawdata1, id.var='Period')
ggplot(data.melt, aes(Period, value)) +
geom_line() +
facet_wrap(~variable, scales='free_y')
If you want to use multiplot instead, you could do the following:
out <- lapply(names(rawdata1)[-1],
function(index) ggplot(rawdata1) +
geom_line(aes_string(x = 'Period', y = index)) +
ggtitle(label = index))
multiplot(plotlist = out, cols = 2)

How to specify columns in facet_grid OR how to change labels in facet_wrap

I have a large number of data series that I want to plot using small multiples. A combination of ggplot2 and facet_wrap does what I want, typically resulting a nice little block of 6 x 6 facets. Here's a simpler version:
The problem is that I don't have adequate control over the labels in facet strips. The names of the columns in the data frame are short and I want to keep them that way, but I want the labels in the facets to be more descriptive. I can use facet_grid so that I can take advantage of the labeller function but then there seems to be no straightforward way to specify the number of columns and a long row of facets just doesn't work for this particular task. Am I missing something obvious?
Q. How can I change the facet labels when using facet_wrap without changing the column names? Alternatively, how can I specify the number of columns and rows when using facet_grid?
Code for a simplified example follows. In real life I am dealing with multiple groups each containing dozens of data series, each of which changes frequently, so any solution would have to be automated rather than relying on manually assigning values.
require(ggplot2)
require(reshape)
# Random data with short column names
set.seed(123)
myrows <- 30
mydf <- data.frame(date = seq(as.Date('2012-01-01'), by = "day", length.out = myrows),
aa = runif(myrows, min=1, max=2),
bb = runif(myrows, min=1, max=2),
cc = runif(myrows, min=1, max=2),
dd = runif(myrows, min=1, max=2),
ee = runif(myrows, min=1, max=2),
ff = runif(myrows, min=1, max=2))
# Plot using facet wrap - we want to specify the columns
# and the rows and this works just fine, we have a little block
# of 2 columns and 3 rows
mydf <- melt(mydf, id = c('date'))
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variable, ncol = 2)
print (p1)
# Problem: we want more descriptive labels without changing column names.
# We can change the labels, but doing so requires us to
# switch from facet_wrap to facet_grid
# However, in facet_grid we can't specify the columns and rows...
mf_labeller <- function(var, value){ # lifted bodily from the R Cookbook
value <- as.character(value)
if (var=="variable") {
value[value=="aa"] <- "A long label"
value[value=="bb"] <- "B Partners"
value[value=="cc"] <- "CC Inc."
value[value=="dd"] <- "DD Company"
value[value=="ee"] <- "Eeeeeek!"
value[value=="ff"] <- "Final"
}
return(value)
}
p2 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_grid( ~ variable, labeller = mf_labeller)
print (p2)
I don't quite understand. You've already written a function that converts your short labels to long, descriptive labels. What is wrong with simply adding a new column and using facet_wrap on that column instead?
mydf <- melt(mydf, id = c('date'))
mydf$variableLab <- mf_labeller('variable',mydf$variable)
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variableLab, ncol = 2)
print (p1)
To change the label names, just change the factor levels of the factor you use in facet_wrap. These will be used in facet_wrap on the strips. You can use a similar setup as you would using the labeller function in facet_grid. Just do something like:
new_labels = sapply(levels(df$factor_variable), custom_labeller_function)
df$factor_variable = factor(df$factor_variable, levels = new_labels)
Now you can use factor_variable in facet_wrap.
Just add labeller = label_wrap_gen(width = 25, multi_line = TRUE) to the facet_wrap() arguments.
Eg.: ... + facet_wrap( ~ variable, ,labeller = label_wrap_gen(width = 25, multi_line = TRUE))
More info: ?ggplot2::label_wrap_gen
Simply add labeller = label_both to the facet_wrap() arguments.
... + facet_wrap( ~ variable, labeller = label_both)

Resources