I have a question about convert variable name into strings to work as a x-axis name.
I tried to apply the deparse(substitute(input)), but unfortunately, it doesn't work well when I called this function within another function.
plot_CI <- function(input){
nm <- deparse(substitute(input))
if (substring(nm,1,1) == 'u') {
prior <- 'uniform'
} else if ((substring(nm,1,1) == 'l')) {
prior <- 'logit_Normal'
} else {
prior <- paste(strsplit(nm,"_")[[1]][1:2],collapse="_")
}
plot <- ggplot(temp_data, aes(x = x, y = mean)) +
geom_point(size = 2) +
geom_errorbar(aes(ymax = high, ymin = low)) +
geom_hline(yintercept = true_value, col = 'blue') +
labs(x=prior, y='value')
return(plot)
}
sen_plot <- function(variable){
# variable <- deparse(substitute(var))
file_name <- paste0('C:/Users/Qiangsuper/Dropbox/Papers/1/plot/sensitivity_', variable, '.png')
png(filename = file_name, width = 1000, height = 400)
p1 <- plot_CI(eval(parse(text = paste0('uniform_', variable))))
p2 <- plot_CI(eval(parse(text = paste0('logitN_', variable))))
multiplot(p1,p2,cols=2)
dev.off()
}
for (i in c("beta_1", "beta_2", "beta_3", "phi", "p", "delta")) {
sen_plot(i)
}
I expect 'uniform' as X-axis name, however, I only received eval(parse(text = paste0('Uniform_', variable))).
Thank you very much for your help.
UPDATE ABOUT QUESTION:
I think I should make the question more concise. Here is an easily-understood scenario.
uniform_beta_1 is a data frame or data table, which stores the my results. I try to develop an automatic plotting algorithm which will automatically identify which prior distribution I applied and name the X-axis with this prior distribution. For uniform_beta_1, the prior distribution is uniform, then the X-axis's name will be uniform. Here is what I try to do:
input <- uniform_beta_1
nm <- deparse(substitute(input))
Then apply the substring command to judge the prior distribution. However, in this case, nm will return 'input' rather than 'uniform_beta_1'. I am wondering if there is any way I can return 'uniform_beta_1'.
Thank you very much for your help.
The final solution for my question is that I create a list at very beginning and store all these strings into this list to avoid the transformation in the function.
Related
Suppose you have data:
df = data.frame(A = rep(1:10,each=10),B = rep(1:10,times=10),C = runif(100,0,1))
I've written a function that takes a column name as an argument:
plotFill<-function(dataframe,variable){
if(!(variable %in% names(dataframe))) stop("Variable not in data.frame")
plot = ggplot(data=dataframe,aes(x=A,y=B,z=get(variable))) +
geom_tile(aes(fill=get(variable)))
return(plot)
}
You can therefore run this doing: plotFill(df,"C")
I'm trying to label the legend with the name of the variable passed, but adding labs(colour=variable) doesn't work, which I think it should since variable is a string...
If its only about the label name, you could use plot$labels$fill:
plotFill<-function(dataframe,variable){
if(!(variable %in% names(dataframe))) stop("Variable not in data.frame")
plot = ggplot(data=dataframe,aes(x=A,y=B,z=get(variable))) +
geom_tile(aes(fill=get(variable)))
plot$labels$fill <- variable
return(plot)
}
You shouldn't use get here. Instead, use aes_string.
plotFill<-function(dataframe,variable){
if(!(variable %in% names(dataframe))) stop("Variable not in data.frame")
plot = ggplot(data=dataframe,aes_string(x="A",y="B",z=variable)) +
geom_tile(aes_string(fill=variable))
return(plot)
}
plotFill(df,"C")
I'm dealing with several outputs I obtain from QIIME, texts which I want to manipulate for obtaining boxplots. Every input is formatted in the same way, so the manipulation is always the same, but it changes the source name. For each input, I want to extract the last 5 rows, have a mean for each column/sample, associate the values to sample experimental labels (Group) taken from the mapfile and put them in the order I use for making a boxplot of all the 6 data obtained.
In bash, I do something like "for i in GG97 GG100 SILVA97 SILVA100 NCBI RDP; do cp ${i}/alpha/collated_alpha/chao1.txt alpha_tot/${i}_chao1.txt; done" to do a command various times changing the names in the code in an automatic way through ${i}.
I'm struggling to find a way to do the same with R. I thought creating a vector containing the names and then using a for cycle by moving the i with [1], [2] etc., but it doesn't work, it stops at the read.delim line not finding the file in the wd.
Here's the manipulation code I wrote. After the comment, it will repeat itself 6 times with the 6 databases I'm using (GG97 GG100 SILVA97 SILVA100 NCBI RDP).
PLUS, I repeat this process 4 times because I have 4 metrics to use (here I'm showing shannon, but I also have a copy of the code for chao1, observed_species and PD_whole_tree).
library(tidyverse)
library(labelled)
mapfile <- read.delim(file="mapfile_HC+BV.txt", check.names=FALSE);
mapfile <- mapfile[,c(1,4)]
colnames(mapfile) <- c("SampleID","Pathology_group")
#GG97
collated <- read.delim(file="alpha_diversity/GG97_shannon.txt", check.names=FALSE);
collated <- tail(collated,5); collated <- collated[,-c(1:3)]
collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]
labels <- t(mapfile)
colnames(collated_reorder) <- labels[2,]
mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
mean = as.matrix(mean); mean <- t(mean)
GG97_shannon <- as.data.frame(rbind(labels[2,],mean))
GG97_shannon <- t(GG97_shannon);
DB_type <- list(DB = "GG97"); DB_type <- rep(DB_type, 41)
GG97_shannon <- as.data.frame(cbind(DB_type,GG97_shannon))
colnames(GG97_shannon) <- c("DB","Group","value")
rm(collated,collated_reorder,DB_type,labels,mean)
Here I paste all the outputs together, freeze the order and make the boxplot.
alpha_shannon <- as.data.frame(rbind(GG97_shannon,GG100_shannon,SILVA97_shannon,SILVA100_shannon,NCBI_shannon,RDP_shannon))
rownames(alpha_shannon) <- NULL
rm(GG97_shannon,GG100_shannon,SILVA97_shannon,SILVA100_shannon,NCBI_shannon,RDP_shannon)
alpha_shannon$Group = factor(alpha_shannon$Group, unique(alpha_shannon$Group))
alpha_shannon$DB = factor(alpha_shannon$DB, unique(alpha_shannon$DB))
library(ggplot2)
ggplot(data = alpha_shannon) +
aes(x = DB, y = value, colour = Group) +
geom_boxplot()+
labs(title = 'Shannon',
x = 'Database',
y = 'Diversity') +
theme(legend.position = 'bottom')+
theme_grey(base_size = 16)
How do I keep this code "DRY" and don't need 146 rows of code to repeat the same things over and over? Thank you!!
You didn't provide a Minimal reproducible example, so this answer cannot guarantee correctness.
An important point to note is that you use rm(...), so this means some variables are only relevant within a certain scope. Therefore, encapsulate this scope into a function. This makes your code reusable and spares you the manual variable removal:
process <- function(file, DB){
# -> Use the function parameter `file` instead of a hardcoded filename
collated <- read.delim(file=file, check.names=FALSE);
collated <- tail(collated,5); collated <- collated[,-c(1:3)]
collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]
labels <- t(mapfile)
colnames(collated_reorder) <- labels[2,]
mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
mean = as.matrix(mean); mean <- t(mean)
# -> rename this variable to a more general name, e.g. `result`
result <- as.data.frame(rbind(labels[2,],mean))
result <- t(result);
# -> Use the function parameter `DB` instead of a hardcoded string
DB_type <- list(DB = DB); DB_type <- rep(DB_type, 41)
result <- as.data.frame(cbind(DB_type,result))
colnames(result) <- c("DB","Group","value")
# -> After the end of this function, the variables defined in this function
# vanish automatically, you just need to specify the result
return(result)
}
Now you can reuse that block:
GG97_shannon <- process(file = "alpha_diversity/GG97_shannon.txt", DB = "GG97")
GG100_shannon <- process(file =...., DB = ....)
SILVA97_shannon <- ...
SILVA100_shannon <- ...
NCBI_shannon <- ...
RDP_shannon <- ...
Alternatively, you can use looping structures:
General-purpose for:
datasets <- c("GG97_shannon", "GG100_shannon", "SILVA97_shannon",
"SILVA100_shannon", "NCBI_shannon", "RDP_shannon")
files <- c("alpha_diversity/GG97_shannon.txt", .....)
DBs <- c("GG97", ....)
result <- list()
for(i in seq_along(datasets)){
result[[datasets[i]]] <- process(files[i], DBs[i])
}
mapply, a "specialized for" for looping over several vectors in parallel:
# the first argument is the function from above, the other ones are given as arguments
# to our process(.) function
results <- mapply(process, files, DBs)
I am writing a function that feeds an extra argument to a function if certain condition is met otherwise leave that argument as empty.
The code below is an example that plots "Sepal.Length" and if fn_y is not NULL then the color argument will be feed into the function as well (i.e. split the scatter plot according to fn_y ).
fn_plotly <- function(fn_data, fn_x, fn_y){
if(is.null(fn_y)){
p <- plotly::plot_ly(data = fn_data, x = ~fn_data[[fn_x]],
type = "scatter")
} else {
p <- plotly::plot_ly(data = fn_data, x =~ fn_data[[fn_x]],
type = "scatter", color = fn_data[[fn_y]])
}
return(p)
}
fn_plotly(iris, "Sepal.Length", NULL)
fn_plotly(iris, "Sepal.Length", "Species")
The code above does work but I was wondering if there is any other way that could use pipe function %>% to write the code a bit shorter, i.e. something like this
plotly::plot_ly(data = fn_data, x =~ fn_data[[fn_x]],type="scatter") %>% ifelse(is.null(fn_y),"",color = fn_data[[fn_y]] )
I would like to use this functionality not only on plotly so please do not suggest me to use other plotting packages.
Are you aware that you can get the same result without any if then else?
See this:
fn_plotly<-function(fn_data,fn_x,fn_y){
p<-plotly::plot_ly(data = fn_data, x =~ fn_data[[fn_x]],type="scatter", color=fn_data[,fn_y])
return(p)
}
fn_plotly(iris,"Sepal.Length",NULL)
fn_plotly(iris,"Sepal.Length","Species")
I want to draw a hexbin plot with ggplot, but with log scale "pretty" breaks for the frequency. Consider
df = data.frame(a=rnorm(1000)); df$b <- df$a+rnorm(1000);
I used this answer to get pretty breaks on linear scale
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=pretty(..value.., n=10)))) +
scale_fill_discrete("Frequency")
This works. Now say I want to use log scale pretty breaks. So I used the idea from another answer to define
base_breaks <- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
and try to do
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=base_breaks(n=10)(..value..))))
but it is not able to find the function. It says:
Error in cut.default(value, breaks = base_breaks(n = 10)(value)) :
could not find function "base_breaks"
Even though base_breaks is defined.
> base_breaks(n=10)(c(1:1000))
[1] 1 5 10 50 100 500 1000
How can I make my function visible in whatever environment ggplot is calling it? I even defined it as a global variable with
base_breaks <<- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
but I still get the same error.
I am not sure about it, but you could try simplifying the function like this:
base_breaks <<- function(n = 10, x){
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
Maybe the problem is that you have a function whose result is another function, and that could be causing the error. With this aproach you would have the values more directly. Check it out!
I can't check it myself, since I get an error object 'value' not found...
I would like to change the facet labels, such that I have greek letters on the y axis and normal text on the x axis. The testing data is:
testdata<-data.frame(Method2=c("a","a","b","b"),
gamma=c(0,1,0,1),values=c(1,2,3,4),x=rep(1,4),y=rep(1,4))
testplot2<-ggplot(data=testdata,aes(x=Method2,y=gamma))
testplot2<-testplot2+facet_grid(gamma~Method2 )
testplot2+geom_point()
So far I have tried the following in many differnt constellations and I'm getting rather desperate:
1) To change the names in the data frame with the paste expression and I used label_parsed without much success.
gamma<- factor(sapply(testdata$gamma, function(n){
if(n %in% c("0")) paste(expression(gamma),"0") else
if(n %in% c("1")) paste(expression(gamma),"1")
}), levels=c(paste(expression(gamma),"0"),
paste(expression(gamma),"1")
))
testdata$gamma <- gamma
2) And I have tried to use a labeller with
my.label_bquote <- function (expr1 = (gamma == .(x)),expr2 = x)
{
quoted1<- substitute(expr1)
function(variable, value) {
value <- as.character(value)
browser()
if(variable == gamma)
lapply(value, function(x) eval(substitute(bquote(expr1, list(x = x)),list(expr1 = quoted1))))
else if(variable == Method2){
value[value=="a"] <- "Whatever"
value[value=="b"] <- "Whatever2"
}
return(value)
}
}
which is a changed form of a previous answer given to a similar question:
Facet labels involving a greek symbol
Would be grateful for any help!
How about we create a label swapper factory of sorts. Here's a generic function that will allow for renaming of levels of values in facet labels
get_label_swap <- function(...) {
dots<-list(...)
function(variable, value) {
if(variable %in% names(dots)) {
swaps <- dots[[variable]]
vals <- as.character(value)
lapply(vals, function(v) {if(v %in% names(swaps)) swaps[[v]] else v })
} else {
label_value(variable, value)
}
}
}
Then, to get one specific to your problem, we would do
label_swap <- get_label_swap(gamma=list("0"=expression(gamma*0),
"1"=expression(gamma*1)))
So the function looks at the named parameters and expects a list where the names of the list are the values of the factor, and the values in the list are what you want to replace them with. So only for the variable "gamma" will it swap "0" and "1" with the appropriate expressions. Then it returns a function we can pass as a labeller= parameter to ggplot. Thus we plot with
testplot2 <- ggplot(data=testdata,aes(x=Method2,y=gamma))
testplot2 <- testplot2+facet_grid(gamma~Method2, labeller=label_swap)
testplot2 + geom_point()
which results in
The newest version of ggplot no longer supports these types of labellers, here's a possible alternative solution using the new labeller_bquote function
ggplot(data=testdata,aes(x=Method2,y=gamma)) +
geom_point() +
facet_grid(gamma~Method2, labeller=label_bquote(rows=gamma*.(gamma)))