Dynamic variable names in plots, files and compatibility with loop - r

I am trying to write a function that makes a plot and saves it into a file automatically.
The trick I struggle with it to do both dynamically [plotname=varname & filename=varname &],
and to make it compatible with calling it from a loop.
# Create data
my_df = cbind(uni=runif (100),norm=rnorm (100),bino=rbinom(100,20, 0.5)); head (my_df)
my_vec = my_df[,'uni'];
# How to make plot and file-name meaningful if you call the variable in a loop?
# if you call by name, the plotname is telling. It is similar what I would like to see.
hist(my_df[,'bino'])
for (plotit in colnames(my_df)) {
hist(my_df[,plotit])
print (plotit)
# this is already not meaningful
}
# step 2 write it into files
hist_auto <- function(variable, col ="gold1", ...) {
if ( length (variable) > 0 ) {
plotname = paste(substitute(variable), sep="", collapse = "_"); print (plotname); is (plotname)
# I would like to define plotname, and later tune it according to my needs
FnP = paste (getwd(),'/',plotname, '.hist.pdf', collapse = "", sep=""); print (FnP)
hist (variable, main = plotname)
#this is apparently not working: I do not get my_df[, "bino"] or anything similar
dev.copy2pdf (file=FnP )
} else { print ("var empty") }
}
hist_auto (my_vec)
# name works, and is meaningful [as much as the var name ... ]
hist_auto (my_df[,'bino'])
# name sort of works, but falls apart
assign (plotit, my_df[,'bino'])
hist_auto (get(plotit))
# name works, but meaningless
# Now in a loop
for (plotit in colnames(my_df)) {
my_df[,plotit]
hist(my_df[,plotit])
## name works, but meaningless and NOT UNIQUE > overwritten by next
}
for (plotit in colnames(my_df)) {
hist_auto(my_df[,plotit])
## name works, but meaningless and NOT UNIQUE > overwritten by next
}
for (plotit in colnames(my_df)) {
assign (plotit, my_df[,plotit])
hist_auto (get(plotit))
## name works, but meaningless and NOT UNIQUE > overwritten by next
}
My aim is to have a function that iterates over eg. columns of a matrix, plots and saves each with a unique and meaningful name.
The solution will probably involve a smart combination of substitute() parse() eval() and paste (), but lacking solid understanding I failed to figure out.
My basis of experimentation was:
how to dynamically call a variable?

How about something like this? You may need to install.packages("ggplot2")
library(ggplot2)
my_df <- data.frame(uni=runif(100),
norm=rnorm(100),
bino=rbinom(100, 20, 0.5))
get_histogram <- function(df, varname, binwidth=1, save=T) {
stopifnot(varname %in% names(df))
title <- sprintf("Histogram of %s", varname)
p <- (ggplot(df, aes_string(x=varname)) +
geom_histogram(binwidth=binwidth) +
ggtitle(title))
if(save) {
filename <- sprintf("histogram_%s.png", gsub(" ", "_", varname))
ggsave(filename, p, width=10, height=8)
}
return(p)
}
for(var in names(my_df))
get_histogram(my_df, var, binwidth=0.5) # If you want to save them
get_histogram(my_df, "uni", binwidth=0.1, save=F) # If you want to look at a specific one

So I ended up with 2 functions, one that can iterate over data frames, and another that takes a single vectors. Using parts of Adrian's [thanks!] solution:
hist_dataframe <- function(variable, col ="gold1", ...) {
stopifnot(colName %in% colnames(df))
variable = df[,colName]
stopifnot(length (variable) >1 )
plotname = paste(substitute(df),'__', colName, sep="")
FnP = paste (getwd(),'/',plotname, '.hist.pdf', collapse = "", sep=""); print (FnP)
hist (variable, main = plotname)
dev.copy2pdf (file=FnP )
}
And the one for simple vectors stays as in Q.

Related

R : How to create objects with a function which name and value depend on an argument, and that these objects are found in the global environment?

I have the following situation: I have different dataframes, I would like to be able, for each dataframe, to create 2 dataframes according to the value of one of the columns (log2FoldChange>1 and logFoldChange<-1).
For this I use the following code:
DJ29_T0_Overexpr = DJ29_T0[which(DJ29_T0$log2FoldChange > 1),]
DJ29_T0_Underexpr = DJ29_T0[which(DJ21_T0$log2FoldChange < -1),]
DJ229_T0 being one of my dataframe.
First problem: the sign for the dataframe where log2FoldChange < -1 is not taken into account.
But the main problem is at the time of making the function, I wrote the following:
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),])
assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),])
}
Which I then ran with :
spliteOverUnder(DJ29_T0)
No error message, but my objects are not exported in my global environment. I tried with return(paste(nm,"_Overexpr", sep="") but it only returns the object name but not the associated dataframe.
Using paste() forces the use of assign(), so I can't do :
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
paste(nm,"_Overexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > 1),]
paste(nm,"_Underexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) < -1),]
}
spliteOverUnder(DJ24_T0)
I encounter the following error:
Error in paste(nm, "_Overexpr", sep = "") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > :
could not find function "paste<-"
If you've encountered this difficulty before, I'd appreciate a little help.
And if you knew, once the function works, how to use a For loop going through a list containing all my dataframes to apply this function to each of them, I'm also a taker.
Thanks
When assigning, use the pos argument to hoist the new objects out of the function.
function(){
assign(x = ..., value = ...,
pos = 1 ## see below
)
}
... where 0 = the function's local environment, 1 = the environment next up (in which the function is defined) etc.
edit
A general function to create the split dataframes in your global environment follows. However, you might rather want to save the new dataframes (from within the function) or just forward them to downstream functions than cram your workspace with intermediary objects.
splitOverUnder <- function(the_name_of_the_frame){
df <- get(the_name_of_the_frame)
df$cat <- cut(df$log2FoldChange,
breaks = c(-Inf, -1, 1, Inf),
labels = c('underexpr', 'normal', 'overexpr')
)
split_data <- split(df, df$cat)
sapply(c('underexpr', 'overexpr'),
function(n){
new_df_name <- paste(the_name_of_the_frame, n, sep = '_')
assign(x = new_df_name,
value = split_data$n,
envir = .GlobalEnv
)
}
)
}
## say, df1 and df2 are your initial dataframes to split:
sapply(c('df1', 'df2'), function(n) splitOverUnder(n))

Cant manipulate global/local variables inside a function in R

dna = c("A","G","C","T")
x =sample(dna,50,replace =TRUE)
dna_f = function(x){
dnastring <- ""
for (val in x){
paste(dnastring,val,sep="")
}
return(dnastring)
}
dna_f(x)
I'm trying to produce a single string that contains all the randomly sampled letters. x contains all 50 letters and im trying to combine them into one string using the paste function. but when i run this, the output is an empty string. I tried placing dnastring as a global variable because i thought maybe the scope of a function operates differently in R(I'm new to R) but i got the same output. some help would be appreciated thanks.
You don't need for loop here. Try paste with collapse argument.
dna_f = function(x){
paste0(x, collapse = '')
}
dna_f(x)
#[1] "CCTACCAACCCTTTCTAGCCCACTATGCATCACAACTGCGGTCTCATCAC"
You forgot the dnastring <-
dna = c("A","G","C","T")
x =sample(dna,50,replace =TRUE)
dna_f = function(x){
dnastring <- ""
for (val in x){
dnastring <- paste(dnastring,val,sep="")
}
return(dnastring)
}
Output:
> dna_f(x)
[1] "GGTCTGGCCGAACTACTGTACACCCCAAAGACAACGCCCCCGACGCTCTA"

Function to abbreviate scientific names

Could you please help me?
I'm trying to modify an R function written by a colleague. This function receives a character vector with scientific names (Latin binomes), just like this one:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
Then, it should abbreviate the scientific names, using only the first three letters of the genus (first term) and the epithet (second term) to make a short code. For instance, Cerradomys scotti should become Cersco.
This is the original function:
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
vector
}
With a simple list like that one, the function works perfectly. However, when the list has some missing or extra elements, it does not work. The loop stops when it meets the first row that does not match the pattern. Let's take this more complex list as an example:
Name
Cerradomys scotti
Oligoryzomys sp
Philander frenatus
Byrsonima sp
Campomanesia adamantium
Cecropia pachystachya
Cecropia sp
Erythroxylum sp
Ficus sp
Leandra aurea
Morfosp1
Vismia cf brasiliensis
See that Morfosp1 has only 1 term. And Vismia cf brasiliensis has an additional term (cf) in the middle.
I've tried adapting the function, for instance, this way:
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)) {
vector[i] <- if(splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2]))) {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
vector
}
Nevertheless, it does not work. I get this error message:
Error in if (splitnames[[i]][2] == "^sp" & is.na(splitnames[[i]][2])) { :
valor ausente onde TRUE/FALSE necessário
How could I make the function:
Deal also with names that have only 1 term?
Expected outcome: Morfosp1 -> Morfosp1 (stays the same)
Deal also with names that have an additional term in the middle?
Expected outcome: Vismia cf brasiliensis -> Visbra (term in the middle is ignored)
Thank you very much!
Something like this is pretty concise:
test <- c("Cerradomys scotti", "Oligoryzomys sp", "Latingstuff", "Latin staff more")
# function to truncate a given name
trunc_str <- function(latin_name) {
# split it on a space
name_split <- unlist(strsplit(latin_name, " ", fixed = TRUE))
# if one name, just return it
if (length(name_split) == 1) return(name_split)
# truncate to first 3 letters
name_trunc <- substr(name_split, 1, 3)
# paste the first and last term together (skipping any middle ones)
paste0(head(name_trunc, 1), tail(name_trunc, 1))
}
# iterate over all
vapply(test, trunc_str, "")
# Cerradomys scotti Oligoryzomys sp Latingstuff Latin staff more
# "Cersco" "Olisp" "Latingstuff" "Latmor"
If you don't want a named vector output, you can use USE.NAMES = FALSE in vapply(). Or feel free to use a loop here.
AbbreviatedNames <- function(vector) {
abbreviations <- character(length = length(vector))
splitnames <- strsplit(vector, " ")
for (i in 1:length(vector)){
# One name
if(length(splitnames[[i]])==1){
vector[i] <- paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
# Two names
else if(length(splitnames[[i]])==2){
vector[i] <- if(splitnames[[i]][2] == "^sp") {
paste(substr(splitnames[[i]][1],1,3),
splitnames[[i]][2], sep = "")
}
else {
paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][2],1,3), sep = "")
}
}
# Three names
else if(length(splitnames[[i]])==3){
vector[i] <- paste(substr(splitnames[[i]][1],1,3),
substr(splitnames[[i]][3],1,3), sep = "")
# Assuming that the unwanted word is always in the middle
}
}
return(vector)
}
I tested on the list you gave and it seems to work, tell me if you need a more general code
Thank you very much for the help, Ricardo and Adam! I've made the code available on GitHub to other people who work with interaction networks, and need to abbreviate scientific names to be used in graphs.

Delete data frame column within function

I have the following code:
df<- iris
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
for (z in column.D) {
x[[z]] <- NULL #with this part I wanted to delete the above selected columns
}
}
}
columnFunction(df)
So how is it possible to address data.frame columns "dynamically" so: x[[z]] <- NULL should translate to:
df$Species <- NULL
df[["Species"]] <- NULL
df[,"Species"] <- NULL
and that for every selected column in every data.frame chosen for the function.
Well does anyone know how to archive something like that? I tried several things like with the paste command or sprintf, deparse but i didnt get it working. I also tied to address the data.frame as a global variable by using <<- but didn`t help, too. (Well its the first time i even heard about that). It looks like i miss the right method transferring x and z to the variable assignment.
If you want to create a function columnFunction that removes columns from a passed data frame df, all you need to do is pass the data frame to the function, return the modified version of df, and replace df with the result:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)

Entering variables into regression function

I have this feature_list that contains several possible values, say "A", "B", "C" etc. And there is time in time_list.
So I will have a loop where I will want to go through each of these different values and put it in a formula.
something like for(i in ...) and then my_feature <- feature_list[i] and my_time <- time_list[i].
Then I put the time and the chosen feature to a dataframe to be used for regression
feature_list<- c("GPRS")
time_list<-c("time")
calc<-0
feature_dim <- length(feature_list)
time_dim <- length(time_list)
data <- read.csv("data.csv", header = TRUE, sep = ";")
result <- matrix(nrow=0, ncol=5)
errors<-matrix(nrow=0, ncol=3)
for(i in 1:feature_dim) {
my_feature <- feature_list[i]
my_time <- time_list[i]
fitdata <- data.frame(data[my_feature], data[my_time])
for(j in 1:60) {
my_b <- 0.0001 * (2^j)
for(k in 1:60) {
my_c <- 0.0001 * (2^k)
cat("Feature: ", my_feature, "\t")
cat("b: ", my_b, "\t")
cat("c: ", my_c, "\n")
err <- try(nlsfit <- nls(GPRS ~ 53E5*exp(-1*b*exp(-1*c*time)), data=fitdata, start=list(b=my_b, c=my_c)), silent=TRUE)
calc<-calc+1
if(class(err) == "try-error") {
next
}
else {
coefs<-coef(nlsfit)
ess<-deviance(nlsfit)
result<-rbind(result, c(coefs[1], coefs[2], ess, my_b, my_c))
}
}
}
}
Now in the nls() call I want to be able to call my_feature instead of just "A" or "B" or something and then to the next one on the list. But I get an error there. What am I doing wrong?
You can use paste to create a string version of your formula including the variable name you want, then use either as.formula or formula functions to convert this to a formula to pass to nls.
as.formula(paste(my_feature, "~ 53E5*exp(-1*b*exp(-1*c*time))"))
Another option is to use the bquote function to insert the variable names into a function call, then eval the function call.
I worked with R a while ago, maybe you can give this a try:
What you want is create a formula with a list of variables right?
so if the response variable is the first element of your list and the others are the explanatory variables you could create your formula this way:
my_feature[0] ~ reduce("+",my_feature[1:]) . This might work.
this way you can create formulae that depends on the variables in my_features.

Resources