Print dataframe name in function output - r

I have a function that looks like this:
removeRows <- function(dataframe, rows.remove){
dataframe <- dataframe[-rows.remove,]
print(paste("The", paste0(rows.remove, "th"), "row was removed from", "xxxxxxx"))
}
I can use the function like this to remove the 5th row from the dataframe:
removeRows(mtcars, 5)
The function output this message:
"The 5th row was removed from xxxxxxx"
How can I replace xxxxxxx with the name of the dataframe I have used, so in this case mtcars?

You need to access the variable name in an unevaluated context. We can use substitute for this:
removeRows <- function(dataframe, rows.remove) {
df.name <- deparse(substitute(dataframe))
dataframe <- dataframe[rows.remove,]
print(paste("The", paste0(rows.remove, "th"), "row was removed from", df.name))
}
In fact, that is its main use; as per the documentation,
The typical use of substitute is to create informative labels for data sets and plots.

I would like to point out that df.name <- deparse(substitute(dataframe)) should be used at the top of your function before any transformation is done. I used it right at the end of my function, just before doing ggsave, which does not return the name but somehow what is inside the dataframe, which is not what you want. This gave me a lot of headache.
So something like this :
function(df){
df.name <- deparse(substitute(dataframe))
ggplot()
ggsave()
}

Related

How to get an R function to have a global effect on a dataframe?

I have been trying to create a function which will permanently change a value in specific cells of my data frame. I insert the data frame name, row index I wish to change, and the new name as a string. However, the function seems to change the value name within the local environment but not global.
The function is as follows:
#change name function
name_change <- function(df, row, name) {
df[row, 1] = name
return(df[row, 1])
}
E.g. if data frame was:
Name
Column B
Mark
2
Beth
4
The function name_change(df, 2, 'Jess') would change Beth to Jess.
When inserted as raw code it does permanently change the value. But then does not work when used as a function.
df[2, 1] = 'Jess'
Thanks in advance for your time
If you change your function like this:
name_change <- function(df, row, name) {
df[row, 1] = name
return(df)
}
and then assign the result of the function back to the original df, you will get the change you are looking for:
df = name_change(df,2,'Jess')
An alternative to the solutions already provided is to use the superassignment operator <<-. The ordinary assignment <- (or '=' you used) operate in your function's environment only. The superassignment reaches beyond your function's closure and can thus modify the dataframe residing in the global environment. Note, though, this is a quick'n'dirty fix only.
That said, the code would read like this:
#change name function
dirty_name_change <- function(df, row, name) {
df[row, 1] <<- name ## note the double arrow
}
You are returning the value of the cell, not the mutated df. R passes by arguments by value so you can imagine the function modifying a copy of df passed in. The solution is to return the mutated df and reassign it.
Can you pass-by-reference in R?

Convert R list to Pythonic list and output as a txt file

I'm trying to convert these lists like Python's list. I've used these codes
library(GenomicRanges)
library(data.table)
library(Repitools)
pcs_by_tile<-lapply(as.list(1:length(tiled_chr)) , function(x){
obj<-tileSplit[[as.character(x)]]
if(is.null(obj)){
return(0)
} else {
runs<-filtered_identical_seqs.gr[obj]
df <- annoGR2DF(runs)
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
#print(score)
return(score)
}
})
dt_text <- unlist(lapply(tiled_chr$score, paste, collapse=","))
writeLines(tiled_chr, paste0("x.txt"))
The following line of code iterates through each row of the DataFrame (only 2 columns) and splits them into the list. However, its output is different from what I desired.
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
But I wanted the following kinda output:
[20350, 20355], [20357, 20359], [20361, 20362], ........
If I understand your question correctly, using as.tuple from the package 'sets' might help. Here's what the code might look like
library(sets)
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
....
df_text = unlist(lapply(score, as.tuple),recursive = F)
This will return a list of tuples (and zeroes) that look more like what you are looking for. You can filter out the zeroes by checking the type of each element in the resulting list and removing the ones that match the type. For example, you could do something like this
df_text_trimmed <- df_text[!lapply(df_text, is.double)]
to get rid of all your zeroes
Edit: Now that I think about it, you probably don't even need to convert your dataframes to tuples if you don't want to. You just need to make sure to include the 'recursive = F' option when you unlist things to get a list of 0s and dataframes containing the numbers you want.

Assign dataframe name to a variable in a function

I have created a function and passing data frame as a parameter to the function. Now, I would like to take that data frame name as a string and store it into as a string variable.
Code used:
RFun <- function(a){
args=(commandArgs(TRUE))
l<<-80
h<<-85
fname<<-paste(a,"_Temp.csv")
a_R<-filter(a_RW,cs==2|cs==3)
a_R<-a_Rinse[-c(2,3)]
write.csv(a_R,file=fname,row.names=FALSE)
a_Rinse_Temperature_Deviations <- read.csv(paste("~/",fname"))
}
RFun(df)
From the above function when I try to execute it is creating numeric variables l and h with values which I have specified, but fname is creating for the complete data frame with rows and columns and it is not storing as I require here.
It is taking lot of time for execution as well.
Expected fname should be df_Temp.csv. Where df is the data frame.
Looks like assign(String varName , obj Value) might get you where you need to be.
RFun<-function(a){
args=(commandArgs(TRUE))
l<<-80
h<<-85
fname <<- "File_Name_Text"
assign (fname,paste(a,"_Temp.csv"))
a_R<-filter(a_RW,cs==2|cs==3)
a_R<-a_Rinse[-c(2,3)]
write.csv(a_R,file=fname,row.names=FALSE)
a_Rinse_Temperature_Deviations <- read.csv(paste("~/",fname))
}
It's hard to follow without a working example. But try to assign only the "name" of your df instead of the complete df. Try this:
fname <<- paste(deparse(substitute(a)),"_Temp.csv",sep="")

how to pass unknown argument to a function

I'm assigning a data frame to a variable name taken from a string.
So when I run the code I don't know what the variable name will be.
I want to pass that data frame to another function to plot it. How can I pass it to the function without knowing its name?
file_name <- file.choose()
fname <- unlist (strsplit (file_name, "\\", fixed = TRUE))
fname <- fname[length(fname)]
waf_no <- unlist (strsplit (fname, "\\s"))
waf_no <- waf_no[grep(waf_no, pattern="WAF")]
data <- read_WAF_file (file_name)
assign(waf_no, flux_calc(data)) #flux calc() calculates and manipulates the data frame
plot_waf(?)
my plot_waf function is very simple
plot_waf <- function (dataframe) {
library("ggplot2")
qplot(dist,n2o,data=dataframe,shape=treat)
}
The inverse for assign is get:
Search by name for an object (get) or zero or more objects (mget).
Therefore, you'll need to run your plot function like this:
plot_waf(get(waf_no))

R store output from lapply with multiple functions

I'm using lapply to loop through a list of dataframes and apply the same set of functions. This works fine when lapply has just one function, but I'm struggling to see how I store/print the output from multiple functions - in that case, I seem to only get output from one 'loop'.
So this:
output <- lapply(dflis,function(lismember) vss(ISEQData,n=9,rotate="oblimin",diagonal=F,fm="ml"))
works, while the following doesn't:
output <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
})
I think this dummy example is an analogue, so in other words:
nbs <- list(1==1,2==2,3==3,4==4)
nbsout <- lapply(nbs,function(x) length(x))
Gives me something I can access, while I can't see how to store output using the below (e.g. the attempt to use nbsout[[x]][2]):
nbs <- list(1==1,2==2,3==3,4==4)
nbsout <- lapply(nbs,function(x){
nbsout[[x]][1]<-typeof(x)
nbsout[[x]][2]<-length(x)
}
)
I'm using RStudio and will then be printing outputs/knitting html (where it makes sense to display the results from each dataset together, rather than each function-output for each dataset sequentially).
You should return a structure that include all your outputs. Better to return a named list. You can also return a data.frame if your outputs have all the same dimensions.
otutput <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
list(outputvss=outputvss,nefa=nefa)
## or data.frame(outputvss=outputvss,nefa=nefa)
})
When you return a data.frame you can use sapply that simply outputs the final result to a big data.frame. Or you can use the classical:
do.call(rbind,output)
to aggregate the result.
A function should always have an explicit return value, e.g.
output <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
#return value:
list(outputvss, nefa)
})
output is then a list of lists.

Resources