So i have a wrapper function that contains a lot of sub functions. Rather than write out arguments for all the potential arguments for each sub functions in the wrapper. I want to use ... (dots) to allow them to pass through any number of arguments to change the behaviour of sub functions if necessary.
The problem is that several functions may wish to make use of different arguments ... and i keep getting unused argument errors.
So I've tried to use do.call and to update the formals outputs with matches from ... see below code for function
elipRead <- function( type, path, ...){
if(type == "csv"){
ar <- list(...)
args <- formals(readr::read_csv)
args$file <- path
args[which(names(args) %in% names(ar))] <- ar[na.omit(match(names(args), names(ar)))]
out <- do.call(readr::read_csv, args = args)
} else {
ar <- list(...)
args <- formals(readxl::read_xlsx)
args$path <- path
args[which(names(args) %in% names(ar))] <- ar[na.omit(match(names(args), names(ar)))]
out <- do.call(readxl::read_xlsx, args)
}
return(out)
}
However, despite checking the args list is updating correctly i still get errors
csv <-"csv_Filename.csv"
test1 <- elipRead("csv", paste0(getwd(),csv), sheet = "Sheet1" , col_names = FALSE)
# Error in default_locale() : could not find function "default_locale"
xlsx <-"xlsx_Filename.xlsx"
test2 <- elipRead("xlsx", paste0(getwd(),xlsx), sheet = "Sheet1", col_names = TRUE)
# Error: `guess_max` must be a positive integer
for the xlsx attempt the error is in the guess_max default where it cannot find
n_max object. I assume this is to do with do.call envir and n_max not being in the parent environment. For the csv issue again its an issue of not being able to find the default_local() function.
Error in check_non_negative_integer(guess_max, "guess_max") :
object 'n_max' not found
6.
check_non_negative_integer(guess_max, "guess_max")
5.
check_guess_max(guess_max)
4.
read_excel_(path = path, sheet = sheet, range = range, col_names = col_names,
col_types = col_types, na = na, trim_ws = trim_ws, skip = skip,
n_max = n_max, guess_max = guess_max, progress = progress,
.name_repair = .name_repair, format = "xlsx")
3.
(function (path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max), progress = readxl_progress(),
.name_repair = "unique") ...
2.
do.call(readxl::read_xlsx, args)
1.
elipRead("xlsx", paste0(add, xlsx), sheet = "Sheet1", col_names = TRUE)
In the end there are three potential answers i'm hoping for:
1 recommendations of changes to my current code to ensure the do.call function works.
2 An alternative method for using ... to only pass the relevant arguments from the ... dots list to a function.
3 An completely different approach for passing arguments from a wrapper to internal functions.
Related
I'm working on functions that can take the chracter string argument GSE_expt. I have written 4 separate functions which take the argument GSE_expt and produce the output that I am able to save as a variable in the R environment.
The code block below has 2 of those functions. I use paste0 function with the variable GSE_expt to create a file name that the here and rio packages can use to import the file.
# Extracting metadata from 2 different sources and combining them into a single file
extract_metadata <- function(GSE_expt){
GSE_expt <- deparse(substitute(GSE_expt)) # make sure it is a character string
metadata_1 <- rnaseq_metadata_allsamples %>% # subset a larger metadata file
as_tibble %>%
dplyr::filter(GSE == GSE_expt)
# metadata from ENA imported using rio and here packages
metadata_2 <- import(here("metadata", "rnaseq", paste0(GSE_expt, ".txt"))) %>%
as_tibble %>%
select("run_accession","library_layout", "library_strategy","library_source","read_count", "base_count", "sample_alias", "fastq_md5")
metadata <- full_join(metadata_1, metadata_2, by = c("Run"="run_accession"))
return(metadata)
}
# Extracting coverage stats obtained from samtools
clean_extract_coverage <- function(GSE_expt){
coverage <- read_tsv(file = here("results","rnaseq","2022-01-11", "coverage", paste0("coverage_stats_", deparse(substitute(GSE_expt)), "_percent.txt")), col_names = FALSE)
coverage <- data.frame("Run" = coverage$X1[c(TRUE, FALSE)],
"stats" = coverage$X1[c(FALSE, TRUE)])
coverage <- separate(coverage, stats, into = c("num_reads", "covered_bases", "coverage_percent"), convert = TRUE)
return(coverage)
}
The functions work fine on their own individually when I use GSE118008 as the variable for the argument GSE_expt.
I am trying to create a nested/combined function so that I can run GSE118008 on both (or more) functions at the same time and save the output as a list.
When I ran a nested/combined function,
extract_coverage_metadata <- function(GSE_expt){
coverage <- clean_extract_coverage(GSE_expt)
metadata <- extract_metadata(GSE_expt)
return(metadata)
}
extract_coverage_metadata(GSE118008)
This is the error message I got.
Error: 'results/rnaseq/2022-01-11/coverage/coverage_stats_GSE_expt_percent.txt' does not exist.
Rather than creating a filename
coverage_stats_GSE118008_percent.txt
(which it does fine with the individual function), it is unable to do so in this combined function, and instead returns the filename coverage_stats_GSE_expt_percent.txt
Traceback
8. stop("'", path, "' does not exist", if (!is_absolute_path(path)) { paste0(" in current working directory ('", getwd(), "')") }, ".", call. = FALSE)
7. check_path(path)
6. (function (path, write = FALSE) { if (is.raw(path)) { return(rawConnection(path, "rb")) ...
5. vroom_(file, delim = delim %||% col_types$delim, col_names = col_names, col_types = col_types, id = id, skip = skip, col_select = col_select, name_repair = .name_repair, na = na, quote = quote, trim_ws = trim_ws, escape_double = escape_double, escape_backslash = escape_backslash, ...
4. vroom::vroom(file, delim = "\t", col_names = col_names, col_types = col_types, col_select = { { col_select ...
3. read_tsv(file = here("results", "rnaseq", "2022-01-11", "coverage", paste0("coverage_stats_", deparse(substitute(GSE_expt)), "_percent.txt")), col_names = FALSE) at rnaseq_functions.R#30
2. clean_extract_coverage(GSE_expt)
1. extract_coverage_metadata(GSE118008)
I would appreciate any recommendations on how to solve this.
Thanks in advance!
Husain
I need to add 2 columns to a list of csv files and then write the csv's again into a folder. So, what I did is I used llply.
data_files <- list.files(pattern= ".csv$", recursive = T, full.names = F)
x <- llply(data_files, read.csv, header = T)
y <- llply(x, within, Cf <- var1 * 8)
z <- llply(y, within, Pc <- Cf + 1)
When I tried to write the files again using write.table in a loop:
lapply(z, FUN = function(eachPath) {
b <- read.csv(eachPath, header = F)
write.table(b, file = eachPath, row.names = F, col.names = T, quote = F)
})
I get this error and I think it is because z is a list of lists.
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection
What I think it needs to be done is to convert z in a list of dataframes. I would like and advise of how to do that, plus adding a command to extract the name of each file from a column containing the sample ID.
Thanks
Please accept my apologies as I am new to R. The following code is used to process multiple files in one go and placing the output in a separate folder.
require(highfrequency)
require(quantmod)
require(readxl)
input_files1=list("file_path1.xlsx","file_path2.xlsx","file_path3.xlsx") #making list of file paths
for(i in length(input_files1))
{
bid_df<-read_excel(input_files1[i], sheet = 1, col_names = TRUE, col_types = NULL, na = "", skip = 0)
#read_excel takes file path as first argument
ask_df<-read_excel(input_files1[i], sheet = 2, col_names = TRUE, col_types = NULL, na = "", skip = 0)
trade_df<-read_excel(input_files1[i], sheet = 3, col_names = TRUE, col_types = NULL, na = "", skip = 0)
qdata_df <- merge(ask_df, bid_df, by = "TIMESTAMP")
qdata_xts_raw<-xts(qdata_df[,-1], order.by=qdata_df[,1])
qdata_xts_m<-mergeQuotesSameTimestamp(qdata_xts_raw, selection = "median")
trade_xts_raw <- xts(trade_df[,-1], order.by=trade_df[,1])
trade_xts_m<-mergeTradesSameTimestamp(trade_xts_raw, selection = "median")
tqdata=matchTradesQuotes(trade_xts_m,qdata_xts_m)
quoted_spread<-tqLiquidity(tqdata,trade_xts_m,qdata_xts_m,type="qs")
qs_30<-aggregatets(quoted_spread,FUN="mean",on="minutes",k=30)
indexTZ(qs_30) <- "UTC"
write.csv(qs_30, file = file.path("output_file_path", paste0("CAN_out", i)))
}
When the code is run, it gives the following error
Error in file.exists(path) : invalid 'file' argument
Please help in removing the error and running the code.
Access elements of the list in the path as input_files1[[i]]
I would be delighted and most grateful if anyone can explain to me why I am having a problem exporting some data from a function which extracts coefficients from a linear model. I have hundreds to do so I’m hoping to build a loop to handle it but have fallen at an earlier hurdle.
I am using methods borrowed from someone much smarter at this stuff:
https://stat.ethz.ch/pipermail/r-sig-ecology/2008-May/000062.html
The relevant bits (data creation, the function and finally my attempt to export my data are below) but firstly I will mention that the data, “export”, is exported to the experiment.csv file as a single COLUMN. I am told that the Append property of the write.table function only works with rows. Consequently it overwrites previous runs of the same sets of commands rather than successfully appending it.
The error messages are of the form below: (they are all the same, one for each piece of information).
Warning messages:
1: In write.csv(export, file = "experiment.csv", append = TRUE, quote = TRUE, :
attempt to set 'append' ignored
#DATA CREATION
# create an empty list
mod <- list()
# start a loop for create 5 objects of class 'lm'
for (i in 1:5) {
x <- rnorm(i*10)
y <- rnorm(i*10)
mod[[paste("run",i,sep="")]] <- lm(y ~ x)
}
# FUNCTION TO EXTRACT DATA
myFun <-
function(lm)
{
out <- c(lm$coefficients[1],
lm$coefficients[2],
length(lm$model$y),
summary(lm)$coefficients[2,2],
pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
summary(lm)$fstatistic[3], lower.tail = FALSE),
summary(lm)$r.squared)
names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared")
return(out)}
# FAILED ATTEMPT TO EXPORT
export <-myFun(mod$run1)
write.csv(export, file = "experiment.csv", append = TRUE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")
Hi guys I wrote a function that basically splits a large data.frame into two parts and then computes some calculations on each sub-data.frame. After such calculations it sums the computations result from each sub-data.frame row by row (the string: z = x+y).
A piece of the function:
Myfun <- function(fileName,
check.names=FALSE, header = FALSE,
stringsAsFactor = FALSE,
sep = "\t",...){
Data <- read.delim(fileName,
header = header,
check.names = check.names,
stringsAsFactor = stringsAsFactor, sep = sep, ...)
newdata_a <- Data[which(Data[,1]==1), ]
newdata_b <- Data[ which(Data[,1]==-1), ]
...............
...............
z = x+y
return(tryCatch((z), error=function(e) NULL))
}
I apply this function on a set of data.frames using the following piece of code:
source("Myfun.R")
files <- list.files(pattern = ".txt")
files = mixedsort(sort(files))
for (i in 1: length(files)){
b <- lapply(files, Myfun)}
The problem is that for some data.frames x and y have different length due to the nature of the data.frame and this is the reason why the following error message occurs:
Error in Ops.data.frame(x, y) :
+ only defined for equally-sized data frames
Calls: lapply -> FUN -> Ops.data.frame
Execution halted
To overcome this problem I introduce the string: return(tryCatch((z), error=function(e) NULL)) into Myfun to allow R to go on and ignore the error, but the script stops anyway.