I ship a text file with all exported functions listed. To make sure, that all functions are listed, I would like to create a unit test via testthat and compare all exported function with the one in the text file. My current approach reads in the file and compares it with ls("package:myPackage"). But this call returns a long list of all functions of all imported packages. Any ideas how to solve this?
A complete different approach would be to generate this file automatically. But I think the first approach is easier to realise. Hopefully.
Thanks to #Emmanuel-Lin here is my solution:
require(data.table)
test_that("Function List", {
# read namespace and extract exported functions
funnamespace = data.table(read.table(system.file("NAMESPACE", package = "PackageName"), stringsAsFactors = FALSE))
funnamespace[, c("status", "fun") := tstrsplit(V1, "\\(")]
funnamespace[, fun := tstrsplit(fun, "\\)")]
# read function list
funlist = read.csv2(system.file("subdirectory", "functionList.txt", package = "PackageName"), stringsAsFactors = FALSE)
# test
expect_equal(funnamespace[status == "export", fun], funlist[, 1])
})
Obviously, I was to lazy to work out the correct regular expression to replace the two tstrsplit by one row.
Related
I need some suggestions on how to solve this problem. I have a number of zoo objects on which I want to perform a Causal Impact analysis in R, using the homonym package developed by Google. To automatize the process, I want to run a loop over the zoo objects and automatically save the results in a file to be exported in either word or csv.
So far, my solution has been to include the zoo objects into a zoo list by
zoolist<-list(ts1,
ts2,
ts3
)
and then run a for loop like:
for (i in zoolist)
{
experiment_impact<-CausalImpact(i,
pre.period,
post.period,
model.args = list(nseasons = 7, season.duration = 1))
summary(experiment_impact)
}
The code seems to work, however I don't have idea on how to export all the outputs in a csv or doc or whatever format, provided that it is compact and readable.
Any idea? Thank you for your help!
If the only thing you want to do is capture the summary, exactly as printed to the screen, you can use capture.output. Replace the second line in your loop with:
capture.output(summary(experiment_impact), file = 'example.txt', append = T)
A more elegant solution might be to use lapply to run the analysis on each item in the list, so that you end up with a list of output items:
resultList =
lapply(
zoolist,
CausalImpact,
pre.period,
post.period,
model.args = list(nseasons = 7, season.duration = 1)
)
You could then extract desired values from each of the CausalImpact objects in the list and format the values in a data.frame, which you could output using write.csv.
I have some R code with readr package that works well on a local computer - I use list.files to find files with a specific extension and then use readr to operate on those files found.
My question: I want to do something similar with files in AWS S3 and I am looking for some pointers on how to use my current R code to do the same.
Thanks in advance.
What I want:
Given AWS folder/file structure like this
- /folder1/subfolder1/quant.sf
- /folder1/subfolder2/quant.sf
- /folder1/subfolder3/quant.sf
and so on where every subfolder has the same file 'quant.sf', I would like to get a data frame which has the S3 paths and I want to use the R code shown below to operate on all the quant.sf files.
Below, I am showing R code that works currently with data on a Linux machine.
get_quants <- function(path1, ...) {
additionalPath = list(...)
suppressMessages(library(tximport))
suppressMessages(library(readr))
salmon_filepaths=file.path(path=path1,list.files(path1,recursive=TRUE, pattern="quant.sf"))
samples = data.frame(samples = gsub(".*?quant/salmon_(.*?)/quant.sf", "\\1", salmon_filepaths) )
row.names(samples)=samples[,1]
names(salmon_filepaths)=samples$samples
# IF no tx2Gene available, we will only get isoform level counts
salmon_tx_data = tximport(salmon_filepaths, type="salmon", txOut = TRUE)
## Get transcript count summarization
write.csv(as.data.frame(salmon_tx_data$counts), file = "tx_NumReads.csv")
## Get TPM
write.csv(as.data.frame(salmon_tx_data$abundance), file = "tx_TPM_Abundance.csv")
if(length(additionalPath > 0)) {
tx2geneFile = additionalPath[[1]]
my_tx2gene=read.csv(tx2geneFile,sep = "\t",stringsAsFactors = F, header=F)
salmon_tx2gene_data = tximport(salmon_filepaths, type="salmon", txOut = FALSE, tx2gene=my_tx2gene)
## Get Gene count summarization
write.csv(as.data.frame(salmon_tx2gene_data$counts), file = "tx2gene_NumReads.csv")
## Get TPM
write.csv(as.data.frame(salmon_tx2gene_data$abundance), file = "tx2gene_TPM_Abundance.csv")
}
}
I find it easiest to use the aws.s3 R package for this. In this case what you would do is use the s3read_using() and s3write_using() functions to save to and from S3. Like this:
library(aws.s3)
my_tx2gene=s3read_using(FUN=read.csv, object="[path_in_s3_to_file]",sep = "\t",stringsAsFactors = F, header=F)
It basically is a wrapper around whatever function you want to use for file input/output. Works great with read_json, saveRDS, or anything else!
I'm trying to update an xml file with new nodes using xml2. It's easy if I just write everything manually as text,
oldXML <- read_xml("<Root><Trial><Number>3.14159 </Number><Adjective>Fast </Adjective></Trial></Root>")
but I'm developing an application that will run calculations and then put those values into the xml, so I need a mix of character and variables. It ends up looking like:
var1 <- 4.567
var2 <- "Slow"
newLine <- read_xml(paste0("<Trial><Number>",var1," </Number><Adjective>",var2," </Adjective></Trial>"))
xml_add_child(oldXML,newLine)
I suspect there's a much less kludgy way to do this than using paste0, but I can't get anything else to work. I'd like to be able to just instruct it to update the xml by reference to the dataframe, such that it can create new trials:
<Trial>
<Number>df$number[1]</Number>
<Adjective>df$adjective[1]</Adjective>
</Trial>
<Trial>
<Number>df$number[2]</Number>
<Adjective>df$adjective[2]</Adjective>
</Trial>
Is there any way to create new Trial nodes in approximately that fashion, or at least more naturally than using paste0 to insert variables? Is this something the XML package does better than xml2?
If you have your new values in a data.frame like this:
vars <- data.frame(Number = c(4.567, 3.211),
Adjective = c("Slow", "Slow"),
stringsAsFactors = FALSE)
you can convert it to a list of xml_document's as follows:
vars_xml <- lapply(purrr::transpose(vars),
function(x) {
as_xml_document(list(Trial = lapply(x, as.list)))
})
Then you can add the new nodes to the original xml:
for(trial in vars_xml) xml_add_child(oldXML, trial)
I don't know that this is better than your paste approach. Either way, you can wrap it in a function so you only have to write the ugly code once.
Here's a solution that builds on #Ista's excellent answer. Basically, I've dropped the first lapply in favor of purrr::map (we could probably replace the second lapply with a map, but I couldn't find a more readable way to accomplish that).
library(purrr)
vars_xml <- transpose(vars) %>%
map(~as_xml_document(list(Trial = lapply(.x, as.list))))
I would like to tabulate how often a function is used in one or more R script files. I have found the function NCmisc::list.functions.in.file, and it is very close to what I want:
library(stringr)
cat("median(iris$Sepal.Length)\n median(iris$Sepal.Width)\n library(stringr); str_length(iris$Species) \n", file = "script.R")
list.functions.in.file("script.R")
package:base package:stats package:stringr
"library" "median" "str_length"
Note that median is used twice in the script, but list.functions.in.file does not use this information, and only lists each unique function. Are there any packages out there that can produce such frequencies? And bonus for the ability to analyze a corpus of multiple R scripts, not just a single file.
(note this is NOT about counting function calls, e.g. in recursion, and I want to avoid executing the scripts)
That NCmisc function is just a wrapper round utils::parse and utils::getParseData, so you can just make your own function (and then you don't need the dependency on NCmisc:
count.function.names <- function(file) {
data <- getParseData(parse(file=file))
function_names <- data$text[which(data$token=="SYMBOL_FUNCTION_CALL")]
occurrences <- data.frame(table(function_names))
result <- occurrences$Freq
names(result) <- occurrences$function_names
result
}
Should do what you want...
I've imported some data files with an unequal number of columns and was hoping to create a data frame out of them. I've use lapply to convert them into vectors, and now I'm trying to put these vectors into a data frame.
I'm using rbind.na from the package {qpcR} to try out and fill out the remaining elements of each vector with NA so they all become the same size. For some reason the function isn't being recognized by do.call. Can anyone figure out why this is the case?
library(plyr)
library(qpcR)
files <- list.files(path = "C:/documents", pattern = "*.txt", full.names = TRUE)
readdata <- function(x)
{
con <- file(x, open="rt")
mydata <- readLines(con, warn = FALSE, encoding = "UTF-8")
close(con)
return(mydata)
}
all.files <- lapply(files, readdata)
combine <- do.call(rbind.na, all.files)
If anyone has any potential alternatives they can think of I'm open to that too. I actually tried using a function from here but my output didn't give me any columns.
Here is the error:
Error in do.call(rbind.na, all.files) : object 'rbind.na' not found
The package has definitely been installed too.
EDIT: changed cbind.na to rbind.na for the error.
It appears that the function is not exported by the package. Using qpcR:::rbind.na will allow you to access the function.
The triple colon allows you to access the internal variables of a namespace. Be aware though that ?":::" advises against using it in your code, presumably because objects that aren't exported can't be relied upon in future versions of a package. It suggests contacting the package maintainer to export the object if it is stable and useful.