Does uproot4 not support tree.pandas.df()function anymore? - root

I used to retrieve pandas dataframe from ROOT file using tree.pandas.df() function in Uproot4(2 years ago). However, I got the below errors when I ran my code recently. Could anyone tell me what the problem is?
f = uproot.open(inputFile)
treeName = "myTreeName"
tree = f[treeName]
myDf = tree.pandas.df('branchName',entrystop=nEvent, flatten = False)
AttributeError: 'Model_TTree_v19' object has no attribute 'pandas'

In Uproot version 3, a special function named TTree.pandas.df created Pandas DataFrames.
In Uproot version 4 (and above), all of the functions that produce arrays have a library argument that specifies which library to use to represent the arrays. library="pd" makes Pandas DataFrames.
This change is described in the Uproot 3 → 4 cheat-sheet, the new argument is described in several places in the Getting Started Guide, as well as in all the reference documentation for array-fetching functions, such as TTree.arrays.

Related

Obtaining Metadata Information using ee_print function from RGEE

I am using the package RGEE (R wrapper for the Google Earth Engine Python API). The function ee_print() seems to work perfectly for an ImageCollection of just one variable, but seems to fail for ImageCollection with different variables where one needs to select the variable of interest. Any ideas on how to approach this issues with the latter kind of data.
Here's an example code:
GRIDMET = ee$ImageCollection("IDAHO_EPSCOR/GRIDMET")
ee_print(GRIDMET)
Where I get the following error message in return:
Error in strsplit(code, ":") : non-character argument
Have you considered the following approach?
GRIDMET = ee$ImageCollection("IDAHO_EPSCOR/GRIDMET")
print(GRIDMET, type = getOption("rgee.print.option"))
And play with the list of all metadata properties
GRIDMET$propertyNames()$getInfo()# Get a list of all metadata properties
(GRIDMET$get("product_tags")$getInfo()) # you can choose to show a characteristic like "product_tags"

ViSEAGO tutorial: visualising topGO object

Earlier, I had posted a question and was able to load in my data successfully and create a topGO object after some help. I'm trying to visualise GO terms that are significantly associated with the list of differentially expressed genes that I have from mouse RNA-seq data.
Now, I'd want to raise a concern about ViSEAGO's tutorial. The tutorial initially specifies loading two files: 'selection.txt' and 'background.txt'. The origin of these files is not clearly stated. However, after a lot of digging into topGO's documentation, I was able to find the datatypes for each of the files. But, even after following these, I have a problem running the following code. Does anyone have any insights to share?
WORKING CODE:
mysampleGOdata <- new("topGOdata",
description = "my Simple session",
ontology = "BP",
allGenes = geneList_new,
nodeSize = 1,
annot = annFUN.org,
mapping="org.Mm.eg.db",
ID = "SYMBOL")
resultFisher <- runTest(mysampleGOdata, algorithm = "classic", statistic = "fisher")
head(GenTable(mysampleGOdata,fisher=resultFisher),20)
myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
PROBLEMS:
> head(myNewBP,2)
GO.ID Term Annotated Significant Expected fisher
1 GO:0006006 glucose metabolic process 194 12 0.19 1.0e-19
2 GO:0019318 hexose metabolic process 223 12 0.22 5.7e-19
> ###################
> # merge results
> myBP_sResults<-ViSEAGO::merge_enrich_terms(
+ Input=list(
+ condition=c("mysampleGOdata","resultFisher")
+ )
+ )
Error in setnames(x, value) :
Can't assign 3 names to a 2 column data.table
> myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
> ###################
> # display the merged table
> ViSEAGO::show_table(myNewBP)
Error in ViSEAGO::show_table(myNewBP) :
object must be enrich_GO_terms, GO_SS, or GO_clusters class objects
According to the tutorial, the printed table contains for each enriched GO terms, additional columns including the list of significant genes and frequency (ratio of the number of significant genes to the number of background genes) evaluated by comparison. I think I have that, but it's definitely not working.
Can someone see why? I'm not very clear on this.
Thanks!
I think you try to circumvent an error you made at the beginning. You receive the error due to the fact that you did not use the wrapper function from the ViSEAGO package. As you stated in your last question, you had initial problems formatting your data.
Here are some tips:
The "selection" file is a character vector with your DEGs names or IDs. I recommend using EntrezID's.
The "Background" file is a character vector with known genes. I recommend using EntrezID's as well. You can easily generate this character vector with:
background=keys(org.Hs.eg.db, keytype ='ENTREZID').
With these two files, you can easily proceed to the next steps of the package as described in the vignette.
# connect to EntrezGene
EntrezGene<-ViSEAGO::EntrezGene2GO()
# load GO annotations from EntrezGene
# with the add of GO annotations from orthologs genes (see above)
#id = "9606" = homo sapiens
myGENE2GO<-ViSEAGO::annotate(id="9606", EntrezGene)
BP<-ViSEAGO::create_topGOdata(
geneSel = selection, #your DEG vector
allGenes = background, #your created background vector
gene2GO=myGENE2GO,
ont="BP",
nodeSize=5
)
classic<-topGO::runTest(
BP,
algorithm ="classic",
statistic = "fisher"
)
# merge results
BP_sResults<-ViSEAGO::merge_enrich_terms(
Input=list(
condition=c("BP","classic")
)
)
You should get a merged list of your enriched GO terms with the corresponding statistical tests you prefer.
I have faced this problem recently, it was very frustrating. In my case the whole issue seemed to be related to the package version I was using.
I used conda to install ViSEAGO. Nevertheless, R's version in my conda environment was a bit old (i.e. 3.6.1 to be specific). Therefore, when installing ViSEAGO with conda, the version 1.0.0 of the package was installed. Please note that the most recent version of ViSEAGO is 1.4.0.
Therefore, I created a conda environment with R version 4.0.3, and repeated the procedure to install ViSEAGO by using conda. When doing this, ViSEAGO's 1.4.0 version was installed, and everything went fine.
I've tried to backtrack the error, and only find one thing: in the older ViSEAGO version, the function Custom2GO loaded tables with 4 columns; in the most recent version it admits 5 columns (the new one being 'gene_symbol'). I think this disagreement might be part of the issue, as the source code of the function merge_enrich_terms seems to deal with the columns 'gene_id' and 'gene_symbol' at some point, but I'm not sure.
Hope you find my comment helpful!
Cheers,
Mauricio

Using datasets in an R package

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated
The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

How does the PACKAGE argument to .Call work?

.Call seems rather poorly documented; ?.Call gives an explanation of the PACKAGE argument:
PACKAGE: if supplied, confine the search for a character string .NAME to the DLL given by this argument (plus the conventional extension, ‘.so’, ‘.dll’, ...).
This argument follows ... and so its name cannot be abbreviated.
This is intended to add safety for packages, which can ensure by using this argument that no other package can override their external symbols, and also speeds up the search (see ‘Note’).
And in the Note:
If one of these functions is to be used frequently, do specify PACKAGE (to confine the search to a single DLL) or pass .NAME as one of the native symbol objects. Searching for symbols can take a long time, especially when many namespaces are loaded.
You may see PACKAGE = "base" for symbols linked into R. Do not use this in your own code: such symbols are not part of the API and may be changed without warning.
PACKAGE = "" used to be accepted (but was undocumented): it is now an error.
But there are no usage examples.
It's unclear how the PACKAGE argument works. For example, in answering this question, I thought the following should have worked, but it doesn't:
.Call(C_BinCount, x, breaks, TRUE, TRUE, PACKAGE = "graphics")
Instead this works:
.Call(graphics:::C_BinCount, x, breaks, TRUE, TRUE)
Is this simply because C_BinCount is unexported? I.e., if the internal code of hist.default had added PACKAGE = "graphics", this would have worked?
This seems simple but is really rare to find usage of this argument; none of the sources I found give more than passing mention (1, 2, 3, 4, 5)... Examples of this actually working would be appreciated (even if it's just citing code found in an existing package)
(for self-containment purposes, if you don't want to copy-paste code from the other question, here are x and breaks):
x = runif(100000000, 2.5, 2.6)
nB <- 99
delt <- 3/nB
fuzz <- 1e-7 * c(-delt, rep.int(delt, nB))
breaks <- seq(0, 3, by = delt) + fuzz
C_BinCount is an object of class "NativeSymbolInfo", rather than a character string naming a C-level function, hence PACKAGE (which "confine(s) the search for a character string .NAME") is not relevant. C_BinCount is made a symbol by its mention in useDynLib() in the graphics package NAMESPACE.
As an R symbol, C_BinCount's resolution is subject to the same rules as other symbols -- it's not exported from the NAMESPACE, so only accessible via graphics:::C_BinCount. And also, for that reason, off-limits for robust code development. Since the C entry point is imported as a symbol, it is not available as a character string, so .Call("C_BinCount", ...) will not work.
Using a NativeSymbolInfo object tells R where the C code is located, so there is no need to do so again via PACKAGE; the choice to use the symbol rather than character string is made by the package developer, and I think would generally be considered good practice. Many packages developed before the invention of NativeSymbolInfo use the PACKAGE argument, if I grep the Bioconductor source tree there are 4379 lines with .Call.*PACKAGE, e.g., here.
Additional information, including examples, is in Writing R Extensions section 1.5.4.

Kindly check the R command

I am doing following in Cooccur library in R.
> fb<-read.table("Fb6_peaks.bed")
> f1<-read.table("F16_peaks.bed")
everything is ok with the first two commands and I can also display the data:
> fb
> f1
But when I give the next command as given below
> explore_pairs(c("fb", "f1"))
I get an error message:
Error in sum(sapply(tf1_s, score_sample, tf2_hits = tf2_s, hit_list = hit_l)) :
invalid 'type' (list) of argument
Could anyone suggest something?
Despite promising to release a version to the Bioconductor depository in the article the authors published over a year ago, they have still not delivered. The gz file that is attached to the article is not of a form that my installation recognizes. Your really should be corresponding with the authors for this question.
The nature of the error message suggests that the function is expecting a different data class. You should be looking at the specification for the arguments in the help(explore_pairs) file. If it is expecting 2 matrices, then wrapping data.matrix around the arguments may solve the problem, but if it is expecting a class created by one of that packages functions then you need to take the necessary step to construct the right objects.
The help file for explore_pairs does exist (at least in the MAN directory) and says the first argument should be a character vector with further provisos:
\arguments{
\item{factornames}{an vector of character strings, each naming a GFF-like
data frame containing the binding profile of a DNA-binding factor.
There is also a load utility, load_GFF, which I assume is designed for creation of such files.
Try rename your data frame:
names(fb)=c("seq","start","end")
Check the example datasets. The column names are as above. I set the names and it worked.

Resources