What the function in PyGeoprocessing library that allow to reclassify raster values? - raster

I am trying to reclassify raster values using QGIS Reclassify by Table but it's falling. I came across PyGeoprocessing Library and apparently it is much faster than R reclassify function. However, I can't find documentation of this library nor tutorials anywhere.
Thanks for the help

PyGeoProcessing's reclassification function is called reclassify_raster, and can be called like so:
import pygeoprocessing
from osgeo import gdal
raster_path = 'the/path/to/your/input/raster.tif'
# You'll need to load your source and destination pixel values into a dictionary
# like this. Oftentimes, this will be loaded in from a table.
reclassification_map = {
1: 0.35
2: 0.6
}
pygeoprocessing.reclassify_raster(
(raster_path, 1), # Use band 1
reclassification_map,
'reclassified.tif',
gdal.GDT_Float32, # You could use any gdal.GDT_* datatype, this example just uses Float32
target_nodata=-1 # You could set the nodata value to whatever you wanted it to be
)
It's possible that the maintainers might release some API documentation, but until they do that, you can always just take a look at the function docstrings. `reclassify_raster`, for example, is found in this file: https://github.com/natcap/pygeoprocessing/blob/main/src/pygeoprocessing/geoprocessing.py

Related

Defining a Torch Class in R package "torch"

this post is related to my earlier How to define a Python Class which uses R code, but called from rTorch? .
I came across the torch package in R (https://torch.mlverse.org/docs/index.html) which allows to Define a DataSet class definition. Yet, I also need to be able to define a model class like class MyModelClass(torch.nn.Module) in Python. Is this possible in the torch package in R?
When I tried to do it with reticulate it did not work - there were conflicts like
ImportError: /User/homes/mreichstein/miniconda3/envs/r-torch/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZTINSt6thread6_StateE
It also would not make much sense, since torch isn't wrapping Python.
But it is loosing at lot of flexibility, which rTorch has (but see my problem in the upper post).
Thanks for any help!
Markus
You can do that directly using R's torch package which seems quite comprehensive at least for the basic tasks.
Neural networks
Here is an example of how to create nn.Sequential like this:
library(torch)
model <- nn_sequential(
nn_linear(D_in, H),
nn_relu(),
nn_linear(H, D_out)
)
Below is a custom nn_module (a.k.a. torch.nn.Module) which is a simple dense (torch.nn.Linear) layer (source):
library(torch)
# creates example tensors. x requires_grad = TRUE tells that
# we are going to take derivatives over it.
dense <- nn_module(
clasname = "dense",
# the initialize function tuns whenever we instantiate the model
initialize = function(in_features, out_features) {
# just for you to see when this function is called
cat("Calling initialize!")
# we use nn_parameter to indicate that those tensors are special
# and should be treated as parameters by `nn_module`.
self$w <- nn_parameter(torch_randn(in_features, out_features))
self$b <- nn_parameter(torch_zeros(out_features))
},
# this function is called whenever we call our model on input.
forward = function(x) {
cat("Calling forward!")
torch_mm(x, self$w) + self$b
}
)
model <- dense(3, 1)
Another example, using torch.nn.Linear layers to create a neural network this time (source):
two_layer_net <- nn_module(
"two_layer_net",
initialize = function(D_in, H, D_out) {
self$linear1 <- nn_linear(D_in, H)
self$linear2 <- nn_linear(H, D_out)
},
forward = function(x) {
x %>%
self$linear1() %>%
nnf_relu() %>%
self$linear2()
}
)
Also there are other resources like here (using flow control and weight sharing).
Other
Looking at the reference it seems most of the layers are already provided (didn't notice transformer layers at a quick glance, but this is minor).
As far as I can tell basic blocks for neural networks, their training etc. are in-place (even JIT so sharing between languages should be possible).

ViSEAGO tutorial: visualising topGO object

Earlier, I had posted a question and was able to load in my data successfully and create a topGO object after some help. I'm trying to visualise GO terms that are significantly associated with the list of differentially expressed genes that I have from mouse RNA-seq data.
Now, I'd want to raise a concern about ViSEAGO's tutorial. The tutorial initially specifies loading two files: 'selection.txt' and 'background.txt'. The origin of these files is not clearly stated. However, after a lot of digging into topGO's documentation, I was able to find the datatypes for each of the files. But, even after following these, I have a problem running the following code. Does anyone have any insights to share?
WORKING CODE:
mysampleGOdata <- new("topGOdata",
description = "my Simple session",
ontology = "BP",
allGenes = geneList_new,
nodeSize = 1,
annot = annFUN.org,
mapping="org.Mm.eg.db",
ID = "SYMBOL")
resultFisher <- runTest(mysampleGOdata, algorithm = "classic", statistic = "fisher")
head(GenTable(mysampleGOdata,fisher=resultFisher),20)
myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
PROBLEMS:
> head(myNewBP,2)
GO.ID Term Annotated Significant Expected fisher
1 GO:0006006 glucose metabolic process 194 12 0.19 1.0e-19
2 GO:0019318 hexose metabolic process 223 12 0.22 5.7e-19
> ###################
> # merge results
> myBP_sResults<-ViSEAGO::merge_enrich_terms(
+ Input=list(
+ condition=c("mysampleGOdata","resultFisher")
+ )
+ )
Error in setnames(x, value) :
Can't assign 3 names to a 2 column data.table
> myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
> ###################
> # display the merged table
> ViSEAGO::show_table(myNewBP)
Error in ViSEAGO::show_table(myNewBP) :
object must be enrich_GO_terms, GO_SS, or GO_clusters class objects
According to the tutorial, the printed table contains for each enriched GO terms, additional columns including the list of significant genes and frequency (ratio of the number of significant genes to the number of background genes) evaluated by comparison. I think I have that, but it's definitely not working.
Can someone see why? I'm not very clear on this.
Thanks!
I think you try to circumvent an error you made at the beginning. You receive the error due to the fact that you did not use the wrapper function from the ViSEAGO package. As you stated in your last question, you had initial problems formatting your data.
Here are some tips:
The "selection" file is a character vector with your DEGs names or IDs. I recommend using EntrezID's.
The "Background" file is a character vector with known genes. I recommend using EntrezID's as well. You can easily generate this character vector with:
background=keys(org.Hs.eg.db, keytype ='ENTREZID').
With these two files, you can easily proceed to the next steps of the package as described in the vignette.
# connect to EntrezGene
EntrezGene<-ViSEAGO::EntrezGene2GO()
# load GO annotations from EntrezGene
# with the add of GO annotations from orthologs genes (see above)
#id = "9606" = homo sapiens
myGENE2GO<-ViSEAGO::annotate(id="9606", EntrezGene)
BP<-ViSEAGO::create_topGOdata(
geneSel = selection, #your DEG vector
allGenes = background, #your created background vector
gene2GO=myGENE2GO,
ont="BP",
nodeSize=5
)
classic<-topGO::runTest(
BP,
algorithm ="classic",
statistic = "fisher"
)
# merge results
BP_sResults<-ViSEAGO::merge_enrich_terms(
Input=list(
condition=c("BP","classic")
)
)
You should get a merged list of your enriched GO terms with the corresponding statistical tests you prefer.
I have faced this problem recently, it was very frustrating. In my case the whole issue seemed to be related to the package version I was using.
I used conda to install ViSEAGO. Nevertheless, R's version in my conda environment was a bit old (i.e. 3.6.1 to be specific). Therefore, when installing ViSEAGO with conda, the version 1.0.0 of the package was installed. Please note that the most recent version of ViSEAGO is 1.4.0.
Therefore, I created a conda environment with R version 4.0.3, and repeated the procedure to install ViSEAGO by using conda. When doing this, ViSEAGO's 1.4.0 version was installed, and everything went fine.
I've tried to backtrack the error, and only find one thing: in the older ViSEAGO version, the function Custom2GO loaded tables with 4 columns; in the most recent version it admits 5 columns (the new one being 'gene_symbol'). I think this disagreement might be part of the issue, as the source code of the function merge_enrich_terms seems to deal with the columns 'gene_id' and 'gene_symbol' at some point, but I'm not sure.
Hope you find my comment helpful!
Cheers,
Mauricio

Walsh-Hadamard Transform in r

I search for a command to compute Walsh-Hadamard Transform of an image in R, but I don't find anything. In MATLAB fwht use for this. this command implement Walsh-Hadamard Tranform to each row of matrix. Can anyone introduce a similar way to compute Walsh-Hadamard on rows or columns of Matrix in R?
I find a package here:
http://www2.uaem.mx/r-mirror/web/packages/boolfun/boolfun.pdf
But why this package is not available when I want to install it?
Packages that are not maintained get put in the Archive. They get put there when that aren't updated to match changing requirements or start making errors with changing R code base. https://cran.r-project.org/web/packages/boolfun/index.html
It's possible that you might be able to extract useful code from the archive version, despite the relatively ancient version of R that package was written under.
The R code for walshTransform calls an object code routine:
walshTransform <- function ( truthTable ) # /!\ should check truthTable values are in {0,1}
{
len <- log(length(truthTable),base=2)
if( len != round(len) )
stop("bad truth table length")
res <- .Call( "walshTransform",
as.integer(truthTable),
as.integer(len))
res
}
Installing the package succeeded on my Mac, but would require the appropriate toolchain on whatever OS you are working in.

Using datasets in an R package

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated
The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

Reading circular read mapping (BAM) into R with readGAlignments()

I am trying to create a circular genome map in R using the ggbio package. I am new to ggbio and related packages like GenomicAlignments and GenomicRanges.
I exported my read mapping as a BAM file (with associated index file) and tried to use readGAlignmentsFromBam() to read in the file.
myreads <- readGAlignmentsFromBam("final.assembly", index = "final.assembly", use.names = TRUE)
But I always get
Warning message:
In GenomicRanges:::valid.GenomicRanges.seqinfo(x) :
GAlignments object contains 12006 out-of-bound ranges located on sequence
Consensus. Note that only ranges located on a non-circular sequence whose
length is not NA can be considered out-of-bound (use seqlengths() and
isCircular() to get the lengths and circularity flags of the underlying
sequences).
Which makes sense - it's a circular chromosome, so some reads will be outside of the "linear" reference sequence. The question is, how do I fix it? I've attempted adding isCircular = c(TRUE) as an argument, but that did not help. It would seem there is a flag somewhere (in the BAM file? in the R code?) that should be set which isn't, but I can't figure out where.
Apologies for not having a reproducible example, but this is a huge BAM file and I am not familiar enough with the file type to mock up the data.

Resources