ViSEAGO tutorial: visualising topGO object

ViSEAGO tutorial: visualising topGO object - r

Earlier, I had posted a question and was able to load in my data successfully and create a topGO object after some help. I'm trying to visualise GO terms that are significantly associated with the list of differentially expressed genes that I have from mouse RNA-seq data.
Now, I'd want to raise a concern about ViSEAGO's tutorial. The tutorial initially specifies loading two files: 'selection.txt' and 'background.txt'. The origin of these files is not clearly stated. However, after a lot of digging into topGO's documentation, I was able to find the datatypes for each of the files. But, even after following these, I have a problem running the following code. Does anyone have any insights to share?
WORKING CODE:
mysampleGOdata <- new("topGOdata",
description = "my Simple session",
ontology = "BP",
allGenes = geneList_new,
nodeSize = 1,
annot = annFUN.org,
mapping="org.Mm.eg.db",
ID = "SYMBOL")
resultFisher <- runTest(mysampleGOdata, algorithm = "classic", statistic = "fisher")
head(GenTable(mysampleGOdata,fisher=resultFisher),20)
myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
PROBLEMS:
> head(myNewBP,2)
GO.ID Term Annotated Significant Expected fisher
1 GO:0006006 glucose metabolic process 194 12 0.19 1.0e-19
2 GO:0019318 hexose metabolic process 223 12 0.22 5.7e-19
> ###################
> # merge results
> myBP_sResults<-ViSEAGO::merge_enrich_terms(
+ Input=list(
+ condition=c("mysampleGOdata","resultFisher")
+ )
+ )
Error in setnames(x, value) :
Can't assign 3 names to a 2 column data.table
> myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
> ###################
> # display the merged table
> ViSEAGO::show_table(myNewBP)
Error in ViSEAGO::show_table(myNewBP) :
object must be enrich_GO_terms, GO_SS, or GO_clusters class objects
According to the tutorial, the printed table contains for each enriched GO terms, additional columns including the list of significant genes and frequency (ratio of the number of significant genes to the number of background genes) evaluated by comparison. I think I have that, but it's definitely not working.
Can someone see why? I'm not very clear on this.
Thanks!

I think you try to circumvent an error you made at the beginning. You receive the error due to the fact that you did not use the wrapper function from the ViSEAGO package. As you stated in your last question, you had initial problems formatting your data.
Here are some tips:
The "selection" file is a character vector with your DEGs names or IDs. I recommend using EntrezID's.
The "Background" file is a character vector with known genes. I recommend using EntrezID's as well. You can easily generate this character vector with:
background=keys(org.Hs.eg.db, keytype ='ENTREZID').
With these two files, you can easily proceed to the next steps of the package as described in the vignette.
# connect to EntrezGene
EntrezGene<-ViSEAGO::EntrezGene2GO()
# load GO annotations from EntrezGene
# with the add of GO annotations from orthologs genes (see above)
#id = "9606" = homo sapiens
myGENE2GO<-ViSEAGO::annotate(id="9606", EntrezGene)
BP<-ViSEAGO::create_topGOdata(
geneSel = selection, #your DEG vector
allGenes = background, #your created background vector
gene2GO=myGENE2GO,
ont="BP",
nodeSize=5
)
classic<-topGO::runTest(
BP,
algorithm ="classic",
statistic = "fisher"
)
# merge results
BP_sResults<-ViSEAGO::merge_enrich_terms(
Input=list(
condition=c("BP","classic")
)
)
You should get a merged list of your enriched GO terms with the corresponding statistical tests you prefer.

I have faced this problem recently, it was very frustrating. In my case the whole issue seemed to be related to the package version I was using.
I used conda to install ViSEAGO. Nevertheless, R's version in my conda environment was a bit old (i.e. 3.6.1 to be specific). Therefore, when installing ViSEAGO with conda, the version 1.0.0 of the package was installed. Please note that the most recent version of ViSEAGO is 1.4.0.
Therefore, I created a conda environment with R version 4.0.3, and repeated the procedure to install ViSEAGO by using conda. When doing this, ViSEAGO's 1.4.0 version was installed, and everything went fine.
I've tried to backtrack the error, and only find one thing: in the older ViSEAGO version, the function Custom2GO loaded tables with 4 columns; in the most recent version it admits 5 columns (the new one being 'gene_symbol'). I think this disagreement might be part of the issue, as the source code of the function merge_enrich_terms seems to deal with the columns 'gene_id' and 'gene_symbol' at some point, but I'm not sure.
Hope you find my comment helpful!
Cheers,
Mauricio

Related

Index error when running maxnet function (maxnet package)

I use the maxnet function (maxnet package) as one of the model algorithms in an ensemble model. Sometimes, the code executes without an error. Other times, it gives me the error message you see below. I am working on a windows 10 Pro (R version 3.6.1, Rstudio version 1.2.5042).
Code:
dm.Maxent <- maxnet(p = train$species, data = train[-train$species],
maxnet.formula(p = train$species,
data = train[-train$species],
classes = "default"))
Error:
Error in intI(j, n = x#Dim[2], dn[[2]], give.dn = FALSE) :
index larger than maximal 185
train is a dataframe with 621 rows (one row for every occurrence/absence point), and 29 columns (28 columns containing variables and 1 column "species" that indicates presence or absence of the species (0/1)).

I am having the same issue. It is unpredictable, since for several species it ran fine, then out of a sudden it stopped.
I found a response on this link: https://github.com/jamiemkass/ENMeval/issues/62
In the new version of maxnet (check the Github repo, as it looks like the CRAN version gas not been updated yet), there is a new argument "addsamplestobackground". When set to TRUE, it solves some of these errors. Currently, you will have to use install_github to reinstall maxnet to use this argument. Once you do, install_github to get the dev branch version of ENMeval (v2), which will implement this by default. Hopefully that fixes these problems.
I reinstalled maxnet from github :
install.packages("remotes")
remotes::install_github("mrmaxent/maxnet")
and set addsamplestobackground = T Maybe this would help you.

Conditionally split dataframe in rows according to value in cell (in R!)

I have a data frame as follows. I'm putting a single row here because it is enough for the example.
df <- structure(list(issue_url = "https://api.github.com/repos/ropensci/software-review/issues/357",
user = "kauedesousa", body = "Dear #cvitolo thank you so much for your comments and suggestions. It helped a lot! We have worked in incorporating the comments to *chirps* v0.0.6 https://github.com/agrobioinfoservices/chirps. Here we have included a point-to-point response to your comments:\r\n\r\n# general comments\r\n\r\n>README, it seems the code in your README file is only visualised but not executed. In this case, you could keep the README.md and remove README.Rmd (as this is basically redundant). \r\n\r\nThe file README.Rmd was removed\r\n\r\n> Your R folder contains a file called sysdata.rda. Is there a reason to keep these data there? Usually data should be placed under the root folder in a subfolder called data. I would suggest to move sysdata under the chirps/data folder, document the datasets (documentation is currently missing for both tapajos and tapajos_geom) and load the data using data(\"sysdata\") or chirps::tapajos. If you make this change, please change the call to chirps:::tapajo in get_esi() accordingly. \r\n\r\nThe sf polygon is exported as 'tapajos', the sf POINT object is not necessary and can be generated in the examples with sf. Also, functions dataframe_to_geojson and sf_to_geojson are exported since they had the same issue using ::: in the examples and could be useful for the users\r\n\r\n> the man folder contains a figure folder. Is this good practice? I thought the man folder should be used only for documentation. Would it make sense to move the figure folder under inst, for instance? \r\n\r\nThe /man structure is part of using the pre-built vignettes. We not aware of any issues with it. CRAN hasn't baulked at it and Adam (one of the co-authors) just submitted two package updates in the last month that use this method\r\n\r\n#\ttests folder:\r\n\r\n> each of your test files contain a call to library(chirps). This is superflous because you load the package in testthat.R and that should suffice. My suggestion is to load all the packages needed by the tests in testthat.R and remove the commands library(package_name) in the individual test files.\r\n\r\nDONE\r\n\r\n> When you call library() sometimes you use library(package_name), other times library(\"package_name\"). I would suggest to consistently use the latter. \r\n\r\nDONE\r\n\r\n> test-get_chirps.R: you only test that you get the right object class but do not test the returned values, is there a reason for that? In my experience, it is very important to test the correctness of the actual data and I would suggest you to develop tests for that. \r\n\r\nWe have updated the tests so it checks if the functions return the correct values. For this we downloaded the data from ClimateSERV and compared it with the ones retrieved by get_chirps, get_esi and precip_indices to validate it. The tests still have a skip_on_cran option as a CRAN policy. But we have opened an issue in the package repo and will keep it there until we figure out how to make 'vcr' works with 'chirps' https://github.com/agrobioinfoservices/chirps/issues/7\r\n\r\n>test-get_esi.R: as above, you only test that you get the right object class but do not test the returned values. I would suggest to also test the data values. \r\n\r\nSame as above\r\n\r\n> test-precip_indicesl.R: the file name seems to contain an extra \"l\", should that not be test-precip_indices.R? you only test the dimensions of the output data frame but not the values themselves. As, above, I would suggest to also test the data values. \r\n\r\nDONE\r\n\r\n# vignettes:\r\n>The file Overview.Rmd.orig seems redundant and can be removed. \r\n\r\nWe use this file to speed up the vignette creation, here Jeroen Ooms shows how it works https://ropensci.org/technotes/2019/12/08/precompute-vignettes/\r\n\r\n>It's not necessary to run twice the command precip_indices(dat, timeseries = TRUE, intervals = 15), the second one can be removed.\r\n\r\nDONE\r\n\r\n>When running the command get_chirps, I get the following warning message: In st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle = endCapStyle,: st_buffer does not correctly buffer longitude/latitude data. Can this warning be eliminated? Maybe adding a note in the documentation? \r\n\r\nThese warning messages comes from 'sf', but we don't know if is a good practice to suppress that. We added a note to the documentation\r\n\r\n>When running the command get_chirps, I get the following warning message: Warning messages: 1: In st_centroid.sfc(x$geometry) : st_centroid does not give correct centroids for longitude/latitude data. 2: In st_centroid.sfc(x$geometry) : st_centroid does not give correct centroids for longitude/latitude data. Can this warning be eliminated? Maybe adding a note in the documentation? \r\n\r\nSame as above\r\n\r\n>more in-depth discussion of the functionalities included in the package will make it easier for the reader to understand if the chirps dataset is suitable for a given purpose. I would also mention that requests may take a long time to be executed. Is it feasible to use these functions to download large amount of data (for instance to perform a global scale analysis)? In general, a mention of the limitations of this package would be valuable. \r\n\r\nWe added a section for the package limitations. And a better explanation about CHIRPS application into the paper. Also, W. Ashmall says here https://github.com/agrobioinfoservices/chirps/issues/12 that they are planning to upgrade the API service which will make it better to request queries to ClimateSERV.\r\n\r\n# LICENSE = GPL-3\r\n>when you use a widely known license you should not need to add a copy of the license to your repo. The files LICENSE and LICENSE.md are redundant and can be removed. \r\n\r\nDONE\r\n\r\n>When I got my packages reviewed I was made aware that GPL-3 is a strongly protective license and, if you want your package to be used widely (also commercially), MIT or Apache licenses are more suitable. I just wanted to pass on this very valuable suggestion I received. \r\n\r\nThank you, we changed to MIT as suggested\r\n\r\n# inst/paper folder:\r\n\r\n>it seems the code in your paper is only visualised but not executed. In this case, you could keep the paper.md and remove paper.Rmd. Also paper.pdf could be removed. \r\n\r\nDONE\r\n\r\n>Fig1.svg is redundant (Fig1.png is used for rendering the paper). \r\n\r\nDONE\r\n\r\n>in the paper, I would move the introduction to the CHIRPS data at the beginning as readers might not be familiar with these data. \r\n\r\nDONE\r\n\r\n>in the paper you use the command chirps:::tapajos to load data in your sysdata.rda. This is not good practice. The ::: operator should not be used as it exposes non-exported functionalities. If you move sysdata under the chrips/data folder (as suggested above), the dataset can be loaded using data(\"sysdata\") or chirps::tapajos. \r\n\r\nDONE\r\n\r\n>Towards the end of your paper you state: Overall, these indices proved to be an excellent proxy to evaluate the climate variability using precipitation data [#DeSousa2018], the effects of climate change [#Aguilar2005], crop modelling [#Kehel2016] and to define strategies for climate adaptation [#vanEtten2019]. Maybe you could expand a bit, perhaps on the link with crop modelling? \r\n\r\nWe updated this section with more examples, and hopefully a better explanation on CHIRPS applications and how *chirps* can help\r\n\r\n# goodpractice::goodpractice():\r\n\r\n>write unit tests for all functions, and all package code in general. 34% of code lines are covered by test cases. This differs from what is stated on GitHub (codecv badge = ~73% code coverage). The reason might be due to the fact you skip most of your tests on cran, is this because tests take too long to run? If so, is there a way you could modify the tests so that they take less time?\r\n\r\nSame as above in the tests section\r\n\r\n>fix this R CMD check WARNING: Missing link or links in documentation object 'precip_indices.Rd': ‘[tidyr]{pivot_wider}’ See section 'Cross-references' in the 'Writing R Extensions' manual. Maybe you could substitute \\code{\\link[tidyr]{pivot_wider}} with \\code{tidyr::pivot_wider()}. \r\n\r\nThe code was removed from seealso in the documentation.\r\n\r\nAgain, thank you for your time reviewing this package. We hope you like its new version.\r\n\r\n\r\n\r\n\r\n" ,
id = 582430287L, number = 357), row.names = 48L, class = "data.frame")
Issue: Split each row into multiple rows, according to condition. My conditions:
By paragraph (\r\n\r\n)
If a paragraph starts with > keep it with the following one in the same row.
I did the following to achieve (1), but the problem is the result does not fulfil condition (2):
# This is what I did for condition (1)
split <- tidyr::separate_rows(df, "body", sep = "\r\n\r\n")
For example, for condition (2) I will want the following two paragraphs to remain together in the same row:
>When running the command get_chirps, I get the following warning message: Warning messages: 1: In st_centroid.sfc(x$geometry) : st_centroid does not give correct centroids for longitude/latitude data. 2: In st_centroid.sfc(x$geometry) : st_centroid does not give correct centroids for longitude/latitude data. Can this warning be eliminated? Maybe adding a note in the documentation?
Same as above
Question: How can I split each row, with both conditions?

You can separate the rows based on "\r\n\r\n", assign an id based on whether the following row begins with ">", then collapse by id:
library(tidyr)
library(dplyr)
library(stringr)
separate_rows(df, "body", sep = "\r\n\r\n") %>%
mutate(id = cumsum(str_detect(lag(body, default = ""), "^>", negate = TRUE))) %>%
group_by_at(vars(-body)) %>%
summarise(body = str_flatten(body, "\n"))
# A tibble: 30 x 5
# Groups: issue_url, user, id [30]
issue_url user id number body
<chr> <chr> <int> <dbl> <chr>
1 https://api.github.com/repos/ropensci/softw~ kauedeso~ 1 357 "Dear #cvitolo thank you so much for your comments and suggestions. It helped a lot! We have worked in incorporating th~
2 https://api.github.com/repos/ropensci/softw~ kauedeso~ 2 357 "# general comments"
3 https://api.github.com/repos/ropensci/softw~ kauedeso~ 3 357 ">README, it seems the code in your README file is only visualised but not executed. In this case, you could keep the R~
4 https://api.github.com/repos/ropensci/softw~ kauedeso~ 4 357 "> Your R folder contains a file called sysdata.rda. Is there a reason to keep these data there? Usually data should be~
5 https://api.github.com/repos/ropensci/softw~ kauedeso~ 5 357 "> the man folder contains a figure folder. Is this good practice? I thought the man folder should be used only for doc~
6 https://api.github.com/repos/ropensci/softw~ kauedeso~ 6 357 "#\ttests folder:"
7 https://api.github.com/repos/ropensci/softw~ kauedeso~ 7 357 "> each of your test files contain a call to library(chirps). This is superflous because you load the package in testth~
8 https://api.github.com/repos/ropensci/softw~ kauedeso~ 8 357 "> When you call library() sometimes you use library(package_name), other times library(\"package_name\"). I would sugg~
9 https://api.github.com/repos/ropensci/softw~ kauedeso~ 9 357 "> test-get_chirps.R: you only test that you get the right object class but do not test the returned values, is there a~
10 https://api.github.com/repos/ropensci/softw~ kauedeso~ 10 357 ">test-get_esi.R: as above, you only test that you get the right object class but do not test the returned values. I wo~
# ... with 20 more rows

Walsh-Hadamard Transform in r

I search for a command to compute Walsh-Hadamard Transform of an image in R, but I don't find anything. In MATLAB fwht use for this. this command implement Walsh-Hadamard Tranform to each row of matrix. Can anyone introduce a similar way to compute Walsh-Hadamard on rows or columns of Matrix in R?
I find a package here:
http://www2.uaem.mx/r-mirror/web/packages/boolfun/boolfun.pdf
But why this package is not available when I want to install it?

Packages that are not maintained get put in the Archive. They get put there when that aren't updated to match changing requirements or start making errors with changing R code base. https://cran.r-project.org/web/packages/boolfun/index.html
It's possible that you might be able to extract useful code from the archive version, despite the relatively ancient version of R that package was written under.
The R code for walshTransform calls an object code routine:
walshTransform <- function ( truthTable ) # /!\ should check truthTable values are in {0,1}
{
len <- log(length(truthTable),base=2)
if( len != round(len) )
stop("bad truth table length")
res <- .Call( "walshTransform",
as.integer(truthTable),
as.integer(len))
res
}
Installing the package succeeded on my Mac, but would require the appropriate toolchain on whatever OS you are working in.

R - Matchit - Propensity Score Matching - Discard function not working

I am using the MatchIt package on the LaLonde data-set and the discard argument is generating two types of errors. (The code works if I do not use the discard argument). In both cases, it is not clear how to resolve the problems....
The first issue is when I try discard = "hull.control"
m.opt1 <- matchit(treat ~ inc.re74 + inc.re75 + education + nonwhite +
age + nodegree, data = cps_controls, method = "optimal", ratio=1,
discard="hull.control")
This error message is produced....
Loading required namespace: WhatIf
Preprocessing data ...
Performing convex hull test ...
Error in mclapply(1:m, in_ch, mc.cores = mc.cores) :
'mc.cores' > 1 is not supported on Windows
The second issue is when I try discard = "control"
Error in d[i, ] <- abs(d1[i] - d0) :
number of items to replace is not a multiple of replacement length
Is there a way to address either of these? Thanks!!

Your issue seems to be kinda bug in MatchIt package as noted on SO here and here. I've submitted a ticket on GitHub.

Regarding the discard = "hull.control" issue:
Download the source code of MatchIt from here and edit discard.R. Add to the calls of WhatIf::whatif the argument mc.cores = 1. This should hard-code the number of cores used to 1 and thus eliminate the issue.
Uninstall the MatchIt package and build the new one by opening command line and type R CMD build C:\path\to\MatchIt-master. This should create a .tar.gz file. In R Studio, click on Tools -> Install packages... and select the local package.
You may need to restart R Studio if the library was loaded previously.
Enjoy.

How to save Variant Call Format (VCF) file to disk in R using VariantAnnotation Package

I've searched the web for this without much luck. More or less you always get to the example from the VariantAnnotation Package. And since this example works fine on my computer I have no idea why the VCF I created does not.
The problem: I want to determine the number and location of SNPs in selected genes. I have a large VCF file (over 5GB) that has info on all SNPs on all chromosomes for several mice strains. Obviously my computer freezes if I try to do anything on the whole genome scale, so I first determined genomic locations of genes of interest on chromosome 1. I then used the VariantAnnotation Package to get only the data relating to my genes of interest out of the VCF file:
library(VariantAnnotation)
param<-ScanVcfParam(
info=c("AC1","AF1","DP","DP4","INDEL","MDV","MQ","MSD","PV0","PV1","PV2","PV3","PV4","QD"),
geno=c("DP","GL","GQ","GT","PL","SP","FI"),
samples=strain,
fixed="FILTER",
which=gnrng
)
The code above is taken out of a function I wrote which takes strain as an argument. gnrng refers to a GRanges object containing genomic locations of my genes of interest.
vcf<-readVcf(file, "mm10",param)
This works fine and I get my vcf (dim: 21783 1) but when I try to save it won't work
file.vcf<-tempfile()
writeVcf(vcf, file.vcf)
Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
I even tried in parallel, doing the example from the package first and then substituting for my VCF file:
#This is the example:
out1.vcf<-tempfile()
in1<-readVcf(fl,"hg19")
writeVcf(in1,out1.vcf)
This works just fine, but if I only substitute in1 for my vcf I get the same error.
I hope I made myself clear... And any help will be greatly appreciated!! Thanks in advance!

Thanks for reporting this bug. The problem is fixed in version 1.9.47 (devel branch). The fix will be available in the release branch after April 14.
The problem was that you selectively imported 'FILTER' from the 'fixed' field but not 'ALT'. writeVcf() was throwing an error because there was no ALT value to write out. If you don't have access to the version with the fix, a work around would be to import the ALT field.
ScanVcfParam(fixed = c("ALT", "FILTER"))
You can see what values were imorted with the fixed() accessor:
fixed(vcf)
Please report and bugs or problems on the Bioconductor mailing list Martin referenced. More Bioc users will see the question and you'll get help more quickly.
Valerie

Here's a reproducible example
library(VariantAnnotation)
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
param <- ScanVcfParam(fixed="FILTER")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
## Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList
The problem seems to be that writeVcf expects the object to have an 'ALT' field, so
param <- ScanVcfParam(fixed="ALT")
writeVcf(readVcf(fl, "hg19", param=param), tempfile())
succeeds.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ViSEAGO tutorial: visualising topGO object - r

Related

Index error when running maxnet function (maxnet package)

Conditionally split dataframe in rows according to value in cell (in R!)

Walsh-Hadamard Transform in r

R - Matchit - Propensity Score Matching - Discard function not working

How to save Variant Call Format (VCF) file to disk in R using VariantAnnotation Package

Categories

Resources