Getting associated GO:IDs for a given gene name using R bioconductor annotation package - r

I am trying to play around with hgu95av2.db and GO.db libraries from Bioconductor classes.
I have a list of genenames.
Genename1
Genename2
Genename3
These are standard gene names from genedb.
I now want to get the associated go.ids associated with them. I would like to use this information to bin the data in the future.
I have attempted to go through the annotationapi help file where they say one way of getting the ids is to use this api as follows:
select(hgu95av2.db, keys=keys, cols=c("GO"),
keytype="GENENAME")
I am not sure how to set up the keys.
Do I just set up a column object with the list of keys.
When I try to do and run the command above I get the following error:
Error in .testIfKeysAreOfProposedKeytype(x, keys, keytype) :
None of the keys entered are valid keys for the keytype specified.
I have played around with the keytype and always get the same error which makes me think I dont really understand fundamentally how to use this database query tool.
I have done a search in bioconductor and they just assume that I have expression data in affy matrix format where I just have a list of the genenames.
I would appreciate your help and apologies as I am a newbie and not really clear on the R bioconductor interface.
Many thanks

Related

Similar Package like Enchant in R

I have a query.
There is a package in python called "Enchant"
Enchant is a module in python which is used to check the spelling of a word, gives suggestions to correct words. Also, gives antonym and synonym of words. It checks whether a word exists in dictionary or not.
In this module, we can define our own dictionary.
Can you tell me if there is a similar package available in R.
I checked hunspell but I am not able to define my own dictionary.
Can you please help me out
R has a spelling package. See https://cran.r-project.org/package=spelling. It doesn't provide synonyms and antonyms.

reliably extract srclines and srcfile from a function

I need to extract the exact lines of the source that was parsed to create an R function, for use in coverage analysis. deparse is not accurate enough because in doing coverage analysis with package covr exact line numbers matter.
If there is a srcfile, I just need the filename. If there isn't, e.g. function was created in console, I need to create an equivalent temporary file that could have been, line by line, the source file for that function.
I see several function to extract src information from a function, like getsrcFilename or getSrcref, but none specifically to get the source code.
getSrclines looked promising, but doesn't take functions as arguments. I tried to use attributes to get to the srcref and get to the information that way, but it doesn't seem to be stored consistently -- clearly I am missing something.
Sometimes
attributes(body(cover.fun))$srcfile works and sometimes this attributes(attributes(cover.fun)$srcref)$srcfile) does, and in the srcref itself I found the source in srcfile$lines or srcfile$original$lines and of course these look just like experiments and not The Right Way to implement this.
I need something that takes care of functions created in a package, with source or interactively. If the filename is available, that's all I need, otherwise I need the source lines. Thanks

Can I load an RData file while bypassing loading the namespaces?

Let's say some of my users cannot alter their R environments, but I need them to be able to open up RData files. These environment files require a package to be loaded (httpuv to be exact). We don't care about the package, we don't need its capabilities, we just need to get at the data. Is there a way to either force R to bypass loading namespaces when loading the RData file, or force it to save it without namespace dependencies at the originating end? Thanks.
To reproduce, install Shiny. Create and save a some R objects to the server's file system from within a Shiny applet as an RData file. Copy the file over to a computer that doesn't have Shiny or the httpuv package installed. Try loading the RData file, even if the actual objects you saved are completely ordinary data.frames that have nothing to do with Shiny or httpuv.
I did strings on the RData, and the damn thing is full of references to httpuv. The software is loading the file and then actively deciding to not continue in the internal loadFromConn2() function. Therefore there must be a way to make it stop doing so.
Really #baptiste should get credit for the link in his comment to some general solutions, especially the R CMD INSTALL --fake trick, and I will accept that if he reposts it as an answer. That is why I am not accepting the following answer of my own to the specific problem that caused it in my case, but I am posting my answer in case it helps someone else.
Some of the objects I was saving were lm fitted objects. Those contain formula/terms objects (at least two each, for some reason... maybe because they've been through stepAIC), and those formulas in turn each have an environment attribute. The environment attribute is .GlobalEnv which probably does contain copies of package functions someplace. When I dug through the objects inside the fitted models, and then the objects inside all the attributes of those objects, and then the objects inside the attributes of the attributes of those objects... and set every environment attribute I could find to NULL, eventually I was able to save that fitted model to a file that could be opened from a different R installation without getting the error about not being able to load a namespace.
I suppose I could also write a function to iterate through the objects within a fitted model, and their attributes, and remove environments but that sounds ugly and dangerous. Maybe there is a way to force formulas and fitted models not to retain environments, and that will be better. For the time being, instead of saving fitted models, I will save their call attributes after scrubbing any environment attributes I might find there. If that doesn't work, I'll deparse them into character strings.
PS: I used the RDS format and haven't yet tested it with RData, but I suspect that the problem was the saving of the evalution environment in some of the attributes, and had nothing to do with the format in which the objects get saved. I'll post an update if it turns out that this doesn't also work with RData.
PPS: I suspect I'm not the only one here who's hearing about the R CMD INSTALL --fake trick for the first time, and perhaps the word should be spread about this... because to the extent other R users don't know about it, this remains an obvious vector for denial-of-service attacks against R!
I will accept my own answer to get rid of the SO auto-nagger, but will unaccept it and accept #baptiste if they make it possible for me to do so by posting it as an answer. Thanks.

Specify my dataset as working dataset

I am a newbie to R.
I can successfully load my dataset into R-Studio, and I can see my dataset in the workspace.
When I run the command summary(mydataset), I get the expected summary of all my variables.
However, when I run
data(mydataset)
I get the following warning message:
In data(mydataset) : data set ‘mydataset’ not found
I need to run the data() command as recommended in the fitLogRegModel() command, which is part of the PredictABEL package.
Does anybody have a hint on how I can specify mydataset as working dataset?
You don't need to use the data command. You can just pass your data to the function
riskmodel <- fitLogRegModel(data=mydataset, cOutcome=2,
cNonGenPreds=3:10, cNonGenPredsCat=6:8,
cGenPreds=c(11, 13:16), cGenPredsCat=0)
The example uses data(ExampleData) so that it can make data that is in the package available to you. Since you already have your data, you don't need to load it.
An alternative, although it has its drawbacks, is to use attach(mydataset). You can then refer to variables without a mydatdataset$ prefix. The main drawback, as far as I know (although I'd welcome the views of more expert R users) is that if you modify a variable after attaching, it is then not part of the dataset. This can cause confusion and lead to "gotchas". In any case, many expert R users counsel against the use of attach.

How can I remove a lock from a linked environment in R?

I tried to run a Bioconductor package (truncateCDF) that modify an environment(hgu133plus2cdf), to remove unwanted probesets, from an affymetrix chip.
Everything went fine, until I got the following message (translated from french):
> assign(cdfname, cdf.env, env=CDF.env)
Error in assign(cdfname, cdf.env, env = CDF.env) :
impossible to change the value of a locked link for 'hgu133plus2cdf'
The assign function is the ultimate function of the code, that save the changes made to the environment dataset CDF.env to the original environment (hgu133plus2cdf), before using it in analyses of affymetrix chip results; so, it is essential.
My question: what is this locked link to the hgu133plus2cdf environment, and how could I bypass it.
The author of this package successfully run its package around 2005; so I suppose it is a feature introduced since then in R (probably not related to Bioconductor, since assign is a basic R function, reason why I ask this question on this forum instead of Biostar).
I tried to read the docs, but I am overwhelmed, when it comes to environments.
Thanks in advance for any help.
I don't think truncateCDF is from a Bioconductor package; it is a at least not current. This sounds like this post and the next two from the same thread from the Bioconductor mailing list. It is a result of a change in R -- packages now have not-easily-modified name spaces, and these are implemented by locking the environment in which name space symbols are defined. Removing probes is not an essential part of a typical microarray work flow. Please ask on the Bioconductor mailing list (no subscription required) if you'd like more help.

Resources