Specify my dataset as working dataset - r

I am a newbie to R.
I can successfully load my dataset into R-Studio, and I can see my dataset in the workspace.
When I run the command summary(mydataset), I get the expected summary of all my variables.
However, when I run
data(mydataset)
I get the following warning message:
In data(mydataset) : data set ‘mydataset’ not found
I need to run the data() command as recommended in the fitLogRegModel() command, which is part of the PredictABEL package.
Does anybody have a hint on how I can specify mydataset as working dataset?

You don't need to use the data command. You can just pass your data to the function
riskmodel <- fitLogRegModel(data=mydataset, cOutcome=2,
cNonGenPreds=3:10, cNonGenPredsCat=6:8,
cGenPreds=c(11, 13:16), cGenPredsCat=0)
The example uses data(ExampleData) so that it can make data that is in the package available to you. Since you already have your data, you don't need to load it.

An alternative, although it has its drawbacks, is to use attach(mydataset). You can then refer to variables without a mydatdataset$ prefix. The main drawback, as far as I know (although I'd welcome the views of more expert R users) is that if you modify a variable after attaching, it is then not part of the dataset. This can cause confusion and lead to "gotchas". In any case, many expert R users counsel against the use of attach.

Related

I cannot obtain scores for a metaMDS object in RStudio (package: vegan)

I'm using vegan 2.6.4 in RStudio, and have had an unusual error message pop up when I run the the following:
nmds11 = metaMDS(m_com11, distance = "bray")
data.scores11 = as.data.frame(scores(nmds11)$sites)
Error in UseMethod("scores") :
no applicable method for 'scores' applied to an object of class "c('metaMDS', 'monoMDS')
I can safely say this has never happened to me, and I was using the exact same code on a different dataset 5 minutes ago with no issues. I have also previously run this same script on at least a dozen other matrices with no errors.
I have tried calling scores.metaMDS as suggested when looking up the scores function (to help specify what type of object I'm trying to get scores from), but that function apparently does not exist. I've also tried running some old scripts that always worked in the past, with the same unfortunate results.
Any idea what I can do to address this?
Try using vegan::scores(); it could be that some other package you have loaded also has a scores() generic that is overwriting vegan::scores(). You can also try the much more specific vegan:::scores.metaMDS() if the whole S3 system has gotten clobbered.
Beyond that, restart R (in RStudio, find the Restart R option in the menus) so you get a clean session and try running your code again.
I I tried vegan:::scores.metaMDS() without restarting RStudio and it works ! Thanks !!!

R CMD check: no visible binding for global variable (when using a data/ dataset in the package)

Slightly different versions of this question have been asked before but I haven't seen a good answer yet.
I have a very simple repro using the very good source code of ggplot2:
Go into any file in ggplot2/R/ and add a line that references the "diamonds" dataset included in ggplot2/data/diamonds.r.
Then attempt to build/check the package, (ie: R CMD build .; R CMD check --as-cran ggplot2_3.0.0.9000.tar.gz)
In my arbitrary example I added diamonds to line 436 in theme.r and got this note when trying to check:
* checking R code for possible problems ... NOTE
plot_theme: no visible binding for global variable ‘diamonds’
Undefined global functions or variables:
diamonds
I run into this problem in our package which we want to submit to CRAN. AFAIK we are following best practices by using data/ourdataset.r and then "ourdataset" in our R/ code. And yet, we get this NOTE failure.
What are we doing wrong? If this NOTE comes up for a package like ggplot2, I am at a loss as to whether we are doing something wrong or this is something that should be fixed in CHECK. CHECK has been fantastic so far but I am stumped on this one.
Thanks!
Usually, to get rid of that Note you just have to add a reference like this:
ggplot2::diamonds

How to use/install merge method for data.sets from memisc package?

I have two data.sets (from the memisc package) all set for merge, and the merge goes through without error or warning, but the output is a data.frame, not a data.set. The command is:
datTS <- merge(datT1, datT2, by.x="ryear", by.y="ryear")
(Sorry I don't have a more convenient example with toy data handy.) The following pages seem to make it very clear that there should be a method built into memisc that properly merges the data.sets into one data.set:
http://rpackages.ianhowson.com/rforge/memisc/man/dataset-manip.html
https://github.com/melff/memisc/blob/master/pkg/R/dataset-methods.R
...but it just doesn't seem to be properly triggering on my machine (sorry also for my clumsy lingo). Note the similarity of my code and the example code from the very end of the first page I linked:
ds6 <- merge(ds1,ds5,by.x="a",by.y="c")
I've verified that I have the most recent versions of R, RStudio, memisc, and all dependencies. I've used a number of other memisc methods so far (within, transform, missing.values, etc.) without issue.
So my question is: what else does one need to do to get the merge function to properly produce a data.set when the source data are in data.set form, as per the memisc package? (There's no explicit addressing of this merge capability in the official package documentation.) Since the code in the second link above seems to provide the method for this, is there some workaround, at least, for installing and utilizing that code? Maybe there's just some separate "methods installation" I'm not aware of (but why would it be separate from the main package?).
The help page for pkg:memisc in the released version 0.97 does not describe a merge function method for data.sets. You are pointing us to the github version which may not be the one that has been released. You need to install the github version. See: https://github.com/melff/memisc/releases

Can I load an RData file while bypassing loading the namespaces?

Let's say some of my users cannot alter their R environments, but I need them to be able to open up RData files. These environment files require a package to be loaded (httpuv to be exact). We don't care about the package, we don't need its capabilities, we just need to get at the data. Is there a way to either force R to bypass loading namespaces when loading the RData file, or force it to save it without namespace dependencies at the originating end? Thanks.
To reproduce, install Shiny. Create and save a some R objects to the server's file system from within a Shiny applet as an RData file. Copy the file over to a computer that doesn't have Shiny or the httpuv package installed. Try loading the RData file, even if the actual objects you saved are completely ordinary data.frames that have nothing to do with Shiny or httpuv.
I did strings on the RData, and the damn thing is full of references to httpuv. The software is loading the file and then actively deciding to not continue in the internal loadFromConn2() function. Therefore there must be a way to make it stop doing so.
Really #baptiste should get credit for the link in his comment to some general solutions, especially the R CMD INSTALL --fake trick, and I will accept that if he reposts it as an answer. That is why I am not accepting the following answer of my own to the specific problem that caused it in my case, but I am posting my answer in case it helps someone else.
Some of the objects I was saving were lm fitted objects. Those contain formula/terms objects (at least two each, for some reason... maybe because they've been through stepAIC), and those formulas in turn each have an environment attribute. The environment attribute is .GlobalEnv which probably does contain copies of package functions someplace. When I dug through the objects inside the fitted models, and then the objects inside all the attributes of those objects, and then the objects inside the attributes of the attributes of those objects... and set every environment attribute I could find to NULL, eventually I was able to save that fitted model to a file that could be opened from a different R installation without getting the error about not being able to load a namespace.
I suppose I could also write a function to iterate through the objects within a fitted model, and their attributes, and remove environments but that sounds ugly and dangerous. Maybe there is a way to force formulas and fitted models not to retain environments, and that will be better. For the time being, instead of saving fitted models, I will save their call attributes after scrubbing any environment attributes I might find there. If that doesn't work, I'll deparse them into character strings.
PS: I used the RDS format and haven't yet tested it with RData, but I suspect that the problem was the saving of the evalution environment in some of the attributes, and had nothing to do with the format in which the objects get saved. I'll post an update if it turns out that this doesn't also work with RData.
PPS: I suspect I'm not the only one here who's hearing about the R CMD INSTALL --fake trick for the first time, and perhaps the word should be spread about this... because to the extent other R users don't know about it, this remains an obvious vector for denial-of-service attacks against R!
I will accept my own answer to get rid of the SO auto-nagger, but will unaccept it and accept #baptiste if they make it possible for me to do so by posting it as an answer. Thanks.

Using glmm_funs.R?

I am fitting a GLMM and I had seen some examples where is used the function: overdisp_fun, defined in glmm_funs.R, but I don't know which package contain them or how can I call it from R, can somebody help me?
Thanks,
If you google for glmm_funs.R, you'll find links to the script (eg here: http://glmm.wdfiles.com/local--files/trondheim/glmm_funs.R).
You can save the file on your local machine, then call it in your R session with source("path to file/glmm_funs.R").
You will then be able to use the functions contained in the script, including overdisp_fun().
You can think of it a little bit like loading a package, except the functions are just presented in a script.

Resources