data dictionary package - r

Is there a data dictionary package that will work for R.
I have located a data_dict package in the following link for R, however, it will not run on the version of R I am using.
http://optimumsportsperformance.com/blog/creating-a-data-dictionary-function-in-r/
I am looking for a data dictionary package that will make light work of a number of large and complex data tables... I have heard these elusive data dictionary packages exist.

I wrote a package that might be what you're after. It's on CRAN under datadictionary, but it might be better to use the dev version on Github because it handles difftimes and dates better.

Related

using the 'ptw' package in R

I am working on applying the ptw package to my GC-MS wine data. So far I have been able to correctly use this package on the apples example data described in the vignette (MTBLS99). Since I am new to R, I am unable to get my .CDF files into the format they used to start the vignette. They started with three data frames (All.pks, All.tics, All.xset). I assume that this was generated using the xcms package. But I cannot recreate the specific steps used for the data to be formatted in this manner. Has anyone successfully applied 'ptw' to their LC/GC-MS data? can someone share the code used for generating the All.pks, All.tics, All.xset data frames?

Should one load the dataset before or inside of the function?

Q. How does one write an R package function that references an R package dataset in a way that is simple/friendly/efficient for an R user. For example, how does one handle afunction() that calls adataset from a package?
What I think may not be simple/friendly/efficient:
User is required to run
data(adataset) before running
afunction(...)
or else receiving an Error: ... object 'adataset' not found. I have noticed some packages have built-in datasets that can be called anytime the package is loaded, for example, iris, which one can call without bringing it to the Global Environment.
Possible options which I have entertained:
Write data(NamedDataSet) directly into the function. Is this a bad idea. I thought perhaps it could be, looking at memory and given my limiting understanding of function environments.
Code the structure of the dataset directly into the function. I think this works depending on the size of the data but it makes me wonder about how to go about proper documentation in the package.
Change Nothing. Given a large enough dataset, maybe it does not make sense to implement a way different from reading it before calling the function.
Any comments are appreciated.
You might find these resources about writing R data packages useful:
the "External Data" section of R Packages
Creating an R data package
Creating an R data package (overview)
In particular, take note of the DESCRIPTION file and usage of the line LazyData: true. This is how datasets are made available without having to use data(), as in the iris example that you mention.

Stealing methods and data from other R packages

I am currently developing an R package that make use of different datasets from other R packages. As a result, my package has a large number of dependencies, and the user is required to install various unrelated packages in order for my package to work.
I would prefer to copy these datasets to my own package and give proper credit in the documentation, but is there a problem with that?
And what about simple functions from other packages? For example, I need the Matern function from the fields package, and it seems much simpler to just copy that function to my own package instead of having a dependency on a whole package full of unrelated functionality.
Why not just ask the authors/maintainers of those packages for their permission or thoughts on the matter? They may know something that the rest of us don't about how the functions are implemented and how easy they are to copy.
Two different people asked me if they could include a function from my package in theirs, they explained why they wanted to and what they were doing and I agreed that having the user install my whole package for just the 1 function would be overkill and gave them my blessing (and the original source code) to include the functions in their packages (technically due to the license they did not need my permission). Now when I update either of the functions, I also send the updated source code to those 2 authors so that they can keep their copy up to date if they want to.

Where to store data for an R package hosted on GitHub

I'm working on building a package in R and have a couple very large data sets that I would like to make available to package users without having to re-run my code that extracted the data initially. My package (which is still a work in progress) is hosted on GitHub. It's primarily for my own use as I work on a larger research project. Is there a way to include a .csv of a data set so that it stays stored on GitHub? Ideally it would be something like the default data sets mtcars or diamonds. Is there a way to dput() the data set and then store it in my package function file?
Additional information: I've been using a combination of roxygen2 and devtools to build and launch. This question is related but is one step ahead of what I need.
I hope my question is clear!

Update the dataset in an installed package

Is it possible to update a dataset in a local, installed package?
A package that I maintain has a dataset based on periodically-updated data. I would like to update the local version of my dataset and save the changes back to the package such that next time I load the data, i.e. data(xxx), the updated version of the dataset will load.
In the medium and long term I will update the package and then upload a new version to CRAN, but I'm looking for a short term solution.
If it is possible, how would I do it?
You could
by updating the source and re-installing, yes. Preferably with a new distinctive version number.
by forcefully overwriting it, possibly. Not the proper way to do it.
What I would try to do is to put a mechanism to acquire this data in the package, but separate the (changing ?) data from the code.
Packages are not first and foremost a means to direct data acquisition, in particular for changing data sets. Most packages include fixed data to demonstrate or illustrate a method or implementation.

Resources