R calling a dataset in the package itself - r

I have created a package containing a dataset called mydata.
I want to use my dataset in my functions but I dont know how to call it.
when i use data("mydata") to call my dataset and avoid warning messages I have another message during the building process
See section ‘Good practice’ in ‘?data’.
#imporFrom mypackage mydata doesn't work either. What's the best way to call a dataset in the package itself?

Use package_name::mydata, example in car package car::Angell.
Dataset is in a data folder with for example mydata.rda inside.
To add help, create a data.R file in R folder.
Personnaly I put inside :
# 1. mydata ----
#' Here is a long title
#'
#' Description of the dataset : length, columns
#'
#' #format one object of class XXX
#'
#' #source where it comes from
#' #name mydata
NULL

Related

Can't access dataset documentation in R

I have created a package using devtools and roxygen2, which works perfectly fine. Now I want to include two data frames termed mydata1 and mydata2 into the package.
I have saved the data frames in mypackage/data as mydata1.rda and mydata2.rda using devtools::use_data(mydata1, mydata2)
Then I created a mydata1.R and mydata2.R files in mypackage, which contains the description as follows:
#' My temperature data
#'
#' A dataset containing the temperature
#'
#' #format A data frame with 153 rows and 1 variable:
#' \describe{
#' \item{temperature}{relative air temperature in degree Celsius}
#' }
"mydata1"
However, after I install and restart in the package build mode, I cannot access the description via help(mydata1). I can access the data via mypackage::mydata1 though.
What am I doing wrong? It might be a stupid mistake, however, I could not find a solution so far. Thank you very much in advance.
Ok I found the solution, I just needed to save the description files in mypackage/R and not in mypackage.

How to store frequently used data or parameters within an R package?

I am authoring an R package and there are several numerical vectors that users will frequently use as arguments to various package functions. What would be the best way to store these vectors within the package so that users can easily access them?
One idea I had was to save each vector as a data file in inst/data. Then users would be able to use the data file's name in place of the vector when needed (at least, I can do this during development). I like this idea, but am not sure if this solution would violate CRAN rules/norms or cause any problems.
# To create one such vector as a data file
octants <- c(90, 135, 180, 225, 270, 315, 360, 45)
devtools::use_data(octants)
# To access this vector in usage
my_function(data, octants)
Another idea I had was to create a separate function that returns the desired vector. Then users would be able to call the appropriate function when needed. This might be better than data for some reason, but I worry about users forgetting the () after the function name.
# To create the vector within a function
octants <- function() c(90, 135, 180, 225, 270, 315, 360, 45}
# To access this vector in usage
my_function(data, octants()) # works
my_function(data, octants) # doesn't work
Does anyone have ideas on which solution would be preferable or any better alternatives?
I'll be honest, I spent quite a long time carefully reading the manual asking myself the same questions. Do it, it's a good idea, it's useful, and there are tools to help you. The Writing help extension manual describe in what format you can save your data, and how to follow R standards.
What I would advice to provide data within a package is to use :
devtools::use_data(...,internal=FALSE,overwrite=TRUE)
where ... are unquoted names of the datasets you want to save.
https://www.rdocumentation.org/packages/devtools/versions/1.13.3/topics/use_data
You just create a file in the inst subdirectory of your package to create your datasets. My own example is there https://github.com/cran/stacomiR/blob/master/inst/config/generate_data.R
For instance I use it to create the r_mig dataset
#################################
# generates dataset for report_mig
# from the vertical slot fishway located at the estuary of the Vilaine (Brittany)
# Taxa Liza Ramada (Thinlip grey mullet) in 2015
##################################
#{ here some stuff necessary to generate this dataset from my package
# and database}
setwd("C:/workspace/stacomir/pkg/stacomir")
devtools::use_data(r_mig,internal=FALSE,overwrite=TRUE)
This will save your dataset in the appropriate format. Using internal = FALSE allows access to all users using data(). I suggest that you read the data() help file. You can use data() to access your files including when you are not in a package provided they are in a data subdirectory.
If lib.loc and package are both NULL (the default), the data sets are
searched for in all the currently loaded packages then in the ‘data’
directory (if any) of the current working directory.
If you are using Roxygen, create an R file called data.R where you store the description of all your datasets. Below an example of the Roxygen naming of one of the datasets in the stacomiR package.
#' Video counting of thin lipped mullet (Liza ramada) in 2015 in the Vilaine (France)
#'
#' This dataset corresponds to the data collected at the vertical slot fishway
#' in 2015, video recording of the thin lipped mullet Liza ramada migration
#'
#' #format An object of class report_mig with 8 slots:
#' \describe{
#' \item{dc}{the \code{ref_dc} object with 4 slots filled with data corresponding to the iav postgres schema}
#' \item{taxa}{the \code{ref_taxa} the taxa selected}
#' \item{stage}{the \code{ref_stage} the stage selected}
#' \item{timestep}{the \code{ref_timestep_daily} calculated for all 2015}
#' \item{data}{ A dataframe with 10304 rows and 11 variables
#' \describe{
#' \item{ope_identifiant}{operation id}
#' \item{lot_identifiant}{sample id}
#' \item{lot_identifiant}{sample id}
#' \item{ope_dic_identifiant}{dc id}
#' \item{lot_tax_code}{species id}
#' \item{lot_std_code}{stage id}
#' \item{value}{the value}
#' \item{type_de_quantite}{either effectif (number) or poids (weights)}
#' \item{lot_dev_code}{destination of the fishes}
#' \item{lot_methode_obtention}{method of data collection, measured, calculated...}
#' }
#' }
#' \item{coef_conversion}{A data frame with 0 observations : no quantity are reported for video recording of mullets, only numbers}
#' \item{time.sequence}{A time sequence generated for the report, used internally}
#' }
#' #keywords data
"r_mig"
The full file is there :
https://github.com/cran/stacomiR/blob/master/R/data.R
Another example : read : http://r-pkgs.had.co.nz/data.html#documenting-data
Then you can use those data in tests like following, by calling data("r_mig")
test_that("Summary method works",
{
... #some other code
data("r_mig")
r_mig<-calcule(r_mig,silent=TRUE)
summary(r_mig,silent=TRUE)
rm(list=ls(envir=envir_stacomi),envir=envir_stacomi)
})
Most importantly you can use those in the manuals to describe how to use functions in your package.

'data' is not an exported object from 'namespace:my_package'

I'm writing a function that uses an external data as follow:
First, it checks if the data is in the data/ folder, if it is not, it creates the data/ folder and then downloads the file from github;
If the data is already in the data/ folder, it reads it, and perform the calculations.
The question is, when I run:
devtools::check()
it returns:
Error: 'data' is not an exported object from 'namespace:my_package'
Should I manually put something on NAMESPACE?
An example:
my_function <- function(x){
if(file.exists("data/data.csv")){
my_function_calculation(x = x)
} else {
print("Downloading source data...")
require(RCurl)
url_base <-
getURL("https://raw.githubusercontent.com/my_repository/data.csv")
dir.create(paste0(getwd(),"/data"))
write.table(url_base,"data/data.csv", sep = ",", quote = FALSE)
my_function_calculation(x = x)
}
}
my_function_calculation <- function(x = x){
data <- NULL
data <- suppressMessages(fread("data/data.csv"))
#Here, I use data...
return(data)
}
It could not be the same in every case, but I've solved the problem by removing the data.R file on R/ folder.
data.R is a file describing all data presented in the package. I had it since the previous version of my code, that had the data built in, not remote (to be downloaded).
Removing the file solved my problem.
Example of data.R:
#' Name_of_the_data
#'
#' Description_of_the_Data
#'
#' #format A data frame with 10000 rows and 2 variables:
#' \describe{
#' \item{Col1}{description of Col1}
#' \item{Col2}{description of Col2}
#' }
"data_name"
No need to remove data.R in /R folder, you just need to decorate the documentation around the NULL keyword as follow:
#' Name_of_the_data
#'
#' Description_of_the_Data
#'
#' #format A data frame with 10000 rows and 2 variables:
#' \describe{
#' \item{Col1}{description of Col1}
#' \item{Col2}{description of Col2}
#' }
NULL
Generally, this happens when you have a mismatch between the names of one of the rda files in data folder and what is described in R/data.R.
In this case, the data reference in the error message is for data.csv, not the data folder. You need to have rda files in the data folder of a R package. If you want to download csv, you need to put them in inst/extdata.
This being said, you might want to consider using tempdir() to save those files in the temp folder of your session instead.
There's 3 things to check:
The documentation is appropriately named:
#' Name_of_the_data
#'
#' Description_of_the_Data
#'
#' #format A data frame with 10000 rows and 2 variables:
#' \describe{
#' \item{Col1}{description of Col1}
#' \item{Col2}{description of Col2}
#' }
data
That the RData file is appropriately named for export in the data/ folder.
That the RData file is loaded with the name data.
If documentation (1) is A, the Rdata file is A.RData (2), but the object (when loaded with load() ) is named B- you're going to get this error exactly.
The problem probably is because how your object was named when you save it.
Suppose I load a file a called it "d", then I save it (as is suggested) with save in the data/ directory as "data":
save(d, file = "data/data.rda")
Then you will run the clean and install package and you will get the following error:
Error: 'data' is not an exported object from 'namespace:YourPakage'
Looks like it does not matter how you declare your object in the roxygen documentation. I guess you must name your OBJECT with the same name you are going to save it and loaded it.
For example, load your dataset as "pib" object, then save as "pib.rda" and declare in roxygen "loadData.R" (for example) your "pib".
#' Datos del PIB
#'
#' #docType data
#'
#' #usage data(pib)
#'
#' #format An object of class ...
#'
#' #keywords datasets
#'
#' #references ----
#'
#' #source ----
#'
#' #examples
#' data(pib)
"pib"
I had this issue because I copied the .rda file into the R\data folder.
Issue was resolved by using usethis::use_data(DataObject) which automatically takes the raw-data (DataObject) file and adds it to the R\data folder within the R package directory.
When I was stumped by the error
Error: 'data' is not an exported object from 'namespace:my_package'
MrFlick's comment above saved me. I had simply changed the name of an .rda file in my data folder. I was unable to get devtools::document() to recreate the NAMESPACE file. The solution was to re-save the data into the .rda file. (Of course I should have remembered that when one loads from an .rda file the name of the R object(s) has nothing to do with the name of the .rda file so renaming the .rda file doesn't do much.)
I spent a few hours trying to fix this. Finally got it to work.
Notes:
Data files have to be of type "rda". "rds" won't work.
File names had to be lower case.
NULL in documentation name didn't work for me. Had to be a lower case string.
In general, it seems the same error message is caused by several things. Anything the checker doesn't like related to data files, it will issue the same error. Hard to debug under those circumstances.
I will add another trap. Working in RStudio
I have assigned a string to MyString and saved in the data folder of my package project:
save(MyString, file="./data/MyString.RData")
My ./R/data.R file contains documentation for this:
#' A character string
#'
"MyString"
This works. But you must use one file per object and not do save(X, Y, Z, file="BitsAndPieces.RData") and then document BitsAndPieces. If you do then you will get the error of this question. Which I did, needless to say.
I had the same error and I would be able to overcome the error as follows.
The data file located at: data/df.RData
The R documentation file located at: R/df.R
I have created the df.RData file by importing the df.txt file into R and using the save() function to create the .RData file. I used the following code block to create .RData file.
x=read.table("df.txt")
save(x,file="df.RData")
Then after running the RCMD check I get the same error as df is not an exported object from namespace "package name".
I have overcome the error by change the variable name of the df.RData file as
df=read.table("df.txt")
save(df,file="df.RData")
Restarting the session solved the problem for me. Somehow the environment was empty and after restart all objects were back, hence solving the diff.
I had the same issue with one of my packages, and I needed to add
LazyData: true
to my DESCRIPTION file.
I had this problem, even renaming the variables and uninstalling the probematic packages didn't work.
I did:
I was trying to carry out the process in a session (tab) of R that was already in use previously, where the terra package had already been requested. This session is not saved, but was being automatically saved to an image in ~/.RData every time Rstudio was closed. So every time I opened Rstudio it retrieved that section (image) and reloaded the previous state causing the conflict between packages.
I solved it by creating a new blank rmarkdown and closing all previously opened sessions, as well as clearing all saved data in the Rstudio "Global environment".
I encountered this "Error: 'weekly' is not an exported object from 'namespace:ISLR'' when I was trying the following:
library(ISLR)
w <- ISLR::weekly
The problem is somehow fixed by changing it to:
w = ISLR::weekly
The = sign made all the difference here.

Trouble loading R package data

I've developed a R package but for some reason the dataset that goes with the package is not being loaded properly when the package is Roxygenised and installed. I have a .R script in the R folder of the package that looks like this
#' Score Card
#' #docType data
#' #name scoreCard
#' #aliases scoreCard
#' #format An object of class \code{data.frame} with 119 rows and 3 columns.
#' \describe{
#' \item{Category}{The Category for which an observation is made}
#' \item{Observation}{The possible responses given for each category}
#' \item{Score}{The score allocated against a response for each category}
#' }
#' #source Internal
#' #usage scoreCard
#' #keywords datasets
NULL
This creates an .Rmd file for the dataset when Roxygenise is called but when I try to call the data set using packageName::scoreCard it states 'scoreCard' is not an exported object from 'namespace:packageName'. Can anyone spot what I might have done wrong in the above script or any other ideas of what might be going wrong, I'm at a bit of a loss? (The data set is stored in the data folder of the package as per normal.) Afraid I can't share the data or package sorry.
Right I didn't realise I needed LazyData: true in my description file. Should have read this more carefully: http://r-pkgs.had.co.nz/data.html

Cannot find documented data

I'm creating a new package and documenting lookup tables stored in the data/ folder using an R script as per the instructions here http://r-pkgs.had.co.nz/data.html. I have two .rda tables, one for looking up the product based on a product code and another looking up the category based on the category code. (e.g. data/productlookup.rda)
Here's an except of my documented data which is stored in the R/ folder of the package.
#' ProductDecodes: Extract info from Product Codes
#'
#' This package contains functions for the extraction of information
#' from Product codes.
#'
#' #docType package
#' #name productDecoding
NULL
#' Product lookup
#'
#' #source internal
"productlookup"
#' Category lookup
#'
#' #source internal
"categorylookup"
However, when I come to roxygenise the package I get an error stating the .rda tables cannot be found.
Error in get(name, envir = env) : object 'productlookup' not found
The error doesn't occur when the tables are loaded into the global environment first. What I want to know is whether it is possible to roxygenise the package without having to load the .rda into the global environment first? I don't understand why roxygenise needs the lookup tables to be in the global environment in order to find them. Any help/explanation of why this error is occurring would be appreciated.
This is what I usually do and I've just tested successfully with roxygen2 5.0.1:
#' Product lookup
#'
#' #format A data.frame with 200 rows and 2 variables:
#' \itemize{
#' \item prod: product name
#' \item val: product value in US$
#' }
#'
#' #source internal
#' #name productlookup
NULL
Of course, the resulting help page needs more information.

Resources