checking DESCRIPTION meta-information ... NOTE - r

I am developing a package in R and when I run devtools::check() I am getting the following note.
checking DESCRIPTION meta-information ... NOTE
Malformed Description field: should contain one or more complete sentences.
I am not using my name of the package or the word package in the description. I am also using complete sentence for the description yet I am getting this NOTE repeatedly. So I am wondering what does a complete sentence mean in this case.

Try adding periods to the ends of the sentences, that is, turn your existing Description field into Functions to analyze methylation data can be found here. Highlight of this workflow is the comprehensive quality control report.
I've always found it a bit curious that the Description field shouldn't contain the word "package" or the name of the package ...and also requires complete sentences (which require both a subject and a verb, leading to very grammatically awkward constructs if you want your introductory sentence to have a subject without using the no-no words.) I'm pretty sure that grammatically speaking, very few packages have truly "complete" sentences in their Description fields.
I'm pretty sure that it just checks for capital letters and periods, and I'd avoid any special characters, just to be on the safe side.
The Description field of my package on CRAN is Reads river network shape files and computes network distances. Also included are a variety of computation and graphical tools designed for fisheries telemetry research, such as minimum home range, kernel density estimation, and clustering analysis using empirical k-functions with a bootstrap envelope. Tools are also provided for editing the river networks, meaning there is no reliance on external software. Definitely not the greatest, but apparently it worked and CRAN liked it!

Adding a full stop symbol/period/dot (.) at the end of the description details will remove this warning.

Related

Should R package functions not include comments?

I'm in the process of creating a small R package containing a set of functions that should be useful in a specialized area of Biology. I currently have the package on GitHub, but want to submit it to CRAN soon. One thing I have noticed when digging around in other packages, is that the code often includes no comments at all (e.g. short comments describing what different parts of the code does), which makes it more difficult to understand. I'm not a programmer or expert in R, so I don't understand why comments are often not included, and Hadley Wickham's "R packages" book makes not mention of this.
Edit: I'm not referring to the object documentation, that one accesses with ?function(), but to comments that are interspersed within the function code, which a normal user wouldn't see, but that could be helpful for people trying to figure out exactly how a function works.
Is there a specific reason to not include comments within the functions of an R package? If so, should I remove all the comments from my code before submitting to CRAN?

What does CRAN mean by "significant notes"?

In CRAN Repository Policy they write:
In principle, packages must pass R CMD check without warnings or significant notes to be admitted to the main CRAN package area.
The term "significant notes" appears a bit vague. Any idea what do they mean by this in practice? Or perhaps the question should be turned, what is an "insignificant note" in R CMD check allowing publication on CRAN?
I have a feeling that "significant notes" might be those they mention indirectly (package size and processing time of examples). I still feel a bit confused by this term. Hadley Wickham explains that each note has to be manually checked by a person and therefore notes should either be eliminated or mentioned in the submission comments. The whole point is to save the time of volunteers, which is "CRAN’s most precious resource", they write. Following this instruction, it would be nice to know which R CMD notes will definitely lead to the rejection of a package.

Italian Stemmer alternative to Snowball

I'm trying to analyze the texts in Italian in R.
As you do in a textual analysis I have eliminated all the punctuation, special characters and Italian stopwords.
But I have got a problem with Stemming: there is only one Italian stemmer (Snowball), but it is not very precise.
To do the stemming I used the tm library and in particular the stemDocument function and I also tried to use the SnowballC library and both lead to the same result.
stemDocument(content(myCorpus[[1]]),language = "italian")
The problem is that the resulting stemming is not very precise. Are there other more precise Italian stemmers?
or is there a way to implement the stemming, already present in the TM library, by adding new terms?
Another alternative you can check out is the package from this person, he has it for many different languages. Here is the link for Italian.
Whether it will help your case or not is another debate but it can also be implemented via the corpus package. A sample example (for English use case, tweak it for Italian) is also given in their documentation if you move down to the Dictionary Stemmer section
Alternatively, similar to the above way, you can also consider the stemmers or lemmatizers (if you havent considered lemmatizers, they are worth considering) from Python libraries such as NLTK or Spacy and check if you are getting better resutls. After all, they are just files containing mappings of root word vs child words. Download them, fine tune the file to your requirement, and use the mappings as per your convenience by passing it via a custom made function.

R package, size of dataset vis-a-vis code

I am designing an R package (http://github.com/bquast/decompr) to run the Wang-Wei-Zhu export decomposition (http://www.nber.org/papers/w19677).
The complete package is only about 79 kilobyte.
I want to supply an example dataset especially because the input objects are somewhat complex. A relevant real world dataset is available from http://www.wiod.org, however, the total size of the .Rdata object would come to about 1 megabyte.
My question therefore is, would it be a good idea to include the relevant dataset that is so much larger than the package itself?
It is not usual for code to be significantly smaller than data. However, I will not be the only one to suggest the following (especially if you want to submit to CRAN):
Consult the R Extensions manual. In particular, make sure that the data file is in a compressed format and use LazyData when applicable.
The CRAN Repository Policies also have a thing or two to say about data files. There is a hard maximum of 5MB for documentation and data. If the code is likely to change and the data are not, consider creating a separate data package.
PDF documentation can also be distributed, so it is possible to write a "vignette" that is not built by running code when the package is bundled, but instead illustrates usage with static code snippets that show how to download the data. Downloading in the vignette itself is prohibited, as the manual states that all files necessary to build it must be available on the local file system.
I also would have to ask if including a subset of the data is not sufficient to illustrate the use of the package.
Finally, if you don't intend to submit to a package repository, I can't imagine a megabyte download being a breach of etiquette.

How to cross-reference an equation in an R help file/roxygen2

I'm in the process of documenting some of my functions for an R package I'm making.
I'm using roxygen markup, though that is largely irrelevant to my question.
I have put equations into my documentation using \deqn{...}. My question is:
Is there a way to cross-reference this equation later on?
For example, in my Rd file:
\deqn{\label{test}
y = mx + b
}
Can I later do something like:
Referring to equation \ref{test}, ...
I've tried \eqref{test}, \ref{test} (which both get "unknown macro" and don't get linked ), and also \link{test} (which complains it can't find function test because it's really just for linking to other functions).
Otherwise I fear I may have to do something hacky and add in the -- (1) and Refer to equation (1) manually within the \deqn etc in the Rd file...
Update
General answer appears to be "no". (awww...)
However, I can write a vignette and use "normal" latex/packages there. In any case, I've just noticed that the matrix equations I spent ages putting into my roxygen/Rd file look awful in the ?myFunction version of the help (they show up as just-about literal latex source). Which is a shame, because they look beautiful in the pdf version of the help.
#Iterator has pointed out the existence of conditional text, so I'll do ASCII maths in the .Rd files, but Latex maths in the pdf manual/vignette.
I'm compiling my comments above into an answer, for the benefit of others.
First, I do not actually know whether or not .Rd supports tagging of equations. However, the .Rd format is such a strict subset of LaTeX, and produces very primitive text output, that shoehorning extensive equations into its format could be a painful undertaking without much benefit to the user.
The alternative is to use package vignettes, or even externally hosted documentation (as is done by Hadley Wickham, for some of his packages). This will allow you to use PDFs or other documentation, to your heart's content. In this way, you can include screenshots, plots, all of the funkiest LaTeX extensions that only you have, and, most significantly, the AMS extensions that we all know and love.
Nonetheless, one can specify different rendering of a given section of documentation (in .Rd) based on the interface, such as text for the console, nice characters for HTML, etc., and conditional text supports that kind of format variation.
It's a good question. I don't know the answer regarding feasibility, but I had similar questions about documenting functions and equations together, and this investigation into what's feasible with .Rd files has convinced me to use PDF vignettes rather than .Rd files.

Resources