Change default code highlight in Sphinx documentation - r

I'm writing a small R guide with Sphinx.
The default code highlighting language is Python, so I have to add at the beginning of the file .. highlight:: r in order to set R as the default code language.
Is there a way to change this option in the conf.py file?
I tried to edit pygments_style = 'sphinx' but without success..

Have you tried to set the conf.py var :
highlight_language = 'r'
(or 'R' ?)
https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-highlight_language
However I don't see R language support in Pygments language list http://pygments.org/languages/

Related

how to change tesseract config to recognize § and apply with pdftools::pdf_ocr_text in R?

I am using pdftools in R to extract text from both scanned and text based PDF files. One problem is with the § character. This is not recognized by tesseract.
I looked at the following links:
CRAN tesseract package vignette
SO link of a similar question
and this github page
And I tried the following:
I found the configuration files using tesseract_info() and edited the digits file under configs.
The digits file content was like this:
tessedit_char_whitelist 0123456789.
After editing it looks like this:
tessedit_char_whitelist 0123456789-$§.
This did not change anything at all, I am still not able to extract §. They still appear as 8.
After the 1st step failed, I tried the following:
filepng <- pdftools::pdf_convert(filePathPDF, dpi = 600)
specs <- tesseract("deu", options = list(tessedit_char_whitelist = "1234567890-.,;:qwertzuiopüasdfghjklöäyxcvbnmQWERTZUIOPÜASDFGHJKLÖÄYXCVBNM#߀!$%&§/()=?+"))
text <- tesseract::ocr(filepng, engine = specs)
This one failed too. I am by no means an expert on OCR and tesseract has room for improvements when it comes to documentation.
How can I add § to the list of characters to be recognized in the right way, so that it applies?
Update
The following works to recognize §, when I remove language from the argument list:
charlist <- tesseract(options = list(tessedit_char_whitelist = " 1234567890-.,;:qwertzuiopüasdfghjklöäyxcvbnmQWERTZUIOPÜASDFGHJKLÖÄYXCVBNM#߀!$%&§/()=?+"))
text <- tesseract::ocr(filepng, engine = charlist)
But this time, I am losing German umlauts. I cannot find out how I can specify the language and the char_whitelist at the same time. According to the documentation, tesseract() accepts language argument and options argument. But this does not seem to work. Any ideas?
Update:
I tried using tesseract in command line (MacOS Catalina 10.15.7).
I converted a scanned PDF file first to an image then used this:
tesseract fileConverted.tiff fileToText
It creates fileToText.txt. It does recognize §. All of them are correctly recognized. But German umlauts are not recognized correctly, since I did not specify language at all. When I use the same command with the language argument
tesseract fileConverted.tiff fileToText -l deu
German umlauts are recognized properly but § is not.
The digits config file I changed is here:
/usr/local/Cellar/tesseract/4.1.1/share/tessdata/configs
My understanding is: it is not a problem specific to R, but it occurs with tesseract itself. Setting tessedit_char_whitelist and the language at the same time does not seem to be possible or I am missing something horribly.
As said above, tesseract 4 does not support setting a whitelist. To go around that problem, you could use the command-line switch. You need to set OCR Engine mode to the "Original Tesseract only" with --oem 0 then use -c tessedit_char_whitelist=abc... to pass your whitelist directly via the command-line.
Overall, it should look something like this :
tesseract fileConverted.tiff fileToText --oem 0 -l deu -c tessedit_char_whitelist=0123456789-$§

Notepad ++ not recognizing R language

I was reviewing a .rmd file of a colleague for an R script today and realized the style functionality was not working. I am new to this way of R coding and so wanted it in what I am used to reading and so renamed the script a .r so that the style functionality would be enabled.
The style format appeared but then I realized at some point it was no longer working again. Now I can't get Notepad++ to recognize my R scripts (those with a .r in the file naming convention) and can't figure out why.
Should R be a choice in the language selection drop down menu?
When I go to -> Settings -> Style Configurator the recent Style is selected but the colors for the comments (for instance) don't match. They should be green and they are not. I am using Notepadd++ 6.5.5. I would update but our IT department makes it very difficult to do this.
Remove R key from REBOL section in the file ‘langs.xml’ (path: user\AppData\Roaming\Notepad++\langs.xml )
Line 301 by default is:
<Language name=“rebol” ext=" r reb" commentLine=";" commentStart="" commentEnd="">
The messed up part is ext=" r reb". It should just say ext=“reb”
Change that, save the file and you are all set.
See:
https://notepad-plus-plus.org/community/topic/14363/r-file-association-messed-up
1.Check if checkbox Enable Global foreground colour is off in settings -> -> Style Configurator -> global stylers -> global override .
2.Try changing color theme.
3.Check R's lang syntax colors in Style Configurator.

Change default skeletons for RStudio

RStudio has a wonderful set of skeletons for packages and Rmd documents. But, I'd like to know if it's possible to change the defaults to a "skeleton" of your own design. If, like me, you package your research for yourself/clients, you quickly find yourself deleting and copying the same work over and over.
I suppose there are two related questions here:
Can you change the default package skeleton?
Can you change the default Rmd skeleton?
There is no supported way to do this. However, the skeletons are stored as ordinary files in your filesystem, so there's nothing stopping you from modifying them. For instance, if you're on the Mac, this file provides the default Rmd skeleton:
/Applications/RStudio.app/Contents/Resources/resources/templates/r_markdown_v2.Rmd
On Windows, it's here:
C:\Program Files\RStudio\resources\templates\r_markdown_v2.Rmd
I didn't realize it then, but I was actually looking for a custom format. The details of which are documented extensively on the rmarkdown rstudio site.
http://rmarkdown.rstudio.com/developer_custom_formats.html

Auto-format R code in RStudio

Is there any possibilities for auto-formatting code in RStudio?
I found this, but it is not connected with RStudio.
Also it is desirable that it be customizable formatting.
update: June-22-2018
Thank you #Lorenz#kirill#yuhi for styler package. I have used it for a while. The simplest after installation of the package is to just use
scroll to Addin --> style active file
Customization options via interface would give some control on styling we prefer.
Rstudio can now format code to look neat. Select the lines of interest and then navigate to Code >> Reformat code or use the keyboard shortcut Ctrl + Shift + A.
or just run the style directory command to style all the files in the directory.
styler::style_dir()
update:
This is a good way to re-structure the code, but it breaks at , for the elements of a vector. For few this is OK, but with many elements passed to a vector, it is overkill:
x <- c(
"p.G12C",
"p.F121S",
"p.P124S",
"p.P124L",
"p.E13D",
"p.E203K",
"p.Q209P",
"p.Q209P",
"p.Q209L"
)
Update: R-Studio Version 0.99.893
There is a new feature that has been added by R-studio Addins. Part of this addins, now you can add #yuhi formatR as an Addin. This is more tidy and cleaner way to structure code than the built-in code >> Refromat code. However, the drawback with the Addin Reformat R Code it throws an error for Rshiny codes.
First CTRL+A, then CTRL+SHIFT+A.
If on a Mac, use ⌘ instead of CTRL.
Go to the Code menu and select
Reindent Lines
Under my OS, this has the shortcut Ctrl + I.
The package styler can format R code and you can access it via a RStudio Addin that allows formatting the active file, the highlighted code, the package and more. A distinguishing feature is its flexibility, as the transformation of code according to a style guide is done separately from specifying the style guide. This allows styling according to arbitrary style guide. As of version 1.2.0, this also holds for the Addin.
We've implemented the tidyverse style guide while allowing for quite some flexibility in styling. Also, the pipe, tidyeval syntax and more is handled properly. You can read an introduction in this blog post.
If you don't want to follow the tidyverse style guide, you can have a look at the vignette 'Customizing Styler' that describes how you can implement an arbitrary style guide. In this vignette, I show how you can implement a style guide consisting of one rule: Always break the line before {. Hope that helps.
Disclosure: I am the maintainer of styler.
Use the formatR library (see documentation):
install.packages("formatR")
library("formatR")
tidy_eval("filename.R")
To add to the great answers that were already given: You can use the styler package in combination with the shrtcts package to enable Format on Save which is still not officially supported by RStudio.
Use the command shrtcts::edit_shortcuts() in the RStudio Console to open the file where you define your custom shortcuts.
Paste the following code inside that file (set your preferred keybinding in the #shortcut line).
#' Format on Save
#'
#' #description
#' Format Document with styler Package and Save Document.
#' #interactive
#' #shortcut Cmd+S
function() {
# format only .R and .Rmd files, but save all file types
file_type <- tools::file_ext(rstudioapi::getActiveDocumentContext()$path)
if (file_type %in% c("R", "Rmd", "qmd")) {
styler:::style_active_file() |>
capture.output() |>
invisible()
}
rstudioapi::documentSave() |>
capture.output() |>
invisible()
}
This solution uses the native pipe |> and thus requires R 4.1.
You can of course just define separate variables in each line or use the magrittr pipe if you use earlier versions of R.
Use the command shrtcts::add_rstudio_shortcuts(set_keyboard_shortcuts = TRUE) in the RStudio Console to add the new shortcut with its assigned keybinding. Then restart RStudio.
With this configuration pressing Cmd+S formats the active .R or .Rmd document with the styler package and saves the formatted version afterwards.
Files of all other types are saved without formatting, but you could easily extend the code above with a package that formats e.g. .md or .py files as well.
There exist cases where this approach code does not have the desired effect, for instance it does not work for new Untitled files or when your current R session is busy.

Referencing figures with numbers in Sphinx and reStructuredText

When writing RST that will be processed with Sphinx, I can't get Sphinx LaTeX output to use figure numbers when referencing figures. For instance, this code:
The lemmings are attacking, as can be seen in :ref:`figlem`.
.. _figlem:
.. figure:: _static/lemming_invasion.*
They're coming!
Will be converted into this:
The lemmings are attacking, as can be
seen in They're coming!
/image goes here/
Figure 1.1: They're coming!
But what I want is the "standard" LaTeX way of referencing figures, like this:
The lemmings are attacking, as can be
seen in Figure 1.1
How do I achieve this? The code I'm currently using is what the Sphinx manual recommends, but it doesn't produce the output I want.
In the latest versions of Sphinx (1.3+), numbering figures and referencing them from text got a bit easier as support for it is now built-in.
In your text, you can do something like:
.. _label:
.. figure:: images/figure.*
At :numref:`label` you can see...
The end result should be something like "At Fig 1.1 you can see...". This technique works both with the default HTML output and the LaTeX output.
In your conf.py file, make sure to set the flag numfig = True. There are also configuration options for the references' text format (numfig_format and numfig_secnum_depth).
References:
http://www.sphinx-doc.org/en/stable/config.html#confval-numfig
https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#cross-referencing-figures-by-figure-number
The numfig extension does exactly this. I tried it and it worked for me.
To expand on the accepted answer, you can quickly get this set up as follows. Put the numfig.py file in your source directory. Then open conf.py and uncomment the line that says
sys.path.insert(0, os.path.abspath('.'))
Then add 'numfig' to the extensions list.
To use in your rst document, first label your figure (e.g., fig-main):
.. _fig-main:
.. figure:: main.png
This is the figure caption.
Finally, you can reference its figure number using the :num: directive, like this:
Refer to the main figure (Figure :num:`fig-main`).
I think referencing Figures is not yet implemented in reST, but here is workaround http://article.gmane.org/gmane.text.docutils.user/5623 that gets you closer.
One can use raw latex code, inline. For the example above, a role for raw latex code is first defined and than used to refer to the figure with the \ref{} latex command, and to set a label to the figure with the \label{} latex command.
The following should work:
.. role:: raw-latex(raw)
:format: latex
The lemmings are attacking, as can be seen in :ref:`figlem`
on figure :raw-latex:`\ref{pic:lem}`.
.. _figlem:
.. figure:: _static/lemming_invasion.*
They're coming! :raw-latex:`\label{pic:lem}`
Note that the \label{} command will appear inside the caption in the tex file, but it is still acceptable, at least by pdflatex. Also note that there should be at least one space before :raw-latex.

Resources