Dynamic Rstudio Code Snippet - r

I tend to use a lot of line breaks in my code like the following:
# Data =========================================================================
Where the entire comment is always 80 characters long (including the hashtag). What I would like to do is write a code snippet for Rstudio that will insert the hashtag, then a space, then allow the user to type out a series of words, then insert another space, and finally fill in a bunch of "=" until the 80 character limit is reached.
I'm not familiar with how snippets work at all so I'm not sure how difficult this is.
I have this much:
snippet lb
# ${1:name}
but I have no idea how to add a dynamic number of "=" signs. Also, lb = linebreak.

You can't do this with snippets, unfortunately; a snippet is a text template that contains fixed text with slots for user-inserted text.
There is a command built into RStudio to do something very similar, however; from the Code menu, choose Insert Section (or Ctrl+Shift+R). This will do exactly what you're describing, with two small differences:
The line will extend to 5 characters before the print margin (you can adjust the print margin in Tools -> Global Options -> Code.
The line is composed of - rather than = characters.
One advantage to sections marked in this way is that you can use them to fold and navigate inside the file (look at the editor status bar after adding one).

You can use the rstudioapi (which can return column position) inside the snippet to get something like what you want.
Below is a snippet I use called endhead. I use it by commenting my header title and then applying the snippet, eg:
# Section name endhead
which results in:
# Section name -----------------------------------------------------------------
snippet endhead
`r paste0(rep.int("-", 88 - rstudioapi::primary_selection(rstudioapi::getActiveDocumentContext())$range$start[2]), collapse = "")`

You can write a snippet to manipulate text (somewhat). I wrote the snippet below to do something similar to what you want to do. I'm still ironing out the issues (just asked this question).
snippet comm
`r paste0(
"#######################################><###################\n## ",
date(),
" -------------------------------\n## ",
eval(
paste0(
gsub(
".{1,51}\\s?\\K\\b",
"\n## ",
gsub("\\.", " ", paste0(text)),
perl = T
)
)
),
"###################################><###################\n"
)`
I think if you write an R code snippet using an anonymous function that accepts text input via $$, counts the nchar in the text, calculates the number of -'s needed at the end, and then uses eval(paste0()) to insert the comment you should be able to make it work. I'll post a comment or answer here if I figure it out. Please do the same on my question if you get it to work. Thanks. (P.S. Go Badgers!)

Inspired by nick's answer above I designed two snippets that allow the user to choose what level section to insert.
The first will fill-in the rest of the line with #, =, or -.
snippet end
`r strrep(ifelse(substr("$$", 1, 1) %in% c("-", "="), substr("$$", 1, 1), "#"), 84 - rstudioapi::primary_selection(rstudioapi::getActiveDocumentContext())$range$start[2])`
Just specify the character you want to use after end (will default to # if nothing or any other character is given). For example:
## Level 1 Header end<shift+tab>
## Level 2 Header end=<shift+tab>
## Level 3 Header end-<shift+tab>
end<shift+tab>
end=<shift+tab>
end-<shift+tab>
Produces the following lines:
## Level 1 Header ##############################################################
## Level 2 Header =============================================================
## Level 3 Header -------------------------------------------------------------
################################################################################
===============================================================================
-------------------------------------------------------------------------------

Similarly to what Josh was suggesting, the following snippet uses th $$ notation to pass the text following the snippet as described here.
snippet !
`r paste("##", substr("$$", 4, nchar("$$")), strrep(substr("$$", 2, 2), 79-nchar("$$")))`
Again this allows user to select the section level (#, =, or -). The first character after !# should be the header level character you want followed by a space and the header text. For example:
!## Level 1 Header<shift+tab>
!#= Level 2 Header<shift+tab>
!#- Level 3 Header<shift+tab>
Produces the following lines:
## Level 1 Header ##############################################################
## Level 2 Header ==============================================================
## Level 3 Header --------------------------------------------------------------
I prefer the end snippet above because it is more robust and only allows the characters #, =, or - to be inserted where as ! will allow anything, but it is shorter and, I think, easier to understand than calls to the rstudioapi.
!loon<shift+tab>
## n ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

Related

How to add a line in the pdf generated pdf by `exams` using the `exams2nops`?

We are generating a pdf through exams2nops using the items in blocks of choice, we would like to delimitate the blocks in the PDF adding a horizontal line after the last exercise of each block. Having that in mind we added a ***, ---, <hr/> however the behavior was always the same:
I would like a single line without adding the exercise number that's next in the exam:
It is not so easy to solve this by putting the horizontal line into the exercise file. The reason is that the line is needed after the answerlist but the answerlist is not formatted in the exercise but by exams2nops.
A workaround is to tweak the definition of the {question} environment in the LaTeX template used by exams2nops. By default this is simply:
\newenvironment{question}{\item}{}
Where \item is executed at the beginning of the {question} and nothing at the end of it. Changing this by
\renewenvironment{question}{\item}{\hrulefill}
would insert a horizontal line after every question. If you just want it after selected questions you need to insert if/else statements for certain enumerated items. For example, for inserting the horizontal rule after the second item only, you can redefine:
\renewenvironment{question}{\item}{\ifnum\value{enumi}=2 {\hrulefill} \else {} \fi}
Thus, you get the enumi counter from the {enumerate} environment that you use and compare it with 2. If true, you insert the horizontal line, and otherwise you do nothing.
Adding escapes for the backslashes you can pass this re-definition to exams2nops through the header argument:
exams2nops(c("swisscapital", "switzerland", "tstat2", "deriv2"),
header = "\\renewenvironment{question}{\\item}{\\ifnum\\value{enumi}=2 {\\hrulefill} \\else {} \\fi}")
The resulting output is:

How to extract sections of specific text from PDF files into R data frames? Complex

Please any advice will be appreciated.. This is time sensitive. I have PDF reports that are mostly blocks of text. They are long reports (~50-100 pages). I'm trying to write an R script that is capable of extracting specific sections of these PDF reports using start/stop positional strings. NOTE: Reports vary in length. Short example:
DOCUMENT TITLE
01. SECTION 1
This is a test section that I DONT want to extract.
This text would normally be much longer... Over 100 words.
Sample Text Text Text Text Text Text Text Text
02. SECTION 2
This is a test section that I do want to extract.
This text would normally be much longer... Over 100 words.
Sample Text Text Text Text Text Text Text Text
...
11. SECTION 11
This is a test section that I do want to extract.
This text would normally be much longer... Over 100 words.
Sample Text Text Text Text Text Text Text Text
...
12. SECTION 12
This is a test section that I DONT want to extract.
This text would normally be much longer... Over 100 words.
Sample Text Text Text Text Text Text Text Text
...
So the goal in this example, is to extract the paragraph below Section 2 and store it as a field/data point. I also want to store Section 11 as a field/data point. Note the document is in PDF format
I have tried used pdftools, tm, stringr, I've literally spent 20+ hours searching for solutions and tutorials on how to do this. I know it is possible as I have done it using SAS before...
Please see code below, I added comments with questions. I believe RegEx will be part of the solution but i'm so lost.
# Init Step
libs <- c("tm","class","stringr","testthat",
"pdftools")
lapply(libs, require, character.only= TRUE)
# File name & location
filename = "~/pdf_test/test.pdf"
# converting PDF to text
textFile <- pdf_text(filename)
cat(textFile[1]) # Text of pg. 1 of PDF
cat(textFile[2]) # Text of pg. 2 of PDF
# I'm at a loss of how to parse the values I want. I have seen things
like:
sectionxyz <- str_extract_all(textFile, #??? )
rm_between()
# 1) How do I loop through each page of PDF file?
# 2) How do I identify start/stop positions for section to be extracted?
# 3) How do I add logic to extract text between start/stop positions
# and then add the result to a data field?
# 4) Sections in PDF will be long sections of text (i.e. 100+ words into a field)
NEW------
So I have been able to:
-Prep doc correctly
-Identify the correct start/stop patterns:
length(grep("^11\\. LIMITS OF LIABILITY( +){1}$",source_main2))
length(grep("Applicable\\s+[Ll]imits\\s+[Oo]f",source_main2))
pat_st_lol <- "^11\\. LIMITS OF LIABILITY( +){1}$"
pat_ed_lol <- "Applicable\\s+[Ll]imits\\s+[Oo]f"
The length(grep()) statements verify only 1 instance is being found. From here I am kind of lost based on how to use gsub or similar to extract the portion of data I want. I tried:
pat <- paste0(".*",pat_st_lol,"(.*)",pat_ed_lol,".*")
test <- gsub(".*^11\\. LIMITS OF LIABILITY( +){1}$(.*)\n",
"Applicable\\s+[Ll]imits\\s+[Oo]f", source_main2)
test2 <-gsub(".*pat_st_lol(.*)\npat_ed_lol.*")
So far, little progress, but progress anyways.
Provided you can come with a systematic to identify the sections you need, you could, as you indicated, use Regex to extract the text you want.
In your above example, something like gsub(".*SECTION 11(.*)\n12\\..*","\\1",string) ought to work.
Now you could define patterns dynamically using paste and iterate through all files. Each result can then be saved in your data.frame, list,....
Here is a brief more detailed explanation of the pattern:
Firstly, .* is way of matching "anything". If you want to match digits you can use \\d or equivalently [0-9]. Here is a short intro to Regex in R (which I found to be quite useful) where you can find several character classes.
.* at the edges of the pattern means that there can be text before/after
(.*) denotes the content we want (so here matching any content as .* is used). Basically it means extract "anything" between SECTION 11 and 12.
\\. means the dot and \n is the "newline" metacharacter (as before "12.", a new line is started)
In Regex you can create groupings within your pattern using the brackets, i.e. gsub(".*(\\d{2}\\:\\d{2})", "\\1","18.05.2018, 21:37") will return 21:37, or gsub("([A-z]) \\d+","\\1","hello 123") will give hello.
Now the second argument in gsub can and is often used to provide a substitute, i.e. something to replace to matched pattern with. Here however, we do not want any substitue, we want to extract something. \\1 means extract the first grouping, i.e. what it inside the first brackets (you could have multiple groupings).
Finally, string is the string from which we want to extract, i.e. the PDF file
Now if you want to perform something similar in a loop you could do the following:
# we are in the loop
# first is your starting point in the extraction, i.e. "SECTION 11"
# last is your end point, i.e. "12."
first <- "SECTION 11" # first and last can be dynamically assigned
last <- "12\\." # "\\" is added before the dot as "." is a Regex metachar
# If last doesn't systematically contain a dot
# you could use gsub to add "\\" before the dot when needed:
# gsub("\\.","\\\\.",".") returns "\\."
# so gsub("\\.","\\\\.","12.") returns "12\\."
pat <- paste0(".*",first,"(.*)","\n",last,".*") #"\n" is added to stop before the newline, but it could be omitted (then "\n" might appear in the extraction)
gsub(pat,"\\1",string) # returns the same as above

R indent output

is it possible to indent output in R?
e.g.
cat("text1\n")
indent.switch(indent=4)
cat("random text\n")
print("another random text")
indent.switch(indent=0)
cat("text2\n")
resulting in
text1
random text
another random text
text2
I searched for this a few months ago, found nothing and am now searching again.
My current idea is to "overwrite" (I forgot the special term) the functions cat and/or print with an additional argument like:
cat("random text", indent=4)
Only I'm stuck with this and I dont like this procedure very much.
Any ideas?
Edit:
I should be more particular, nevertheless thank you for the \t (omg, i totally forgot this -.-) and that I can format it inside cat.
The given solutions work, but only solve my second-choice-path.
A switch as shown in my first codeexample does not exist I suppose?
My problem is that I have parts of a bigger program which have multiple subscripts, and the output of each subscript should be indented. This is absolutely possible with the "\t" or just blanks inside cat() but has to be done in every command, which I dont like very much.
Solution
I used Chris C's code and extended it in a very easy way. (Thank you very much Chris!)
define.catt <- function(ntab = NULL, nspace=NULL){
catt <- function(input = NULL){
if(!is.null(ntab)) cat(paste0(paste(rep("\t", ntab), collapse = ""), input))
if(!is.null(nspace)) cat(paste0(paste(rep(" ", nspace), collapse = ""), input))
if(is.null(ntab) && is.null(nspace)) cat(input)
}
return(catt)
}
The same way you used \n to print a newline, you can use \t to print a tab.
E.g.
cat("Parent level \n \t Child level \n \t \t Double Child \n \t Child \n Parent level")
Evaluates to
Parent level
Child level
Double Child
Child
Parent level
As an alternative, you can create a derivative of cat called catt and alter options depending on the script. For example.
define.catt <- function(ntab = NULL){
catt <- function(input = NULL){
cat(paste0(paste(rep("\t", ntab), collapse = ""), input))
}
return(catt)
}
You would then set catt with however many tabs you wanted by
catt <- define.catt(ntab = 1)
catt("hi")
hi
catt <- define.catt(ntab = 2)
catt("hi")
hi
And just use catt() instead of cat().
You may consider the very versatile function capture.output(...), which evaluates the '...' list of expressions provided as main input arguments, and stores the text output (as if it would be displayed in the console) into a character vector instead. Then, you simply have to modify the strings as desired: here you want to add some leading spaces to each string. Finally, you write the strings to the console.
These can be done all in one line of nested calls. For example:
writeLines(paste(" ", capture.output(print(head(iris))), sep=""))
I therefore recommend you all to read the help of the capture.output function, and then try to use it for various purposes. Indeed, since the main input has the usual flexibility of the '...' list-like structure, you are free to include, for instance, a call to one home-made function, and thus do almost anything. As for indentation, that is simply done with paste function, once the former has done its magic.

How to add a space to an object name in R

Piston_Rings<-diameter[1:25,]
I want my quality control graph NOT to have the underscore in the object name.
At the moment there is an underscore (not a hyphen) in that object name. It is possible to construct objects whose names have spaces in them but in order to access them you will then always need to use backticks in order to get the interpreter to understand what you want:
> `Piston Rings` <- list(1,2)
> `Piston Rings`[[1]]
[1] 1
> `Piston Rings`[[2]]
[1] 2
The problem you incur is cluttering up your code, at least relative to obeying the usual conventions in R where a space is a token-ending marker to the parser. Hyphens (at least short-hyphens) are actually minus signs.
If on the other hand you only want to use a modified version of a name that contains an underscore as the title for a graph, then try something like this:
Piston_Rings <- list() # just for testing purposes so there will be an object.
plot( 1:10,10:1, main = sub("_", " ", quote(Piston_Rings)) )
#BondedDust's answer is correct, but (guessing, since you haven't been very specific) a simpler way to get what you want is just to specify xlab or ylab arguments to the plot() function. Let's say you have variables stuff (x) and Piston_Rings (y). If you just
plot(stuff,Piston_Rings)
then the plot will have "Piston_Rings" as the y-axis label. But if you
plot(stuff,Piston_Rings,ylab="Piston Rings")
you'll get the label you want. You can also include lots more information this way:
plot(stuff,Piston_Rings,
xlab="Important stuff (really)",
ylab="Piston Rings (number per segment)")
See ?plot.default for many more options.

Implementing syntax highlighting for markdown titles in PySide/PyQt

I am trying to implement a syntax highlighter for markdown for my project in PySide. The current code covers the basic, with bold, italic, code blocks, and some custom tags. Below is an extract of the relevant part of the current code.
What is blocking me right now is how to implement the highlighting for titles (underlined with ===, for the main title, or --- for sub-titles). The method that is used by Qt/PySide to highlight the text is highlightBlock, which processes only one line at a time.
class MySyntaxHighlighter(QtGui.QSyntaxHighlighter):
def highlightBlock(self, text):
# do something with this line of text
self.setCurrentBlockState(0)
startIndex = 0
if self.previousBlockState() != 1:
startIndex = self.blockStartExpression.indexIn(text)
while startIndex >= 0:
endIndex = self.blockEndExpression.indexIn(
text, startIndex)
...
There is a way to recover the previousBlockState, which is useful when a block has a defined start (for instance, the ~~~ syntax at the beginning of a code-block). Unfortunately, there is nothing that defines the start of a title, except for the underlining with === or --- that take place on the next line. All the examples I found only handle cases where there is a defined start of the expression, and so that the previousBlockState gives you an information (as in the example above).
The question is then: is there a way to recover the text of the next line, inside the highlightBlock? To perform a look-ahead, in some sense.
I though about recovering the document currently being worked on, and find the current block in the document, then find the next line and make the regular expression check on this. This would however break if there is a line in the document that has the exact same wording as the title. Plus, it would become quite slow to systematically do this for all lines in the document. Thanks in advance for any suggestion.
If self.currentBlock() gives you the block being highlighted, then:
self.currentBlock().next().text()
should give you the text of the following block.

Resources