Annotating Adobe Reader PDFs with math symbols - math

Many of the math textbooks and other literature I read is in PDF format, so I frequently find myself annotating these with the Adobe Reader comments tool.
I did find a helpful guide here, but sometimes I'd like the option of inserting math symbols, too. Has anyone found a reliable way to insert math symbols, TeX, or other arbitrary formatting into the annotations?
So far, the best I've come up with is to enter the unicode prefixed by "0x" and hit alt+X after it. Maybe with the Adobe javascript SDK you could write a script to shortcut this.

I don't think any of the current commercial editors make this easy, which is too bad. I am sure the vendors monitor this site, so there is hope.
In the meantime, here is a manual workaround.
Use tikz to create your comment boxes. Here are the two examples I found to be most relevant: Boxes and Positioning. Play around with the options to get both the shape and the placement you want. Generate a pdf file from the latex source that contains your comments.
IMPORTANT: if your comments end before the last page of the original document, insert:
\pagebreak{} % create empty page
\thispagestyle{empty} % get rid of page numbers et al
~ % put a space so the page gets generated
before your \end{document}, to get an empty last page. The following command will reuse the last page of your comments document on all subsequent pages of the original document.
Use a recent version of pdftk with the multistamp command to overlay your equations file with your original file like so:
pdftk original.pdf multistamp comments.pdf output out.pdf
Also see this question.

The free (as speech) PDF tool, Okular, supports this functionality by putting latex formula directly between $$...$$.

Related

Edit text/comments of rmarkdown and knitr reports without rerunning code

I love knitr & rmarkdown, but I often find myself in situations where I have a lengthy report that takes some nontrivial amount of time to run. After it's generated, I notice my inevitable typos in text. However, re-knitting everything to just fix a couple typos (just in text, not code) takes a long time and seems avoidable. I was about to start taking a hack at developing my own solution to this, but I'm thinking it's the kind of thing that could already have a mature solution which would likely be more robust than the one I'd build.
I'm wondering if there is solution out there within knitr or third party that would allow me to edit just the text of my reports without rerunning code, generating plots and outputs etc. I know, I can simply edit the generated html text, but then those changes must be replicated in the R/Rmd code that generated it, or they get out of sync. I'm envisioning a function like this:
argument 1: the R/Rmd script with text edits (no code changes)... perhaps a warning is generated when code chunks change
argument 2: the html output file from the last time the R script in argument was knitted without the text edits.
return: the html report (argument 2) updated with the comments in the R/Rmd script (argument 1).
I use the cache option sometimes for large datasets. I toggle eval and echo on and off when developing if I'm just working on the text of my report. However, I'm looking for a function that would take care of all this for me, so one doesn't have to mess with the code and chunk options to make small edits to text.
Here's an interim solution that lets you retain the speed of making changes directly to the rendered text, but you have to do a little work after you're done making changes.
Assuming the following files:
input.knitr is the original Knitr file with text and code integrated.
output.html is the resulting HTML code that has been rendered by Knitr.
Consider making direct text edits to output.html and then running something like Meld visual merge tool:
meld output.html input.knitr
Then manually select the edits in output.html that are new and should be fixed in the original source input.knitr. Tools such as Meld do a pretty good job of aligning the texts so that the chunks and knitted output will appear as large "changes" that, in practice, you would ignore. You would focus on the small changes in the non-chunk sections.

Doxygen auto number formulas

I would like to enter several formulas into my Doxygen documentation. Is there any way to create a label of some type that automatically numbers each formula? Ideally, the automatic numbering will work both in HTML generated output and within Latex generated output from Doxygen. I am looking for something similar to the Caption feature in MS Word.
Example:
You can see the results of my example in Equation 1.1 below.
{Some Formula}
Equation 1.1
{Some other formula}
Equation 1.2
In the example above, the number after "Equation" gets automatically generated. And then I can reference it in the text.
The \anchor feature in Doxygen would allow me to link to the locations. But I don't think that it would generate the auto-numbering correctly.
The option I thought of was to modify my CSS that I use for Doxygen. Currently I already modified it to automatically number my headings. And Latex automatically numbers headings 1-4 already. I could change my CSS to format Heading 4 so that it looks like a right-justified equation label. But I don't know how I can get Latex to use the same formatting.
Any helpful suggestions?
Thanks.
From what I learned, this can not be done in Doxygen. I am now considering two authoring systems to do this:
Doxygen to document the code.
Open Office / Libre Office to document user manuals, which is where the bulk of the equations are.
Both authoring apps will write HTML output. And then I will combine the output with Qt Help Project system.

Lyx - how to create sections without titles

I'm currently looking into lyx because I'm starting to get fed up with writing LaTeX by hand. The problem is, LyX seems to be a bit opiniated about how sections should be written - I'm used to writing
\subsection{}
\subsubsection{}
...
\subsubsection{}
etc., because in the documents I'm writing I don't want titles for my sections. I just want them to be numbered. LyX doesn't seem to like that though, and ends up deleting my sections (and subsections, and subsubsections) when I don't have any text in their titles.
I can just insert a hard space, but this feels kind of weird. I don't understand why lyx wouldn't just let me hit enter and be done with it.
Try to us a 'hard space' as title of the sections, through <Ctrl>+<space>, that should do the trick (it's the keyboard shortcut for LaTeX's "~").
You can insert Evil Red Text (raw LaTeX commands) in LyX. Look for Insert -> TeX Code menu. I just tried it and inserted \subsection{} inside the ERT, and it compiled successfully and showed the section numbers.

How can I retain the initial white space in a line when writing Rd documentation?

In conjunction with trying to find a solution for myself in regards to this question, I find myself plunged into trying to write valid Rd markup. What I want is to add a section named Raw Function Code and put the code of the function under it. I've achieved limited success in this regard by writing a script to modify the Rd files to include
\section{Raw Function Code}{\code{
# some piece of R script will eventally provide this part of the text
}}
However, even if I manually properly spaced text into the .Rd file (using either spaces or tabs), the initial white space of each line seems to get stripped away leaving an undesirable looking function. I've noticed that if I provide a starting character before the white space the white space is retained. However, I did not want to provide a starting character because I'd like people to be able to copy and paste directly from the produced PDF.
I have reviewed parseRd and I know there are three types of text LaTeX-
like, R-like, and verbatim. I have tried to put my function code in \code and in \verb and neither seemed to yield the desired results. What can I do to hold onto my initial white space?
The \section macro contains LaTeX type of text, but as you want to write code, you could use \synopsis macro, i.e.
\synopsis
# some piece of R script will eventally provide this part of the text
}
There is one problem with this though; you cannot give name to this section, it is automatically named as another usage section. Same thing could be achieved by using \examples macro, but now the name of the section is Examples, which is probably even more dubious (not to mention that you probably already have Examples section).
It isn't possible without modifying the usage or examples sections of your Rd code. See Hemmo's answer for a usable workaround. It produces text in the verbatim mode which is sub-optimal, but far better than nothing.
(This answer is set community Wiki in case this state of affairs changes. This result is current as of R-2.15.1)
If you want a super hacky way to do it, you can use \Sexpr to make zero width characters and add spaces between them:
#' first line \cr
#'\Sexpr{"\u200B"} \Sexpr{"\u200B"} \Sexpr{"\u200B"} \Sexpr{"\u200B"} indented line
A warning however - your package will build fine, but R CMD CHECK will throw a fit.

Converting .pdf files to excel (.xls)

A friend of mine doing an internship asked me 2 hours ago if I could help him avoid to do manually 462 pdf file to .xls using free online soft.
I thought of a shell script using unoconv, but I didn't find out how to use it properly, and I am not sure if unoconv can solve this problem since it mainly converts file to pdf, not the reverse thing.
Conversion from PDF to any other structured format is not always possible and not generally recommended.
Having said that, this does look like a one-off job and there's a fair few of them (462).
It's worth pursuing, if you can reliably extract text from most of them and it's reasonably structured. It's a matter of trying to get regular text output across a sample of the PDF's that you can reliably parse into a table structure.
There's plenty of tools around that target either direct or OCR based text extraction, just google around.
One I like is pstotext from the ghostscript suite; the -bboxes option lets me get the coordinates of each word and leaves it up to me to re-assemble the structure. Despite its name it does work on input PDFs. Downside is that it can be a bit flakey and works on some PDF's but not others.
If you get this far, you'd then most likely then need to write a shell-script or program to convert that to a CSV. You can either open this directly via a spread-sheet or look for tools to convert this into XLS.
PS If he hasn't already, get the intern to ask if there's any possible way of getting at the original data that was used to created the PDFs It will save a lot of time and effort and lead to a way more accurate result.
Update An alternative to pstotext is renderpdf.pl command which is included in the Perl CAM::PDF module. More robust, but just reports text (x,y) position, not bounding boxes.
Other responses on a linked question suggest Tabula, too.
https://github.com/tabulapdf/tabula
I tried and it works very well.

Resources