How to create accurate line breaks using nbconvert in Jupyter notebook markdown cell to create pdfs - jupyter-notebook

Thanks for checking my question. I am struggling to use nbconvert to generate the correct layout with accurate line breaks.
Let's say I have the below texts in a Jupyter notebook markdown cell:
B) How far do you agree or disagree with the following statements? (Strongly Disagree to Strongly Agree, 1 to 5)\n
B1. The software used in this course are easy to learn and we got familiar with them fast\n
B2. We have enough technical support to adapt to the usage of the software\n
I would use nbformat.v4.new_markdown_cell, nbformat.write to create multiple notebook, edit the content, then use
!jupyter nbconvert --to pdf --no-input --PDFExporter.latex_count=1 --PDFExporter.dpi=500 \"\"\" + sheet_name + \"\"\".ipynb
to create pdf report.
However, the result will be something like this:
B) How far do you agree or disagree with the following statements? (Strongly Disagree to Strongly Agree, 1 to 5) B1. The software used in this course are easy to learn and we got familiar with them fast B2. We have enough technical support to adapt to the usage of the software
Therefore the final layout of the page is not desirable. Because there will be multiple pdfs, it is not favourable to edit them one by one. By any chance, there is a solution to create the below format?
B) How far do you agree or disagree with the following statements? (Strongly Disagree to Strongly Agree, 1 to 5)
B1. The software used in this course are easy to learn and we got familiar with them fast<br>\n
B2. We have enough technical support to adapt to the usage of the software

Related

Are there any examples of isabelle sources for academic papers?

I’ve completed the better part of a major development in Isabelle, and is wondering how best to go about writing the corresponding academic paper.
From Isabelle sources I can generate a somewhat idiosyncratic take on such a paper. However, the default rendering of theorems, lemmas and definitions seems almost certain to alienate reviewers.
The LaTeX-sugar theories help, but apparently only if I manually restate the entire theory using anti-quotations.
Are there examples of Isabelle developments underpinning publications that I can look to for inspiration on how best to proceed here?
I have done that in the past (for my master's thesis), there is only one case where you should do that: the documentation of Isabelle and the documentation of Isabelle developments (like the AFP).
There are some people who do that (Makarius Wenzel, e.g., https://sketis.net/2019/11), similarly the "Concrete Semantics". However, this is not a great solution.
The reasons not to do that:
The compilation takes much longer than using pdflatex, even if you base your work on an image of your development.
Unless you type LaTeX macros directly, you are much more limited in what you can do (LaTeX wise). And if you do type latex macros directly, you cannot produce HMTL output anymore. So the gain of Isabelle is limited.
Many conferences want to see LaTeX sources and they don't run Isabelle, so you will have to generate LaTeX at some point anyhow (and even possibly do some post-productions effects because Isabelle is not able to do some things).
You rarely want to use the exact theorem of Isabelle (LaTeXsugar can help, but it is not perfect).
What if you write the paper now and discover a typo you want to fix in 5 years? In 5 years, latex will still work. Isabelle2020 probably not.
Do all your co-authors use Isabelle on all their computers, including your laptop if you are on vacation and have an emergency fix to do?
And you will fight Isabelle a lot, for example:
text "
\begin{counterexample}
"
lemma True
by auto
text "
\end{counterexample}
"
does not work, because text is its own environment, so you need post-production effects.
Basically, use the snippets mechanism to extract LaTeX out of the theories and then use your favorite LaTeX editor.

How to make publishable tables and plots using R? [duplicate]

There are a range of tools available for creating publication quality tables using R, Sweave, and LaTeX.
In particular, there are helper functions like latex in the Hmisc package, and xtable in the xtable package. I've also often written my own code so that I could have complete control over table formatting (e.g., see this example).
However, when preparing publication quality tables a range of issues often arise:
how and when to apply numeric formatting
how to precisely control alignment of columns and cells
how to precisely control cell borders
how to convert variable labels to variable names
and so on
Beyond the high level issues of specifying the desired table format, there are issues of implementation.
When should a helper function such as xtable be used?
Which helper function should be used in a given situation?
How can the default output of helper functions be customised to particular requirements?
Question
It seems to me that the above issues are deserving of a detailed textbook-style introduction.
Are there any online or offline resources that provide a detailed overview of how to produce publication quality tables using R, Sweave, and LaTeX, and that address the issues discussed above?
Just to tie this up with a nice little bow at the time of current writing, the best existant tutorials on publication-quality tables and usage scenarios appear to be an amalgamation of these documents:
A Sweave example (source)
The Joy of Sweave: A Beginner's Guide to Reproducible Research with Sweave (source)
Latex and R via Sweave: An example document how to use Sweave (source)
Sweave = R · LaTeX2 (source)
The xtable gallery (source)
The Sweave Homepage
LaTeX documentation
Going beyond the scope of what currently exists, you may want to ask the author of The Joy of Sweave for a document on publication-quality tables specifically. It seems like he's gone above and beyond this problem in his research. In addition to the questions you've raised, this space specifically could use a style guide that, flatly, does not currently exist.
And, as mentioned in the question errata, this is a perfect example of a question for https://tex.stackexchange.com/. I encourage you to continue to ask specific questions there when you run into any difficulties in your current projects.
The package stargazer can create publication-quality - incl. using templates designed to resemble existing academic journals - from commonly used R statistical functions and packages (lm, glm, plm, svyglm, survival, pscl, AER, and others). Also good for creating summary statistics tables, and can directly output data frame content as well.
There is a tabular function in the tables package which addresses formatting, alignment and label operations. The package has a vignette which is a good starting point.
xtable has worked fine for me so far.
In combination with siunitx, and when necessary, longtable, it can produce pretty effective tables, in my opinion. With packages like booktabs and caption, the aesthetics can be pleasing too.
I am not sure this level of detail was asked for by the OP, but for what it's worth, the basic implementation could be something along these lines: https://tex.stackexchange.com/questions/41067/caption-for-longtable-in-sweave/41183#41183 (my own answer to another question).
I highly recommend ConTeXt which makes use of the TABLE package. There is a Table overview in contextgarden and an exhaustive manual.

General guide for creating publication quality tables using R, Sweave, and LaTeX

There are a range of tools available for creating publication quality tables using R, Sweave, and LaTeX.
In particular, there are helper functions like latex in the Hmisc package, and xtable in the xtable package. I've also often written my own code so that I could have complete control over table formatting (e.g., see this example).
However, when preparing publication quality tables a range of issues often arise:
how and when to apply numeric formatting
how to precisely control alignment of columns and cells
how to precisely control cell borders
how to convert variable labels to variable names
and so on
Beyond the high level issues of specifying the desired table format, there are issues of implementation.
When should a helper function such as xtable be used?
Which helper function should be used in a given situation?
How can the default output of helper functions be customised to particular requirements?
Question
It seems to me that the above issues are deserving of a detailed textbook-style introduction.
Are there any online or offline resources that provide a detailed overview of how to produce publication quality tables using R, Sweave, and LaTeX, and that address the issues discussed above?
Just to tie this up with a nice little bow at the time of current writing, the best existant tutorials on publication-quality tables and usage scenarios appear to be an amalgamation of these documents:
A Sweave example (source)
The Joy of Sweave: A Beginner's Guide to Reproducible Research with Sweave (source)
Latex and R via Sweave: An example document how to use Sweave (source)
Sweave = R · LaTeX2 (source)
The xtable gallery (source)
The Sweave Homepage
LaTeX documentation
Going beyond the scope of what currently exists, you may want to ask the author of The Joy of Sweave for a document on publication-quality tables specifically. It seems like he's gone above and beyond this problem in his research. In addition to the questions you've raised, this space specifically could use a style guide that, flatly, does not currently exist.
And, as mentioned in the question errata, this is a perfect example of a question for https://tex.stackexchange.com/. I encourage you to continue to ask specific questions there when you run into any difficulties in your current projects.
The package stargazer can create publication-quality - incl. using templates designed to resemble existing academic journals - from commonly used R statistical functions and packages (lm, glm, plm, svyglm, survival, pscl, AER, and others). Also good for creating summary statistics tables, and can directly output data frame content as well.
There is a tabular function in the tables package which addresses formatting, alignment and label operations. The package has a vignette which is a good starting point.
xtable has worked fine for me so far.
In combination with siunitx, and when necessary, longtable, it can produce pretty effective tables, in my opinion. With packages like booktabs and caption, the aesthetics can be pleasing too.
I am not sure this level of detail was asked for by the OP, but for what it's worth, the basic implementation could be something along these lines: https://tex.stackexchange.com/questions/41067/caption-for-longtable-in-sweave/41183#41183 (my own answer to another question).
I highly recommend ConTeXt which makes use of the TABLE package. There is a Table overview in contextgarden and an exhaustive manual.

R and SPSS difference

I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a lot of graphs and charts. Therefore, I was wondering what is the basic difference between these two softwares.
I am not asking which one is better, but just wanted to know what are the difference in terms of workflow between the two (besides the fact that SPSS has a GUI). I will be mostly working with scripts in either case anyway so I wanted to know about the other differences.
Here is something that I posted to the R-help mailing list a while back, but I think that it gives a good high level overview of the general difference in R and SPSS:
When talking about user friendlyness
of computer software I like the
analogy of cars vs. busses:
Busses are very easy to use, you just
need to know which bus to get on,
where to get on, and where to get off
(and you need to pay your fare). Cars
on the other hand require much more
work, you need to have some type of
map or directions (even if the map is
in your head), you need to put gas in
every now and then, you need to know
the rules of the road (have some type
of drivers licence). The big advantage
of the car is that it can take you a
bunch of places that the bus does not
go and it is quicker for some trips
that would require transfering between
busses.
Using this analogy programs like SPSS
are busses, easy to use for the
standard things, but very frustrating
if you want to do something that is
not already preprogrammed.
R is a 4-wheel drive SUV (though
environmentally friendly) with a bike
on the back, a kayak on top, good
walking and running shoes in the
pasenger seat, and mountain climbing
and spelunking gear in the back.
R can take you anywhere you want to go
if you take time to leard how to use
the equipment, but that is going to
take longer than learning where the
bus stops are in SPSS.
There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.
I work at a company that uses SPSS for the majority of our data analysis, and for a variety of reasons - I have started trying to use R for more and more of my own analysis. Some of the biggest differences I have run into include:
Output of tables - SPSS has basic tables, general tables, custom tables, etc that are all output to that nifty data viewer or whatever they call it. These can relatively easily be transported to Word Documents or Excel sheets for further analysis / presentation. The equivalent function in R involves learning LaTex or using a odfWeave or Lyx or something of that nature.
Labeling of data --> SPSS does a pretty good job with the variable labels and value labels. I haven't found a robust solution for R to accomplish this same task.
You mention that you are going to be scripting most of your work, and personally I find SPSS's scripting syntax absolutely horrendous, to the point that I've stopped working with SPSS whenever possible. R syntax seems much more logical and follows programming standards more closely AND there is a very active community to rely on should you run into trouble (SO for instance). I haven't found a good SPSS community to ask questions of when I run into problems.
Others have pointed out some of the big differences in terms of cost and functionality of the programs. If you have to collaborate with others, their comfort level with SPSS or R should play a factor as you don't want to be the only one in your group that can work on or edit a script that you wrote in the future.
If you are going to be learning R, this post on the stats exchange website has a bunch of great resources for learning R: https://stats.stackexchange.com/questions/138/resources-for-learning-r
The initial workflow for SPSS involves justifying writing a big fat cheque. R is freely available.
R has a single language for 'scripting', but don't think of it like that, R is really a programming language with great data manipulation, statistics, and graphics functionality built in. SPSS has 'Syntax', 'Scripts' and is also scriptable in Python.
Another biggie is that SPSS squeezes its data into a spreadsheety table structure. Dealing with other data structures is probably very hard, but comes naturally to R. I wouldn't know where to start handling network graph type data in SPSS, but there's a package to do it for R.
Also with R you can integrate your workflow with your reporting by using Sweave - you write a document with embedded bits of R code that generate plots or tables, run the file through the system and out comes the report as a PDF. Great for when you want to do a weekly report, or you do a body of work and then the boss gives you an updated data set. Re-run, read it over, its done.
But you know, your call...
Well, are you a decent programmer? If you are, then it's worthwhile to learn R. You can do more with your data, both in terms of manipulation and statistical modeling, than you can with SPSS, and your graphs will likely be better too. On the other hand, if you've never really programmed before, or find the idea of spending several months becoming a programmer intimidating, you'll probably get more value out of SPSS. The level of stuff that you can do with R without diving into its power as a full-fledged programming language probably doesn't justify the effort.
There's another option -- collaborate. Do you know someone you can work with on your project (you don't say whether it's academic or industry, but either way...), who knows R well?
There's an interesting (and reasonably fair) comparison between a number of stats tools here
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
I work with both in a company and can say the following:
If you have a large team of different people (not all data scientists), SPSS is useful because it is plain (relatively) to understand. For example, if users are going to run a model to get an output (sales estimates, etc), SPSS is clear and easy to use.
That said, I find R better in almost every other sense:
R is faster (although, sometimes debatable)
As stated previously, the syntax in SPSS is aweful (I can't stress this enough). On the other hand, R can be painful to learn, but there are tons of resources online and in the end it pays much more because of the different things you can do.
Again, like everyone else says, the sky is the limit with R. Tons of packages, resources and more importantly: indepedence to do as you please. In my organization we have some very high level functions that get a lot done. The hard part is creating them once, but then they perform complicated tasks that SPSS would tangle in a never ending web of canvas. This is specially true for things like loops.
It is often overlooked, but R also has plenty of features to cooperate between teams (github integration with RStudio, and easy package building with devtools).
Actually, if everyone in your organization knows R, all you need is to maintain a basic package on github to share everything. This of course is not the norm, which is why I think SPSS, although a worst product, still has a market.
I have not data for it, but from my experience I can tell you one thing:
SPSS is a lot slower than R. (And with a lot, I really mean a lot)
The magnitude of the difference is probably as big as the one between C++ and R.
For example, I never have to wait longer than a couple of seconds in R. Using SPSS and similar data, I had calculations that took longer than 10 minutes.
As an unrelated side note: In my eyes, in the recent discussion on the speed of R, this point was somehow overlooked (i.e., the comparison with SPSS). Furthermore, I am astonished how this discussion popped up for a while and silently disappeared again.
There are some great responses above, but I will try to provide my 2 cents. My department completely relies on SPSS for our work, but in recent months, I have been making a conscious effort to learn R; in part, for some of the reasons itemized above (speed, vast data structures, available packages, etc.)
That said, here are a few things I have picked up along the way:
Unless you have some experience programming, I think creating summary tables in CTABLES destroys any available option in R. To date, I am unaware package that can replicate what can be created using Custom Tables.
SPSS does appear to be slower when scripting, and yes, SPSS syntax is terrible. That said, I have found that scipts in SPSS can always be improved but using the EXECUTE command sparingly.
SPSS and R can interface with each other, although it appears that it's one way (only when using R inside of SPSS, not the other way around). That said, I have found this to be of little use other than if I want to use ggplot2 or for some other advanced data management techniques. (I despise SPSS macros).
I have long felt that "reporting" work created in SPSS is far inferior to other solutions. As mentioned above, if you can leverage LaTex and Sweave, you will be very happy with your efficient workflows.
I have been able to do some advanced analysis by leveraging OMS in SPSS. Almost everything can be routed to a new dataset, but I have found that most SPSS users don't use this functionality. Also, when looking at examples in R, it just feels "easier" than using OMS.
In short, I find myself using SPSS when I can't figure it out quickly in R, but I sincerely have every intention of getting away from SPSS and using R entirely at some point in the near future.
SPSS provides a GUI to easily integrate existing R programs or develop new ones. For more info, see the SPSS Community on IBM Developer Works.
#Henrik, I did the same task you have mentioned (C++ and R) on SPSS. And it turned out that SPSS is faster compared to R on this one. In my case SPSS is aprox. 7 times faster. I am surprised about it.
Here is a code I used in SPSS.
data list free
/x (f8.3).
begin data
1
end data.
comp n = 1e6.
comp t1 = $time.
loop #rep = 1 to 10.
comp x = 1.
loop #i=1 to n.
comp x = 1/(1+x).
end loop.
end loop.
comp t2 = $time.
comp elipsed = t2 - t1.
form elipsed (f8.2).
exe.
Check out this video why is good to combine SPSS and R...
Link
http://bluemixanalytics.wordpress.com/2014/08/29/7-good-reasons-to-combine-ibm-spss-analytics-and-r/
If you have a compatible copy of R installed, you can connect to it from IBM SPSS Modeler and carry out model building and model scoring using custom R algorithms that can be deployed in IBM SPSS Modeler. You must also have a copy of IBM SPSS Modeler - Essentials for R installed. IBM SPSS Modeler - Essentials for R provides you with tools you need to start developing custom R applications for use with IBM SPSS Modeler.
The truth is: both packages are useful if you do data analysis professionally. Sure, R / RStudio has more statistical methods implemented than SPSS. But SPSS is much easier to use and gives more information per each button click. And, therefore, it is faster to exploit whenever a particular analysis is implemented in both R and SPSS.
In the modern age, neither CPU nor memory is the most valuable resource. Researcher's time is the most valuable resource. Also, tables in SPSS are more visually pleasing, in my opinion.
In summary, R and SPSS complement each other well.

What word processor do you use for technical papers?

I've been looking for some time for a word processor to use for writing technical papers and I haven't really found one. What would really be nice to have is an editor that can handle mathematical expressions, code, and pseudo-code fairly well. I have yet to find one that works very well.
Does anyone have any recommendations?
I personally believe in LaTeX.
Benefits:
You can focus on content over form.
Use logical rather than semantic formatting (e.g., \methodname vs. just italic).
Easier to assemble large documents from multiple files.
Use text-based version control (CVS/SVN/etc.)
Widely used
Much more stable even on super-weak machines
Programmable. For example, I use macros to hide stuff, highlight stuff, obfuscate names by using a macro name with the real name but an obfuscated replacements.
See all the tips and tricks available on SO.
Output looks the same no matter which platform you compile on. Never had that luck with word, each version and each machine produces something slightly different.
My answer's long, so I want to say up front: I think you want OpenOffice Writer (I use v2.4, haven't tried 3.0 yet).
I've used Word with equation editor and LaTeX heavily in the past and OpenOffice Writer
more recently. I used the former two while writing my thesis.
LaTeX may still have advantages in quality of the output and in the ability to use text-based version control, but they're sharply diminished by OO Writer at this point.
Microsoft with equation editor, even the most recent versions, seems very weak still.
What I like about OpenOffice is that you can use the equation formatting mechanisms
in a mode where the window is split between the document you're writing and another
area where you can type very LaTeX-like formatting instructions. One of the big
strengths of LaTeX is that you get to type up something like $x \in S$ for "x is an element of S". OO Writer lets you do this and see the result.
Back when I wrote my thesis, LaTeX was preferable to Word with Eqn. Editor because of the length of my document (over 200 pages), the quality of the results, and the ease of specifying equations. LaTeX does have a disadvantage in simplicity of use that is made more acute by OO Writer.
That said, I'm sure I'd use OO Writer for conference to journal length articles (~8-15 pages v. ~15-40 pages) and also for shorter work. For thesis-length work, I'm not sure which I'd end up using: Word never worked so well for me on longer matter; I suspect OO Writer is better behaved but I don't have enough experience of it to make a firm judgement.
I like LyX (http://www.lyx.org/) -- it's a good tradeoff between "spending all your time writing your document" and "spending all your time writing markup". The most recent versions are even useable!
Apart from that, Word 2008 is actually pretty darn good, provided you use the styles and other "advanced" features.
I fully agree that LATEX is a good choice. I've used for paper in univ, including my master thesis. For LATEX I've been using Kile.
But nowadays there is interesting alternative which is DocBook with MathML extension.
LaTeX with TexMaker got me through grad school.
Depends on what you mean by "Word Processor". If you don't mind not having a WYSIWYG interface, I'd recommend LaTeX (http://www.latex-project.org/).
I wrote my final year Master's dissertation using it, which contained a lot of pseudocode, formulas, etc. Also outputs in a format fairly typical of technical papers.
I use FrameMaker.
MS Word with Mathtype. It has a number of advantages over the default Equation editor, including, but not limited to:
keyboard shortcuts
writing equations in tex mode then converting them
converting equations from "normal" to "linear" mode (the one you can use in your programs, you know a=b/c and such)
templates
no more latex. I can concentrate on the material, not the writing
Word with MS Equation for the mathematical sections.
I like DocBook and use FOP to create PDFs from it.
I use reStructuredText because it can be used in Trac, converted to PDF and HTML, have little markup overhead, and looks nice in its plain form too.
Microsoft Word is considered as the market standard word editor.
My suggestion is for you to use Authorea.
As a former postdoc (Astrophysics) and Ph.D. (Informatics) with 12+ years research experience (Harvard, CERN, UCLA), I have written technical papers for a long time. I have loved and hated LaTeX. For the past 2 years, I have worked with friends and colleagues at developing the next generation platform for writing technical/research documents collaboratively. It is called Authorea. From a technical standpoint Authorea is built on Git and takes LaTeX, Markdown, HTML (even JS, to include fancy d3.js in your papers). Bonus: you don't need to know LaTeX (or any other format) but you can easily add equations, tables, citations, and data to your papers. I hope you'll find it useful.

Resources