R/exams: prevent page breaks between two paragraphs in exams2nops - r

I am currently working with the R/exams package, specifically creating exams using the exams2nops function and the order of the questions is randomized. Everything works fine except for one detail: The samepage = TRUE option only prevents page breaks within a paragraph. For my purpose it is however necessary not to split any of the exercises. Within one text it would still be possible to write only one paragraph (although this is not helpful for clarity). Unfortunately, whenever I need to include a table, I am forced to start a new paragraph which is not "protected" against page break. Due to the randomization there are always some copies where text and tables are splitted to multiple pages. I tried .Rmd as well as .Rnw files and also tried to integrate some LaTeX functions such as \nopagebreak and \needspace without success. So far, I am not too experienced with LaTeX and also with Google's help I did not find a solution.
Here is a minimal example of what kind of exercises I am talking about:
.Rmd
Question
========
Some kind of question:
| A | B | C |
|:-:|:-:|:-:|
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
Answerlist
----------
* First option
* Second option
* Third option
Some further informational text.
.Rnw
\begin{question}
Some kind of question:
\begin{center}
\begin{tabular}{ccccccr}
$\text{A}$ & $\text{B}$ & $\text{C}$ \\
$\text{1}$ & $\text{2}$ & $\text{3}$ \\
$\text{1}$ & $\text{2}$ & $\text{3}$ \\
$\text{1}$ & $\text{2}$ & $\text{3}$ \\
$\text{1}$ & $\text{2}$ & $\text{3}$ \\
$\text{1}$ & $\text{2}$ & $\text{3}$ \\
\end{tabular}
\end{center}
Some further informational text.
\begin{answerlist}
\item First option
\item Second option
\item Third option
\end{answerlist}
\end{question}
I am not sure what to try.

The samepage = TRUE option only enforces that the {answerlist} is in a {samepage} environment - but not the entire {question}.
The easiest option to accomplish what you want to do seems to be to re-define the {question} environment in the header = argument. You could either try to put everything into a {samepage} environment via
exams2nops(...,
header = "\\renewenvironment{question}{\\item \\begin{samepage}}{\\end{samepage}}")
This may have to be coupled, though, with some \nopagebreak commands in between paragraphs. (See: Make an unbreakable block in TeX)
A simpler solution might be to put every exercise on its own page by including a page break at the end of each exercise:
exams2nops(...,
header = "\\renewenvironment{question}{\\item}{\\newpage}")
In case you are not familiar with the LaTeX syntax above:
\renewenvironment{foo}{...}{...} re-defines the environment "foo".
The first ... is what is executed at the beginning of the environment.
The second ... is executed at the end.
By default, only \item is executed at the beginning to increase the enumerated counter for the exercises.
The double backslashes (e.g., \\item) are required in R to escape the backslashes as they are a special character.

Related

Is there a way to add some basic scripting to MkDocs when writing in MarkDown?

I have a documentation site using MkDocs. It contains a list of documents, all with the same basic structure (we only have a list of experimental notebooks). Each notebook (written in a separate MarkDown file) should contain an author + date in the beginning using the following:
<style>table, td, th {border:none!important; font-size:15px}</style>
| | |
| -------------- | :-------------------------- |
| **Author(s):** | First Author, Second Author |
| **Date:** | 2024-01-23 |
What I would like to have is some kind of templating that enables the users to avoid adding the above lines (eg. avoid adding CSS) but instead just specifying the list of authors and a date. This would enable me as a documentation owner to change the formatting later if needed without changing each .md file content.
I read about Jinja templates and wonder if it could be used to achieve this?
My notes after studying a problem that is similar to yours:
One example using global blocks
in your case you should look at mkdocs with jinja macros.
The metadata can be directly accessed in your markdown.
So you can place at start of the document
authors: First A, Second A
docdate: 2024-01-23
and then in the document text uses {{ authors }}, it will be replaced.
You requested to use this globally without changing the markdown files. This can be done by including a block of markdown from an external file in your document. First setup mkdocs include plugin and then use something like this:
---
authors: First, second
docdate: 2024-01-23
---
{% include-markdown "../authortable.md" %}
In the authortable you can directly put your layout for the author table, and include the metadata. The include is done before variable expantion.
Your include file could be:
<style>table, td, th {border:none!important; font-size:15px}</style>
| | |
| -------------- | :-------------------------- |
| **Author(s):** | {{authors}} |
| **Date:** | {{docdate}} |
Other approaches
There are other ways of course.
You can make a python macro and call this with the metadata variables:
{{ make_the_table (authors , docdate) }}
The advantage of the macro approach is that you have the full power of Python, and can use databases, condition, file system properties, etc etc.
You can also automatically modify the raw markdown without having any tags in the documents, using a method on_post_page_macros(env) as described in advanced macro.
You may want to use metadata also called frontend data, as described in the yaml-style-meta-data section. Such information is placed at the beginning of your markdown file. The content can then be used to generate content.
The macro plugin is made for this and can be found here https://mkdocs-macros-plugin.readthedocs.io

How do I flatten a large, complex, deeply nested JSON file into multiple CSV files a linking identifier

I have a complex JSON file (~8GB) containing publically available data for businesses. We have decided to split the files up into multiple CSV files (or tabs in a .xlsx), so clients can easily consume the data. These files will be linked by the NZBN column/key.
I'm using R and jsonlite to read a small sample in (before scaling up to the full file). I'm guessing I need some way to specify what key/columns go in each file (i.e, the first file will have headers: australianBusinessNumber, australianCompanyNumber, australianServiceAddress, the second file will have headers: annualReturnFilingMonth, annualReturnLastFiled, countryOfOrigin...)
Here's a sample of two businesses/entities (I've bunged some of the data as well so ignore the actual values): test file
I've read almost every post on s/o of similar questions and none seem to be giving me any luck. I've tried variations of purrr, *apply commands, custom flattening functions and jqr (an r version of 'jq' - looks promising but I can't seem to run it).
Here's an attempt at creating my separate files, but I'm unsure how to include the linking identifier (NZBN) + I keep running into further nested lists (i'm unsure how many levels of nesting there are)
bulk <- jsonlite::fromJSON("bd_test.json")
coreEntity <- data.frame(bulk$companies)
coreEntity <- coreEntity[,sapply(coreEntity, is.list)==FALSE]
company <- bulk$companies$entity$company
company <- purrr::reduce(company, dplyr::bind_rows)
shareholding <- company$shareholding
shareholding <- purrr::reduce(shareholding, dplyr::bind_rows)
shareAllocation <- shareholding$shareAllocation
shareAllocation <- purrr::reduce(shareAllocation, dplyr::bind_rows)
I'm not sure if it's easier to split the files up during the flattening/wrangling process, or just completely flatten the whole file so I just have one line per business/entity (and then gather columns as needed) - my only concern is that I need to scale this up to ~1.3million nodes (8GB JSON file).
Ideally I would want the csv files split every time there is a new collection, and the values in the collection would become the columns for the new csv/tab.
Any help or tips would be much appreciated.
------- UPDATE ------
Updated as my question was a little vague I think all I need is some code to produce one of the csv's/tabs and I replicate for the other collections.
Say for example, I wanted to create a csv of the following elements:
entityName (unique linking identifier)
nzbn (unique linking
identifier)
emailAddress__uniqueIdentifier
emailAddress__emailAddress
emailAddress__emailPurpose
emailAddress__emailPurposeDescription
emailAddress__startDate
How would I go about that?
i'm unsure how many levels of nesting there are
This will provide an answer to that quite efficiently:
jq '
def max(s): reduce s as $s (null;
if . == null then $s elif $s > . then $s else . end);
max(paths|length)' input.json
(With the test file, the answer is 14.)
To get an overall view (schema) of the data, you could
run:
jq 'include "schema"; schema' input.json
where schema.jq is available at this gist. This will produce a structural schema.
"Say for example, I wanted to create a csv of the following elements:"
Here's a jq solution, apart from the headers:
.companies.entity[]
| [.entityName, .nzbn]
+ (.emailAddress[] | [.uniqueIdentifier, .emailAddress, .emailPurpose, .emailPurposeDescription, .startDate])
| #csv
shareholding
The shareholding data is complex, so in the following I've used the to_table function defined elsewhere on this page.
The sample data does not include a "company name" field so in the following, I've added a 0-based "company index" field:
.companies.entity[]
| [.entityName, .nzbn] as $ix
| .company
| range(0;length) as $cix
| .[$cix]
| $ix + [$cix] + (.shareholding[] | to_table(false))
jqr
The above solutions use the standalone jq executable, but all going well, it should be trivial to use the same filters with jqr, though to use jq's include, it might be simplest to specify the path explicitly, as for example:
include "schema" {search: "~/.jq"};
If the input JSON is sufficiently regular, you
might find the following flattening function helpful, especially as it can emit a header in the form of an array of strings based on the "paths" to the leaf elements of the input, which can be arbitrarily nested:
# to_table produces a flat array.
# If hdr == true, then ONLY emit a header line (in prettified form, i.e. as an array of strings);
# if hdr is an array, it should be the prettified form and is used to check consistency.
def to_table(hdr):
def prettify: map( (map(tostring)|join(":") ));
def composite: type == "object" or type == "array";
def check:
select(hdr|type == "array")
| if prettify == hdr then empty
else error("expected head is \(hdr) but imputed header is \(.)")
end ;
. as $in
| [paths(composite|not)] # the paths in array-of-array form
| if hdr==true then prettify
else check, map(. as $p | $in | getpath($p))
end;
For example, to produce the desired table (without headers) for .emailAddress, one could write:
.companies.entity[]
| [.entityName, .nzbn] as $ix
| $ix + (.emailAddress[] | to_table(false))
| #tsv
(Adding the headers and checking for consistency,
are left as an exercise for now, but are dealt with below.)
Generating multiple files
More interestingly, you could select the level you want, and produce multiple tables automagically. One way to partition the output into separate files efficiently would be to use awk. For example, you could pipe the output obtained using this jq filter:
["entityName", "nzbn"] as $common
| .companies.entity[]
| [.entityName, .nzbn] as $ix
| (to_entries[] | select(.value | type == "array") | .key) as $key
| ($ix + [$key] | join("-")) as $filename
| (.[$key][0]|to_table(true)) as $header
# First emit the line giving all the headers:
| $filename, ($common + $header | #tsv),
# Then emit the rows of the table:
(.[$key][]
| ($filename, ($ix + to_table(false) | #tsv)))
to
awk -F\\t 'fn {print >> fn; fn=0;next} {fn=$1".tsv"}'
This will produce headers in each file; if you want consistency checking, change to_table(false) to to_table($header).

Google Sheets FILTER() and QUERY() not working with SUM()

I'm trying to pull and sum data from one sheet on another. This is GA data being built into a report, so I have sessions split up by landing page and device type, and would like to group them in different ways.
I usually use FILTER() for this sort of thing, but it keeps returning a 0 sum. Thinking this may be an odd edge case with FILTER(), I switched to using QUERY() instead. That gave me an error, but a Google search doesn't offer much documentation about what the error actually means. Taking a guess that it could be indicating an issue with the data type (i.e. not numeric), I changed the format of the source from "Automatic" to "Number", but to no avail.
Maybe it's a lack of coffee, I'm at a loss as to why neither function is working to do a simple lookup and sum by criteria.
FILTER() function
SUM(FILTER(AllData!C:C,AllData!A:A="/chestnut/",AllData!B:B="desktop"))
No error, but returns 0 regardless of filter parameters.
QUERY() function
QUERY(AllData!A:G, "SELECT SUM(C) WHERE A='/chestnut/' AND B='desktop'",1)
Error returned:
Unable to parse query string for Function QUERY parameter 2: AVG_SUM_ONLY_NUMERIC
Sample data:
landingPage | deviceCategory | sessions
-------------|----------------|----------
/chestnut/ | desktop | 4
/chestnut/ | desktop | 2
/chestnut/ | tablet | 5
/chestnut/ | tablet | 1
/maple/ | desktop | 1
/maple/ | desktop | 2
/maple/ | mobile | 3
/maple/ | mobile | 1
I think the summing doesn't work because your numbers are text formatted.
See if any of these work? (change ranges to suit)
using FILTER()
=SUM(FILTER(VALUE(AllData!C:C),AllData!A:A="/chestnut/",AllData!B:B="desktop"))
using QUERY()
=ArrayFormula(QUERY({AllData!A:B, VALUE(AllData!C:C)}, "SELECT SUM(Col3) WHERE Col1='/chestnut/' AND Col2='desktop' label SUM(Col3)''",1))
using SUMPRODUCT()
=SUMPRODUCT(VALUE(AllData!C2:C),AllData!A2:A="/chestnut/",AllData!B2:B="desktop")

LaTeX: stack three lines in math mode

Hey,
I'm writing a formula with three indexes i,j,k.
At the end of the line I'd like to put this:
i=1,...,a
j=1,...,b
k=1,...,n
But I'd like it in smaller font and stacked above each other. Can someone tell me a command which can accomplish this? \mbox can't do math mode, I think.
Try the \substack command:
z_i = a_j + b_k \qquad \substack{
i=1,\dots,a \\
j=1,\dots,b \\
k=1,\dots,n}

Unix diff to only print relevant diff

I have these two files
File: 11
11
456123
File: 22
11
789
Output of diff 11 22
2c2
< 456123
---
> 789
Output to be
< 456123
> 789
I want it to not print the 2c2 and --- lines. I looked at the man page but could not locate any help. Any ideas? The file has more than 1000 lines.
What about diff 11 22 | grep "^[<|>]"?
Update: As knitti pointed out the correct pattern is ^[<>]
Diff has a whole host of useful options like --old-group-format that are described very briefly in help. They are expanded in http://www.network-theory.co.uk/docs/diff/Line_Group_Formats.html
The following is producing something similar to what you want.
diff 11.txt 22.txt --unchanged-group-format="" --changed-group-format="<%<>%>"
<456123
>789
You might also need to play with --old-group-format=format (groups hunks containing only lines from the first file) --new-group-format=format --old-line-format=format (formats lines just from the first file) and --new-line-format=format etc
Disclaimer - I have not used this for real before, in fact I have only just understood them. If you have further questions I am happy to look at it later.
Edited to change order of lines

Resources