How to generate PDF documents using Julia - julia

I want to generate student feedback reports using a loop like this:
for student in studentList
# Report/feedback content goes here:
# Here I want to use text with variables for example
student * " received " * xPoints
"Q1"
"Good effort but missing units"
"Q2"
"More text ..."
# end of the feedback
end
My goal is to generate 30+ PDF files for all students, with the score for each question, supplemented with some free text for each student. One way I thought of would be to write to multiple TeX files and compile them to PDFs at the end.
I'm not determined to output PDFs, if there is a better method of generating multiple, human-readable reports programmatically in Julia.

As for now we can start with the basics and output a HTML file, which is faster. You can use a templating library, and in this case we can use mustache. The template is hardcoded, but it is simple to include it in an external file.
Remember to install the templating library Mustache:
import Pkg; Pkg.add("Mustache")
The basic idea is the following:
have a list of dictionaries with the data
have a template of the report, where parts to be substituted are in {{ ... }} guards
save the report of a single student in a file html through iteration.
You can add some code to directly send a mail to the student, without even saving the file, if your computer is configured to do so (as long as you do not include external CSS, the mail will be formatted accordingly to the HTML instructions).
using Mustache
students = [
Dict( "name" => "John", "surname" => "Smith", "mark" => 30 ),
Dict( "name" => "Elisa", "surname" => "White", "mark" => 100 )
]
tmpl = mt"""
<html>
<body>
Hello <b>{{name}}, {{surname}}</b>. Your mark is {{mark}}
</body>
</html>
"""
for student in students
rendered = render(tmpl, student)
filename = string("/tmp/", student["name"], "_", student["surname"], ".html")
open(filename, "w") do file
write(file, rendered)
end
end
The result for a single student is something like:
<html>
<body>
Hello <b>Elisa, White</b>. Your mark is 100
</body>
</html>
If you prefer a PDF, i think the faster way is to have a piece of LaTeX as template (in place of a HTML template), to export the result of Mustache into a file and then to compile it from the script with a system call:
using Mustache
students = [
Dict( "name" => "John", "surname" => "Smith", "mark" => 30 ),
Dict( "name" => "Elisa", "surname" => "White", "mark" => 100 )
]
tmpl = mt"""
\documentclass{standalone}
\begin{document}
Hello \textbf{ {{name}}, {{surname}}}. Your mark is ${{mark}}$.
\end{document}
"""
for student in students
rendered = render(tmpl, student)
filename = string("/tmp/", student["name"], "_", student["surname"], ".tex")
open(filename, "w") do file
write(file, rendered)
end
run(`pdflatex $filename`)
end
which results in something like:
A reference to Mustache.jl, where you can find some instructions on how to iterate over different questions with a single line of template. This is an example in which the marks are an array of values (again for tex):
using Mustache
students = [
Dict( "name" => "John", "surname" => "Smith", "marks" => [25, 32, 40, 38] ),
Dict( "name" => "Elisa", "surname" => "White", "marks" => [40, 40, 36, 35] )
]
tmpl = """
\\documentclass{article}
\\begin{document}
Hello \\textbf{ {{name}}, {{surname}} }. Your marks are:
\\begin{itemize}
{{#marks}}
\\item Mark for question is {{.}}
{{/marks}}
\\end{itemize}
\\end{document}
"""
for student in students
rendered = render(tmpl, student)
filename = string("/tmp/", student["name"], "_", student["surname"], ".tex")
open(filename, "w") do file
write(file, rendered)
end
run(`pdflatex $filename`)
end
which results in:

Related

Judge whitespace or number using pyparsing

I am working on parsing structured text files by pyparsing and I have a problem judging whitespace or numerical number. My file looks like this:
RECORD 0001
TITLE (Main Reference Title)
AUTHOR (M.Brown)
Some files have more than one author then
RECORD 0002
TITLE (Main Reference Title 1)
AUTHOR 1(S.Red)
2(B.White)
I would like to parse files and convert them into dictionary format.
{"RECORD": "001",
"TITLE": "Main Reference Title 1",
"AUTHOR": {"1": "M.Brown"}
}
{"RECORD": "002",
"TITLE": "Main Reference Title 2",
"AUTHOR": {"1": "S.Red", "2": "B.White"}
}
I tried to parse the AUTHOR field by pyparsing (tried both 2.4.7 and 3.0.0b3). Following is the simplified version of my code.
from pyparsing import *
flag = White(" ",exact=1).set_parse_action(replace_with("1")) | Word(nums,exact=1)
flaged_field = Group(flag + restOfLine)
next_line = White(" ",exact=8).suppress() + flaged_field
authors_columns = Keyword("AUTHOR").suppress() +\
White(" ",exact=2).suppress() +\.
flaged_field +\ # parse first row
ZeroOrMore(next_line) # parse next row
authors = authors_columns.search_string(f)
, where 'f' contains all lines read from the file. With this code, I only could parse the author's names with numbering flags.
[]
[[['1', '(S.Red)'],['2','(B.White)']]]
However, if I only parse with whitespace
flag = White(" ",exact=1).set_parse_action(replace_with("1"))
it worked correctly for the files without numbering flags.
['1', '(M.Brown)']
[]
The number (or whitespace) in [9:10] has a meaning in my format and want to judge if it is a whitespace or a numerical number (limited up to 9). I also replaced "|" to "^", and replaced the order, and tried
flag = Word(nums+" ")
, too, but neither of the cases works for me. Why judge White(" ") or Word(nums) doesn't work with my code? Could someone help me or give me an idea to solve this?
This was solved by adding leave_whitespace().
flag = (White(" ",exact=1).set_parse_action(replace_with("0")) | Word(nums,exact=1)).leave_whitespace()

Producing files in dagster without caring about the filename

In the dagster tutorial, in the Materializiations section, we choose a filename (sorted_cereals_csv_path) for our intermediate output, and then yield it as a materialization:
#solid
def sort_by_calories(context, cereals):
# Sort the data (removed for brevity)
sorted_cereals_csv_path = os.path.abspath(
'calories_sorted_{run_id}.csv'.format(run_id=context.run_id)
)
with open(sorted_cereals_csv_path, 'w') as fd:
writer = csv.DictWriter(fd, fieldnames)
writer.writeheader()
writer.writerows(sorted_cereals)
yield Materialization(
label='sorted_cereals_csv',
description='Cereals data frame sorted by caloric content',
metadata_entries=[
EventMetadataEntry.path(
sorted_cereals_csv_path, 'sorted_cereals_csv_path'
)
],
)
yield Output(None)
However, this is relying on the fact that we can use the local filesystem (which may not be true), it will likely get overwritten by later runs (which is not what I want) and it's also forcing us to come up with a filename which will never be used.
What I'd like to do in most of my solids is just say "here is a file object, please store it for me", without concerning myself with where it's going to be stored. Can I materialize a file without considering all these things? Should I use python's tempfile facility for this?
Actually it seems this is answered in the output_materialization example.
You basically define a type:
#usable_as_dagster_type(
name='LessSimpleDataFrame',
description='A more sophisticated data frame that type checks its structure.',
input_hydration_config=less_simple_data_frame_input_hydration_config,
output_materialization_config=less_simple_data_frame_output_materialization_config,
)
class LessSimpleDataFrame(list):
pass
This type has an output_materialization strategy that reads the config:
def less_simple_data_frame_output_materialization_config(
context, config, value
):
csv_path = os.path.abspath(config['csv']['path'])
# Save data to this path
And you specify this path in the config:
execute_pipeline(
output_materialization_pipeline,
{
'solids': {
'sort_by_calories': {
'outputs': [
{'result': {'csv': {'path': 'cereal_out.csv'}}}
],
}
}
},
)
You still have to come up with a filename for each intermediate output, but you can do it in the config, which can differ per-run, instead of defining it in the pipeline itself.

find duplicate records by searching specific column values in a file

I am new for unix can you please help me to find duplicate record
duplicate based on Name,EmpId and designation
Input File:
"Name" , "Address", ËmpId"," designation", "office location"
"NameValue","AddressValue",ËmpIdValue","designationValue","office locationValue"
"NameValue1","AddressValue1",ËmpIdValue1","designationValue1","office locationValue1"
"NameValue","AddressValue1",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressValue2",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressVal4ue",ËmpIdValue1","designationValue","office locationValue"
Output file:
"NameValue","AddressValue",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressValue1",ËmpIdValue","designationValue","office locationValue"
"NameValue","AddressValue2",ËmpIdValue","designationValue","office locationValue"
Probably python script will be the best for this:
import fileinput
dict = {}
for line in fileinput.input():
tokens = line.split(",")
key = tokens[0] + "###" + tokens[2] + "###" + tokens[3]
if key in dict:
# print the previous duplicate, if it wasn't printed yet
if len(dict[key]):
print dict[key],
dict[key] = ""
print line,
else:
dict[key] = line
For production use you probably may want to use more sophisticated algorithm to make keys unique, but the general idea is the same.

creating multi-dimensional array from textfile in perl [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
The file temp.txt has contents like this:
ABC 1234 56 PQR
XYZ 8672 12 RQP
How to store the temp.txt file into a two dimensional array, so that I can access them through the array index?
use File::Slurp;
use Data::Dumper;
my #arr = map [split], read_file("temp.txt");
print Dumper \#arr;
output
$VAR1 = [
[
'ABC',
'1234',
'56',
'PQR'
],
[
'XYZ',
'8672',
'12',
'RQP'
]
];
At a minimum, you could do this
my #file = load_file($filename);
sub load_file {
my $filename = shift;
open my $fh, "<", $filename or die "load_file cannot open $filename: $!";
my #file = map [ split ], <$fh>;
return #file;
}
This will read an argument file, split the content on whitespace and put it inside an array ref (one per line), then return the array with array refs. On exiting the subroutine, the file handle will be closed.
This is a somewhat clunky solution, in some ways. It loads the entire file into memory, it does not have a particularly fast lookup when you are looking for a specific value, etc. If you have a unique key in each row, you can use a hash instead of an array, to make lookup faster:
my %file = map { my ($key, #vals) = split; $key => \#vals; } <$fh>;
Note that the keys must be unique, or they will overwrite each other.
Or you can use Tie::File to only look up the values you want:
use Tie::File;
tie my #file, 'Tie::File', $filename or die "Cannot tie file: $!";
my $line = [ split ' ', $file[0] ];
Or if you have a specific delimiter on the lines of your file, and a format that complies with the CSV format, you can use Tie::File::CSV
use Tie::File::CSV;
tie my #file, 'Tie::Array::CSV', $filename, sep_char => ' '
or die "Cannot tie file: $!";
my $line = $file[0];
Note that using this module might be overkill, and might cause problems if you do not have a strict csv format. Also, Tie::File has a reputation of decreasing performance. Which solution is best depends largely on your needs and preferences.

How to Display Vocabulary Title in Archetypes?

I want my custom type to display the stored vocabulary's title. The field definition looks like:
atapi.LinesField(
'member_field',
searchable=1,
index='KeywordIndex',
multiValued=1,
storage=atapi.AnnotationStorage(),
vocabulary_factory='member_name',
widget=AutocompleteWidget(
label=_(u"Member Name"),
description=_(u"Multiple Lines, One Per Line."),
actb_timeout=-1,
actb_expand_onfocus=0,
actb_filter_bogus=0,
),
enforceVocabulary=0,
),
The vocabulary definition looks like:
class member_name(object):
implements(IVocabularyFactory)
def __call__(self, context=None):
items = (
SimpleTerm(value='john', title=u'John Doe'),
SimpleTerm(value='paul', title=u'Paul Smith'),
... ...
)
return SimpleVocabulary(items)
member_nameFactory = member_name()
The corresponding page template looks like:
<div tal:define="mbrs context/member_field|nothing"
tal:condition="mbrs">
Member List:
<span tal:repeat="mbr mbrs">
<span tal:replace="mbr">Member Name</span>
<span class="separator"
tal:condition="not: repeat/mbr/end" tal:replace="string:, " />
</span>
</div>
The example result, showing only values, looks like: Member List: paul , john. How can I display their titles instead, like: Member List: Paul Smith , John Doe?
Vocabularies (in Zope3 style) are just named utilities and you can retrieve them like this:
from zope.component import getUtility
from zope.schema.interfaces import IVocabularyFactory
factory = getUtility(IVocabularyFactory, vocabularyname)
vocabulary = factory(self.context)
and then you can get the term's title like this:
fieldvalue = self.context.getField('myfield').get(self.context)
term = vocabulary.getTerm(fieldvalue)
print "Term value is %s token is %s and title is %s" + (term.value, term.token, term.title)
More info:
http://collective-docs.readthedocs.org/en/latest/forms/vocabularies.html

Resources