Viewing MS Word .docx files in Midnight Commander - docx

I want to be able to quickly view (with F3) the content of Word doc/docx files in Midnight Commander. MC's extensions file calls /usr/lib/mc/ext.d/doc.sh, which contains wv, antiword, catdoc, and word2x as helper programs. On my system (debian), the first three are available, but none of them are able to deal with the newer docx format.
The obvious solution is to use LibreOffice:
libreoffice --headless --convert-to "txt:Text (encoded):UTF8" filename.docx
This works well, but how do I tell MC to use it and display the result of the conversion? If I put this in ~/.config/mc/mc.ext, replacing the lines
View=%view{ascii} /usr/lib/mc/ext.d/doc.sh view msdoc
with
View=libreoffice --headless --convert-to "txt:Text (encoded):UTF8" "${MC_EXT_FILENAME}"
then I end up with a filename.txt file in the current directory, and nothing is displayed. What I want to happen is for mc to do the conversion when I press F3 and discard it when I quit the viewer. (I guess the converted file would be written to /tmp/ and removed on quit.)
Bonus: it would be nice if the displayed file would be word-wrapped, I suppose that could be done by using the wrap command?
Can I do this without having to modify /usr/lib/mc/ext.d/doc.sh, in my ~/.config/mc/mc.ext?

I use docx2txt:
View=%view{ascii} docx2txt %f -
Also you don't need such a long conversion string in libreoffice.
libreoffice --cat %f
is enough.

Related

Looping through the content of a file in Zsh

I'm trying to loop through the contents of a file in zsh. In my loop I want to get user input. Going off of this answer for Bash, I'm attempting to do:
while read -u 10 line; do
echo $line;
# TODO read from stdin here, etc.
done 10<myfile.txt
However I get an error:
zsh: parse error near `10'
Referring to the 10 after the done. Obviously I'm not getting the file descriptor syntax right, but I'm having trouble figuring out the docs.
Use a file descriptor number less than 10. If you want to hard code file descriptor numbers, stick to the range 3-9 (plus 0-2 for stdin,out,err). When zsh needs file descriptors itself, it uses them in the 10+ range.
If you're even getting close to needing more than the 7 available hard coded file descriptors, you should really think about using variables to name them. Syntax like exec {myfd}<myfile.txt will open a file with zsh allocating a file descriptor greater than 10 and assigning it to $myfd.
Bourne shell syntax is not entirely unambiguous given file descriptors numbering 10 and over and even in bash, I'd advise against using them. I'm not entirely sure how bash avoids conflicts if it needs to open any for internal use - I guess it never needs to leave any open. This may look like a zsh limitation at first sight but is actually a sensible feature.

How do I convert my 5GB 1 liner file to lines based on pattern?

I have a 5GB 1 liner file with JSON data and each line starts from this pattern "{"created". I need to be able to use Unix commands on my Mac to convert this monster of a 1 liner into as many lines as it deserves. Any commands?
ASCII English text, with very long lines, with no line terminators
If you have enough memory you can open the file once with the TextWrangler application (the free BBEdit cousin) and use regular search/replace on the whole file. Use \r in replace to add a return. Will be very slow at opening the file, may even hang if low on memory, but in the end it may probably work. No scripting, no commands,.. etc.. I did this with big SQL files and sometimes it did the job.
You have to replace your line-start string with the same string with \n or \r or \r\n in front of it.
Unclear how it can be a “one liner” file but then each line starts with "{"created", but perhaps python -mjson.tool can help you get started:
cat your_source_file.json | python -mjson.tool > nicely_formatted_file.json
Piping raw JSON through ``python -mjson.tool` will cleanly format the JSON to be more human readable. More info here.
OS X ships with both flex and bison, you can use those to write a parser for your data.
You can use PHP as a shell command (if PHP is installed), just save a text file with name "myscript" and appropriate code (I cannot test code now, but the idea is as follows)
UNTESTED CODE
#!/usr/bin/php
<?php
$REPLACE_STRING='{"created'; // anything you like
// open input file with fopen() in read mode
$inFp=fopen('big_in_file.txt', "r");
// open output file with fopen() in write mode
$outFp=fopen('big_out_file.txt', "w+");
// while not end of file
while (!feof($inFp)) {
// read file chunks here with fread() in variable $chunk
$chunk = fread($inFp, 8192);
// do a $chunk=str_replace($REPLACE_STRING,"\r".$REPLACE_STRING; // to add returns
// (or use \r\n for windows end of lines)
$chunk=str_replace($REPLACE_STRING,"\r".$REPLACE_STRING,$chunk);
// problem: if chunk contains half the string at the end
// easily solved if $REPLACE_STRING is a one char like '{'
// otherwise test for fist char { in the end of $chunk
// remove final part and save it in a var for nest iteration
// write $chunk to output file
fwrite($outFp, $chunk);
// End while
}
?>
After you save it you must make it executable whith sudo chmod a+x ./myscript
and then launch it as ./myscript in terminal
After this, the myscript file is a full unix command

unix: can i write to the same file in parallel without missing entries?

I wrote a script that executes commands in parallel. I let them all write an entry to the same log file. It does not matter if the order is wrong or entries are interleaved, but i noticed that some entries are missing. I should probably lock the file before writing, however, is it true that if multiple processes try to write to a file simultaneously, it will result in missing entries?
Yes, if different processes independently open and write to the same file, it may result in overlapping writes and missing data. This happens because each process will get its own file pointer, that advances only by local writes.
Instead of locking, a better option might be to open the log file once in an ancestor of all worker processes, have it inherited across fork(), and used by them for logging. This means that there will be a single shared file pointer, that advances when any of the processes writes a new entry.
In a script you should use ">> file" (double greater than) to append output to that file. The interpreter will open the destination in "append" mode. If your program also wants to append, follow the directives below:
Open a text file in "append" mode ("a+") and give preference to printing only full lines (don't do multiple 'print' followed by a final 'println', but print the entire line with a single 'println').
The fopen documentation states this:
DESCRIPTION
The fopen() function opens the file whose pathname is the
string pointed to by filename, and associates a stream with
it.
The argument mode points to a string beginning with one of
the following sequences:
r or rb Open file for reading.
w or wb Truncate to zero length or create file
for writing.
a or ab Append; open or create file for writing
at end-of-file.
r+ or rb+ or r+b Open file for update (reading and writ-
ing).
w+ or wb+ or w+b Truncate to zero length or create file
for update.
a+ or ab+ or a+b Append; open or create file for update,
writing at end-of-file.
The character b has no effect, but is allowed for ISO C
standard conformance (see standards(5)). Opening a file with
read mode (r as the first character in the mode argument)
fails if the file does not exist or cannot be read.
Opening a file with append mode (a as the first character in
the mode argument) causes all subsequent writes to the file
to be forced to the then current end-of-file, regardless of
intervening calls to fseek(3C). If two separate processes
open the same file for append, each process may write freely
to the file without fear of destroying output being written
by the other. The output from the two processes will be
intermixed in the file in the order in which it is written.
It is because of this intermixing that you want to give preference to
using only 'println' (or its equivalent).

Printing hard copies of code

I have to hand in a software project that requires either a paper or .pdf copy of all the code included.
One solution I have considered is grouping classes by context and doing a cat *.extension > out.txt to provide all the code, then by catting the final text files I should have a single text file that has classes grouped by context. This is not an ideal solution; there will be no page breaks.
Another idea I had was a shell script to inject latex page breaks in between files to be joined, this would be more acceptable. Although I'm not too adept at scripting or latex.
Are there any tools that will do this for me?
Take a look at enscript (or nenscript), which will convert to Postscript, render in columns, add headers/footers and perform syntax highlighting. If you want to print code in a presentable fashion, this works very nicely.
e.g. here's my setting (within a zsh function)
# -2 = 2 columns
# -G = fancy header
# -E = syntax filter
# -r = rotated (landscape)
# syntax is picked up from .enscriptrc / .enscript dir
enscript -2GrE $*
For a quick solution, see a2ps, followed by ps2pdf. For a nicer, more complex solution I would go for a simple script that puts each file in a LaTeX listings environment and combines the result.

Convert asp.net project pages from Windows-1251 to Utf-8

I can do that file-by-file with Save As Encoding in Visual Studio, but I'd like to make this in one click. Is it possible?
I know, some will start bashing on me:
download a smalltalk IDE (such as ST/X),
open a workspace,
type in:
'yourDirectoryHere' asFilename directoryContentsAsFilenamesDo:[:oldFileName |
|cyrString utfString newFile|
cyrString := oldFileName contentsAsString.
utfString := CharacterEncoder encodeString:cyrString from:#'iso8859-5' into:#'utf'.
newFile := oldFile withSuffix:'utf'.
newFile contents:utfString.
].
that will convert all files in the given directory and create corresponding .utf files without affecting the original files. Even if you normally do not use smalltalk, for this type of actions, smalltalk is a perfect scripting environment.
I know, most of you don't read smalltalk, but the code should be readable even for non-smalltalkers and a corresponding perl/python/java/c# piece of code also written and executed in 1 minute or so, taking the above as a guide. I guess all current languages provide something similar to the CharacterEncoder above.

Resources