I am calling a shell program from R Markdown like this
```{sh}
SomeShellProgram -options
```
and render the file as HTML. The calculation the program does take some time, wherefore the author included an self-updating progress "bar" which looks something like this:
45Mb 12.4% 935 OTUs, 3485 chimeras (6.7%)
However, especially if the progress is slow, it will update this line every 0.1% or so. And each line is rendered separately in the HTML, which can ad up to up to 1000 lines of progress bars.
I don't want to suppress the output completely , e.g. with echo=FALSE in the chunk options. I am producing a report and the information that is printed is important.
I am looking for a hack that would somehow only capture the last X lines and render these, or maybe using grep or something similar to only capture the lines that have 100% or so.
I tried redirecting the output with > output.txt but the progress wasn't printed to the file (although other information was).
I can't think of a way to provide a reproducible example without giving the full example, sorry for that.
For those that are interested: I am trying to produce a report on the analysis of 16S Illumina sequencing data and I'm using Usearch and the command that gives me the most headaches is the usearch -cluster_otus command.
UPDATE
There is an additional problem with rendering the last X lines: The progress bar in the output is delimited by ^M(carriage return characters) and not by line breaks, so lessonly recognises it as a single line. Therefore my final solution includes
redirecting the output from the progress bar with 2> into a file
replacing the ^Mcharacters with line breaks using sed
rendering the last X lines with less
My (pseudo)code to do this on mac osx is the following (where X = number of lines)
FunctionWithProgressBar -option 2> tempfile.tmp
sed -ibak $'s/\x0D/\\\n/g' tempfile.tmp
tail -nX tempfile.tmp
and in R Markdown:
```{sh, results="hide"}
FunctionWithProgressBar -option 2> tempfile.tmp
```
```{sh, echo=FALSE}
sed -ibak $'s/\x0D/\\\n/g' tempfile.tmp
tail -nX tempfile.tmp
```
note that matching the backspace is a pain in the butt (especially on osx) and changes between platforms.
The progress bar is probably in the sterr stream, so you capture it with "2>" and not ">" so you could capture stderr and stdout separately, e.g.:
usearch blablabla 2> only_err > only_stdout
Or if you want all of the output together, you have to redirect stderr to stdout, and do an append, as such:
usearch blablabla >> total_output 2>&1
As for the R-markdown part, I cannot really help, never used, sorry.
regards,
Moritz
Related
I am using the tikzmagic extension with a Jupyter notebook to embed some TikZ diagrams into the notebook. (I am open to alternatives if there is a better way.)
In one cell, I create an iPython variable preamble like so:
preamble=r'''\tikzset{terminal/.style={
rectangle, minimum size=6mm, rounded corners=3mm, very thick, draw=black!50,
top color=white, bottom color=black!20, font=\ttfamily}}'''
In a subsequent cell, I try to use that variable like this:
%%tikz -f svg -l calc,positioning,shapes.misc -x $preamble
But that ends up generating LaTeX code like
% ⋮
\usetikzlibrary{shapes.misc}
\tikzset{terminal/.style={rectangle,
\begin{document}
% ⋮
It seems to terminate the argument at the ␣ (<space>). If I use
%%tikz -f svg -l calc,positioning,shapes.misc -x "$preamble"
It generates LaTeX code like
% ⋮
\usetikzlibrary{shapes.misc}
"\tikzset{terminal/.style={rectangle, minimum size=6mm, rounded corners=3mm, very thick, draw=black!50,
top color=white, bottom color=black!20, font=\ttfamily}}"
\begin{document}
% ⋮
My apologies if this is the wrong place to ask, but I thought TeX people might have encountered this problem, even though the fault is probably mine or in the Python source.
I have a constantly updating huge log file (MainLog).
I want to create another file which is only the last n lines of the log file BUT also updating.
If I use:
tail -f MainLog > RecentLog
I get ALMOST what I want except RecentLog is written as MainLog is available and might at any point only have part of the last MainLog line.
How can I specify to tail that I only want it to write when a WHOLE line is available?
By default, tail outputs whole lines unless you use the -c switch to count characters. Something like
tail -n 20 -f MainLog > RecentLog
(substituting the number of lines you want prepended to the second file for "20") should work as you want.
But if if doesn't, it is possible that using grep to line-buffer your output will fix this condition. See this question.
After many attempts, the only solution for multiple files that worked (fantastically well) for me is the fdlinecombine command. It's a small binary that reads multiple file descriptors and prints data to stdout linewise.
My use case is spawning multiple long-running ssh commands in the background and following their output, without having the lines garbled or interrupted in between.
As I'm learning more about UNIX commands I started working with sed at work. Sed's design reads a file in line by line, and executes commands on each line individually.
How does grep process files? I've tried various ways of googling "does grep process line by line" and nothing really concrete shows up.
From Why GNU grep is fast :
Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!
and then
Don't look for newlines in the input until after you've found a match.
EDIT:
I will correct myself. It is neither line by line nor full file, its in terms of chunks of data which are placed into the buffer.
More details are here http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
The regular expression you pass to grep doesn't have any way of specifying newlines (although you can specify matches against the start or end of a line).
So it appears to work line by line, even though actually it may not treat line ends differently to other characters.
I am trying to make a bunch of files in my directory, but the files are generating ~200 lines of errors, so they fly past my terminal screen too quickly and I have to scroll up to read them.
I'd like to pipe the output that displays on the screen to a pager that will let me read the errors starting at the beginning. But when I try
make | less
less does not display the beginning of the output - it displays the end of the output that's usually piped to the screen, and then tells me the output is 1 line long. When I try typing Gg, the only line on the screen is the line of the makefile that executed, and the regular screen output disappears.
Am I using less incorrectly? I haven't really ever used it before, and I'm having similar problems with something like, sh myscript.sh | less where it doesn't immediately display the beginning of the output file.
The errors from make appear on the standard error stream (stderr in C), which is not redirected by normal pipes. If you want to have it redirected to less as well, you need either make |& less (csh, etc.) or make 2>&1 | less (sh, bash, etc.).
Error output is sent to a slightly different place which isn't caught by normal pipelines, since you often want to see errors but not have them intermixed with data you're going to process further. For things like this you use a redirection:
$ make 2>&1 | less
In bash and zsh (and csh/tcsh, which is where they borrowed it from) this can be shortened to
$ make |& less
With things like make which are prone to produce lots of errors I will want to inspect later, I generally capture the output to a file and then less that file later:
$ make |& tee make.log
$ less make.log
I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.
Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...
Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.
you are also use awk
awk '/start/,/end/' file
I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)
use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end