How can I tail -f but only in whole lines? - unix

I have a constantly updating huge log file (MainLog).
I want to create another file which is only the last n lines of the log file BUT also updating.
If I use:
tail -f MainLog > RecentLog
I get ALMOST what I want except RecentLog is written as MainLog is available and might at any point only have part of the last MainLog line.
How can I specify to tail that I only want it to write when a WHOLE line is available?

By default, tail outputs whole lines unless you use the -c switch to count characters. Something like
tail -n 20 -f MainLog > RecentLog
(substituting the number of lines you want prepended to the second file for "20") should work as you want.
But if if doesn't, it is possible that using grep to line-buffer your output will fix this condition. See this question.

After many attempts, the only solution for multiple files that worked (fantastically well) for me is the fdlinecombine command. It's a small binary that reads multiple file descriptors and prints data to stdout linewise.
My use case is spawning multiple long-running ssh commands in the background and following their output, without having the lines garbled or interrupted in between.

Related

Is there a way to skip first x lines of a bz2 file in Python without calling next()?

I'm trying to read the latest Wikidata dump while skipping the first, say, 100 lines.
Is there a better way to do this than calling next() repeatedly?
WIKIDATA_JSON_DUMP = bz2.open('latest-all.json.bz2', 'rt')
for n in range(100):
next(WIKIDATA_JSON_DUMP)
Alternatively, is there a way to split up the file in bash by, say, using bzcat to pipe select chunks to smaller files?
If it was compressed using something like bgzip, you can skip blocks, but they will contain a variable number of lines, depending on the compression ratio. For raw bzip files which are a single stream, I don't think you have any choice but to read and throw away the lines to be skipped, due to the nature of the compression format.
You can try the following in bash, to skip the first 10 lines for example:
bzcat -d -c /tmp/myfile.bz2 | tail -n +11
Notice the tail gets the N+1 number of lines you want to skip.

The best way in Unix to add a header to multiple files in a directory?

Before anyone else checks, I am confident this is not a duplicate of the existing question of how to add a header in Unix to multiple files (the question is here: Adding header into multiple text files). This is more about optimisation of a solution I am currently using for this current issue.
I have numerous directories in which I have over 20000 files and for each file I want to add the same header.
What I have been doing is:
sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt
Now, this does work exactly as I want it to, but there have been a couple of issues.
First is that this seems to be an extremely slow method of doing this and it can take a pretty long time to get through all 20K+ files.
Second, and more frustratingly, occasionally my connection to the server I am using has timed out during this long process meaning that the command won't finish running, so I end up with half the files having the header and half not. And if I started from the top again this would mean a number of the files would have the header twice so I actually have to go through a process of creating them again so I can add the header all at once.
So, what I am wondering is if there is a better/quicker solution to this problem. The question I linked above seems like it would actually be slower (given that it seems like there is more the command line needs to do at each file as it is going through a loop) and so doesn't seem like it would fix this.
Don't use -i. It confuses things when you get interrupted. Instead, use
mkdir -p ../output-dir
for file in *.txt; do
sed '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' "$file" > ../output-dir/"$file"
done
When you're done, you can rename the directories if you wish. This doesn't address the connection issue (ThoriumBR's suggestion of nohup is good for that), but when it happens you can recover state more easily.
First, adding a header is slow. You have to move the entire file contents to add something at the start. Adding a trailer would be very fast.
Second, use nohup:
nohup - run a command immune to hangups, with output to a non-tty
Using nohup sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt will keep the command running on the background even if the server times you out.

Does grep process line by line or entire file?

As I'm learning more about UNIX commands I started working with sed at work. Sed's design reads a file in line by line, and executes commands on each line individually.
How does grep process files? I've tried various ways of googling "does grep process line by line" and nothing really concrete shows up.
From Why GNU grep is fast :
Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!
and then
Don't look for newlines in the input until after you've found a match.
EDIT:
I will correct myself. It is neither line by line nor full file, its in terms of chunks of data which are placed into the buffer.
More details are here http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
The regular expression you pass to grep doesn't have any way of specifying newlines (although you can specify matches against the start or end of a line).
So it appears to work line by line, even though actually it may not treat line ends differently to other characters.

viewing the updated data in a file

If i have file for eg: a log file created by any background process on unix,
how do i view the data that getting updated each and every time.
i know that i can use tail command to see the file.but let's say i have used
tail -10 file.txt it will give the last 10 lines.but if lets say at one time 10 lines got added and at the next instance it has been added 2 lines. now from the tail command i can see previous 8 lines also.i don't want this.i want only those two lines which were added.
In short i want to view only those lines which were appended.how can i do this?
tail -f file.txt
That always outputs the last added lines immediately.

Remove lines which are between given patterns from a file (using Unix tools)

I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.
Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...
Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.
you are also use awk
awk '/start/,/end/' file
I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)
use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end

Resources