Converting a file of dates into unix time with awk

Converting a file of dates into unix time with awk - unix

I have a file that contains dates and lat longs and I want to convert the dates that are in UTC to unix time. I decided to use the unix date function within an awk script to do this. I tried:
awk 'BEGIN {t=$3; c="date -j -f %Y%j%H%M%S "t" +%s"; c|getline; close( c ); print $1, $2, c; }' test.txt
where t is the third row in my file and is in the format 2014182120311. I am new to awk/scripting and not sure if it is possible to imbed an awk variable into a unix command inside an awk line.
When I run the script I get the error:
usage: date [-jnu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
So I think I am not defining "t" properly. Any help is much appreciated!
On a side note- I have tried mktime and the other awk time functions but they do not work for me I believe because I do not have gawk.

Do you not have the option of installing gawk? You'll thank yourself later (e.g. see Converting list of dates from a file to timestamp with bash).
The BEGIN section is executed before any file is opened, so $3 (the 3rd field of the current record read from the current file) has no meaning in that section and so t=$3 just sets the variable t to the null string.
You don't want to print the command, c, you want to print the result of executing the command as read by getline but when you use undirected getline it overwrites $0, $1, etc. so you should save the result in a var (see http://awk.info/?tip/getline).
You probably want something like this:
awk '{
cmd = "date -j -f %Y%j%H%M%S " $3 " +%s"
if ( (cmd | getline time) > 0 ) {
print $1, $2, time
}
close(cmd)
}' test.txt
but if not, post some sample input and expected output so we can help you.

Related

Awk command to perform action on lines excluding 1st and last

I have multiple MS excel files in csv format in a particular directory.
I want to update the value of one particular column in all the rows of the csv files.
Also, the action should not be operated on 1st and last line.
So far I have come up with below code for one row:
awk -F, 'NR>2{$2=300;}1' OFS=, test.csv
But i am facing difficulty in excluding the last line.
Also, i need to perform the same for all the files in the directory.
So far tried the below but not able to succeed to replace that string value using awk.
1)
2)

This may do:
awk -F, 't{print t} {a=t=$0} NR>1{$2=300;t=$0} END {print a}' OFS=, test.csv

$ cat file
1,a,b
2,c,d
3,e,f
$ awk 'BEGIN{FS=OFS=","} NR>1{print (NR>2 ? chgd : orig)} {orig=$0; $2=300; chgd=$0} END{print orig}' file
1,a,b
2,300,d
3,e,f

You could simplify the script a bit by reading the file twice:
awk 'BEGIN{FS=OFS=","} NR==FNR {c=NR;next} !(FNR==1||FNR==c){$2=200} 1' file file
This uses the NR==FNR section merely to count lines, giving you a simple expression for determining whether to update the field in question.
And if you have GNU awk available, you might save a few CPU cycles by not reassigning the c variable for every line, using something like this:
gawk 'BEGIN{FS=OFS=","} ENDFILE {c=FNR} NR==FNR{next} !(FNR==1||FNR==c){$2=200} 1' file file
This still reads the file twice, but assigns c only after each file is read.
If you want, you can emulate the ENDFILE condition in non-GNU awk using NR>FNR && FNR==1 if you only have two files, then set c=NR-1. It won't perform as well.
I haven't tested the speed difference between these two, but I suspect it would be negligible except in cases of truly obscenely large files.

Thanks all,
I got to make it work. Below is the command:
awk -v sq="" -F, 't{print t} {a=t=$0} NR>2{$3=sq"ops_data"sq;t=$0} END {print a}' OFS=, test1.csv

Unix command to replace first column of a .csv file

I want a unix command (that I will call in a ControlM job) that changes the value of the first column of my .csv file (not the header line), with the date of the previous day (expected format : YYYY-MM-DD).
I tried many commands but none of them do want I want :
tmp=$(mktemp) && awk -F\| -v val=`date -d yesterday +%F` 'NR>1 {gsub($1,val)}' file.csv > "$tmp" && mv "$tmp" file.csv
or :
awk -F\| -v val=`date -d yesterday +%F` '{gsub($1, val)}1' file.csv
even tried gensub but not working.
Example of what I want :
Input :
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-05;2017-11-15;BRIDGE;HELLO
2019-03-05;2018-03-17;WORK;DATA
Output I want (as today is 2019-03-07):
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-06;2017-11-15;BRIDGE;HELLO
2019-03-06;2018-03-17;WORK;DATA
Can you help please and give me examples of commands that should work, I'm not finding a solution.
Thanks a lot

Could you please try following first?(not saving output into file.csv itself it will print output on terminal once happy then you could use answer
provided at last of this post)
awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv
Problems identified in OP's code(and fixed in my suggestion):
1- Use of backtick is depreciated now to save shell variable's values, so instead use val=$(date....) for declaring awk's variable named val.
2- Use of -F, you have set your field separator as \| which is pipe but when we see your provided sample Input_file carefully it is delimited with ;(semi colon) NOT | so that is also one of the reason why it is not reflecting in output.
3- Since use of gsub($1,val), replaces whole line to only with value of variable val
because
syntax of gsub is something like: gsub(your_regex/value_needs_to_be_replaced,"new_value"/variable_which_should_be_there_after_replacement,current_line/variable). Since you have defined wrong field separator so whole line being treated as $1 and thus when you print it by doing awk -F\| -v val=$(date -d yesterday +%F) 'NR>1 {gsub($1,val)} 1' file.csv it will only print previous dates.
4- 4th and main issue is you have NOT printed anything, so even you did mistakes you will NOT see any output either on terminal or in output file.
If happy then you could run your own command to make changes into Input_file itself.(I am assuming that you are having propervaluein your tmp variable here)
tmp=$(mktemp) && awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv > "$tmp" && mv "$tmp" file.csv

Create file name on basis of output of the command

I have a question on creating file name on basis of output we get on running a command. Below is the example
Have 2 records like below
cat test1.txt
Unable to find Solution
Can you please help
And I am running below command to get the last word from the first line and i want to have file name to be that name(Last word name)
cat test1.txt | head -1 | awk '{ print $NF }'
Solution
Can you please help me to get the file name as a last word name

When you want to redirect your output to a file and have the filename from the first line of your output, you can use
outfile=""
your_command | while read line; do
if [ -z "${outfile}" ]; then
outfile=$(echo "${line}"| awk '{ print $NF }')
fi
echo "${line}" >> ${outfile}
done

Variable pattern matching awk

I have an input file "myfile" and a shell variable "VAR" whose
contents are shown below:
# cat myfile
Hello World
Hello Universe
# echo $VAR
World
I need to process all lines in "myfile" which contains pattern "World"
using awk. It works if i do as shown below:
# awk '/World/ { print $0 }' myfile
But i could not use VAR to do the same operation. I tried the
following even knowing that these will not work:
# awk '/$VAR/ { print $0 }' myfile
and
# awk -v lvar=$VAR '/lvar/ { print $0 }' myfile
and
# awk -v lvar=$VAR 'lvar { print $0 }' myfile
Please let me know how to match the contents of VAR in awk.
Thanks in advance

You could use:
awk -v lvar="$VAR" '$0~lvar {print}' myfile
Or (works but not recommended):
awk "/$VAR/ {print}" myfile

This is probably what you want:
awk -v lvar="$VAR" '$0~lvar' myfile
Never do
awk "/$VAR/" myfile
or any other syntax that expands shell variables inline within the awk script such that they become part of the script as it can produce all sorts of bizarre failures and error messages depending on how the shell variable is populated since with that syntax there is no variable inside awk but the shell variable contents are instead integrated into the script as if you'd hard-coded it.
Always do the above using -v or
awk 'BEGIN{lvar=ARGV[1]; delete ARGV[1]} $0~lvar' "$VAR" myfile
depending on your requirements for the shell expanding backslashes. See http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details.
Only do
awk '$0~lvar' lvar="$VAR" myfile
if you are processing multiple files, need to change initial values of the variable between files, and do not have gawk for BEGINFILE or need to populate those values from environment variables.
Note that all of the above do a regexp match on lvar, if you need a string match instead than use awk -v lvar="$VAR" 'index($0,lvar)'.

If you export your variable, it becomes a simpler solution:
export VAR
awk '$0~ENVIRON["VAR"]' myfile
Takes out any issues of reprocessing the variable's value.

Is there a Unix utility to prepend timestamps to stdin?

I ended up writing a quick little script for this in Python, but I was wondering if there was a utility you could feed text into which would prepend each line with some text -- in my specific case, a timestamp. Ideally, the use would be something like:
cat somefile.txt | prepend-timestamp
(Before you answer sed, I tried this:
cat somefile.txt | sed "s/^/`date`/"
But that only evaluates the date command once when sed is executed, so the same timestamp is incorrectly prepended to each line.)

ts from moreutils will prepend a timestamp to every line of input you give it. You can format it using strftime too.
$ echo 'foo bar baz' | ts
Mar 21 18:07:28 foo bar baz
$ echo 'blah blah blah' | ts '%F %T'
2012-03-21 18:07:30 blah blah blah
$
To install it:
sudo apt-get install moreutils

Could try using awk:
<command> | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }'
You may need to make sure that <command> produces line buffered output, i.e. it flushes its output stream after each line; the timestamp awk adds will be the time that the end of the line appeared on its input pipe.
If awk shows errors, then try gawk instead.

annotate, available via that link or as annotate-output in the Debian devscripts package.
$ echo -e "a\nb\nc" > lines
$ annotate-output cat lines
17:00:47 I: Started cat lines
17:00:47 O: a
17:00:47 O: b
17:00:47 O: c
17:00:47 I: Finished with exitcode 0

Distilling the given answers to the simplest one possible:
unbuffer $COMMAND | ts
On Ubuntu, they come from the expect-dev and moreutils packages.
sudo apt-get install expect-dev moreutils

How about this?
cat somefile.txt | perl -pne 'print scalar(localtime()), " ";'
Judging from your desire to get live timestamps, maybe you want to do live updating on a log file or something? Maybe
tail -f /path/to/log | perl -pne 'print scalar(localtime()), " ";' > /path/to/log-with-timestamps

Kieron's answer is the best one so far. If you have problems because the first program is buffering its out you can use the unbuffer program:
unbuffer <command> | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; }'
It's installed by default on most linux systems. If you need to build it yourself it is part of the expect package
http://expect.nist.gov

Just gonna throw this out there: there are a pair of utilities in daemontools called tai64n and tai64nlocal that are made for prepending timestamps to log messages.
Example:
cat file | tai64n | tai64nlocal

Use the read(1) command to read one line at a time from standard input, then output the line prepended with the date in the format of your choosing using date(1).
$ cat timestamp
#!/bin/sh
while read line
do
echo `date` $line
done
$ cat somefile.txt | ./timestamp

I'm not an Unix guy, but I think you can use
gawk '{print strftime("%d/%m/%y",systime()) $0 }' < somefile.txt

#! /bin/sh
unbuffer "$#" | perl -e '
use Time::HiRes (gettimeofday);
while(<>) {
($s,$ms) = gettimeofday();
print $s . "." . $ms . " " . $_;
}'

$ cat somefile.txt | sed "s/^/`date`/"
you can do this (with gnu/sed):
$ some-command | sed "x;s/.*/date +%T/e;G;s/\n/ /g"
example:
$ { echo 'line1'; sleep 2; echo 'line2'; } | sed "x;s/.*/date +%T/e;G;s/\n/ /g"
20:24:22 line1
20:24:24 line2
of course, you can use other options of the program date. just replace date +%T with what you need.

Here's my awk solution (from a Windows/XP system with MKS Tools installed in the C:\bin directory). It is designed to add the current date and time in the form mm/dd hh:mm to the beginning of each line having fetched that timestamp from the system as each line is read. You could, of course, use the BEGIN pattern to fetch the timestamp once and add that timestamp to each record (all the same). I did this to tag a log file that was being generated to stdout with the timestamp at the time the log message was generated.
/"pattern"/ "C\:\\\\bin\\\\date '+%m/%d %R'" | getline timestamp;
print timestamp, $0;
where "pattern" is a string or regex (without the quotes) to be matched in the input line, and is optional if you wish to match all input lines.
This should work on Linux/UNIX systems as well, just get rid of the C\:\\bin\\ leaving the line
"date '+%m/%d %R'" | getline timestamp;
This, of course, assumes that the command "date" gets you to the standard Linux/UNIX date display/set command without specific path information (that is, your environment PATH variable is correctly configured).

Mixing some answers above from natevw and Frank Ch. Eigler.
It has milliseconds, performs better than calling a external date command each time and perl can be found in most of the servers.
tail -f log | perl -pne '
use Time::HiRes (gettimeofday);
use POSIX qw(strftime);
($s,$ms) = gettimeofday();
print strftime "%Y-%m-%dT%H:%M:%S+$ms ", gmtime($s);
'
Alternative version with flush and read in a loop:
tail -f log | perl -pne '
use Time::HiRes (gettimeofday); use POSIX qw(strftime);
$|=1;
while(<>) {
($s,$ms) = gettimeofday();
print strftime "%Y-%m-%dT%H:%M:%S+$ms $_", gmtime($s);
}'

caerwyn's answer can be run as a subroutine, which would prevent the new processes per line:
timestamp(){
while read line
do
echo `date` $line
done
}
echo testing 123 |timestamp

Disclaimer: the solution I am proposing is not a Unix built-in utility.
I faced a similar problem a few days ago. I did not like the syntax and limitations of the solutions above, so I quickly put together a program in Go to do the job for me.
You can check the tool here: preftime
There are prebuilt executables for Linux, MacOS, and Windows in the Releases section of the GitHub project.
The tool handles incomplete output lines and has (from my point of view) a more compact syntax.
<command> | preftime
It's not ideal, but I though I'd share it in case it helps someone.

The other answers mostly work, but have some drawbacks. In particular:
Many require installing a command not commonly found on linux systems, which may not be possible or convenient.
Since they use pipes, they don't put timestamps on stderr, and lose the exit status.
If you use multiple pipes for stderr and stdout, then some do not have atomic printing, leading to intermingled lines of output like [timestamp] [timestamp] stdout line \nstderr line
Buffering can cause problems, and unbuffer requires an extra dependency.
To solve (4), we can use stdbuf -i0 -o0 -e0 which is generally available on most linux systems (see How to make output of any shell command unbuffered?).
To solve (3), you just need to be careful to print the entire line at a time.
Bad: ruby -pe 'print Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \")' (Prints the timestamp, then prints the contents of $_.)
Good: ruby -pe '\$_ = Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \") + \$_' (Alters $_, then prints it.)
To solve (2), we need to use multiple pipes and save the exit status:
alias tslines-pipe="stdbuf -i0 -o0 ruby -pe '\$_ = Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \") + \$_'"
function tslines() (
stdbuf -o0 -e0 "$#" 2> >(tslines-pipe) > >(tslines-pipe)
status="$?"
exit $status
)
Then you can run a command with tslines some command --options.
This almost works, except sometimes one of the pipes takes slightly longer to exit and the tslines function has exited, so the next prompt has printed. For example, this command seems to print all the output after the prompt for the next line has appeared, which can be a bit confusing:
tslines bash -c '(for (( i=1; i<=20; i++ )); do echo stderr 1>&2; echo stdout; done)'
There needs to be some coordination method between the two pipe processes and the tslines function. There are presumably many ways to do this. One way I found is to have the pipes send some lines to a pipe that the main function can listen to, and only exit after it's received data from both pipe handlers. Putting that together:
alias tslines-pipe="stdbuf -i0 -o0 ruby -pe '\$_ = Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \") + \$_'"
function tslines() (
# Pick a random name for the pipe to prevent collisions.
pipe="/tmp/pipe-$RANDOM"
# Ensure the pipe gets deleted when the method exits.
trap "rm -f $pipe" EXIT
# Create the pipe. See https://www.linuxjournal.com/content/using-named-pipes-fifos-bash
mkfifo "$pipe"
# echo will block until the pipe is read.
stdbuf -o0 -e0 "$#" 2> >(tslines-pipe; echo "done" >> $pipe) > >(tslines-pipe; echo "done" >> $pipe)
status="$?"
# Wait until we've received data from both pipe commands before exiting.
linecount=0
while [[ $linecount -lt 2 ]]; do
read line
if [[ "$line" == "done" ]]; then
((linecount++))
fi
done < "$pipe"
exit $status
)
That synchronization mechanism feels a bit convoluted; hopefully there's a simpler way to do it.

doing it with date and tr and xargs on OSX:
alias predate="xargs -I{} sh -c 'date +\"%Y-%m-%d %H:%M:%S\" | tr \"\n\" \" \"; echo \"{}\"'"
<command> | predate
if you want milliseconds:
alias predate="xargs -I{} sh -c 'date +\"%Y-%m-%d %H:%M:%S.%3N\" | tr \"\n\" \" \"; echo \"{}\"'"
but note that on OSX, date doesn't give you the %N option, so you'll need to install gdate (brew install coreutils) and so finally arrive at this:
alias predate="xargs -I{} sh -c 'gdate +\"%Y-%m-%d %H:%M:%S.%3N\" | tr \"\n\" \" \"; echo \"{}\"'"

No need to specify all the parameters in strftime() unless you really want to customize the outputting format :
echo "abc 123 xyz\njan 765 feb" \
\
| gawk -Sbe 'BEGIN {_=strftime()" "} sub("^",_)'
Sat Apr 9 13:14:53 EDT 2022 abc 123 xyz
Sat Apr 9 13:14:53 EDT 2022 jan 765 feb
works the same if you have mawk 1.3.4. Even on awk-variants without the time features, a quick getline could emulate it :
echo "abc 123 xyz\njan 765 feb" \
\
| mawk2 'BEGIN { (__="date")|getline _;
close(__)
_=_" " } sub("^",_)'
Sat Apr 9 13:19:38 EDT 2022 abc 123 xyz
Sat Apr 9 13:19:38 EDT 2022 jan 765 feb
If you wanna skip all that getline and BEGIN { }, then something like this :
mawk2 'sub("^",_" ")' \_="$(date)"

If the value you are prepending is the same on every line, fire up emacs with the file, then:
Ctrl + <space>
at the beginning of the of the file (to mark that spot), then scroll down to the beginning of the last line (Alt + > will go to the end of file... which probably will involve the Shift key too, then Ctrl + a to go to the beginning of that line) and:
Ctrl + x r t
Which is the command to insert at the rectangle you just specified (a rectangle of 0 width).
2008-8-21 6:45PM <enter>
Or whatever you want to prepend... then you will see that text prepended to every line within the 0 width rectangle.
UPDATE: I just realized you don't want the SAME date, so this won't work... though you may be able to do this in emacs with a slightly more complicated custom macro, but still, this kind of rectangle editing is pretty nice to know about...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Converting a file of dates into unix time with awk - unix

Related

Awk command to perform action on lines excluding 1st and last

Unix command to replace first column of a .csv file

Create file name on basis of output of the command

Variable pattern matching awk

Is there a Unix utility to prepend timestamps to stdin?

Categories

Resources