I have two text files:
f1.txt
boom Boom pow
Lazy dog runs.
The Grass is Green
This is TEST
Welcome
and
f2.txt
Welcome
I am lazy
Welcome, Green
This is my room
Welcome
bye
In Ubuntu Command Line I am trying:
awk 'BEGIN {RS=" "}FNR==NR {a[$1]=NR; next} $1 in a' f1.txt f2.txt
and getting output:
Green
This
is
My desired output is:
lazy
Green
This is
Welcome
Description: I want to compare two txt files, line by line. Then I want to output all duplicate words. The matches should be not case sensitive. Also, comparing line by line would be better instead of looking for a match from f1.txt in a whole f2.txt file. In example, the word "Welcome" should not be in desired output if it was on line 6 instead of line 5 in f2.txt
Well, then. With awk:
awk 'NR == FNR { for(i = 1; i <= NF; ++i) { a[NR,tolower($i)] = 1 }; next } { flag = 0; for(i = 1; i <= NF; ++i) { if(a[FNR,tolower($i)]) { printf("%s%s", flag ? OFS : "", $i); flag = 1 } } if(flag) print "" }' f1.txt f2.txt
This works as follows:
NR == FNR { # While processing the first file:
for(i = 1; i <= NF; ++i) { # Remember which fields were in
a[NR,tolower($i)] = 1 # each line (lower-cased)
}
next # Do nothing else.
}
{ # After that (when processing the
# second file)
flag = 0 # reset flag so we know we haven't
# printed anything yet
for(i = 1; i <= NF; ++i) { # wade through fields (words)
if(a[FNR,tolower($i)]) { # if this field was in the
# corresponding line in the first
# file, then
printf("%s%s", flag ? OFS : "", $i) # print it (with a separator if it
# isn't the first)
flag = 1 # raise flag
}
}
if(flag) { # and if we printed anything
print "" # add a newline at the end.
}
}
I need to transform elements from an array to column index and return the value of $3 for each column index.
I donĀ“t have access to gawk 4 so I cannot work with real multidimensional arrays.
Input
Name^Code^Count
Name1^0029^1
Name1^0038^1
Name1^0053^1
Name2^0013^3
Name2^0018^3
Name2^0023^5
Name2^0025^1
Name2^0029^1
Name2^0038^1
Name2^0053^1
Name3^0018^1
Name3^0060^1
Name4^0018^2
Name4^0025^5
Name5^0018^2
Name5^0025^1
Name5^0060^1
Desired output
Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^
Name2^3^3^5^1^1^1^1^
Name3^^1^^^^^^1
Name4^^2^^5^^^^
Name5^^^^1^^^^1
Any suggestions on how to tackle this task without using real multidimensional arrays?
The following solution uses GNU awk v3.2 features for sorting. This does not use multi-dimensional arrays. It only simulates one.
awk -F"^" '
NR>1{
map[$1,$2] = $3
name[$1]++
value[$2]++
}
END{
printf "Name"
n = asorti(value, v_s)
for(i=1; i<=n; i++) {
printf "%s%s", FS, v_s[i]
}
print ""
m = asorti(name, n_s)
for(i=1; i<=m; i++) {
printf "%s", n_s[i]
for(j=1; j<=n; j++) {
printf "%s%s", FS, map[n_s[i],v_s[j]]
}
print ""
}
}' file
Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^
Name2^3^3^5^1^1^1^1^
Name3^^1^^^^^^1
Name4^^2^^5^^^^
Name5^^2^^1^^^^1
This will work with any awk and will order the output of counts numerically while keeping the names in the order they occur in your input file:
$ cat tst.awk
BEGIN{FS="^"}
NR>1 {
if (!seenNames[$1]++) {
names[++numNames] = $1
}
if (!seenCodes[$2]++) {
# Insertion Sort - start at the end of the existing array and
# move everything greater than the current value down one slot
# leaving open the slot for the current value to be inserted between
# the last value smaller than it and the first value greater than it.
for (j=++numCodes;codes[j-1]>$2+0;j--) {
codes[j] = codes[j-1]
}
codes[j] = $2
}
count[$1,$2] = $3
}
END {
printf "%s", "Name"
for (j=1;j<=numCodes;j++) {
printf "%s%s",FS,codes[j]
}
print ""
for (i=1;i<=numNames;i++) {
printf "%s", names[i]
for (j=1;j<=numCodes;j++) {
printf "%s%s",FS,count[names[i],codes[j]]
}
print ""
}
}
...
$ awk -f tst.awk file
Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^
Name2^3^3^5^1^1^1^1^
Name3^^1^^^^^^1
Name4^^2^^5^^^^
Name5^^2^^1^^^^1
Since you only have two "dimensions", it is easy enough to use one array for each dimension and a joining array with a calculated column name. I didn't do the sorting of columns or rows, but the idea is pretty basic.
#!/usr/bin/awk -f
#
BEGIN { FS = "^" }
(NR == 1) {next}
{
rows[$1] = 1
columns[$2] = 1
join_table[$1 "-" $2] = $3
}
END {
printf "Name"
for (col_name in columns) {
printf "^%s", col_name
}
printf "\n"
for (row_name in rows) {
printf row_name
for (col_name in columns) {
printf "^%s", join_table[row_name "-" col_name]
}
printf "\n"
}
}
I have the following YACC parser
%start Start
%token _DTP_LONG // Any number; Max upto 4 Digits.
%token _DTP_SDF // 17 Digit number indicating SDF format of Date Time
%token _DTP_EOS // end of input
%token _DTP_MONTH //Month names e.g Jan,Feb
%token _DTP_AM //Is A.M
%token _DTP_PM //Is P.M
%%
Start : DateTimeShortExpr
| DateTimeLongExpr
| SDFDateTimeExpr EOS
| DateShortExpr EOS
| DateLongExpr EOS
| MonthExpr EOS
;
DateTimeShortExpr : DateShortExpr TimeExpr EOS {;}
| DateShortExpr AMPMTimeExpr EOS {;}
;
DateTimeLongExpr : DateLongExpr TimeExpr EOS {;}
| DateLongExpr AMPMTimeExpr EOS {;}
;
DateShortExpr : Number { rc = vDateTime.SetDate ((Word) $1, 0, 0);
}
| Number Number { rc = vDateTime.SetDate ((Word) $1, (Word) $2, 0); }
| Number Number Number { rc = vDateTime.SetDate ((Word) $1, (Word) $2, (Word) $3); }
;
DateLongExpr : Number AbsMonth { // case : number greater than 31, consider as year
if ($1 > 31) {
rc = vDateTime.SetDateFunc (1, (Word) $2, (Word) $1);
}
// Number is considered as days
else {
rc = vDateTime.SetDateFunc ((Word) $1, (Word) $2, 0);
}
}
| Number AbsMonth Number {rc = vDateTime.SetDateFunc((Word) $1, (Word) $2, (Word) $3);}
;
TimeExpr : Number { rc = vDateTime.SetTime ((Word) $1, 0, 0);}
| Number Number { rc = vDateTime.SetTime ((Word) $1, (Word) $2, 0); }
| Number Number Number { rc = vDateTime.SetTime ((Word) $1, (Word) $2, (Word) $3); }
;
AMPMTimeExpr : TimeExpr _DTP_AM { rc = vDateTime.SetTo24hr(TP_AM) ; }
| TimeExpr _DTP_PM { rc = vDateTime.SetTo24hr(TP_PM) ; }
| _DTP_AM TimeExpr { rc = vDateTime.SetTo24hr(TP_AM) ; }
| _DTP_PM TimeExpr { rc = vDateTime.SetTo24hr(TP_PM) ; }
;
SDFDateTimeExpr : SDFNumber { rc = vDateTime.SetSDF ($1);}
;
MonthExpr : AbsMonth { rc = vDateTime.SetNrmMth ($1);}
| AbsMonth Number { rc = vDateTime.Set ($1,$2);}
;
Number : _DTP_LONG { $$ = $1; }
;
SDFNumber : _DTP_SDF { $$ = $1; }
;
EOS : _DTP_EOS { $$ = $1; }
;
AbsMonth : _DTP_MONTH { $$ = $1; }
;
%%
It is giving three shift reduce conflicts.How can i remove them????
The shift-reduce conflicts are inherent in the "little language" that your grammar describes. Consider the stream of input tokens
_DTP_LONG _DTP_LONG _DTP_LONG EOS
Each _DTP_LONG can be reduced as a Number. But should
Number Number Number
be reduced as a 1-number DateShortExpr followed by a 2-number TimeExpr or as a 2-number DateShortExpr followed by a 1-number TimeShortExpr? The ambiguity is built in.
If possible, redesign your language by adding additional symbols to distinguish dates from times--colons to set off the parts of a time and slashes to set off the parts of a date, for instance.
Update
I don't think that you can use yacc/bison's precedence features here, because the tokens are indistinguishable.
You will have to rely on yacc/bison's default behavior when it encounters a shift/reduce conflict, that is, to shift rather than reduce. Consider this example in your output:
+------------------------- STATE 9 -------------------------+
+ CONFLICTS:
? sft/red (shift & new state 12, rule 11) on _DTP_LONG
+ RULES:
DateShortExpr : Number^ (rule 11)
DateShortExpr : Number^Number
DateShortExpr : Number^Number Number
DateLongExpr : Number^AbsMonth
DateLongExpr : Number^AbsMonth Number
+ ACTIONS AND GOTOS:
_DTP_LONG : shift & new state 12
_DTP_MONTH : shift & new state 13
: reduce by rule 11
Number : goto state 26
AbsMonth : goto state 27
What the parser will do is to shift and apply rule 12, rather than reduce by rule 11 (DateShortExpr : Number). This means the parser will never interpret a single Number as a DateShortExpr; it will always shift.
And a difficulty with relying on the default behavior is that it might change as you make modifications to your grammar.
I have an experimental flex source file(lex.l):
%option noyywrap
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
delim [ \t\n]
ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(.{digit}+)?(E[+-]?{digit}+)?
%%
{letter}+ { words++; chars += strlen(yytext); printf("Word\n"); }
\n { chars++; lines++; printf("Line\n"); }
. { chars++; printf("SomethingElse\n"); }
%%
int main(argc, argv)
int argc;
char **argv;
{
if(argc > 1)
{
if(!(yyin = fopen(argv[1], "r")))
{
perror(argv[1]);
return (1);
}
}
yylex();
printf("lines: %8d\nwords: %8d\nchars: %8d\n", lines, words, chars);
}
I created an input file called "input.txt" with "red apple" written in it. Command line:
$ flex lex.l
$ cc lex.yy.c
$ ./a.out < input.txt
Word
SomethingElse
Word
Line
lines: 1
words: 2
chars: 10
Since there is no newline character in the input file, why the "\n" in lex.l is pattern matched? (The "lines" is supposed to be 0, and the "chars" is supposed to be 9)
(I am using OS X.)
Thanks for your time.
It is very possible that your text editor has automatically inserted a newline at the end of the file.
On Unix, is there a command to display a file's modification time, precise to the second?
On Linux this is easily done with a "stat -c %y", which returns something like 2009-11-27 11:36:06.000000000 +0100. I found no equivalent on Unix.
I found this:
ls --time-style='+%d-%m-%Y %H:%M:%S' -l
Which exports something like this:
root:~# ls --time-style='+%d-%m-%Y %H:%M:%S' -l
total 0
-rw-r--r-- 1 root root 0 16-04-2015 23:14:02 other-file.txt
-rw-r--r-- 1 root root 0 16-04-2015 23:13:58 test.txt
According to the man page on my Mac (which has the BSD standard version of stat) you can get the epoch time version of the modification in seconds with:
stat -f %m /etc/passwd
Or if you want to print that out in hours:mins:secs you can do this:
perl -e "print scalar(localtime(`stat -f %m /etc/passwd`))"
The following gives you last modified time in seconds since Epoch:
stat -c%Y <file>
The find command is a good source for all kinds of file information, including modification time to the second:
find /etc/passwd -maxdepth 0 -printf "%TY/%Tm/%Td %TH:%TM:%.2TS\n"
2011/11/21 13:41:36
The first argument can be a file. The maxdepth prevents searching if a directory name is given. The %T instructs it to print last modification time.
Some systems interpret %TS as a floating point seconds (e.g. 36.8342610). If you want fractional seconds use "%TS" instead of "%.2TS", but you may not see fractional seconds on every system.
For anyone facing the same issue, I found no solution (on HP-UX 11i anyway).
Ended up coding a personalized "ls -lh" for my needs. It's not that hard..
Prints something like :
- 664 rw-/rw-/r-- 1L expertNoob adm 8.37 kB 2010.08.24 12:11:15 findf1.c
d 775 rwx/rwx/r-x 2L expertNoob adm 96 B 2010.08.24 15:17:37 tmp/
- 775 rwx/rwx/r-x 1L expertNoob adm 16 kB 2010.08.24 12:35:30 findf1
- 775 rwx/rwx/r-x 1L expertNoob adm 24 kB 2010.09.14 19:45:20 dir_info
- 444 r--/r--/r-- 1L expertNoob adm 9.01 kB 2010.09.01 11:23:41 getopt.c
- 664 rw-/rw-/r-- 1L expertNoob adm 6.86 kB 2010.09.01 11:24:47 getopt.o
- 664 rw-/rw-/r-- 1L expertNoob adm 6.93 kB 2010.09.14 19:37:44 findf1.o
l 775 rwx/rwx/r-x 1L expertNoob adm 6 B 2010.10.06 17:09:01 test1 -> test.c
- 664 rw-/rw-/r-- 1L expertNoob adm 534 B 2009.03.26 15:34:23 > test.c
d 755 rwx/r-x/r-x 25L expertNoob adm 8 kB 2009.05.20 15:36:23 zip30/
Here it is :
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/errno.h>
#include <dirent.h>
#include <pwd.h>
#include <grp.h>
#include <time.h>
#include <locale.h>
#include <langinfo.h>
#include <stdio.h>
//#include <stdint.h>
#include <limits.h> // PATH_MAX
#include <stdarg.h>
#include "getopt.h"
static short START_VSNBUFF=16;
// This is bformat from Better String library (bstrlib), customized
int strformat (char ** str, const char * fmt, ...) {
va_list arglist;
char * buff;
int n, r;
/* Since the length is not determinable beforehand, a search is
performed using the truncating "vsnprintf" call (to avoid buffer
overflows) on increasing potential sizes for the output result. */
if ((n = (int) (2*strlen (fmt))) < START_VSNBUFF) n = START_VSNBUFF;
if ( NULL == ( buff = (char *) malloc((n + 2)*sizeof(char)) ) ) {
n = 1;
if ( NULL == ( buff = (char *) malloc((n + 2)*sizeof(char)) ) ) {
fprintf( stderr, "strformat: not enough memory to format string\n" );
return -1;
}
}
for (;;) {
va_start (arglist, fmt);
r = vsnprintf (buff, n + 1, fmt, arglist); // n+1 chars: buff[0]..buff[n], n chars from arglist: buff[n]='\0'
va_end (arglist);
buff[n] = (unsigned char) '\0'; // doesn't hurt, especially strlen!
if ( strlen(buff) < n ) break;
if (r > n) n = r; else n += n;
if ( NULL == ( buff = (char *) realloc( buff, (n + 2)*sizeof(char) ) ) ) {
free(buff);
fprintf( stderr, "strformat: not enough memory to format string\n" );
return -1;
}
}
if( NULL != *str ) free(*str);
*str = buff;
return 0;
}
int printFSObjectInfo( const char * path, const char * name ) {
struct stat statbuf;
struct passwd *pwd;
struct group *grp;
struct tm *tm;
char datestring[256];
char *type = "? ";
char *fbuf = NULL;
double size = 0;
const char *units[] = {"B ", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"};
int i = 0;
char owner[] = "---", group[] = "---", others[] = "---";
/* Get entry's information. */
if ( -1 == lstat( path, &statbuf ) ) {
fprintf( stderr, "printFSObjectInfo: error: can't stat %s\n", path );
if( 0 == strformat( &fbuf, "lstat() said: %s", path ) ) { perror(fbuf); return -1; }
}
// File type
if( S_ISREG(statbuf.st_mode) ) type = "-"; // regular file
if( S_ISDIR(statbuf.st_mode) ) { // directory
type="d";
if( S_ISCDF(statbuf.st_mode) ) type = "hd"; // hidden dir
}
if( S_ISBLK(statbuf.st_mode) ) type = "b"; // block special
if( S_ISCHR(statbuf.st_mode) ) type = "c"; // character special
if( S_ISFIFO(statbuf.st_mode) ) type = "f"; // pipe or FIFO
if( S_ISLNK(statbuf.st_mode) ) type = "l"; // symbolic link
if( S_ISSOCK(statbuf.st_mode) ) type = "s"; // socket
if( S_ISNWK(statbuf.st_mode) ) type = "n"; // network special
printf( "%2s ", type );
/* Print out type, permissions, and number of links. */
//printf("%10.10s", sperm (statbuf.st_mode));
if( S_IRUSR & statbuf.st_mode ) owner[0] = 'r';
if( S_IWUSR & statbuf.st_mode ) owner[1] = 'w';
if( S_IXUSR & statbuf.st_mode ) owner[2] = 'x';
if( S_IRGRP & statbuf.st_mode ) group[0] = 'r';
if( S_IWGRP & statbuf.st_mode ) group[1] = 'w';
if( S_IXGRP & statbuf.st_mode ) group[2] = 'x';
if( S_IROTH & statbuf.st_mode ) others[0] = 'r';
if( S_IWOTH & statbuf.st_mode ) others[1] = 'w';
if( S_IXOTH & statbuf.st_mode ) others[2] = 'x';
//printf( "\n%o\n", statbuf.st_mode );
printf( "%3o %s/%s/%s ", 0777 & statbuf.st_mode, owner, group, others );
printf("%4dL", statbuf.st_nlink);
/* Print out owner's name if it is found using getpwuid(). */
if ((pwd = getpwuid(statbuf.st_uid)) != NULL)
printf(" %-8.8s", pwd->pw_name);
else
printf(" %-8d", statbuf.st_uid);
/* Print out group name if it is found using getgrgid(). */
if ((grp = getgrgid(statbuf.st_gid)) != NULL)
printf(" %-8.8s", grp->gr_name);
else
printf(" %-8d", statbuf.st_gid);
/* Print size of file. */
//printf(" %9d", (int)statbuf.st_size);
i = 0;
size = (double) statbuf.st_size;
while (size >= 1024) {
size /= 1024;
i++;
}
if( 0 == (double)(size - (long) size) )
printf( "%7d %-2s", (long)size, units[i] );
else printf( "%7.2f %-2s", size, units[i] );
tm = localtime(&statbuf.st_mtime);
/* Get localized date string. */
strftime(datestring, sizeof(datestring), "%Y.%m.%d %T", tm); // nl_langinfo(D_T_FMT)
if ( 0 == strcmp(name, "\n") )
printf(" %s > %s", datestring, path);
else {
if( 0 == strcmp(type, "d") ) printf(" %s %s/", datestring, name);
else printf(" %s %s", datestring, name);
}
if( 0 == strcmp(type, "l") ) {
char buf[1+PATH_MAX];
if( -1 == readlink( path, buf, (1+PATH_MAX) ) ) {
fprintf( stderr, "printFSObjectInfo: error: can't read symbolic link %s\n", path);
if( 0 == strformat( &fbuf, "readlink() said: %s:", path ) ) { perror(fbuf); return -2; }
}
else {
lstat( buf, &statbuf ); // want errno, a symlink may point to non-existing object
if(errno == ENOENT) printf(" -> %s [!no such file!]\n", buf );
else {
printf(" -> %s\n", buf );
if ( 0 != strcmp(name, "\n") ) printFSObjectInfo( buf, "\n" );
}
}
}
else printf("\n");
return 0;
}
int main(int argc, char **argv) {
struct dirent *dp;
struct stat statbuf;
char *path = NULL; //[1+PATH_MAX];
char *fbuf = NULL;
char *pathArg = NULL;
if( argc == 1 || 0 == strlen(argv[1]) ) pathArg = ".";
else pathArg = argv[1];
if ( lstat( pathArg, &statbuf ) == -1 ) {
printf("%s: error: can't stat %s\n", argv[0], pathArg);
if( 0 == strformat( &fbuf, "stat() said: %s", pathArg ) ) perror(fbuf);
exit(2);
}
if( S_ISDIR(statbuf.st_mode) ) {
DIR *dir = opendir( pathArg );
if( NULL == dir ) {
fprintf( stderr, "%s: error: can't open %s\n", argv[0], pathArg );
if( 0 != strformat( &fbuf, "opendir() said: %s", pathArg ) ) exit(5);
perror(fbuf);
exit(4);
}
/* Loop through directory entries. */
while ( (dp = readdir(dir)) != NULL ) {
if( 0!= strformat( &path, "%s/%s", pathArg, dp->d_name ) ) continue;
printFSObjectInfo( path, dp->d_name );
}
closedir(dir);
} else printFSObjectInfo( pathArg, pathArg );
return 0;
}
In printFSObjectInfo() you have full functionality of lstat() system call, you can customize this to your wishes.
Be well.
Try a perl one-liner:
perl -e '#d=localtime ((stat(shift))[9]); printf "%02d-%02d-%04d %02d:%02d:%02d\n", $d[3],$d[4]+1,$d[5]+1900,$d[2],$d[1],$d[0]' your_file_to_show_the_date_for.your_extension
ls -le works if you need only HH:MM:SS
On Mac OS X (tested on 10.10.5 Yosemite thru 10.12.4 Sierra):
prompt> ls -lT
total 0
-rw-r--r-- 1 youruser staff 0 Sep 24 10:28:30 2015 my_file_1.txt
-rw-r--r-- 1 youruser staff 0 Sep 24 10:28:35 2015 my_file_2.txt
If you are using HP-UX:
Ok let's say that the name of the file is "junk". On HP-UX you can do:
perl -e '#d=localtime ((stat(shift))[9]); printf "%4d-%02d-%02d %02d:%02d:%02d\n", $d[5]+1900,$d[4]+1,$d[3],$d[2],$d[1],$d[0]' junk
And yes, perl comes with HP-UX. It is in /usr/contrib. But you may have a more recent version in /usr/local or /opt.
Source: Perderabo
Today I encountered the same issue on an old version of HP-UX. The stat program was not part of the installation. (just the C version)
The quickest solution for me was to use a tool such as Tectia file transfer running on my laptop, without actually doing any copying, It converts the time of last modification for you from HP-UX and provides dates and times for all files once you have logged into UNIX.
Possibly this works with other similar graphic based file transfer tools, but I have not tried yet.
On AIX the istat command does this:
machine:~/support> istat ../core
Inode 30034 on device 32/3 File
Protection: rw-rw-r--
Owner: 500(group) Group: 500(user)
Link count: 1 Length 10787748 bytes
Last updated: Wed Feb 22 13:54:28 2012
Last modified: Wed Feb 22 13:54:28 2012
Last accessed: Wed Feb 22 19:58:10 2012
Seconds:
date +%s -r /etc/passwd
Or, with more precision (up to nanosecond precision), if your filesystem supports it:
date +%s.%N -r /etc/passwd