I have a list of files that get added to my work stream. They are csv with a date time stamp to indicate when they are created. I need to pick up each file in the order of the datetime in the file name to process it. Here is a sample list that I get:
Workprocess_2016_11_11T02_00_12.csv
Workprocess_2016_11_11T06_50_45.csv
Workprocess_2016_11_11T10_06_18.csv
Workprocess_2016_11_11T14_23_00.csv
How would I compare the files to search for the oldest one and work towards the chronological newer file? The day the files are dumped is the same, so I can only use from the timestamp in file name.
The beneficial aspect of that date time format is that it sorts the same lexically and chronologically. So all you need is
for file in *.csv; do
mv "$f" xyz
process xyz
done
Related
my log file name contains the current date, like my_log_210616.log
and I need to tail the file in fluent-bit. I tried with,
[INPUT]
Name tail
Path /var/log/my-service/my_log_%y%m%d.log
[OUTPUT]
Name stdout
Match *
but it doesn't watch the file. I replaced my_log_%y%m%d.log with my_log_210616.log, then it works.
How can I use strftime in the path?
One solution is to use a path that matches any date. Since fluent-bit will read the log files from their tail you won’t get data from the older files.
You could also add ’Ignore_Older 24h’ to the input config. This will ignore files with modified times older than 24 hours. Using ’Ignore_Older’ with a parser that extracts the event time works even better.
You could also do more elaborate filtering by file name in a lua filter.
I have a Python3 script that reads the first eight characters of every filename in a directory in order to determine whether the file was created before or after 180 days ago based on each file's name. The file names all begin with YYYYMMDD or eerasedd_YYYYMMDD_etc.xls. I can collect all these filenames already.
I need to tell my script to ignore any filename that does not conform to the standard eight leading numerical characters, example: 20180922 or eerasedd_20171207_1oIkZf.so.
if name.startswith('eerasedd_'):
fileDate = datetime.strptime(name[9:17], DATEFMT).date()
else:
fileDate = datetime.strptime(name[0:8], DATEFMT).date()
I need logic to prevent the script from choking on files that don't fit the desired pattern. The script needs to carry on with its work and forget about non-conformant filenames. Do I need to add code that causes an exception or just add an elif block?
I have a function to get only the names of those files I need based on their extensions.
def get_files(extensions):
all_files = []
for ext in extensions:
all_files.extend(Path('/Users/mrh/Python/calls').glob(ext))
for file in get_files(('*.wav', '*.xml')):
print (file.name)
Now I need to figure out how to check each 'file.name' for the date string in its filename. i.e. now I need to run something like
if name.startswith('eerasedd_'):
fileDate = datetime.strptime(name[9:17], DATEFMT).date()
else:
fileDate = datetime.strptime(name[0:8], DATEFMT).date()
against 'file.name' to see whether the files are 180 days old or less.
I have a requirement in which I need to subtract x number of days from dates present in a delimited file if the date exists excluding the first and last row. If the date does not exist in the specified field, ignore the same.
For example, aaa.txt contains
header
abc|20160431|dhadjs|20160325|hjkkj|kllls
ddd||dhajded|20160320|dwdas|hfehf
footer
I want the modified file to have the dates subtracted by 10 days. Something like below:-
header
abc|20160421|dhadjs|20160315|hjkkj|kllls
ddd||dhajded|20160310|dwdas|hfehf
footer
I don't want to use a programming language like Java to read the file but rather use a scripting language on unix. Any suggestions on how this can be done?
I want to create a file with system date in another directory and copy data difference of two files into it.
NOW=$(date +"%H_%D")
file="log_$NOW.txt"
diff tmp1.txt tmp2.txt > $temp/log_$NOW.txt
i am using above code. But file is not getting generated. Apart from it if i create a file with simple name i.e without using $NOW the file is getting generated. Please help me.
The format string to date produces something like 16_12/03/13. This contains directory separators so the filename becomes invalid. Instead use dots to separate the date:
NOW=$(date +"%H_%m.%d.%y")
which should produce strings like 16_12.03.13
I want to generate a unique sequence number for each row in the file in unix. I can not make identity column in database as it has some other sources which also inserts data in it. I tried using NR number in awk but since i have filters in my script it may skip rows in the file so i may not get sequential numbers.
my requirements are - This sequence number needs to be persistent since everday i would receive this file and should start from where i left of. also the number needs to be preceded by "EMP_" for each line in the file.
Please suggest.
Thanks in advance.
To obtain unique id in UNIX you may use file to store and read the value. however this method is so tedious and require mechanism on file IO locking. the easiest way is to use date time to obtain unique id example :
#!/bin/sh
uniqueVal = `date '+%Y%m%d%H%M%S'`