Stat converted into printf of `find` command - unix

What would be the find ... printf conversion for the following timestamps returned from the stat command?
$ stat myfile
[...]
Access: 2019-03-13 01:00:52.918888000 -0700
Modify: 2019-03-13 01:00:52.918888000 -0700
Change: 2020-12-02 00:31:25.958001000 -0800
Birth: -
For example, how could I grab those four values in doing something similar to:
find . -printf "%p,%s,Access?,Modify?,Change?,Birth?\n"
Finally, what is the use (and meaning) of Birth and how could that be empty ?

Related

How to interprete an iso8601 string with daylight saving info?

The following two iso8601 strings both mean the beginning of the respective day. But since EST has been changed to EDT between the two dates, the second time string is interpreted as 1am instead of 0am. Is there a way to let jq be aware of the daylight saving when processing an iso8601 string?
$ jq fromdateiso8601 <<< '"2022-03-11T05:00:00Z"'
1646974800
$ jq fromdateiso8601 <<< '"2022-03-14T04:00:00Z"'
1647234000
$ TZ=America/New_York date --date=#1646974800
Fri Mar 11 00:00:00 EST 2022
$ TZ=America/New_York date --date=#1647234000
Mon Mar 14 01:00:00 EDT 2022
$ jq --version
jq-1.6
I run jq on macOSX.
EDIT:
$ jq -r todateiso8601 <<<'1647230400'
2022-03-14T04:00:00Z
$ date -d '2022-03-14T04:00:00Z' '+%s'
1647230400
$ date -d #1647230400 --iso=seconds --utc
2022-03-14T04:00:00+00:00
2022-03-14T04:00:00Z is 1647230400, not 1647234000.
I get the correct result with jq 1.6 on Ubuntu:
$ jq fromdateiso8601 <<<'"2022-03-14T04:00:00Z"'
1647230400
I get the same incorrect result with jq 1.6 on Cygwin:
$ jq fromdateiso8601 <<<'"2022-03-14T04:00:00Z"'
1647234000
(fromdateiso8601 is simply not supported on Windows.)
There's a bug somewhere, but I don't know where.

How to convert datestring to local time?

2021-08-04T22:55:12+0000
I want to convert the above string to local time.
But fromdateiso8601 does not work on this format. What is the best way to convert this kind of string to local time?
EDIT: I tried the following but the part of 23:55:12 does not change when the timezone is changed. I'd expect that this should be changed as TZ is changed.
$ TZ=America/New_York jq '.x | gsub("[+]0000"; "Z") | fromdateiso8601| gmtime | strftime("%Y-%m-%dT%H:%M:%S%Z")' <<< '{"x": "2021-08-04T22:55:12+0000" }'
"2021-08-04T23:55:12EST"
$ TZ=Europe/Madrid jq '.x | gsub("[+]0000"; "Z") | fromdateiso8601| gmtime | strftime("%Y-%m-%dT%H:%M:%S%Z")' <<< '{"x": "2021-08-04T22:55:12+0000" }'
"2021-08-04T23:55:12CET"
The first problem here is that jq's handling of timezones (TZ) is buggy; the second is that jq's built-ins do not recognize timezone offsets.
Unfortunately, there is to my knowledge not much that can easily be
done about jq's TZ-related bugginess other than using gojq, the Go
implementation of jq, instead.
Fortunately, time offsets can be handled quite easily, e.g. using the datetime_to_seconds filter as defined below.
So, for the time being, the following solution to the stated problem assumes the use of gojq rather than stedolan/jq. It has two main steps:
Use the generic filter datetime_to_seconds to convert the timestamp with an offset to "seconds since the epoch";
Use strflocaltime, which recognizes the environment variable TZ.
The solution is embedded in a bash script to facilitate a comparison between different versions of jq and gojq.
bash script
#!/bin/bash
# Syntax: go TZ
function go {
TZ="$1" $jq -Rr '
# Convert a timestamp with a possibly empty timezone offset to seconds since the Epoch.
# Input should be a string of the form yyyy-mm-ddThh:mm:ss or yyyy-mm-ddThh:mm:ss<OFFSET>
# where <OFFSET> is Z, or has the form [-+]hh:mm or [-+]hhmm
# If no timezone offset is explicitly given, it is taken to be Z.
def datetime_to_seconds:
if test("[-+]")
then
sub("(?<s>[-+])(?<d1>[0-9]{2})(?<d2>[0-9]{2})$"; "\(.s)\(.d1):\(.d2)")
| capture("(?<datetime>^.*T[0-9:]+)(?<s>[-+])(?<hh>[0-9]+):?(?<mm>[0-9]*)")
| (.datetime +"Z" | fromdateiso8601) as $seconds
| (if .s == "+" then -1 else 1 end) as $plusminus
| (.mm | if . == "" then 0 else . end) as $mm
| ([.hh,$mm] | map(tonumber) |.[0] *= 60 | add * 60 * $plusminus) as $offset
| ($seconds + $offset)
else . + (if test("Z") then "" else "Z" end) | fromdateiso8601
end;
datetime_to_seconds
| strflocaltime("%Y-%m-%dT%H:%M:%S %Z")
'
}
for jq in jq-1.6 jqMaster gojq ; do
echo $jq is $($jq --version)
done
echo
for TZ in America/New_York Europe/Madrid ;do
for jq in jq-1.6 jqMaster gojq ; do
for time in 2021-08-04T22:55:12+0000 ; do
echo $jq $TZ $time
echo $time | go $TZ
echo
done
done
done
Output
Here is the output with some "#" annotations.
jq-1.6 is jq-master-2e01ff1
jqMaster is jq-1.6-129-g80052e5-dirty
gojq is gojq 0.12.4 (rev: 244f9f7/go1.16.4)
jq-1.6 America/New_York 2021-08-04T22:55:12+0000
2021-08-04T19:55:12 EST # wrong
jqMaster America/New_York 2021-08-04T22:55:12+0000
2021-08-04T18:55:12 EST # wrong
gojq America/New_York 2021-08-04T22:55:12+0000
2021-08-04T18:55:12 EDT # correct
jq-1.6 Europe/Madrid 2021-08-04T22:55:12+0000
2021-08-05T01:55:12 CET # wrong
jqMaster Europe/Madrid 2021-08-04T22:55:12+0000
2021-08-05T00:55:12 CET # wrong
gojq Europe/Madrid 2021-08-04T22:55:12+0000
2021-08-05T00:55:12 CEST # correct

Snakemake: MissingInputException in snakemake pipeline

I'm trying a SnakeMake pipeline and I'm stucked on an error I really don't understand.
I've got a directory (raw_data) in which I have the input files :
ll /home/nico/labo/etudes/Optimal/data/raw_data
total 41M
drwxrwxr-x 2 nico nico 4,0K mars 6 16:09 ./
drwxrwxr-x 11 nico nico 4,0K mars 6 16:14 ../
-rw-rw-r-- 1 nico nico 15M févr. 27 12:21 sampleA_R1.fastq.gz
-rw-rw-r-- 1 nico nico 19M févr. 27 12:22 sampleA_R2.fastq.gz
-rw-rw-r-- 1 nico nico 3,4M févr. 27 12:21 sampleB_R1.fastq.gz
-rw-rw-r-- 1 nico nico 4,3M févr. 27 12:22 sampleB_R2.fastq.gz
This directory contains 4 files for 2 samples.
I created a config json file for the SnakeMake pipeline named config_snakemake_Optimal_mapping_BaL.json:
{
"fastqExtension": "fastq.gz",
"fastqDir": "/home/nico/labo/etudes/Optimal/data/raw_data",
"outputDir": "/home/nico/labo/etudes/Optimal/data/mapping_BaL",
"logDir": "logs",
"reference": {
"fasta": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta",
"index": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta.bwt"
}
}
And finally the SnakeMake file snakefile_bwa_samtools.py:
import subprocess
from os.path import join
### Globals ---------------------------------------------------------------------
# A Snakemake regular expression matching fastq files.
SAMPLES, = glob_wildcards(join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]))
print(SAMPLES)
### Rules -----------------------------------------------------------------------
# Pipeline output files
rule all:
input: expand(join(config["outputDir"], "{sample}.bam.bai"), sample=SAMPLES)
# Reads alignment on reference genome and BAM file creation
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1_ID = "{sample}_R1."+config["fastqExtension"],
fq2_ID = "{sample}_R2."+config["fastqExtension"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1_ID} and {input.fq2_ID} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
# Sorting the BAM files on genomic positions
rule bam_sort:
input:
join(config["outputDir"], "{sample}.bamUnsorted")
output:
join(config["outputDir"], "{sample}.bam")
log:
join(config["outputDir"], config["logDir"], "{sample}.samtools_sort.log")
version:
subprocess.getoutput(
"samtools --version | "
"head -1 | "
"cut -d' ' -f2"
)
message:
"Genomic sorting of {input} with samtools version {version}."
shell:
"samtools sort -f {input} {output} 2> {log}"
# Indexing the BAM files
rule bam_index:
input:
join(config["outputDir"], "{sample}.bam")
output:
join(config["outputDir"], "{sample}.bam.bai")
message:
"Indexing {input}."
shell:
"samtools index {input}"
I run this pipeline:
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
and I've got the following error outputs:
['sampleB', 'sampleA']
MissingInputException in line 18 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
Missing input files for rule bwa_mem_to_bam:
sampleB_R1.fastq.gz
sampleB_R2.fastq.gz
or depending the moment:
['sampleB', 'sampleA']
PeriodicWildcardError in line 40 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
The value _unsorted in wildcard sample is periodically repeated (sampleB_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted). This would lead to an infinite recursion. To avoid this, e.g. restrict the wildcards in this rule to certain values.
The samples are correctly detected as they appear in the list (first line of kind of outputs) and I'm surely messing around with the wildcards in the rule bwa_mem_to_bam, but I really don't get why..
Any clue?
I quickly looked your code.
Why didn't the first one work out?
Look when you declare fq1_ID and fq1, same for sample 2. You didn't assign the same string. For fq1 you add a repertory for the file witch is not present for fq1_ID so snakemake is searching it in the workdir (current directory if -d option is not set) a file name with your string. Beacuse these variables are in input section.
So by removing the two fq1/2_ID, it will erase all files searching problems.
Hugo
Finally, I succed with the pipeline removing the fq1_ID and fq2_ID variables in the rule bwa_mem_to_bam and replacing in the message of the rule input.fq1_ID and input.fq2_ID by input.fq1 and input.fq2.
The message is less elegant, but the pipeline is running correctly. Still doesn't understand exactly where was the mistake, if someone can explain, I'm still listening!
The correct code for rule bwa_mem_to_bam:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1} and {input.fq2} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
Thanks Hugo for checking my code and your explanation, it makes sense!
I finally get a flash idea waking up this morning (the best ones), and realized that I neglected the params part of the rule, fq1_ID and fq2_ID are not inputs but params..
I changed the code to that:
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1 = join(config["fastqDir"], "{sample}_R1.fastq.gz"),
fq2 = join(config["fastqDir"], "{sample}_R2.fastq.gz")
output:
temp(join(config["outputDir"],"{sample}_unsorted.bam"))
params:
fq1_ID = "{sample}_R1.fastq.gz",
fq2_ID = "{sample}_R2.fastq.gz",
ref_ID = os.path.basename(config["reference"]["fasta"])
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {params.fq1_ID} and {params.fq2_ID} on {params.ref_ID} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
And it works just fine!
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
Provided cores: 3
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
2 bam_index
2 bam_sort
2 bwa_mem_to_bam
7
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta with BWA version 0.7.12.
1 of 7 steps (14%) done
Genomic sorting of sampleB_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB_unsorted.bam.
2 of 7 steps (29%) done
Indexing sampleB.bam.
3 of 7 steps (43%) done
4 of 7 steps (57%) done
Genomic sorting of sampleA_unsorted.bam with samtools version 1.2.
Removing temporary output file /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA_unsorted.bam.
5 of 7 steps (71%) done
Indexing sampleA.bam.
6 of 7 steps (86%) done
localrule all:
input: /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleB.bam.bai, /home/nico/labo/etudes/Optimal/data/mapping_BaL/sampleA.bam.bai
7 of 7 steps (100%) done
And finally get my correct messages:
Alignment of sampleB_R1.fastq.gz and sampleB_R2.fastq.gz on
BaL_AY713409.fasta with BWA version 0.7.12.
Alignment of sampleA_R1.fastq.gz and sampleA_R2.fastq.gz on BaL_AY713409.fasta
with BWA version 0.7.12.

Reading file mtime in UNIX in particular format (2013-06-25 09:04:32)

hi i have this file on my UNIX box (SunOS 5.10).
-rwxr-xr-x 1 phnxep siebel 917 Feb 1 02:52 crontest.sh
Here date and time are given like Feb1 02:52.Can i just read these values in UNIX for my file ignoring the rest of the details.In the required format-:
2017-02-1 02:52
And convert them to integer values later on???Please help guys i am really stuck on this.
If you are using gnu ls, try
ls -l --time-style=full-iso
E.g.:
$ touch x
$ ls -l --time-style=full-iso x
-rw-r--r-- 1 max max 0 2017-02-06 13:18:56.498920000 +0000 x
If you are using Sun ls, try -E option, e.g. ls -E.

How can I find the current date minus seven days in Unix?

I am trying to find the date that was seven days before today.
CURRENT_DT=`date +"%F %T"`
diff=$CURRENT_DT-7
echo $diff
I am trying stuff like the above to find the 7 days less than from current date. Could anyone help me out please?
GNU date will to the math for you:
date --date "7 days ago"
Other version will require you to covert the current date into seconds since the UNIX epoch first, manually subtract 7 days' worth of seconds, and convert that back into the desired form. Consult the documentation for your version of date for details on how to convert to and from Unix timestamps. Here's an example using GNU date again:
x=$(date +%s)
x=$((x - 7 * 24 * 60 * 60))
date --date #$x
Here is a simple Perl script which (unlike the other examples) works with Unix:
perl -e 'use POSIX qw(ctime); printf "%s", ctime(time - (7 * 24 * 60 * 60));'
(Tested with Solaris 10, and a token Linux system, of course - with the caveat that Perl is not necessarily part of one's configuration, merely very likely).
Adding this one for shells on OSX:
date -v-7d
> Tue Apr 3 15:16:31 EDT 2018
date
> Tue Apr 10 15:16:33 EDT 2018
Need that formated?
date -v-7d +%Y-%m-%d
> 2018-04-03
Ksh's printf can do time calculation:
$ printf '%(%Y-%m-%d)T\n'
2015-04-07
$ printf '%(%Y-%m-%d)T\n' '7 days ago'
2015-03-31
$
I haven't used unix in a while but I found this in one of my scripts
echo `date +%s`-604800 | bc
DATE=$(date --date "7 days ago" | awk '{print$1,$2,$3}')
echo "$DATE"
if [ -z "$(grep -i "$DATE" test.log)" ]; then
exit 1
fi
sed -i "1,/$DATE/d" test.log

Resources