os.walk ignore directorys and its content - os.walk

i'm trying to ignore some directory and the files in it in specific path and this is my code
x = open(wbCMD, 'a')
x.write('set path="C:\Program Files\WinRAR\";%path% c:/Program Files/WinRAR/\n')
x.write('Rar.exe a -r "Backup.rar" -m5 -ep1')
chkdict = {}
setdef = chkdict.setdefault
for root, dirs, files in os.walk(foldername):
if ignoreddirs in dirs:
continue
for file in files:
ext = path.splitext(file)[1]
if ext in ignored:
continue
if not ext in chkdict:
print("%s" % setdef(ext,ext))
x.write(" *%s" % setdef(ext,ext))
x.write(" *makefile *Depend *readme\npause")
x.close
del chkdict
ignoreddirs array looks like this
ignoreddirs = ["bin"]

dirs and ignoreddirs are both lists of strings. Therefore, dirs does not contain ignoreddirs. It may, however contain some of its elements. One way to check this would be to check their intersection:
if len(set(ignoreddirs).intersection(set(dirs))) > 0:
continue

Related

snakemake Wildcards in input files cannot be determined from output files:

I use the snakemkae to create a pipeline to split bam by chr,but I met a problem,
Wildcards in input files cannot be determined from output files:
'OutputDir'
Can someone help me to figure it out ?
if config['ref'] == 'hg38':
ref_chr = []
for i in range(1,23):
ref_chr.append('chr'+str(i))
ref_chr.extend(['chrX','chrY'])
elif config['ref'] == 'b37':
ref_chr = []
for i in range(1,23):
ref_chr.append(str(i))
ref_chr.extend(['X','Y'])
rule all:
input:
expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr)
rule minimap2:
input:
TargetFastq
output:
Sortbam = "{OutputDir}/{name}.sorted.bam",
Sortbai = "{OutputDir}/{name}.sorted.bam.bai"
resources:
mem_mb = 40000
threads: nt
singularity:
OntSoftware
shell:
"""
minimap2 -ax map-ont -d {ref_mmi} --MD -t {nt} {ref_fasta} {input} | samtools sort -O BAM -o {output.Sortbam}
samtools index {output.Sortbam}
"""
rule split_bam:
input:
rules.minimap2.output.Sortbam
output:
splitBam = expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr),
splitBamBai = expand(f"{OutputDir}/split/{name}.{{chr}}.bam.bai",chr=ref_chr)
resources:
mem_mb = 30000
threads: nt
singularity:
OntSoftware
shell:
"""
samtools view -# {nt} -b {input} {chr} > {output.splitBam}
samtools index -# {nt} {output.splitBam}
"""
I change the wilcards {outputdir},but is dose not help.
expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr),
splitBamBai = expand(f"{OutputDir}/split/{name}.{{chr}}.bam.bai",chr=ref_chr),
A couple of comments on this lines...:
You escape chr by using double braces, {{chr}}. This means you don't want chr to be expanded, which I doubt it is correct. I suspect you want something like:
expand("{{OutputDir}}/split/{{name}}.{chr}.bam",chr=ref_chr),
The rule minimpa2 does not contain {chr} wildcard, hence the error you get.
As an aside, when you create a bam file and its index in the same rule, you can get the time stamp of the index file to be older than the bam file itself. This later can generate spurious warning from samtools/bcftools. See https://github.com/snakemake/snakemake/issues/1378 (not sure if it's been fixed).

Find a specific file in a directory and delete directory

Im new to python and trying things . Is it possible to walk true a tree searching for specific file name , after finding that file delete the whole folder were that file is found?
The following works , but only deletes the file , i want to delete the whole folder when addon.sxm is found
if os.path.exists(Addons):
for root, dirs, files in os.walk(Addons):
package_count = 0
package_count += len(files)
if package_count > 0:
for f in files:
if fnmatch.fnmatch(f, 'addon.sxm'):
try:
os.remove(os.path.join(root, f))
except:
pass
else:
pass
Instead of os.remove(os.path.join(root, f)) use shutil.rmtree(root); it will remove directory where file is located.
import os
import fnmatch
import shutil
Addons="/path/to/my/folder/"
if os.path.exists(Addons):
for root, dirs, files in os.walk(Addons):
package_count = 0
package_count += len(files)
if package_count > 0:
for f in files:
print(f)
if fnmatch.fnmatch(f, 'addon.sxm'):
try:
shutil.rmtree(root);
except:
pass
else:
pass

qmake and generated qm files

What is the best (proper) way to organize compiled translations (*.qm) into resources?
*.qm files referred in qrc file and generated by two (three) extra targets this way:
trans_update.commands = lupdate $$_PRO_FILE_
trans_update.depends = $$_PRO_FILE_
trans_release.commands = lrelease $$_PRO_FILE_
trans_release.depends = trans_update $$TRANSLATIONS
translate.depends = trans_release
QMAKE_EXTRA_TARGETS += trans_update trans_release translate deploy
CONFIG(release, debug|release) {
DESTDIR=release
PRE_TARGETDEPS += translate
}
but the problem is at the moment qmake runs first time, there're no qm files generated yet and make prints errors like:
RCC: Error in 'qml.qrc': Cannot find file ...
I don't like an idea of saving compiled qm files into VSC.
Is there a way to organize it nicely?
I like to point out a solution which I use in some projects. It might be far from perfect, but it works out nicely.
CONFIG(release, debug|release) {
TRANSLATION_TARGET_DIR = $${OUT_PWD}/release/translations
LANGUPD_OPTIONS = -locations relative -no-ui-lines
LANGREL_OPTIONS = -compress -nounfinished -removeidentical
} else {
TRANSLATION_TARGET_DIR = $${OUT_PWD}/debug/translations
LANGUPD_OPTIONS =
LANGREL_OPTIONS = -markuntranslated "MISS_TR "
}
isEmpty(QMAKE_LUPDATE) {
win32:LANGUPD = $$[QT_INSTALL_BINS]\lupdate.exe
else:LANGUPD = $$[QT_INSTALL_BINS]/lupdate
}
isEmpty(QMAKE_LRELEASE) {
win32:LANGREL = $$[QT_INSTALL_BINS]\lrelease.exe
else:LANGREL = $$[QT_INSTALL_BINS]/lrelease
}
langupd.command = \
$$LANGUPD $$LANGUPD_OPTIONS $$shell_path($$_PRO_FILE_) -ts $$_PRO_FILE_PWD_/$$TRANSLATIONS
langrel.depends = langupd
langrel.input = TRANSLATIONS
langrel.output = $$TRANSLATION_TARGET_DIR/${QMAKE_FILE_BASE}.qm
langrel.commands = \
$$LANGREL $$LANGREL_OPTIONS ${QMAKE_FILE_IN} -qm $$TRANSLATION_TARGET_DIR/${QMAKE_FILE_BASE}.qm
langrel.CONFIG += no_link
QMAKE_EXTRA_TARGETS += langupd
QMAKE_EXTRA_COMPILERS += langrel
PRE_TARGETDEPS += langupd compiler_langrel_make_all
There might be a sensful tweak to lupdate options because the various builds (release and debug) generate different *.ts files which then trigger a change in the used VCS.
I also like to guide the tended reader to an example where experts use it.
The recommended way -- which may not have been available at the time this question was originally asked would be to use
TRANSLATIONS += <your *.ts files>
CONFIG += lrelease embed_translations
If you really need/want to build the qm files separately, I'd point to what qmake does with the above config and adapt it according to your needs. See https://github.com/qt/qtbase/blob/5.15.2/mkspecs/features/lrelease.prf
(Basically, it creates and adds a list of resources to RESOURCES).

How to merge multiple files from multiple directories/folders

I have 300 directories/folders, each directory has two columns single file (xxx.gz), I want to merge all files from all folders in a single file. In all files first column is Identifier (ID) which is same.
How to merge all files into single file?
And I want to header for each column as name of file in respective directory.
Directory names are are: (68a7eb0a-123, b5694957-764, etc.. ) and files name are : (a5c403c2, 292c4a2f etc),
directory name and respective file name are not same, I want file name as header.
all directories
ls
6809b1c3-75a5
68e9b641-0cc9
71ae07b8-8bde
b7815cd2-1e69
..
..
each directory contain single file:
cd 6809b1c3-75a5
ls bd21dc2e.txt.gz
Try this:
for i in * ; do for j in $i/*.gz ; do echo $j >> ../final.txt ; gunzip -c $j >> ../final.txt ; done ; done
Annotated version:
for i in * # for each directory under current working directory
do # have nothing else in there
for j in $i/*.gz # for each gzipped file under directories
do
echo $j >> ../final.txt # echo path/file to the final file
gunzip -c $j >> ../final.txt # append gunzipping the file to the final file
done
done
Result:
$ head -8 ../final.txt
6809b1c3-75a5/bd21dc2e.txt.gz
blabla
whatever
you
have
in
those
files

How to use awk for multiple file search in two directories, print records only from files with matching string in second directory

Remade a previous question so that it is more clear. I'm trying to search files in two directories and print matching character strings (+ line immediately following) into a new file from the second directory only if they match a record in the first directory. I have found similar examples but nothing quite the same. I don't know how to use awk for multiple files from different directories and I've tortured myself trying to figure it out.
Directory 1, 28,000 files, formatted viz.:
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
Directory 2, 15 files, formatted viz.:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Desired output:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Directories 1 and 2 are located in my home directory: (./Test1 & ./Test2)
If anyone could advise command to specific the different directories, I'd be immensely grateful! Currently when I include file path (e.g., /Test1/*.fa) I get the following error:
awk: can't open file /Test1/*.fa
You'll want something like this (untested):
awk '
FNR==1 {
dirname = FILENAME
sub("/.*","",dirname)
if (NR==1) {
dirname1 = dirname
}
}
dirname == dirname1 {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' Test1/* Test2/*
Given you're getting the error message /usr/bin/awk: Argument list too long which means you're exceeding your shells maximum argument length for a command and that 28,000 of your files are in the Test1 directory, try this:
find Test1 -type f -exec cat {} \; |
awk '
NR == FNR {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' - Test2/*
Solution in TXR:
Data:
$ ls dir*
dir1:
file1 file2
dir2:
file1 file2
$ cat dir1/file1
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
$ cat dir1/file2
>XYZ
SDOIWEUROIUOIWUEROIWUEROIWUEROIWUEROUIEIDIDIIDFIFI
>MNO
OOIWEPOIUWERHJSDHSDFJSHDF
$ cat dir2/file1
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
$ cat dir2/file2
>STP
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
$
Run:
$ txr filter.txr dir1/* dir2/*
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
Code in filter.txr:
#(bind want #(hash :equal-based))
#(next :args)
#(all)
#dir/#(skip)
#(and)
# (repeat :gap 0)
#dir/#file
# (next `#dir/#file`)
# (repeat)
>#key
# (do (set [want key] t))
# (end)
# (end)
#(end)
#(repeat)
#path
# (next path)
# (repeat)
>#key
#datum
# (require [want key])
# (output)
>#key
#datum
# (end)
# (end)
#(end)
To separate the dir1 paths from the rest, we use an #(all) match (try multiple pattern branches, which must all match) with two branches. The first branch matches one #dir/#(skip) pattern, binding the variable dir to text that is preceded by a slash, and ignore the rest. The second branch matches a whole consecutive sequence of #dir/#file patterns via #(repeat :gap 0). Because the same dir variable appears that already has a binding from the first branch of the all, this constrains the matches to the same directory name. Inside this repeat we recurse into each file via next and gather the >-delimited keys into the keep hash. After that, we process the remaining arguments as path names of files to process; they don't all have to be in the same directory. We scan through each one for the >#key pattern followed by a line of #datum. The #(require ...) directive will fail the match if key is not in the wanted hash, otherwise we fall through to the #(output).

Resources