snakemake Wildcards in input files cannot be determined from output files: - wildcard

I use the snakemkae to create a pipeline to split bam by chr,but I met a problem,
Wildcards in input files cannot be determined from output files:
'OutputDir'
Can someone help me to figure it out ?
if config['ref'] == 'hg38':
ref_chr = []
for i in range(1,23):
ref_chr.append('chr'+str(i))
ref_chr.extend(['chrX','chrY'])
elif config['ref'] == 'b37':
ref_chr = []
for i in range(1,23):
ref_chr.append(str(i))
ref_chr.extend(['X','Y'])
rule all:
input:
expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr)
rule minimap2:
input:
TargetFastq
output:
Sortbam = "{OutputDir}/{name}.sorted.bam",
Sortbai = "{OutputDir}/{name}.sorted.bam.bai"
resources:
mem_mb = 40000
threads: nt
singularity:
OntSoftware
shell:
"""
minimap2 -ax map-ont -d {ref_mmi} --MD -t {nt} {ref_fasta} {input} | samtools sort -O BAM -o {output.Sortbam}
samtools index {output.Sortbam}
"""
rule split_bam:
input:
rules.minimap2.output.Sortbam
output:
splitBam = expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr),
splitBamBai = expand(f"{OutputDir}/split/{name}.{{chr}}.bam.bai",chr=ref_chr)
resources:
mem_mb = 30000
threads: nt
singularity:
OntSoftware
shell:
"""
samtools view -# {nt} -b {input} {chr} > {output.splitBam}
samtools index -# {nt} {output.splitBam}
"""
I change the wilcards {outputdir},but is dose not help.

expand(f"{OutputDir}/split/{name}.{{chr}}.bam",chr=ref_chr),
splitBamBai = expand(f"{OutputDir}/split/{name}.{{chr}}.bam.bai",chr=ref_chr),
A couple of comments on this lines...:
You escape chr by using double braces, {{chr}}. This means you don't want chr to be expanded, which I doubt it is correct. I suspect you want something like:
expand("{{OutputDir}}/split/{{name}}.{chr}.bam",chr=ref_chr),
The rule minimpa2 does not contain {chr} wildcard, hence the error you get.
As an aside, when you create a bam file and its index in the same rule, you can get the time stamp of the index file to be older than the bam file itself. This later can generate spurious warning from samtools/bcftools. See https://github.com/snakemake/snakemake/issues/1378 (not sure if it's been fixed).

Related

path not being detected by Nextflow

i'm new to nf-core/nextflow and needless to say the documentation does not reflect what might be actually implemented. But i'm defining the basic pipeline below:
nextflow.enable.dsl=2
process RUNBLAST{
input:
val thr
path query
path db
path output
output:
path output
script:
"""
blastn -query ${query} -db ${db} -out ${output} -num_threads ${thr}
"""
}
workflow{
//println "I want to BLAST $params.query to $params.dbDir/$params.dbName using $params.threads CPUs and output it to $params.outdir"
RUNBLAST(params.threads,params.query,params.dbDir, params.output)
}
Then i'm executing the pipeline with
nextflow run main.nf --query test2.fa --dbDir blast/blastDB
Then i get the following error:
N E X T F L O W ~ version 22.10.6
Launching `main.nf` [dreamy_hugle] DSL2 - revision: c388cf8f31
Error executing process > 'RUNBLAST'
Error executing process > 'RUNBLAST'
Caused by:
Not a valid path value: 'test2.fa'
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
I know test2.fa exists in the current directory:
(nfcore) MN:nf-core-basicblast jraygozagaray$ ls
CHANGELOG.md conf other.nf
CITATIONS.md docs pyproject.toml
CODE_OF_CONDUCT.md lib subworkflows
LICENSE main.nf test.fa
README.md modules test2.fa
assets modules.json work
bin nextflow.config workflows
blast nextflow_schema.json
I also tried with "file" instead of path but that is deprecated and raises other kind of errors.
It'll be helpful to know how to fix this to get myself started with the pipeline building process.
Shouldn't nextflow copy the file to the execution path?
Thanks
You get the above error because params.query is not actually a path value. It's probably just a simple String or GString. The solution is to instead supply a file object, for example:
workflow {
query = file(params.query)
BLAST( query, ... )
}
Note that a value channel is implicitly created by a process when it is invoked with a simple value, like the above file object. If you need to be able to BLAST multiple query files, you'll instead need a queue channel, which can be created using the fromPath factory method, for example:
params.query = "${baseDir}/data/*.fa"
params.db = "${baseDir}/blastdb/nt"
params.outdir = './results'
db_name = file(params.db).name
db_path = file(params.db).parent
process BLAST {
publishDir(
path: "{params.outdir}/blast",
mode: 'copy',
)
input:
tuple val(query_id), path(query)
path db
output:
tuple val(query_id), path("${query_id}.out")
"""
blastn \\
-num_threads ${task.cpus} \\
-query "${query}" \\
-db "${db}/${db_name}" \\
-out "${query_id}.out"
"""
}
workflow{
Channel
.fromPath( params.query )
.map { file -> tuple(file.baseName, file) }
.set { query_ch }
BLAST( query_ch, db_path )
}
Note that the usual way to specify the number of threads/cpus is using cpus directive, which can be configured using a process selector in your nextflow.config. For example:
process {
withName: BLAST {
cpus = 4
}
}

In snakemake, how do you use wildcards with scatter-gather processes?

I am trying to use snakemake's scatter-gather functionality to parallelize a slow step in my workflow. However, I cannot figure out how to apply it in situations where I am using wildcards. For example, I have defined the wildcard library in rule all, however, this does not seem to apply to the scatter function in ScatterIntervals:
import re
SCATTER_COUNT = 100
scattergather:
split=SCATTER_COUNT
rule all:
input:
expand("{library}_output.txt", library=["FC19271512", "FC19271513"])
rule ScatterIntervals:
input:
"{library}_baits.interval_list"
output:
temp(scatter.split("tmp/{library}_baits.scatter_{scatteritem}.interval_list"))
params:
output_prefix = (
lambda wildcards, output:
re.sub("\.scatter_\d+\.interval_list", "", output[0])
),
scatter_count = SCATTER_COUNT
shell:
"""
python ScatterIntervals.py \
-i {input} \
-o {params.output_prefix} \
-s {params.scatter_count}
"""
rule ProcessIntervals:
input:
bam = "{library}.bam",
baits = "tmp/{library}_baits.scatter_{scatteritem}.interval_list"
output:
temp("tmp/{library}_output.scatter_{scatteritem}.txt")
shell:
"""
python ProcessIntervals.py \
-b {input.bam} \
-l {input.baits} \
-o {output}
"""
rule GatherIntervals:
input:
gather.split("tmp/{library}_output.scatter_{scatteritem}.txt")
output:
"{library}_output.txt"
run:
inputs = "-i ".join(input)
command = f"python GatherOutputs.py {inputs} -o {output[0]}"
shell(command)
WildcardError in line 16 of Snakefile:
No values given for wildcard 'library'.
Evidently this works like expand, in that you can quote the wildcards that aren't scatteritem if you want DAG resolution to deal with them:
temp(scatter.split("tmp/{{library}}_baits.scatter_{scatteritem}.interval_list"))
The same logic applies for gather.split.

Unable to use -C of grep in Unix Shell Script

I am able to use grep in normal command line.
grep "ABC" Filename -C4
This is giving me the desired output which is 4 lines above and below the matched pattern line.
But if I use the same command in a Unix shell script, I am unable to grep the lines above and below the pattern. It is giving me output as the only lines where pattern is matched and an error in the end that cannot says cannot open grep : -C4
The results are similar if I use -A4 and -B4
I'll assume you need a portable POSIX solution without the GNU extensions (-C NUM, -A NUM, and -B NUM are all GNU, as are arguments following the pattern and/or file name).
POSIX grep can't do this, but POSIX awk can. This can be invoked as e.g. grepC -C4 "ABC" Filename (assuming it is named "grepC", is executable, and is in your $PATH):
#!/bin/sh
die() { echo "$*\nUsage: $0 [-C NUMBER] PATTERN [FILE]..." >&2; exit 2; }
CONTEXT=0 # default value
case $1 in
-C ) CONTEXT="$2"; shift 2 ;; # extract "4" from "-C 4"
-C* ) CONTEXT="${1#-C}"; shift ;; # extract "4" from "-C4"
--|-) shift ;; # no args or use std input (implicit)
-* ) [ -f "$1" ] || die "Illegal option '$1'" ;; # non-option non-file
esac
[ "$CONTEXT" -ge 0 ] 2>/dev/null || die "Invalid context '$CONTEXT'"
[ "$#" = 0 ] && die "Missing PATTERN"
PATTERN="$1"
shift
awk '
/'"$PATTERN"'/ {
match='$CONTEXT'
for(i=1; i<=CONTEXT; i++) if(NR>i) print last[i];
print
next
}
match { print; match-- }
{ for(i='$CONTEXT'; i>1; i--) last[i] = last[i-1]; last[1] = $0 }
' "$#"
This sets up die as a fatal error function, then finds the desired lines of context from your arguments (either -C NUMBER or -CNUMBER), with an error for unsupported options (unless they're files).
If the context is not a number or there is no pattern, we again fatally error out.
Otherwise, we save the pattern, shift it away, and reserve the rest of the options for handing to awk as files ("$#").
There are three stanzas in this awk call:
Match the pattern itself. This requires ending the single-quote portion of the string in order to incorporate the $PATTERN variable (which may not behave correctly if imported via awk -v). Upon that match, we store the number of lines of context into the match variable, loop through the previous lines saved in the last hash (if we've gone far enough to have had them), and print them. We then skip to the next line without evaluating the other two stanzas.
If there was a match, we need the next few lines for context. As this stanza prints them, it decrements the counter. A new match (previous stanza) will reset that count.
We need to save previous lines for recalling upon a match. This loops through the number of lines of context we care about and stores them in the last hash. The current line ($0) is stored in last[1].

How can I merge PDF files (or PS if not possible) such that every file will begin in a odd page?

I am working on a UNIX system and I'd like to merge thousands of PDF files into one file in order to print it. I don't know how many pages they are in advance.
I'd like to print it double sided, such that two files will not be on the same page.
Therefore it I'd the merging file to be aligned such that every file will begin in odd page and a blank page will be added if the next place to write is an even page.
Here's the solution I use (it's based on #Dingo's basic principle, but uses an easier approach for the PDF manipulation):
Create PDF file with a single blank page
First, create a PDF file with a single blank page somewhere (in my case, it is located at /path/to/blank.pdf). This command should work (from this thread):
touch blank.ps && ps2pdf blank.ps blank.pdf
Run Bash script
Then, from the directory that contains all my PDF files, I run a little script that appends the blank.pdf file to each PDF file with an odd page number:
#!/bin/bash
for f in *.pdf; do
let npages=$(pdfinfo "$f"|grep 'Pages:'|awk '{print $2}')
let modulo="($npages %2)"
if [ $modulo -eq 1 ]; then
pdftk "$f" "/path/to/blank.pdf" output "aligned_$f"
# or
# pdfunite "$f" "/path/to/blank.pdf" "aligned_$f"
else
cp "$f" "aligned_$f"
fi
done
Combine the results
Now, all aligned_-prefixed files have even page numbers, and I can join them using
pdftk aligned_*.pdf output result.pdf
# or
pdfunite aligned_*.pdf result.pdf
Tool info:
ps2pdf is in the ghostscript package in most Linux distros
pdfinfo, pdfunite are from the Poppler PDF rendering library (usually the package name is poppler-utils or poppler_utils)
pdftk is usually its own package, the pdftk package
your problem can be more easily solved if you look at this from an another point of view
to obtain that, in printing, page 1 of second pdf file will be not attached to last page of first pdf file on the same sheet of paper, and, more generally, first page of subsequent pdf file will be not printed on the back of the same sheet with the last page of the precedent pdf file
you need to perform a selective addition of one blank page only to pdf files having and odd number of pages
I wrote a simple script named abbblankifneeded that you can put in a file and then copy in /usr/bin or /usr/local/bin
and then invoke in folder where you have your pdf with this syntax
for f in *.pdf; do addblankifneeded $f; done
this script adds a blank page at end to pdf files having an odd number of pages, skipping pdf files having already an even number of pages and then join together all pdf into one
requirements: pdftk, pdfinfo
NOTE: depending from your bash environment, you may need to replace sh interpreter with bash interpreter in the first line of script
#!/bin/sh
#script to add automatically blank page at the end of a pdf documents, if count of their pages is a not a module of 2 and then to join all pdfs into one
#
# made by Dingo
#
# dokupuppylinux.co.cc
#
#http://pastebin.com/u/dingodog (my pastebin toolbox for pdf scripts)
#
filename=$1
altxlarg="`pdfinfo -box $filename| grep MediaBox | cut -d : -f2 | awk '{print $3 FS $4}'`"
echo "%PDF-1.4
%µí®û
3 0 obj
<<
/Length 0
>>
stream
endstream
endobj
4 0 obj
<<
/ProcSet [/PDF ]
/ExtGState <<
/GS1 1 0 R
>>
>>
endobj
5 0 obj
<<
/Type /Halftone
/HalftoneType 1
/HalftoneName (Default)
/Frequency 60
/Angle 45
/SpotFunction /Round
>>
endobj
1 0 obj
<<
/Type /ExtGState
/SA false
/OP false
/HT /Default
>>
endobj
2 0 obj
<<
/Type /Page
/Parent 7 0 R
/Resources 4 0 R
/Contents 3 0 R
>>
endobj
7 0 obj
<<
/Type /Pages
/Kids [2 0 R ]
/Count 1
/MediaBox [0 0 595 841]
>>
endobj
6 0 obj
<<
/Type /Catalog
/Pages 7 0 R
>>
endobj
8 0 obj
<<
/CreationDate (D:20110915222508)
/Producer (libgnomeprint Ver: 2.12.1)
>>
endobj
xref
0 9
0000000000 65535 f
0000000278 00000 n
0000000357 00000 n
0000000017 00000 n
0000000072 00000 n
0000000146 00000 n
0000000535 00000 n
0000000445 00000 n
0000000590 00000 n
trailer
<<
/Size 9
/Root 6 0 R
/Info 8 0 R
>>
startxref
688
%%EOF" | sed -e "s/595 841/$altxlarg/g">blank.pdf
pdftk blank.pdf output fixed.pdf
mv fixed.pdf blank.pdf
pages="`pdftk $filename dump_data | grep NumberOfPages | cut -d : -f2`"
if [ $(( $pages % 2 )) -eq 0 ]
then echo "$filename has already a multiple of 2 pages ($pages ). Script will be skipped for this file" >>report.txt
else
pdftk A=$filename B=blank.pdf cat A B output blankadded.pdf
mv blankadded.pdf $filename
pdffiles=`ls *.pdf | grep -v -e blank.pdf -e joinedtogether.pdf| xargs -n 1`; pdftk $pdffiles cat output joinedtogether.pdf
fi
exit 0
You can use PDFsam:
gratis
runs on Microsoft Windows, Mac OS X and Linux
portable version available (at least on Windows)
can add a blank page after each merged document if the document has an odd number of pages
Disclaimer: I'm the author of the tools I'm mentioning here.
sejda-console
It's a free and open source command line interface for performing pdf manipulations such as merge or split. The merge command has an option stating:
[--addBlanks] : add a blank page after each merged document if the number of pages is odd (optional)
Since you just need to print the pdf I'm assuming you don't care about the order your documents are merged. This is the command you can use:
sejda-console merge -d /path/to/pdfs_to_merge -o /outputpath/merged_file.pdf --addBlanks
It can be downloaded from the official website sejda.org.
sejda.com
This is a web application backed by Sejda and has the same functionalities mentioned above but through a web interface. You are required to upload your files so, depending on the size of your input set, it might not be the right solution for you.
If you select the merge command and upload your pdf documents you will have to flag the checkbox Add blank page if odd page number to get the desired behaviour.
Here is a PowerShell version of the most popular solution using pdftk. I did this for windows but you can use PowerShell Core for other platforms.
# install pdftk server if on windows
# https://www.pdflabs.com/tools/pdftk-server/
$blank_pdf_path = ".\blank.pdf"
$input_folder = ".\input\"
$aligned_folder = ".\aligned\"
$final_output_path = ".\result.pdf"
foreach($file in (Get-ChildItem $input_folder -Filter *.pdf))
{
# easy but might break if pdfinfo output changes
# takes 7th line with the "Page: 2" and matches only numbers
(pdfinfo $file.FullName)[7] -match "(\d+)" | Out-Null
$npages = $Matches[1]
$modulo = $npages % 2
if($modulo -eq 1)
{
$output_path = Join-Path $aligned_folder $file.Name
pdftk $file.FullName $blank_pdf_path output $output_path
}
else
{
Copy-Item $file.FullName -Destination $aligned_folder
}
}
$aligned_pdfs = Join-Path $aligned_folder "*.pdf"
pdftk $aligned_pdfs output $final_output_path
Preparation
Install Python and make sure you have the pyPDF package.
Create a PDF file with a single blank in /path/to/blank.pdf (I've created blank pdf pages here).
Save this as pdfmerge.py in any directory of your $PATH. (I'm not a Windows user. This is straight forward under Linux. Please let me know if you get errors / if it works.)
Make pdfmerge.py executable
Every time you need it
Run uniprint.py a directory that contains only PDF files you want to merge.
pdfmerge.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
def merge(path, blank_filename, output_filename):
blank = PdfFileReader(file(blank_filename, "rb"))
output = PdfFileWriter()
for pdffile in glob('*.pdf'):
if pdffile == output_filename:
continue
print("Parse '%s'" % pdffile)
document = PdfFileReader(open(pdffile, 'rb'))
for i in range(document.getNumPages()):
output.addPage(document.getPage(i))
if document.getNumPages() % 2 == 1:
output.addPage(blank.getPage(0))
print("Add blank page to '%s' (had %i pages)" % (pdffile, document.getNumPages()))
print("Start writing '%s'" % output_filename)
output_stream = file(output_filename, "wb")
output.write(output_stream)
output_stream.close()
if __name__ == "__main__":
parser = ArgumentParser()
# Add more options if you like
parser.add_argument("-o", "--output", dest="output_filename", default="merged.pdf",
help="write merged PDF to FILE", metavar="FILE")
parser.add_argument("-b", "--blank", dest="blank_filename", default="blank.pdf",
help="path to blank PDF file", metavar="FILE")
parser.add_argument("-p", "--path", dest="path", default=".",
help="path of source PDF files")
args = parser.parse_args()
merge(args.path, args.blank_filename, args.output_filename)
Testing
Please make a comment if this works on Windows and Mac.
Please always leave a comment if it doesn't work / it could be improved.
It works on Linux. Joining 3 PDFs to a single 200-page PDF took less then a second.
Martin had a good start. I updated to PyPdf2 and made a few tweaks like sorting the output by filename.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
from glob import glob
from PyPDF2 import PdfFileReader, PdfFileWriter
import os.path
def merge(pdfpath, blank_filename, output_filename):
with open(blank_filename, "rb") as f:
blank = PdfFileReader(f)
output = PdfFileWriter()
filelist = sorted(glob(os.path.join(pdfpath,'*.pdf')))
for pdffile in filelist:
if pdffile == output_filename:
continue
print("Parse '%s'" % pdffile)
document = PdfFileReader(open(pdffile, 'rb'))
for i in range(document.getNumPages()):
output.addPage(document.getPage(i))
if document.getNumPages() % 2 == 1:
output.addPage(blank.getPage(0))
print("Add blank page to '%s' (had %i pages)" % (pdffile, document.getNumPages()))
print("Start writing '%s'" % output_filename)
with open(output_filename, "wb") as output_stream:
output.write(output_stream)
if __name__ == "__main__":
parser = ArgumentParser()
# Add more options if you like
parser.add_argument("-o", "--output", dest="output_filename", default="merged.pdf",
help="write merged PDF to FILE", metavar="FILE")
parser.add_argument("-b", "--blank", dest="blank_filename", default="blank.pdf",
help="path to blank PDF file", metavar="FILE")
parser.add_argument("-p", "--path", dest="path", default=".",
help="path of source PDF files")
args = parser.parse_args()
merge(args.path, args.blank_filename, args.output_filename)
`
The code by #Chris Lercher in https://stackoverflow.com/a/12761103/1369181 did not quite work for me. I do not know whether that is because I am working on Cygwin/mintty. Also, I have to use qpdf instead of pdftk. Here is the code that has worked for me:
#!/bin/bash
for f in *.pdf; do
npages=$(pdfinfo "$f"|grep 'Pages:'|sed 's/[^0-9]*//g')
modulo=$(($npages %2))
if [ $modulo -eq 1 ]; then
qpdf --empty --pages "$f" "path/to/blank.pdf" -- "aligned_$f"
else
cp "$f" "aligned_$f"
fi
done
Now, all "aligned_" files have even page numbers, and I can join them using qpdf (thanks to https://stackoverflow.com/a/51080927):
qpdf --verbose --empty --pages aligned_* -- all.pdf
And here the useful code from https://unix.stackexchange.com/a/272878 that I have used for creating the blank page:
echo "" | ps2pdf -sPAPERSIZE=a4 - blank.pdf
This one worked for me. Have used pdfcpu on macos.
Can be installed this way:
brew install pdfcpu
And have slightly adjusted the code from https://stackoverflow.com/a/12761103/1369181
#!/bin/bash
mkdir aligned
for f in *.pdf; do
let npages=$(pdfcpu info "$f"|grep 'Page count:'|awk '{print $3}')
let modulo="($npages %2)"
if [ $modulo -eq 1 ]; then
pdfcpu page insert -pages l -mode after "$f" "aligned/$f"
else
cp "$f" "aligned/$f"
fi
done
pdfcpu merge merged-aligned.pdf aligned/*.pdf
rm -rf aligned
NB! It creates and removes "aligned" directory in the current directory. So feel free to improve it to make it safe for use.

Are there any good programs for actionscript/flex that'll count lines of code, number of functions, files, packages,etc

Doug McCune had created something that was exactly what I needed (http://dougmccune.com/blog/2007/05/10/analyze-your-actionscript-code-with-this-apollo-app/) but alas - it was for AIR beta 2. I just would like some tool that I can run that would provide some decent metrics...any idea's?
There is a Code Metrics Explorer in the Enterprise Flex Plug-in below:
http://www.deitte.com/archives/2008/09/flex_builder_pl.htm
Simple tool called LocMetrics can work for .as files too...
Or
find . -name '*.as' -or -name '*.mxml' | xargs wc -l
Or if you use zsh
wc -l **/*.{as,mxml}
It won't give you what fraction of those lines are comments, or blank lines, but if you're only interested in how one project differs from another and you've written them both, it's a useful metric.
Here's a small script I wrote for finding the total numbers of occurrence for different source code elements in ActionScript 3 code (this is written in Python simply because I'm familiar with it, while Perl would probably be better suited for a regex-heavy script like this):
#!/usr/bin/python
import sys, os, re
# might want to improve on the regexes used here
codeElements = {
'package':{
'regex':re.compile('^\s*[(private|public|static)\s]*package\s+([A-Za-z0-9_.]+)\s*', re.MULTILINE),
'numFound':0
},
'class':{
'regex':re.compile('^\s*[(private|public|static|dynamic|final|internal|(\[Bindable\]))\s]*class\s', re.MULTILINE),
'numFound':0
},
'interface':{
'regex':re.compile('^\s*[(private|public|static|dynamic|final|internal)\s]*interface\s', re.MULTILINE),
'numFound':0
},
'function':{
'regex':re.compile('^\s*[(private|public|static|protected|internal|final|override)\s]*function\s', re.MULTILINE),
'numFound':0
},
'member variable':{
'regex':re.compile('^\s*[(private|public|static|protected|internal|(\[Bindable\]))\s]*var\s+([A-Za-z0-9_]+)(\s*\\:\s*([A-Za-z0-9_]+))*\s*', re.MULTILINE),
'numFound':0
},
'todo note':{
'regex':re.compile('[*\s/][Tt][Oo]\s?[Dd][Oo][\s\-:_/]', re.MULTILINE),
'numFound':0
}
}
totalLinesOfCode = 0
filePaths = []
for i in range(1,len(sys.argv)):
if os.path.exists(sys.argv[i]):
filePaths.append(sys.argv[i])
for filePath in filePaths:
thisFile = open(filePath,'r')
thisFileContents = thisFile.read()
thisFile.close()
totalLinesOfCode = totalLinesOfCode + len(thisFileContents.splitlines())
for codeElementName in codeElements:
matchSubStrList = codeElements[codeElementName]['regex'].findall(thisFileContents)
codeElements[codeElementName]['numFound'] = codeElements[codeElementName]['numFound'] + len(matchSubStrList)
for codeElementName in codeElements:
print str(codeElements[codeElementName]['numFound']) + ' instances of element "'+codeElementName+'" found'
print '---'
print str(totalLinesOfCode) + ' total lines of code'
print ''
Pass paths to all of the source code files in your project as arguments for this script to get it to process all of them and report the totals.
A command like this:
find /path/to/project/root/ -name "*.as" -or -name "*.mxml" | xargs /path/to/script
Will output something like this:
1589 instances of element "function" found
147 instances of element "package" found
58 instances of element "todo note" found
13 instances of element "interface" found
2033 instances of element "member variable" found
156 instances of element "class" found
---
40822 total lines of code
CLOC - http://cloc.sourceforge.net/. Even though it is Windows commandline based, it works with AS3.0, has all the features you would want, and is well-documented. Here is the BAT file setup I am using:
REM =====================
echo off
cls
REM set variables
set ASDir=C:\root\directory\of\your\AS3\code\
REM run the program
REM See docs for different output formats.
cloc-1.09.exe --by-file-by-lang --force-lang="ActionScript",as --exclude_dir=.svn --ignored=ignoredFiles.txt --report-file=totalLOC.txt %ASDir%
REM show the output
totalLOC.txt
REM end
pause
REM =====================
To get a rough estimate, you could always run find . -type f -exec cat {} \; | wc -l in the project directory if you're using Mac OS X.

Resources