expand() function: Wildcards in input files/params cannot be determined from output files - wildcard

I have been using Snakemake for a considerable amount of time and I still get the same errors regarding the expand function which I am not able to solve or either to find a clear explanation. In the Snakemake documentation it is neither explained.
Snakemake version
I am using version 7.18.2.
In my Snakemake code I usually put the expand function in the rule all so all the variables are full-filled there. This usually works, but sometime I just receive the following error: Wildcards in input files cannot be determined from output files.
This was my code last time that I got that error:
rule all:
input:
snp = expand(project_dir + plink_output_dir + "target_data/{target_name}.valid.snp", target_name = TARGET_files),
snp_ex = expand(project_dir + plink_output_dir + "external_data/{ext_name}.valid.snp", ext_name = EXTERNAL_file),
pvalue = project_dir + plink_output_dir + "target_data/SNP.pvalue",
rule plink_score_ex:
input:
...
valid_snps = project_dir + plink_output_dir + "external_data/{ext_name}.valid.snp",
params:
...
bfile_dir = project_dir + "data/{ext_name}"
output:
status_ex = project_dir + plink_output_dir + "external_data/{ext_name}_prs_status",
shell: ...
I got the following error: once for the wildcards placed in the input and another for the wildcards in the params. Here are both logs:
WildcardError in line 251 of <path>/xxx_snakefile:
Wildcards in input files cannot be determined from output files:
'ext_name'
WildcardError in line 251 of <path>/xxx_snakefile:
Wildcards in params cannot be determined from output files. Note that you have to use a function to deactivate automatic wildcard expansion in params strings, e.g., `lambda wildcards: '{test}'`. Also see https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#non-file-parameters-for-rules:
'ext_name'
And this is the code with which I solved the error:
rule plink_score_ex:
input:
valid_snps = expand(project_dir + plink_output_dir + "external_data/{ext_name}.valid.snp", ext_name = EXTERNAL_file),
params:
bfile_dir = expand(project_dir + "data/{ext_name}", ext_name = EXTERNAL_file),
output:
plink_status = expand(project_dir + plink_output_dir + "external_data/plink_status_ext.txt", ext_name = EXTERNAL_file),
shell: ...
With this, I solved the error corresponding to this specific rule. However, I needed to add the expand function in all the input files and params and in all the different rules.
My question is: once that I have placed the corresponding expand function in the rule all, where do I necessary have to put it again? In case I place it in some specific rule, as it is rule plink_score_ex, do I need to put it in the input? in the output? In both places?
Thank you very much!

Related

Scilab - gui - many unknown variables error messages

Still trying to understand the logic of Scilab, I created a small calculation tool for a mechanical element. The main problem I have is finding the right order (or syntax) for the calculation code... I get a lot of "unknown variable" errors and I don't understand why?
I tried to change the order of definitions for the functions, declare the variables as global, etc. but nothing seems to help.
The code for the calculation is not long and also not complicated, but the gui was built using guibuilder, so the uicontrols definitions are probably much longer than they need to be.
Could somebody help me make this code working, as I would learn and understand a lot by this example, althought it contains more than one "problem zones"?
Here what I've done:
G = 78500;
table_titles = ["" "Wire diameter" "Wp" "Tau alwd" "M alwd" "Angle alwd"];
f=figure('figure_position',[910,163],'figure_size',
[903,537],'auto_resize','on','background',[33],'figure_name','Graphic
window number %d');
//////////
delmenu(f.figure_id,gettext('File'))
delmenu(f.figure_id,gettext('?'))
delmenu(f.figure_id,gettext('Tools'))
toolbar(f.figure_id,'off')
handles.dummy = 0;
handles.sl_dwire=uicontrol(f,'unit','normalized','BackgroundColor',
[-1,-1,-1],'Enable','on','FontAngle','normal','FontName','Tahoma',
'FontSize',[12],'FontUnits','points','FontWeight','normal',
'ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left','ListboxTop',
[],'Max',[12],'Min',[0],'Position',
[0.0058208,0.77875,0.124375,0.06875],'Relief','default','SliderStep',
[0.1,1],'Style','slider','String',"Wire diameter",'Value',
[6],'VerticalAlignment','middle','Visible','on','Tag','sl_dwire',
'Callback','sl_dwire_callback(handles)')
handles.ed_dwire=uicontrol(f,'unit','normalized','BackgroundColor',
[-1,-1,-1],'Enable','off','FontAngle','normal','FontName','Tahoma',
'FontSize',[12],'FontUnits','points','FontWeight','normal',
'ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left','ListboxTop',
[],'Max',[1],'Min',[0],
'Position',[0.0058208,0.71875,0.124375,0.06875],'Relief',
'default','SliderStep',[0.01,0.1],'String',"wire diameter: " +
msprintf('%2.1f',handles.sl_dwire.Value) + "mm",'Style','text',
'Value',[0],'VerticalAlignment','middle','Visible','on','Tag',
'ed_dwire','Callback','auto')
handles.sl_wangle=uicontrol(f,'unit','normalized','BackgroundColor',
[-1,-1,-1],'Enable','on','FontAngle','normal','FontName','Tahoma',
'FontSize',[12],'FontUnits','points','FontWeight','normal',
'ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left',
'ListboxTop',[],'Max',[180],'Min',[5],'Position',
[0.0090625,0.5191667,0.25625,0.0645833],'Relief','default',
'SliderStep',[0.1,1],'String','Working angle','Style','slider','Value',
[50],'VerticalAlignment','middle','Visible','on','Tag','sl_wangle',
'Callback','sl_wangle_callback(handles)')
handles.ed_wangle=uicontrol(f,'unit','normalized','BackgroundColor',
[-1,-1,-1],'Enable','off','FontAngle','normal','FontName','Tahoma',
'FontSize',[12],'FontUnits','points','FontWeight','normal',
'ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left',
'ListboxTop',[],'Max',[1],'Min',[0],'Position',
[0.0090625,0.4591667,0.25625,0.0645833],'Relief','default',
'SliderStep',[0.01,0.1],'String',"Working angle: " +
msprintf('%2.1f',handles.sl_wangle.Value) + "°",'Style','text',
'Value',[0],'VerticalAlignment','middle','Visible','on','Tag',
'ed_wangle','Callback','auto')
handles.sl_activel=uicontrol(f,'unit','normalized',
'BackgroundColor',[-1,-1,-1],'Enable','on','FontAngle','normal',
'FontName','Tahoma','FontSize',[12],'FontUnits','points','FontWeight',
'normal','ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left',
'ListboxTop',[],'Max',[1000],'Min',[10],'Position',
[0.0090625,0.365,0.25625,0.0645833],'Relief','default',
'SliderStep',[0.1,1],'String','Active length' ,'Style','slider',
'Value',[10],'VerticalAlignment','middle','Visible','on','Tag',
'sl_activel','Callback','sl_activel_callback(handles)')
handles.ed_activel=uicontrol(f,'unit','normalized',
'BackgroundColor',[-1,-1,-1],'Enable','off','FontAngle','normal',
'FontName','Tahoma','FontSize',[12],'FontUnits','points','FontWeight',
'normal','ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left',
'ListboxTop',[],'Max',[1],'Min',[0],'Position',
[0.0090625,0.305,0.25625,0.0645833],'Relief','default',
'SliderStep',[0.01,0.1],'String','Active length: ' +
msprintf('%2.1f',handles.sl_activel.Value) +
"mm",'Style','text','Value',[0],'VerticalAlignment','middle','Visible',
'on','Tag','ed_activel','Callback','auto')
handles.ax_graph= newaxes();handles.ax_graph.margins = [ 0 0 0 0];
handles.ax_graph.axes_bounds = [0.4274266,0.0619266,0.3995485,0.5191743];
handles.tab_param=uicontrol(f,'unit','normalized','BackgroundColor',
[-1,-1,-1],'Enable','on','FontAngle','normal','FontName',
'Tahoma','FontSize',[12],'FontUnits','points','FontWeight','normal',
'ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left',
'ListboxTop',[],'Max',[1],'Min',[0],
'Position',[0.4308126,0.1690826,0.3950339,0.2178899],'Relief',
'default','SliderStep',[0.01,0.1],'String',string(table_param),'Style',
'table','Value',[0],'VerticalAlignment','middle','Visible',
'on','Tag','tab_param','Callback','tab_param_callback(handles)')
handles.sl_sfactor=uicontrol(f,'unit','normalized',
'BackgroundColor',[-1,-1,-1],'Enable','on','FontAngle','normal',
'FontName','Tahoma','FontSize',[12],'FontUnits','points','FontWeight',
'normal','ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left',
'ListboxTop',[],'Max',[1],'Min',[0],
'Position',[0.0058208,0.6525688,0.124375,0.06875],'Relief','default',
'SliderStep',[0.01,0.1],'String',"Safety factor",'Style',
'slider','Value',[0.58],'VerticalAlignment','middle','Visible','on',
'Tag','ed_sfactor','Callback','sl_sfactor_callback(handles)')
handles.ed_sfactor=uicontrol(f,'unit','normalized',
'BackgroundColor',[-1,-1,-1],'Enable','off','FontAngle','normal',
'FontName','Tahoma','FontSize',[12],'FontUnits','points','FontWeight',
'normal','ForegroundColor',[-1,-1,-1],'HorizontalAlignment',
'left','ListboxTop',[],'Max',[1],'Min',[0],'Position',
[0.0058208,0.5925688,0.124375,0.06875],'Relief','default',
'SliderStep',[0.01,0.1],'String',"Safety factor : " +
msprintf('%2.1f',handles.sl_sfactor.Value),'Style',
'text','Value',[0.58],'VerticalAlignment','middle','Visible','on',
'Tag','ed_dwire','Callback','auto')
handles.popm_wtype=uicontrol(f,'unit','normalized',
'BackgroundColor',[-1,-1,-1],'Enable','on','FontAngle','normal',
'FontName','Tahoma','FontSize',[12],'FontUnits','points','FontWeight',
'normal','ForegroundColor',[-1,-1,-1],'HorizontalAlignment','left',
'ListboxTop',[],'Max',[1],'Min',[0],'Position',
[0.0058208,0.8618349,0.124375,0.0639450],'Relief','default',
'SliderStep',[0.01,0.1],'String',gettext("SL/DL|SM/DM|SH/DH"),'Style',
'popupmenu','Value',[2],'VerticalAlignment','middle','Visible','on',
'Tag','popm_wtype','Callback','popm_wtype_callback(handles)')
//////////
// Callbacks are defined as below. Please do not delete the comments
as it will be used in coming version
//////////
function sl_sfactor_callback(handles)
sf=handles.sl_sfactor.Value;
handles.ed_sfactor.String="Safety factor: " + msprintf('%3.2f',sf);
endfunction
function sl_dwire_callback(handles)
wd=handles.sl_dwire.Value;
Wp = %pi*wd^3/16;
Ip =%pi*wd^4/32;
Kt = G*%pi*Ip/(180*L);
Talwd = sf * calcform;
Malwd = Wp * Talwd;
alphaalwd = Malwd / Kt;
x=0:0.1:alphaalwd*1.5;
plot(x,Kt*x);
handles.ed_dwire.String="Wire diameter: " +
msprintf('%2.1f',wd) + "mm";
endfunction
function popm_wtype_callback(handles)
//Write your callback for popm_wtype here
if selected == 1 then
calcform =(1845 - 700*log10(wd));
elseif selected == 2 then
calcform =(2105 - 780*log10(wd));
elseif selected == 3 then
calcform = (2220 - 820*log10(wd));
end
endfunction
function sl_wangle_callback(handles)
handles.ed_wangle.String="Working angle: " +
msprintf('%2.1f',handles.sl_wangle.Value) + "°";
endfunction
function sl_activel_callback(handles)
//Write your callback for sl_activel here
handles.ed_activel.String="Active length: " +
msprintf('%2.1f',handles.sl_activel.Value) + "mm";
L=handles.sl_activel.Value;
endfunction
function tab_param_callback(handles)
//Write your callback for tab_param here
table_values = string([ wd Wp Talwd Malwd alphaalwd]);
table_param = [table_titles; [table_values]];
endfunction
I expect the code to dynamically update the graph and the parameters table according to the positions of input sliders and popup-menu.
Again, it would be very helpful if somebody could help me get this code working, as I would get answers for a lot of my questions concerning programming with scilab.
Thank you very much in advance!
First of all, your code is not directly executable, caused by the linebreaks. Please add ... after each line of a statement. This improves also the readability.
As mentioned by #luispauloml, you try to use variables, which exist just inside of another function. For example you try to reach table_param, which is just alive in tab_param_callback(handles).
To get rid of this, you have to define the output for the function:
function [table_values, table_param] = tab_param_callback(handles)
table_values = string([ wd Wp Talwd Malwd alphaalwd]);
table_param = [table_titles; [table_values]];
endfunction
Now you can call this function to get the variable:
string(tab_param_callback(handles))
I corrected just this case and added the dots as explained. For the other variables, it can be done analogously. Furthermore, I moved the function definition to the beginning of the code. Because if your scrips crashed in the middle, the compiler has no chance to read the function definitions.
Please find the code in this file .
I hope this helps. Good luck!

Does U-SQL support extracting files based on date of creation in ADLS

We know U-SQL supports directory and filename pattern matching while extracting the files. What I wanted to know does it support pattern matching based on date of creation of the file in ADLS (without implementing custom extractors).
Say a folder contains files created across months (filenames don't have date as part of the filename), is there a way to pull only files of a particular month.
The U-SQL EXTRACT operator is not aware of any metadata (such as create date) about a file - only the filename.
You could probably build a solution using the .NET SDK. For something rather simple you could use PowerShell to create a file which will contain all the files that meet your date time criteria. Then consume the content as desired.
# Log in to your Azure account
Login-AzureRmAccount
# Modify variables as required
$DataLakeStoreAccount = "<yourDataLakeStoreAccountNameHere>";
$DataLakeAnalyticsAccount = <yourDataLakeAnalyticsAccountNameHere>";
$DataLakeStorePath = "/Samples/Data/AmbulanceData/"; #modify as desired
$outputFile = "Samples/Outputs/ReferenceGuide/filteredFiles.csv"; #modify as desired
$filterDate = "2016-11-22";
$jobName = "GetFiles";
# Query directory and build main body of script. Note, there is a csv filter.
[string]$body =
"#initial =
SELECT * FROM
(VALUES
" +
(Get-AzureRmDataLakeStoreChildItem -Account $DataLakeStoreAccount -Path $DataLakeStorePath |
Where {$_.Name -like "*.csv" -and $_.Type -eq "FILE"} | foreach {
"(""" + $DataLakeStorePath + $_.Name + """, (DateTime)FILE.CREATED(""" + $DataLakeStorePath + $_.Name + """)), `r`n" });
# formattig, add column names
$body =
$body.Substring(0,$body.Length-4) + "
) AS T(fileName, createDate);";
# U-SQL query and OUTPUT statement
[string]$output =
"
// filter results based on desired time frame
#filtered =
SELECT fileName
FROM #initial
WHERE createDate.ToString(""yyyy-MM-dd"") == ""$filterDate"";
OUTPUT #filtered
TO ""$outputFile""
USING Outputters.Csv();";
# bring it all together
$script = $body + $output;
#Execute job
$jobInfo = Submit-AzureRmDataLakeAnalyticsJob -Account $DataLakeAnalyticsAccount -Name $jobName -Script $script -DegreeOfParallelism 1
#check job progress
Get-AzureRmDataLakeAnalyticsJob -Account $DataLakeAnalyticsAccount -JobId $jobInfo.JobId -ErrorAction SilentlyContinue;
Write-Host "You now have a list of desired files to check # " $outputFile
Currently there is no way to access or use file meta data properties. Please add your vote and use case to the following feedback item: https://feedback.azure.com/forums/327234-data-lake/suggestions/10948392-support-functionality-to-handle-file-properties-fr
it's been a while since this question was asked, and I'm not sure if this is what you were looking for originally, but now you can use the FILE.MODIFIED U-SQL function:
DECLARE #watermark string = "2018-08-16T18:12:03";
SET ##FeaturePreviews="InputFileGrouping:on";
DECLARE #file_set_path string = "adl://adls.azuredatalakestore.net/stage/InputSample.tsv";
#input =
EXTRACT [columnA] int?,
[columnB] string
FROM #file_set_path
USING Extractors.Tsv(skipFirstNRows : 1, silent : true);
#result =
SELECT *, FILE.MODIFIED(#file_set_path) AS FileModifiedDate
FROM #input
WHERE FILE.MODIFIED(#file_set_path) > DateTime.ParseExact(#watermark, "yyyy-MM-ddTHH:mm:ss", NULL);
OUTPUT #result TO "adl://ADLS.azuredatalakestore.net/stage/OutputSample.tsv" USING Outputters.Tsv(outputHeader:true);
The U-SQL built-in function is documented here:
https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/file-modified-u-sql

Javascript / Googlescript using set.Formula with complex formulas

I have researching this but cannot find a suitable solution. The following formula works fine at the formula level when placed in a sheet cell. The issue is I want the formula to run at the script level. The options I am aware of include running a script to:
(1) set.Formula('=complex formula')
or
(2) rewriting the entire formula as a script
I am new to GAS, and have messed around with both methods. There seems to be a syntax error when using option (1), usually in the form of a missing ")" that I cannot debug. Employing option (2) is currently above my skill level. Any help on either option would be greatly appreciated.
Here is the formula in question:
=ARRAYFORMULA(QUERY({UI!A:G,YEAR(UI!A:A),MONTH(UI!A:A), TEXT(UI!A:A, "MMMM"), TEXT(UI!A:A, "MMM-YY"), REPLACE(UI!A:A,1,1000,"GRAND
TOTAL")}, "SELECT * WHERE Col1 IS NOT NULL AND Col2 IS NOT NULL LABEL
Col8 'Year',Col9 'MonthMO#',Col10 'MonthMO',Col11 'MonthMOYR',Col12
'GRAND TOTAL'"))
Answer
Formulas doesn't run "at script level" so you will have to rewrite your formula as a Google Apps Script / JavaScript function.
Escaping formula apostrophes in scripts
In case that you want a script to add your complex formula to a cell bear in mind that writing complex formulas in one line makes harder to debug them.
Try dividing your formula by functions and parameters and use a tab to align them. IMHO this prevents that the use of \ makes the script unreadable (Leaning toothpick syndrome).
Below is a onEdit() function that inserts the complex formula in the question to the cell to the right of a cell where 'Yes' is wrote. Look to the use of \ to escape the apostrophes used in the second parameter of the query function (select statement).
function onEdit() {
var ss = SpreadsheetApp.getActive();
var rng = ss.getActiveRange();
var trg = rng.offset(0, 1);
var formula =
'=ARRAYFORMULA('
+ 'QUERY({'
+ 'UI!A:G,YEAR(UI!A:A),MONTH(UI!A:A),'
+ 'TEXT(UI!A:A, "MMMM"), '
+ 'TEXT(UI!A:A, "MMM-YY"), '
+ 'REPLACE(UI!A:A,1,1000,"GRAND TOTAL")'
+ '},'
+ '"SELECT * WHERE Col1 IS NOT NULL AND Col2 IS NOT NULL LABEL Col8 \'Year\', '
+ 'Col9 \'MonthMO#\', Col10 \'MonthMO\', Col11 \'MonthMOYR\', '
+ 'Col12 \'GRAND TOTAL\'"'
+ ')'
+ ')';
if(rng.getValue() == 'Yes') trg.setFormula(formula);
}

Simplifying device creation in sip.conf

I often have to define many similar devices in sip.conf like this:
[device](!)
; setting some parameters
[device01](device)
callerid=dev01 <01>
[device02](device)
callerid=dev02 <02>
; ...
[deviceXX](device)
callerid=devXX <XX>
The question is perhaps I could avoid setting device-name specific parameters by using some variable like following?
[device](!)
callerid=dev${DEVICE_NAME:-2} <${DEVICE_NAME:-2}>
; setting some parameters
[device01](device)
[device02](device)
; ...
[deviceXX](device)
P.S.
It would be perfect, if there was some device constructor, so I could reduce the script to following, but, I think, that is not possible in Asterisk.
[device](!)
callerid=dev${DEVICE_NAME:-2} <${DEVICE_NAME:-2}>
; setting some parameters
;[device${MAGIC_LOOP(1,XX,leading_zeroes)}](device)
I've had good results writing a small program that takes care of it. It checks for a line saying something like
------- Automatically generated -------
and whatever is after that line, it's going to be regenerated as soon as it detects that there are new values for it (it could be from a database or from a text file). Then, I run it with supervisor and it checks every XX seconds if there are changes.
If there are changes, it issues a sip reload command after updating the sip.conf file
I wrote it in python, but whatever language you feel comfortable with should work just fine.
That's how I managed that and has been working fine so far (after a couple of months). I'd be extremely interested in learning about other approaches though. It's basically this (called from another script with supervisor):
users = get_users_logic()
#get the data that will me used on the sip.conf file
data_to_be_hashed = reduce(lambda x, y: x + y, map(lambda x: x['username'] + x['password'] + x['company_prefix'], users))
m = hashlib.md5()
m.update(str(data_to_be_hashed).encode("ascii"))
new_md5 = m.hexdigest()
last_md5 = None
try:
file = open(os.path.dirname(os.path.realpath(__file__)) + '/lastMd5.txt', 'r')
last_md5 = file.read().rstrip()
file.close()
except:
pass
# if it changed...
if new_md5 != last_md5:
#needs update
with open(settings['asterisk']['path_to_sip_conf'], 'r') as file:
sip_content = file.read().rstrip()
parts = sip_content.split(";-------------- BEYOND THIS POINT IT IS AUTO GENERATED --------------;")
sip_content = parts[0].rstrip()
sip_content += "\n\n;-------------- BEYOND THIS POINT IT IS AUTO GENERATED --------------;\n\n"
for user in users:
m = hashlib.md5()
m.update(("%s:sip.ellauri.it:%s" % (user['username'], user['password'])).encode("ascii"))
md5secret = m.hexdigest()
sip_content += "[%s]\ntype = friend\ncontext = %sLocal\nmd5secret = %s\nhost = dynamic\n\n" % (
user['username'], user['company_prefix'], md5secret)
#write the sip.conf file
f = open(settings['asterisk']['path_to_sip_conf'], 'w')
print(sip_content, file=f)
f.close()
subprocess.call('asterisk -x "sip reload"', shell=True)
#write the new md5
f = open(os.path.dirname(os.path.realpath(__file__)) + '/lastMd5.txt', 'w')
print(new_md5, file=f)
f.close()

pyparsing multiple lines optional missing data in result set

I am quite new pyparsing user and have missing match i don't understand:
Here is the text i would like to parse:
polraw="""
set policy id 800 from "Untrust" to "Trust" "IP_10.124.10.6" "MIP(10.0.2.175)" "TCP_1002" permit
set policy id 800
set dst-address "MIP(10.0.2.188)"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 724 from "Trust" to "Untrust" "IP_10.16.14.28" "IP_10.24.10.6" "TCP_1002" permit
set policy id 724
set src-address "IP_10.162.14.38"
set dst-address "IP_10.3.28.38"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 233 name "THE NAME is 527 ;" from "Untrust" to "Trust" "IP_10.24.108.6" "MIP(10.0.2.149)" "TCP_1002" permit
set policy id 233
set service "TCP_1002-1005"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
"""
I setup grammar this way:
KPOL = Suppress(Keyword('set policy id'))
NUM = Regex(r'\d+')
KSVC = Suppress(Keyword('set service'))
KSRC = Suppress(Keyword('set src-address'))
KDST = Suppress(Keyword('set dst-address'))
SVC = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
ADDR = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
EXIT = Suppress(Keyword('exit'))
EOL = LineEnd().suppress()
P_SVC = KSVC + SVC + EOL
P_SRC = KSRC + ADDR + EOL
P_DST = KDST + ADDR + EOL
x = KPOL + NUM('PId') + EOL + Optional(ZeroOrMore(P_SVC)) + Optional(ZeroOrMore(P_SRC)) + Optional(ZeroOrMore(P_DST))
for z in x.searchString(polraw):
print z
Result set is such as
['800', 'MIP(10.0.2.188)']
['724', 'IP_10.162.14.38', 'IP_10.3.28.38']
['233', 'TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
The 800 is missing service tag ???
What's wrong here.
Thanks by advance
Laurent
The problem you are seeing is that in your expression, DST's are only looked for after having skipped over optional SVC's and SRC's. You have a couple of options, I'll go through each so you can get a sense of what all is going on here.
(But first, there is no point in writing "Optional(ZeroOrMore(anything))" - ZeroOrMore already implies Optional, so I'm going to drop the Optional part in any of these choices.)
If you are going to get SVC's, SRC's, and DST's in any order, you could refactor your ZeroOrMore to accept any of the three data types, like this:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC|P_SRC|P_DST)
This will allow you to intermix different types of statements, and they will all get collected as part of the ZeroOrMore repetition.
If you want to keep these different types of statements in groups, then you can add a results name to each:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC("svc*")|
P_SRC("src*")|
P_DST("dst*"))
Note the trailing '*' on each name - this is equivalent to calling setResultsName with the listAllMatches argument equal to True. As each different expression is matched, the results for the different types will get collected into the "svc", "src", or "dst" results name. Calling z.dump() will list the tokens and the results names and their values, so you can see how this works.
set policy id 233
set service "TCP_1002-1005"
set dst-address "IP_10.3.28.38"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
shows this for z.dump():
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: [['IP_10.3.28.38']]
- svc: [['TCP_1002-1005'], ['TCP_1006-1008'], ['TCP_1786']]
If you wrap ungroup on the P_xxx expressions, maybe like this:
P_SVC,P_SRC,P_DST = (ungroup(expr) for expr in (P_SVC,P_SRC,P_DST))
then the output is even cleaner-looking:
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: ['IP_10.3.28.38']
- svc: ['TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
This is actually looking pretty good, but let me pass on one other option. There are a number of cases where parsers have to look for several sub-expressions in any order. Let's say they are A,B,C, and D. To accept these in any order, you could write something like OneOrMore(A|B|C|D), but this would accept multiple A's, or A, B, and C, but not D. The exhaustive/exhausting combinatorial explosion of (A+B+C+D) | (A+B+D+C) | etc. could be written, or you could maybe automate it with something like
from itertools import permutations
mixNmatch = MatchFirst(And(p) for p in permutations((A,B,C,D),4))
But there is a class in pyparsing called Each that allows to write the same kind of thing:
Each([A,B,C,D])
meaning "must have one each of A, B, C, and D, in any order". And like And, Or, NotAny, etc., there is an operator shortcut too:
A & B & C & D
which means the same thing.
If you want "must have A, B, and C, and optionally D", then write:
A & B & C & Optional(D)
and this will parse with the same kind of behavior, looking for A, B, C, and D, regardless of the incoming order, and whether D is last or mixed in with A, B, and C. You can also use OneOrMore and ZeroOrMore to indicate optional repetition of any of the expressions.
So you could write your expression as:
x = KPOL + NUM('PId') + EOL + (ZeroOrMore(P_SVC) &
ZeroOrMore(P_SRC) &
ZeroOrMore(P_DST))
I looked at using results names with this expression, and the ZeroOrMore's seem to be confusing things, maybe still a bug in how this is done. So you may have to reserve using Each for more basic cases like my A,B,C,D example. But I wanted to make you aware of it.
Some other notes on your parser:
dblQuotedString.setParseAction(lambda t: t[0].replace('"','')) is probably better written
dblQuotedString.setParseAction(removeQuotes). You don't have any embedded quotes in your examples, but it's good to be aware of where your assumptions might not translate to a future application. Here are a couple of ways of removing the defining quotes:
dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \ and an ending quote \'
# removed leading and trailing "s, but also internal ones too, which are
# really part of the quoted string
dblQuotedString.setParseAction(lambda t: t[0].strip('"'))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \'
# removed leading and trailing "s, and leaves the one internal ones but strips off
# the escaped ending quote
dblQuotedString.setParseAction(removeQuotes)
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \"'
# just removes leading and trailing " characters, leaves escaped "s in place
KPOL = Suppress(Keyword('set policy id')) is a bit fragile, as it will break if there are any extra spaces between 'set' and 'policy', or between 'policy' and 'id'. I usually define these kind of expressions by first defining all the keywords individually:
SET,POLICY,ID,SERVICE,SRC_ADDRESS,DST_ADDRESS,EXIT = map(Keyword,
"set policy id service src-address dst-address exit".split())
and then define the separate expressions using:
KSVC = Suppress(SET + SERVICE)
KSRC = Suppress(SET + SRC_ADDRESS)
KDST = Suppress(SET + DST_ADDRESS)
Now your parser will cleanly handle extra whitespace (or even comments!) between individual keywords in your expressions.

Resources