Best practice to parse multiple config files - pyparsing

What would be the best practice - if there is any - to parse multiple config files?
I want to parse the mysql server configuration and also write the configuration again.
The configuration allows to issue multiple lines like:
!includedir /etc/mysql.d/
So the interesting thing is, that some configuration may be located in the main file but other may be located in a sub file.
I think pyparsing only works on ONE single file or one content string.
So I probably first need to read all files and maybe restructures the contents like adding headers for the different files...
====main file====
[mysql]
....
!includedir /etc/mysql.d/
====/etc/mysql.d/my.cnf====
[client]
.....
I would only have one pyparsing call.
Then I could parse everything into one big data object, group the file sections and have the file names as keys. This way I could also write the data back to the disk...
The other possibility would be to parse the main file and programmatically parse all other files that were found in the main file.
Thus I would have several pyparsing calls.
What do you think?

In your pyparsing code, attach a parse action to the expression that matches the include statements, have it parse the contents of the referenced files or directory of files, then merge those results into the current parse output. The parse action would make the successive calls to parseString, your code would only make a single call.
See this new example added to the pyparsing examples directory: https://github.com/pyparsing/pyparsing/blob/master/examples/include_preprocessor.py

Related

Need to handle the spaces between the filename in Control-M File watcher command

I have File watcher job which is looking for certain file name(Membership Daily 20191230.xslx). Could some one share some insights how to handle the space between the file when i provided the path with file name?
Usually will use * as wild card search but i have the different files which are closer with member.
Server File Watcher Run : UNIX
Enclose the full path name in quotation marks (for example, “c:\ctm\My Example.txt”). Only if a file name is in a Rules file containing a wildcard, then the filename should not be enclosed in quotation marks.
If you don't want to use spaces, one ? will wildcard for any one character, for example c:\ctm\My?File?Example.txt.

Overwrite an existing file programmatically

I have a QDialogBox where there is an option to upload a file.
I can upload files and save them to a folder. It works fine.
But if in case there is a file that already exists in the folder, I am not sure how to handle that scenario.
I want to warn the user that the file with same name already exists.
Is there a Windows API that I can use in this case? (because when we manually save an existing file, we get a warning, how can I use that?)
If someone can point me to that documentation, it will be great.
If you are using a QFileDialog, confirmOverwrite is activated by default, so, if getSaveFileName() returned a non-empty QString, then that means the user accepted to overwrite the file. Other way, you get an empty QString.
Then, you can check if the file exists, and remove it in that case, but you know that the user was Ok with that.
There is always a potential race condition when saving files. Checking to see if the file exists first is not safe, because some other process could create a file with the same name in between the check and when you actually write the file.
To avoid problems, the file must be opened with exclusive access, and in such a way that it immediately fails if it already exists.
If you want to do things properly, take a look at these two answers:
How do I create a file in python without overwriting an existing
file
Safely create a file if and only if it does not exist with
python
You can use QDir::entryList() to get the file names in a directory if you're not using a QFileDialog.
QDir dir("/path/to/directory");
QStringList fileNames = dir.entryList();
Then iterating through file names, you can see if there's a file with the same name. If you need it, I can give an example for that too. It'd be C++, but easily adaptable to Python.
Edit: Smasho just suggested that using QDir::exists() method. You can check if the file name exists in the directory with this method instead of iterating like I suggested.
if(dir.exists(uploadedFileName))

Transforming the Default URI when using MLCP

I have a delimited file as input source to ingest data in marklogic using conten-pump through unix.There is no such column in the file that is unique throught to serve as the URI. Problem with this is that since duplicates(URI) is not possible, those records are skipped/overwritten for that particular URI.
The syntaxes available are:
-delimited_uri_id *my_column_name*
output_uri_prefix *my_prefix_string*
output_uri_suffix *my_suffix_string*
output_uri_replace pattern,'string'
The command for mlcp is:
bin/mlcp.sh import -host localhost -port 8042 -username name -password password-input_file_path hdfs://path/to/file -delimiter '|' -delimited_uri_id column_name-input_file_type delimited_text -mode distributed
The problem that lies here is that if I modify the above command and include:
-output_uri_prefix $(date +%s%N)
It takes the time(in nanoseconds) of execution of this command and prefixes for all URI.But that doesnt solve my problem since this value remains repeated. Same would happen for other options available too .What could be done to have all records ingested by the construction of unique URI for all records in some manner?
One way or another it is up to you to provide unique ids. For a delimited file the easiest answer might be to add a new column and populate it with a unique id, generated however you like.
Or you could use http://marklogic.github.io/recordloader/ DelimitedDataLoader with the special option ID_NAME=#AUTO. But keep in mind that ID_NAME=#AUTO will single-thread ingestion.

How to read invariant csv files using c#

I am working on Windows Application development using c#. I want to read a csv file from a directory and imported into sql server database table. I am successfully read and import the csv file data into database table if the file content is uniform. But I am unable to insert the file data with invariant form ex.Actually my csv file delimiter is tab('\t') and after getting individual fields I have a field that contains data like dcc
Name
----
xxx
xxx yyy
xx yy zz
and i rerieved data like xxx,yyy and xx,yy,zz so the insertion becomes problem.
How could i insert the data uniformly into a database table.
It's pretty easy.
Just read file line-by-line. Example on MSDN here:
How to: Read Text from a File
For each line use String.Split Method with your tab as delimiter. Method documentation and sample are here:
String.Split Method (Char[], StringSplitOptions)
Then working insert your data.
If a CSV (or TSV) value contains a delimiter inside of it, then it should be surrounded by quotes. See the spec for more details: https://www.rfc-editor.org/rfc/rfc4180#page-3
So your input file is incorrectly formatted. If you can convince the input provider to fix this issue, that will be the best way to fix the problem. If not, other solutions may include:
visually inspecting and editing the file to fix errors, or
writing your parser program to have enough knowledge of your data expectations that it can correctly "guess" where the real delimiters are.
If I'm understanding you correctly, the problem is that your code is splitting on spaces instead of on tabs. Given you have read in the lines from the file, all you need to do is:
string[] fileLines;//from the file
foreach(string line in fileLines)
{
string[] lineParts=line.Split(new char[]{'\t'});
}
and then do whatever you want with each lineParts. The \t is the tab character.
If you're also asking about writing the lines to a database file...you can just read in tab-delimited files with the Import Export Wizard (assuming you're using Sql Server Mgmt Studio, but I'm sure there are comparable ways to import using other db management software).

Getting extension of the file in FileUpload Control

At the moment i get file extension of the file like :
string fileExt = System.IO.Path.GetExtension(filUpload.FileName);
But if the user change the file extension of the file ( for example user could rename "test.txt" to "test.jpg" ), I can't get the real extension . What's the solution ?
You seem to be asking if you can identify file-type from its content.
Most solutions will indeed attempt the file extension, but there are too many different possible file types to be reliably identifiable.
Most approaches use the first several bytes of the file to determine what they are.
Here is one list, here another.
If you are only worried about text vs binary, see this SO question and answers.
See this SO answer for checking if a file is a JPG - this approach can be extended to use other file headers as in the first two links in this answer.
Whatever the user renames the file extension to, that is the real file extension.
You should never depend on the file extension to tell you what's in the file, since it can be renamed.
See "how can we check file types before uploading them in asp.net?"
There's no way to get the 'real' file extension - the file extension that you get from the filename is the real one. If file content is your concern, you can retrieve the content type using the .ContentType property and verify that it is a content type that you are expecting - eg. image/jpg.

Resources