When writing a web application which allows the upload of files, one must be concious of the fact that users of said web application may be on any operating system, including unix-like systems which use <lf> for line ending, mac-like systems which use <cr> for line endings, and windows-like systems which use <cr><lf> for line ending.
Assuming that I want to parse uploaded files (such as csv's) to access the data within them (for instance so as to import it into my application), is there a standard OS-agnostic method of breaking the file up into its constituent lines for further parsing?
You could use a StreamReader. The ReadLine method handles the various types of line break:
A line is defined as a sequence of characters followed by a line feed
("\n"), a carriage return ("\r"), or a carriage return immediately
followed by a line feed ("\r\n"). The string that is returned does not
contain the terminating carriage return or line feed.
Related
I have File watcher job which is looking for certain file name(Membership Daily 20191230.xslx). Could some one share some insights how to handle the space between the file when i provided the path with file name?
Usually will use * as wild card search but i have the different files which are closer with member.
Server File Watcher Run : UNIX
Enclose the full path name in quotation marks (for example, “c:\ctm\My Example.txt”). Only if a file name is in a Rules file containing a wildcard, then the filename should not be enclosed in quotation marks.
If you don't want to use spaces, one ? will wildcard for any one character, for example c:\ctm\My?File?Example.txt.
What would be the best practice - if there is any - to parse multiple config files?
I want to parse the mysql server configuration and also write the configuration again.
The configuration allows to issue multiple lines like:
!includedir /etc/mysql.d/
So the interesting thing is, that some configuration may be located in the main file but other may be located in a sub file.
I think pyparsing only works on ONE single file or one content string.
So I probably first need to read all files and maybe restructures the contents like adding headers for the different files...
====main file====
[mysql]
....
!includedir /etc/mysql.d/
====/etc/mysql.d/my.cnf====
[client]
.....
I would only have one pyparsing call.
Then I could parse everything into one big data object, group the file sections and have the file names as keys. This way I could also write the data back to the disk...
The other possibility would be to parse the main file and programmatically parse all other files that were found in the main file.
Thus I would have several pyparsing calls.
What do you think?
In your pyparsing code, attach a parse action to the expression that matches the include statements, have it parse the contents of the referenced files or directory of files, then merge those results into the current parse output. The parse action would make the successive calls to parseString, your code would only make a single call.
See this new example added to the pyparsing examples directory: https://github.com/pyparsing/pyparsing/blob/master/examples/include_preprocessor.py
I'm trying to build a channel to read an HL7 ADT text file, extract an MRN and write output to a SQLite table (Database Writer).
My SQLite table contains my data but all my PatientIDs are appearing as a concatenated string in one very wide column. As opposed to a PatientID per row/record.
I'm noticing some weird illegal(?) characters in my HL7 file (which come from a Meditech EMR). In QuickViewHL7 they appear in the MSH-22 and MSH-30.
In the VIM editor -
My question is, are these supposed to be delimiters? If so, what are they? Carriage Returns?
I've posted this question on the Mirth Connect forums but seen little but tumbleweeds. I'm hoping someone here might have seen this before and tell me what's going on.
UPDATE: Hex dump suggests it's a 0x7f (0111 1111). This looks like a backspace character. Should I simply strip it or substitute it with something?
This illegal character should be a line feed carriage return to delimit the start of the next HL7 segment.
Using VIM, highlight the illegal character and press 'ga'. This will tell you the hex value of the character. In my case 0xfa (which appears to be a back space!?).
Again in Vim, do a global substitute for a \r
:%s/\%x7f/\r/g
Then save the file.
Everything parses out nicely now.
I have put an XML into a receive location using the Microsoft BizTalk default pipeline "XMLReceive" and then use PassThroughTransmit to output the file to a directory.
However, if hex editor to check the output file, I found that there are three special characters  are found at the beginning of the output file.
The ASCII of  is EF BB BF.
Is there any idea why there are 3 control characters are added at the beginning of the output file?
Those characters are the Byte Order Mark which tell the receiving application how to interpret the text stream. They are not junk but are optional.
I recommend you always send the BOM unless the recieving system cannot accept them (which is really their problem ;).
I have googled the solution myself and shared to others.
Removing the BOM from Outgoing BizTalk Files
http://mindovermessaging.com/2013/08/06/removing-the-bom-from-outgoing-biztalk-files/
The three special characters are BOM (Byte Order Mark), set the PreserveBOM to false in sendport XMLTransmit pipeline will remove these three characters.
Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)
id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.
Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.
Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.