How can I parse a quoted string with Parsers.jl - julia

Julia’s CSV.jl parses csv files with quoted strings. It uses Parsers.jl to do this. Yet from the documentation of Parsers.jl it is not clear how to parse a double-quoted string on its own. How would I do that? As a secondary question, what is the supported set of escape sequences that Parsers.jl uses?

You can pass arbitrary characters to indicate quotation and escape characters via Parsers.Options. For example,
using Parsers
str = "{-1}"
oq, cq, e = UInt8('{'), UInt8('}'), UInt8('\\')
res = Parsers.xparse(Int64, str; openquotechar=oq, closequotechar=cq, escapechar=e)
x, code, tlen = res.val, res.code, res.tlen
print(x)

Related

Use Replace function to remove "{ characters from string

I would like to remove "{ and replace it with {. The following is the line of code that I'm currently using.
var MyString = DataString.Replace(#""{", "");
The following is the error message I'm getting
Please advise
Thanks
You need to escape the quote that you want to replace using two quotes, so for your example:
var MyString = DataString.Replace(#"""{", "{");
Also see How to include quotes in a string for alternatives to use quotes in strings.
If you are expecting JSON data then what you really need is a JSON Parser for that. And if you just want to replace "{ to { then you simply need to escape and replace the string like below:
// Suppose the variable named str has a value of Hello"{ wrapped in double quotes
var strReplaced = str.Replace("\"{", "{");
Console.WriteLine($"strReplaced: {strReplaced}");
// This will result in strReplaced: Hello{

combining strings to one string in r

I'm trying to combine some stings to one. In the end this string should be generated:
//*[#id="coll276"]
So my inner part of the string is an vector: tag <- 'coll276'
I already used the paste() method like this:
paste('//*[#id="',tag,'"]', sep = "")
But my result looks like following: //*[#id=\"coll276\"]
I don't why R is putting some \ into my string, but how can I fix this problem?
Thanks a lot!
tldr: Don't worry about them, they're not really there. It's just something added by print
Those \ are escape characters that tell R to ignore the special properties of the characters that follow them. Look at the output of your paste function:
paste('//*[#id="',tag,'"]', sep = "")
[1] "//*[#id=\"coll276\"]"
You'll see that the output, since it is a string, is enclosed in double quotes "". Normally, the double quotes inside your string would break the string up into two strings with bare code in the middle:
"//*[#id\" coll276 "]"
To prevent this, R "escapes" the quotes in your string so they don't do this. This is just a visual effect. If you write your string to a file, you'll see that those escaping \ aren't actually there:
write(paste('//*[#id="',tag,'"]', sep = ""), 'out.txt')
This is what is in the file:
//*[#id="coll276"]
You can use cat to print the exact value of the string to the console (Thanks #LukeC):
cat(paste('//*[#id="',tag,'"]', sep = ""))
//*[#id="coll276"]
Or use single quotes (if possible):
paste('//*[#id=\'',tag,'\']', sep = "")
[1] "//*[#id='coll276']"

Show all text of a docx in a stringBuilder with docx4j

i need to put all text of a docx in a stringBuilder, also with tab and hyphen.
i've tried the use of org.docx4j.TextUtils, but in the resultant string doesn't seen tab.
String inputfilepath = System.getProperty("user.home") + "test.docx";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
org.docx4j.wml.Document wmlDocumentEl = (org.docx4j.wml.Document)documentPart.getJaxbElement();
Writer out = new OutputStreamWriter(System.out);
extractText(wmlDocumentEl, out);
out.close();
As per my answer at http://www.docx4java.org/forums/docx-java-f6/is-it-possible-to-extract-all-text-also-tab-and-hyphen-t1996.html#p6933?sid=b0d58fec2ba349d0f3f49cf66411397c
The problem with tab and hyphen, as I guess you know, is that they aren't represented in the docx as normal characters.
Tab is w:tab
A hyphen might be a hyphen character, or it might be displayed (without being actually in the docx), or it might be:
http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/noBreakHyphen.html
or http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/softHyphen.html
Replicating Word's hyphenation behaviour would be a challenge.
But for the others, there are three approaches which occur to me:
generalising your traverse approach (are you using TraversalUtil.getChildrenImpl?)
doing it in XSLT (you can do this in docx4j, but XSLT is probably slower, and a mix of technologies)
marshal the main document part to a string, do suitable string replacements, then unmarshal, then use TextUtils
For (3), assuming MainDocumentPart mdp, to get it as a String:
String stringContent = mdp.getXML();
Then to inject the modified content:
mdp.setContents((Document)XmlUtils.unmarshalString(stringContent) );

Find word (not containing substrings) in comma separated string

I'm using a linq query where i do something liike this:
viewModel.REGISTRATIONGRPS = (From a In db.TABLEA
Select New SubViewModel With {
.SOMEVALUE1 = a.SOMEVALUE1,
...
...
.SOMEVALUE2 = If(commaseparatedstring.Contains(a.SOMEVALUE1), True, False)
}).ToList()
Now my Problem is that this does'n search for words but for substrings so for example:
commaseparatedstring = "EWM,KI,KP"
SOMEVALUE1 = "EW"
It returns true because it's contained in EWM?
What i would need is to find words (not containing substrings) in the comma separated string!
Option 1: Regular Expressions
Regex.IsMatch(commaseparatedstring, #"\b" + Regex.Escape(a.SOMEVALUE1) + #"\b")
The \b parts are called "word boundaries" and tell the regex engine that you are looking for a "full word". The Regex.Escape(...) ensures that the regex engine will not try to interpret "special characters" in the text you are trying to match. For example, if you are trying to match "one+two", the Regex.Escape method will return "one\+two".
Also, be sure to include the System.Text.RegularExpressions at the top of your code file.
See Regex.IsMatch Method (String, String) on MSDN for more information.
Option 2: Split the String
You could also try splitting the string which would be a bit simpler, though probably less efficient.
commaseparatedstring.Split(new Char[] { ',' }).Contains( a.SOMEVALUE1 )
what about:
- separating the commaseparatedstring by comma
- calling equals() on each substring instead of contains() on whole thing?
.SOMEVALUE2 = If(commaseparatedstring.Split(',').Contains(a.SOMEVALUE1), True, False)

What is the regular expression for "No quotes in a string"?

I am trying to write a regular expression that doesn't allow single or double quotes in a string (could be single line or multiline string). Based on my last question, I wrote like this ^(?:(?!"|').)*$, but it is not working. Really appreciate if anybody could help me out here.
Just use a character class that excludes quotes:
^[^'"]*$
(Within the [] character class specifier, the ^ prefix inverts the specification, so [^'"] means any character that isn't a ' or ".)
Just use a regex that matches for quotes, and then negate the match result:
var regex = new Regex("\"|'");
bool noQuotes = !regex.IsMatch("My string without quotes");
Try this:
string myStr = "foo'baa";
bool HasQuotes = myStr.Contains("'") || myStr.Contains("\""); //faster solution , I think.
bool HasQuotes2 = Regex.IsMatch(myStr, "['\"]");
if (!HasQuotes)
{
//not has quotes..
}
This regular expression below, allows alphanumeric and all special characters except quotes(' and "")
#"^[a-zA-Z-0-9~+:;,/#&_#*%$!()\[\] ]*$"
You can use it like
[RegularExpression(#"^[a-zA-Z-0-9~+:;,/#&_#*%$!()**\[\]** ]*$", ErrorMessage = "Should not allow quotes")]
here use escape sequence() for []. Since its not showing in this post

Resources