Let's pretend we have the following types:
type Message {
text : Option<string>
}
type Update {
msg : Option<Message>
}
How do I match it in one line, like in C# using null-conditional operator i.e update?.msg?.text ?
Is there a way to do it like this?:
match msg, msg.text with
| Some msg, Some txt -> ...
| None -> ...
because I don't want to be writing 2 nested match expressions.
You have two Record types (missing the "=" in your example). To match some variable of Update type, you could do as follows:
type Message = { text : Option<string> }
type Update = { msg : Option<Message> }
let u = {msg = Some({text = Some "text"})}
//all 3 possible cases
match u with
| {msg = Some({text = Some t})} -> t
| {msg = Some({text = None})} -> ""
| {msg = None} -> ""
My program reads a line from a file. This line contains comma-separated text like:
123,test,444,"don't split, this",more test,1
I would like the result of a split to be this:
123
test
444
"don't split, this"
more test
1
If I use the String.split(","), I would get this:
123
test
444
"don't split
this"
more test
1
In other words: The comma in the substring "don't split, this" is not a separator. How to deal with this?
You can try out this regex:
str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
This splits the string on , that is followed by an even number of double quotes. In other words, it splits on comma outside the double quotes. This will work provided you have balanced quotes in your string.
Explanation:
, // Split on comma
(?= // Followed by
(?: // Start a non-capture group
[^"]* // 0 or more non-quote characters
" // 1 quote
[^"]* // 0 or more non-quote characters
" // 1 quote
)* // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
[^"]* // Finally 0 or more non-quotes
$ // Till the end (This is necessary, else every comma will satisfy the condition)
)
You can even type like this in your code, using (?x) modifier with your regex. The modifier ignores any whitespaces in your regex, so it's becomes more easy to read a regex broken into multiple lines like so:
String[] arr = str.split("(?x) " +
", " + // Split on comma
"(?= " + // Followed by
" (?: " + // Start a non-capture group
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" )* " + // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
" [^\"]* " + // Finally 0 or more non-quotes
" $ " + // Till the end (This is necessary, else every comma will satisfy the condition)
") " // End look-ahead
);
Why Split when you can Match?
Resurrecting this question because for some reason, the easy solution wasn't mentioned. Here is our beautifully compact regex:
"[^"]*"|[^,]+
This will match all the desired fragments (see demo).
Explanation
With "[^"]*", we match complete "double-quoted strings"
or |
we match [^,]+ any characters that are not a comma.
A possible refinement is to improve the string side of the alternation to allow the quoted strings to include escaped quotes.
Building upon #zx81's answer, cause matching idea is really nice, I've added Java 9 results call, which returns a Stream. Since OP wanted to use split, I've collected to String[], as split does.
Caution if you have spaces after your comma-separators (a, b, "c,d"). Then you need to change the pattern.
Jshell demo
$ jshell
-> String so = "123,test,444,\"don't split, this\",more test,1";
| Added variable so of type String with initial value "123,test,444,"don't split, this",more test,1"
-> Pattern.compile("\"[^\"]*\"|[^,]+").matcher(so).results();
| Expression value is: java.util.stream.ReferencePipeline$Head#2038ae61
| assigned to temporary variable $68 of type java.util.stream.Stream<MatchResult>
-> $68.map(MatchResult::group).toArray(String[]::new);
| Expression value is: [Ljava.lang.String;#6b09bb57
| assigned to temporary variable $69 of type String[]
-> Arrays.stream($69).forEach(System.out::println);
123
test
444
"don't split, this"
more test
1
Code
String so = "123,test,444,\"don't split, this\",more test,1";
Pattern.compile("\"[^\"]*\"|[^,]+")
.matcher(so)
.results()
.map(MatchResult::group)
.toArray(String[]::new);
Explanation
Regex [^"] matches: a quote, anything but a quote, a quote.
Regex [^"]* matches: a quote, anything but a quote 0 (or more) times , a quote.
That regex needs to go first to "win", otherwise matching anything but a comma 1 or more times - that is: [^,]+ - would "win".
results() requires Java 9 or higher.
It returns Stream<MatchResult>, which I map using group() call and collect to array of Strings. Parameterless toArray() call would return Object[].
You can do this very easily without complex regular expression:
Split on the character ". You get a list of Strings
Process each string in the list: Split every string that is on an even position in the List (starting indexing with zero) on "," (you get a list inside a list), leave every odd positioned string alone (directly putting it in a list inside the list).
Join the list of lists, so you get only a list.
If you want to handle quoting of '"', you have to adapt the algorithm a little bit (joining some parts, you have incorrectly split of, or changing splitting to simple regexp), but the basic structure stays.
So basically it is something like this:
public class SplitTest {
public static void main(String[] args) {
final String splitMe="123,test,444,\"don't split, this\",more test,1";
final String[] splitByQuote=splitMe.split("\"");
final String[][] splitByComma=new String[splitByQuote.length][];
for(int i=0;i<splitByQuote.length;i++) {
String part=splitByQuote[i];
if (i % 2 == 0){
splitByComma[i]=part.split(",");
}else{
splitByComma[i]=new String[1];
splitByComma[i][0]=part;
}
}
for (String parts[] : splitByComma) {
for (String part : parts) {
System.out.println(part);
}
}
}
}
This will be much cleaner with lambdas, promised!
Please see the below code snippet. This code only considers happy flow. Change the according to your requirement
public static String[] splitWithEscape(final String str, char split,
char escapeCharacter) {
final List<String> list = new LinkedList<String>();
char[] cArr = str.toCharArray();
boolean isEscape = false;
StringBuilder sb = new StringBuilder();
for (char c : cArr) {
if (isEscape && c != escapeCharacter) {
sb.append(c);
} else if (c != split && c != escapeCharacter) {
sb.append(c);
} else if (c == escapeCharacter) {
if (!isEscape) {
isEscape = true;
if (sb.length() > 0) {
list.add(sb.toString());
sb = new StringBuilder();
}
} else {
isEscape = false;
}
} else if (c == split) {
list.add(sb.toString());
sb = new StringBuilder();
}
}
if (sb.length() > 0) {
list.add(sb.toString());
}
String[] strArr = new String[list.size()];
return list.toArray(strArr);
}
I have a multiline textbox that user may type whatever he wants to for example,
"Hello my name is #Konstantinos and i am 20 #years old"
Now i want to place a button when is pressed the output will be #Konstantinos and #years -
Is that something that can be done using substring or any other idea?
Thank you in advance
If all that you want is HashTags(#) from the entire string, you can perform simple .Split() and Linq. Try this:
C#
string a = "Hello my name is #Konstantinos and i am 20 #years old";
var data = a.Split(' ').Where(s => s.StartsWith("#")).ToList();
VB
Dim a As String = "Hello my name is #Konstantinos and i am 20 #years old"
Dim data = a.Split(" ").Where(Function(s) s.StartsWith("#")).ToList()
Using regex will give you more flexibility.
You can define a pattern to search for strings starting with #.
.Net regex cheat sheet
Dim searchPattern = "#(\S+)" '\S - Matches any nonwhite space character
Dim searchString = "Hello my name is #Konstantinos and i am 20 #years old"
For Each match As Match In Regex.Matches(searchString, searchPattern, RegexOptions.Compiled)
Console.WriteLine(match.Value)
Next
Console.Read()
This will work . Try this..
string str = "Hello my name is #Konstantinos and i am 20 #years old asldkfjklsd #kumod";
int i=0;
int k = 0;
while ((i = str.IndexOf('#', i)) != -1)
{
string strOutput = str.Substring(i);
k = strOutput.IndexOf(' ');
if (k != -1)
{
Console.WriteLine(strOutput.Substring(0, k));
}
else
{
Console.WriteLine(strOutput);
}
i++;
}
Hey im looking for a way to do the following to populate a text file
if I need to fill a alphanumeric column with Field size 20 and I only have 18 characters to append two blank values.
then same for numeric values if field size is 10 for example and i have a value of 5 characters to fill in remaining spaces with 5 0's
i.e instead of 10000 i would have 0000010000
string s = "10000";
string t = s.PadLeft(20, '0');
The PadLeft method should do the trick. Something like this:
var output = myTextString.PadLeft(20);
or
var output = myNumericString.PadLeft(10, '0');
Here's some pseudocode that should do it:
int size = mystring.length();
int padding = 20 - size;
string pad = "";
for(padding){
pad += "0";
}
string newstring = pad + mystring;
http://jsfiddle.net/qtzTu/
How to Pad a Number with Leading Zeroes:
http://msdn.microsoft.com/en-us/library/dd260048.aspx
The OP implied the string lengths would vary, so here is a Func where you won't have to hard code values like "10" or "20"
Func<string, int, string> PadStringToSize = (x,y)
=> (x.Length < y ? x.PadLeft(y, '0') : x);
You can then do things like:
Console.WriteLine(PadStringToSize("10000", 10)); // Pads to 10
Console.WriteLine(PadStringToSize("10000", 30)); // Pads to 30
Then you can just wrap the call to a method that takes the size as a parameter instead hard coding the desired length.
According to MSDN on DateTime.ToString ToString("s") should always return string in the format of the sortable XML Schema style formatting, e.g.: 2008-10-01T17:04:32.0000000
In Reflector I came to this pattern inside DateTimeFormatInfo.
public string SortableDateTimePattern
{
get
{
return "yyyy'-'MM'-'dd'T'HH':'mm':'ss";
}
}
Does DateTime.ToString("s") return always a string in this format?
Regardless the Culture, Region, ...
Yes it does
Code to test that
var dateTime = DateTime.Now;
var originialString = dateTime.ToString("s");
string testString;
foreach (var c in System.Globalization.CultureInfo.GetCultures(CultureTypes.AllCultures))
{
Thread.CurrentThread.CurrentUICulture = c;
if (c.IsNeutralCulture == false)
{
Thread.CurrentThread.CurrentCulture = c;
}
testString = dateTime.ToString("s");
Console.WriteLine("{0} ", testString);
if (originialString != testString)
{
throw new ApplicationException(string.Format("ToString(s) is returning something different for {0} " , c));
}
}
Yes it does. As others have said it only contains numeric values and string literals (e.g. 'T' and ':'), nothing that is altered by region or culture settings.
Yep. Breaking that pattern down, it's only numeric properties, there's no reference to anything like month or day names in there.
yyyy - 4 digit date
MM - 2 digit month, with leading zero
dd - 2 digit day, with leading zero
T - a literal T
HH - 2 digit hour, with leading zero, 24 hour format
mm - 2 digit minute, with leading zero
ss - 2 digit second, with leading zero