I have a control file header.cntrl. it has details of header. Example below...
cat header.cntrl
id, name, age, location, phone number
Now I am getting files from different sources,
Source 1 is sending input.dat file in the following format
cat input.dat
id, name, age, location, status, phone number
1,Abc, 34,India, active, 9999999999
Source 2 is sending data in the following format
cat input_2.dat
id, age, name, qualification, status, phone number, location
2,24,xyz, L L B, Active, 88888-88888, India
So different sources are sending files in different formats. We would need to convert those input files to header.cntrl file format.
I was trying this using awk code, but for each source, I'll need to write an awk code. Can we do it with a single script which can be used for any new future source as well?
This reformat_data script can reformat the two "non-standard" input formats and any future source formats. The key idea is to use Perl hashes to store the appropriate headings and only print those that are needed as specified in the header.cntrl file.
cat $* | perl -ne '
BEGIN {
#std_header = ("id","name","age","location","phone number");
print join(",", #std_header), ",\n";
chomp($firstline=<>);
$firstline =~ s/,\s+/,/g;
#inputfile_header=split(/,/, $firstline);
%hash=();
}
chomp;
#row = split(/,/);
$i=0;
for $cell (#row) {
$cell =~ s/\s+//;
$header=$inputfile_header[$i];
$hash{$header} = $row[$i];
$i++;
}
foreach $cell (#std_header) {
print "$hash{$cell},";
}
print "\n";
'
Here are the results of running the reformat_data script using the two sample input files:
cat input.dat
id, name, age, location, status, phone number
1,Abc, 34,India, active, 9999999999
reformat_data input.dat
id,name,age,location,phone number,
1,Abc,34,India,9999999999,
cat input_2.dat
id, age, name, qualification, status, phone number, location
2,24,xyz, L L B, Active, 88888-88888, India
reformat_data input_2.dat
id,name,age,location,phone number,
2,xyz,24,India,88888-88888,
In this particular case you can check the number of fields in lines (provided that all lines of a file have the same number of fields) (awk code):
{
n = split($0, a, "[ \t]*,[ \t]*");
if (n < 7) {
print a[1] ", " a[2] ", " a[3] ", " a[4] ", " a[6];
}
else {
print a[1] ", " a[3] ", " a[2] ", " a[7] ", " a[6];
}
}
A more sophisticated solution is to use the first line as key identifier and take remaining fields "by name":
{
n = split($0, a, "[ \t]*,[ \t]*");
if (FNR == 1) {
for (i = 1; i <= n; ++i) {
lbl[a[i]] = i;
}
}
print a[lbl["id"]] ", " a[lbl["name"]] ", " a[lbl["age"]] ", " a[lbl["location"]] ", " a[lbl["phone number"]];
}
Related
I don't get any error but I was expecting that when I type an integer that is not 1,2,3,4 the code should enter in else statement and print what is in paste0 function. What is wrong?
escolha <- as.integer(readline(prompt="Enter your choice: "))
if(escolha == 1){
print("Cool you choose addition!")
} else if (escolha == 2) {
print("Cool, you choose subtraction!")
} else if (escolha == 3) {
print("Cool, you choose multiplication!")
} else if (escolha == 4){
print("Cool, you choose division!")
} else{
paste0("It's not possible to use ", escolha," as input.")
escolha<- as.integer(readline(prompt="Choose a valid number (1 a 4): "))
}
num1 <- as.double(readline(prompt="What is the first number? "))
num2 <- as.double(readline(prompt="What is the second number? "))
resultado <- switch (escolha, (num1+num2), (num1-num2), (num1*num2), (num1/num2))
cat("The result is: ", resultado)
paste0() (and paste()) assemble a string and return it. You still need to print the result to the screen with print() or cat(), like this:
cat(paste0("It's not possible to use ", escolha," as input.\n"))
(added the \n at the end, so the readline() prompt that follows will be on a separate line)
My program reads a line from a file. This line contains comma-separated text like:
123,test,444,"don't split, this",more test,1
I would like the result of a split to be this:
123
test
444
"don't split, this"
more test
1
If I use the String.split(","), I would get this:
123
test
444
"don't split
this"
more test
1
In other words: The comma in the substring "don't split, this" is not a separator. How to deal with this?
You can try out this regex:
str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
This splits the string on , that is followed by an even number of double quotes. In other words, it splits on comma outside the double quotes. This will work provided you have balanced quotes in your string.
Explanation:
, // Split on comma
(?= // Followed by
(?: // Start a non-capture group
[^"]* // 0 or more non-quote characters
" // 1 quote
[^"]* // 0 or more non-quote characters
" // 1 quote
)* // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
[^"]* // Finally 0 or more non-quotes
$ // Till the end (This is necessary, else every comma will satisfy the condition)
)
You can even type like this in your code, using (?x) modifier with your regex. The modifier ignores any whitespaces in your regex, so it's becomes more easy to read a regex broken into multiple lines like so:
String[] arr = str.split("(?x) " +
", " + // Split on comma
"(?= " + // Followed by
" (?: " + // Start a non-capture group
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" )* " + // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
" [^\"]* " + // Finally 0 or more non-quotes
" $ " + // Till the end (This is necessary, else every comma will satisfy the condition)
") " // End look-ahead
);
Why Split when you can Match?
Resurrecting this question because for some reason, the easy solution wasn't mentioned. Here is our beautifully compact regex:
"[^"]*"|[^,]+
This will match all the desired fragments (see demo).
Explanation
With "[^"]*", we match complete "double-quoted strings"
or |
we match [^,]+ any characters that are not a comma.
A possible refinement is to improve the string side of the alternation to allow the quoted strings to include escaped quotes.
Building upon #zx81's answer, cause matching idea is really nice, I've added Java 9 results call, which returns a Stream. Since OP wanted to use split, I've collected to String[], as split does.
Caution if you have spaces after your comma-separators (a, b, "c,d"). Then you need to change the pattern.
Jshell demo
$ jshell
-> String so = "123,test,444,\"don't split, this\",more test,1";
| Added variable so of type String with initial value "123,test,444,"don't split, this",more test,1"
-> Pattern.compile("\"[^\"]*\"|[^,]+").matcher(so).results();
| Expression value is: java.util.stream.ReferencePipeline$Head#2038ae61
| assigned to temporary variable $68 of type java.util.stream.Stream<MatchResult>
-> $68.map(MatchResult::group).toArray(String[]::new);
| Expression value is: [Ljava.lang.String;#6b09bb57
| assigned to temporary variable $69 of type String[]
-> Arrays.stream($69).forEach(System.out::println);
123
test
444
"don't split, this"
more test
1
Code
String so = "123,test,444,\"don't split, this\",more test,1";
Pattern.compile("\"[^\"]*\"|[^,]+")
.matcher(so)
.results()
.map(MatchResult::group)
.toArray(String[]::new);
Explanation
Regex [^"] matches: a quote, anything but a quote, a quote.
Regex [^"]* matches: a quote, anything but a quote 0 (or more) times , a quote.
That regex needs to go first to "win", otherwise matching anything but a comma 1 or more times - that is: [^,]+ - would "win".
results() requires Java 9 or higher.
It returns Stream<MatchResult>, which I map using group() call and collect to array of Strings. Parameterless toArray() call would return Object[].
You can do this very easily without complex regular expression:
Split on the character ". You get a list of Strings
Process each string in the list: Split every string that is on an even position in the List (starting indexing with zero) on "," (you get a list inside a list), leave every odd positioned string alone (directly putting it in a list inside the list).
Join the list of lists, so you get only a list.
If you want to handle quoting of '"', you have to adapt the algorithm a little bit (joining some parts, you have incorrectly split of, or changing splitting to simple regexp), but the basic structure stays.
So basically it is something like this:
public class SplitTest {
public static void main(String[] args) {
final String splitMe="123,test,444,\"don't split, this\",more test,1";
final String[] splitByQuote=splitMe.split("\"");
final String[][] splitByComma=new String[splitByQuote.length][];
for(int i=0;i<splitByQuote.length;i++) {
String part=splitByQuote[i];
if (i % 2 == 0){
splitByComma[i]=part.split(",");
}else{
splitByComma[i]=new String[1];
splitByComma[i][0]=part;
}
}
for (String parts[] : splitByComma) {
for (String part : parts) {
System.out.println(part);
}
}
}
}
This will be much cleaner with lambdas, promised!
Please see the below code snippet. This code only considers happy flow. Change the according to your requirement
public static String[] splitWithEscape(final String str, char split,
char escapeCharacter) {
final List<String> list = new LinkedList<String>();
char[] cArr = str.toCharArray();
boolean isEscape = false;
StringBuilder sb = new StringBuilder();
for (char c : cArr) {
if (isEscape && c != escapeCharacter) {
sb.append(c);
} else if (c != split && c != escapeCharacter) {
sb.append(c);
} else if (c == escapeCharacter) {
if (!isEscape) {
isEscape = true;
if (sb.length() > 0) {
list.add(sb.toString());
sb = new StringBuilder();
}
} else {
isEscape = false;
}
} else if (c == split) {
list.add(sb.toString());
sb = new StringBuilder();
}
}
if (sb.length() > 0) {
list.add(sb.toString());
}
String[] strArr = new String[list.size()];
return list.toArray(strArr);
}
I am trying to loop through a csv file pushing each column into an array but I am not sure how to do that, I know that the tag {{!COL1}} will give me the data I want but I can't figure out how to save it into a variable I can use to push inside an array.
csvToArray = "CODE:";
csvToArray += "SET !DATASOURCE artist.csv" + "\n";
csvToArray += "SET !ERRORIGNORE YES" + "\n";
csvToArray += "SET !DATASOURCE_LINE {{CSV}}" + "\n";
csvToArray += "SET !VAR1 {{!COL1}}" + "\n";
for(i = 0; i < 10; i++){
iimSet("CSV", i);
iimPlay(csvToArray);
}
This code will allow me to loop through a csv file, and the {{!COL1}} tag gives me the data i want, how do i save this into a varialbe i can use, please someone help cant figure this one out. :(
To get the data of COL1 you will need to set EXTRACT = COL1, then assign your variable to iimGetLastExtract().
Add this to your macro:
csvToArray += "SET !EXTRACT {{!COL1}}";
Then add this to your js file:
var myArray = []; // place at top of file for readability
myArray.push(iimGetLastExtract()); // place at end of for loop
// puts each extraction into myArray
Try the below code block.
function read_file(path) {
var content = '', i = 1, f, res = '',spl;
while (1){
content = res;
if (content != "") {
spl = content.split("|");
//spl[0] will contain the Row1 and Column1 value, spl[1] will contain Row 1 and Column 2 value etc.,
/*Here you can specify your script/conditions
Example:
iimPlay('CODE:TAG POS=1 TYPE=INPUT:TEXT ATTR=NAME:login[username] CONTENT='+spl[0]);
iimPlay('CODE:TAG POS=1 TYPE=INPUT:PASSWORD ATTR=NAME:login[password] CONTENT='+spl[1]);
I used the infinite loop, so in order to come out of the loop, you need to add a condition like below
if (spl[0]=="End") {//In your input sheet if the Row1,Col1 contains the value 'End', then it will exit the loop
iimDisplay("All Rows Completed");
return;
}
*/
f = "CODE: "+"\n";
f += "SET !EXTRACT null" + "\n";
f += "SET !LOOP 3" + "\n";
f += "SET !DATASOURCE \""+path+"\" "+"\n";
f += "SET !DATASOURCE_COLUMNS 5" + "\n"; //Assume that you have five columns in the your input file
f += "SET !DATASOURCE_LINE " + i + "\n";
f += "SET !EXTRACT {{!col1}}|{{!col2}}|{{!col3}}|{{!col4}}|{{!col5}}" + "\n";
iimPlay(f);
res = iimGetLastExtract();
i++;
}
return content;
}
var file_conten = read_file("yourinputfilename.csv");
I have data in format yyyy-MM-dd hh:mm:ss. It is stored as text.
I want a query that matches a given day(yyyy-MM-dd).
E.g
Select * from test where test MATCH '2014-03-30*'
When i try that, it returns all data excluding that of March 2014.
If i try
Select * from test where test MATCH '2014-04-30*'
It returns all data excluding that of April 2014.
I am puzzled!...i am getting the opposite of what i want!
Any reason for the strange behavior?
This is my full code....testing for date pattern
public List <Transactions> GetTransactionsVirtual(String token)
{
List<Transactions> trans = new ArrayList<Transactions>();
SQLiteDatabase db;
String sql= " Select " + MESSAGE + "," + TDATE + ","+ SERVICE_PROVIDER +
" from " + TABLE_NAME_VIRTUAL + " Where "+ TABLE_NAME_VIRTUAL + " MATCH " + "?" +
" Order by " + TDATE + " DESC";
//check entered string...if date string strip it ..
String pattern="^(19|20)\\d\\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$";
String tokenedit=null;
String newtoken=null;
if (token.matches(pattern))
{
tokenedit= token.replace("-", " ");
Log.e("testtoken", tokenedit);
newtoken = tokenedit+"*" ;
}else
{
newtoken= token+ "*";
}
String [] args= new String[]{newtoken};
Log.e("sqlamatch", sql);
db= this.getReadableDatabase();
Cursor c = db.rawQuery(sql, args);
if(c.moveToFirst())
{
do{
Transactions t=new Transactions();
t.setTransactiondate(c.getString(c.getColumnIndex(TDATE)));
t.setMessage(c.getString(c.getColumnIndex(MESSAGE)));
t.setServiceprovider(c.getString(c.getColumnIndex(SERVICE_PROVIDER)));
//Log.e("msg",t.getMessage().toString());
trans.add(t);
}while(c.moveToNext());
}
return trans;
}
The full-text search mechanism is designed for searching words inside texts.
With the default tokenizer, the three fields of a date (yyyy, MM, dd) are parsed as single, independent words, and the delimiters are ignored.
An FTS query like 2014-03-30* searches for documents (=records) that
contain the word 2014, and
do not contain the word 03, and
do not contain any word beginning with 30.
You need to do a phrase search:
SELECT * FROM test WHERE test MATCH '"2014 03 30"'
If all you data has a fixed format, you should not use an FTS table in the first place.
I am trying to read a text file using Dynamics AX. However, the following code replaces any spaces in the lines with commas:
// Open file for read access
myFile = new TextIo(fileName , 'R');
myFile.inFieldDelimiter('\n');
fileRecord = myFile.read();
while (fileRecord)
{
line = con2str(fileRecord);
info(line);
…
I have tried various combinations of the above code, including specifying a blank '' field delimiter, but with the same behaviour.
The following code works, but seems like there should be a better way to do this:
// Open file for read access
myFile = new TextIo(fileName , 'R');
myFile.inRecordDelimiter('\n');
myFile.inFieldDelimiter('_stringnotinfile_');
fileRecord = myFile.read();
while (fileRecord)
{
line = con2str(fileRecord);
info(line);
The format of the file is field format. For example:
DATAFIELD1 DATAFIELD2 DATAFIELD3
DATAFIELD1 DATAFIELD3
DATAFIELD1 DATAFIELD2 DATAFIELD3
So what I end up with unless I use the workaround above is something like:
line=DATAFIELD1,DATAFIELD2,DATAFIELD3
The underlying problem here is that I have mixed input formats. Some of the files just have line feeds {LF} and others have {CR}{LF}. Using my workaround above seems to work for both. Is there a way to deal with both, or to strip \r from the file?
Con2Str:
Con2Str will retrieve a list of values from a container and by default uses comma (,) to separate the values.
client server public static str Con2Str(container c, [str sep])
If no value for the sep parameter is specified, the comma character will be inserted between elements in the returned string.
Possible options:
If you would like the space to be the default separator, you can pass space as the second parameter to the method Con2Str.
One other option is that you can also loop through the container fileRecord to fetch the individual elements.
Code snippet 1:
Below code snippet loads the file contents into textbuffer and replace the carriage returns (\r) with new line (\n) character. The condition if (strlen(line) > 1) will help to skip empty strings due to the possible occurrence of consecutive newline characters.
TextBuffer textBuffer;
str textString;
str clearText;
int newLinePos;
str line;
str field1;
str field2;
str field3;
counter row;
;
textBuffer = new TextBuffer();
textBuffer.fromFile(#"C:\temp\Input.txt");
textString = textBuffer.getText();
clearText = strreplace(textString, '\r', '\n');
row = 0;
while (strlen(clearText) > 0 )
{
row++;
newLinePos = strfind(clearText, '\n', 1, strlen(clearText));
line = (newLinePos == 0 ? clearText : substr(clearText, 1, newLinePos));
if (strlen(line) > 1)
{
field1 = substr(line, 1, 14);
field2 = substr(line, 15, 12);
field3 = substr(line, 27, 10);
info('Row ' + int2str(row) + ', Column 1: ' + field1);
info('Row ' + int2str(row) + ', Column 2: ' + field2);
info('Row ' + int2str(row) + ', Column 3: ' + field3);
}
clearText = (newLinePos == 0 ? '' : substr(clearText, newLinePos + 1, strlen(clearText) - newLinePos));
}
Code snippet 2:
You could use File macro instead of hard coding the values \r\n and R that denotes the read mode.
TextIo inputFile;
container fileRecord;
str line;
str field1;
str field2;
str field3;
counter row;
;
inputFile = new TextIo(#"c:\temp\Input.txt", 'R');
inputFile.inFieldDelimiter("\r\n");
row = 0;
while (inputFile.status() == IO_Status::Ok)
{
row++;
fileRecord = inputFile.read();
line = con2str(fileRecord);
if (line != '')
{
field1 = substr(line, 1, 14);
field2 = substr(line, 15, 12);
field3 = substr(line, 27, 10);
info('Row ' + int2str(row) + ', Column 1: ' + field1);
info('Row ' + int2str(row) + ', Column 2: ' + field2);
info('Row ' + int2str(row) + ', Column 3: ' + field3);
}
}
Never tried to use the default RecordDelimiter as FieldDelimiter and not setting another RecordDelimiter explicitly. Normally rows (Records) are delimited by \n and fields are delimited by comma, tab, semicolon or some other symbol. You might also be hitting some weird behaviour when TextIO is assuming correct UTF-format. You didn't supply an example of some rows from you datafile, so guessing is hard.
Read more about TextIO here: http://msdn.microsoft.com/en-us/library/aa603840.aspx
EDIT:
With the additional example of file content, it seems to me the file is a fixed width file, where each column has its own fixed width. I would rather recommend using subStr if that is the case. Read about substr here: http://msdn.microsoft.com/en-us/library/aa677836.aspx
use StrAlpha to restrict blank values after you convert Con2Str