SKIP not working at the beginning of the line - javacc

having this
|83.56|
|as.63|
|as.lk|
|as45.34|
as imput in a *.txt file i need to skip the character "|" in the beginning but also at the end of the line, cause the output should be
<83.56> :only numbers
<as.63>
<as.lk> :only letters
<as45.34> :numers/letters together
i got this as my code declaration
whitout "|" at the beggining
and nothing appears as result, this is strange cause if i put the character "|" by this way the result its almost the expected one, its this
<|83.56> :only numbers
<|as.63>
<|as.lk> :only letters
<|as45.34> :numers/letters together
so the matter is that the "|" of the end of the line it´s being skipped propertly, but the one at the beginning don´t
note: I have also declared at the beginning numeros and letras_minusculas, by this way
TOKEN:{<Numeros:["0"-"9"]>}
TOKEN:{<Letras_minusculas:["a"-"z"]>}

JavaCC has provided a skip section where the un-necessary characters are skipped.
Please find the example skip block below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>))+ >
}
sample inputs:
|83.56| --> NUMERIC without "|"
|as.63| --> ALPHA-NUM without "|"
|as.lk| --> ALPHA without "|"
|as45.34| --> ALPHA-NUM without "|"
note the skip will not skip the characters when any possible match already started.
Eg : |83|3.56|
The character "83" will be started to match NUMERIC and ALPHA-NUM so the next character cannot be skipped. Here in our case causes the error, Because all the possible match is not accommodating the "|" symbol.
If we changed the rule like below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>|"|"))+ >
}
Then the input matches to ALPHA-NUM including "|" i.e.: 83|3.56

Related

Parse error when text is split on multi lines

i'm getting a parse error when I split a text line on multiple lines and show the JSON file on screen with the command "jq . words.json".
The JSON file with the text value on a single line looks like this
{
"words" : "one two three four five"
}
The command "jq . words.json" works fine and shows the JSON file on screen.
But when i split the value "one two three four five" on two lines and run the same command I get a parse error
{
"words" : "one two
three four five"
^
}
parse error: Invalid string: control characters from U+0000 through
U+001F must be escaped at line 3, column 20
The parse error points to the " at the end of the third line.
How can i solve this?
Tia,
Anthony
That's because the JSON format is invalid. It should look like this:
{
"words" : "one two \nthree four five"
}
You have to escape end of line in JSON:
{
"words" : "one two\nthree four five"
}
To convert the text with the multi-line string to valid JSON, you could use any-json (https://www.npmjs.com/package/any-json), and pipe that into jq:
$ any-json --input-format=cson split-string.txt
{
"words": "one two three four five"
}
$ any-json --input-format=cson split-string.txt | jq length
1
For more on handling almost-JSON texts, see the jq FAQ: https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
The parse error points to the " at the end of the third line.
The way jq flags this error may be counterintuitive, but the error in the JSON precedes the indicated quote-mark.
If the error is non-obvious, it may be that an end-quote is missing on the prior key or value. In this case, the value that matches the criteria U+0000 through U+001F could be U+000A, which is the line feed character in ASCII.
In the case of this question, the line feed was inserted intentionally. But, unescaped, this is invalid JSON.
In case it helps somebody, I had this error:
E: parse error: Invalid string: control characters from U+0000 through
U+001F must be escaped at line 3, column 5
jq was parsing the file containing this data, with missing " after "someKey
{
"someKey: {
"someData": "someValue"
}
}

unix shell scripting to find and remove unwanted string in a pipe delimited file in a particular column

{
I have a requirement, where the file is pipe "|" delimited.
The first row contains the headers, and the count of columns is 5.
I have to delete only the string in the 3rd column if it matches the pattern.
Also note the 3rd column can contain strings with commas ,, semicolon ; or colon : but it will never contain a pipe | (due to which we have chosen a pipe delimiter).
Input File:
COL1|COL2|COL3|COL4|COL5
1|CRIC|IPL|CRIC1:IPL_M1;IPL_M2;TEST_M1,CRIC2:ODI_M1;IPL_M3|C1|D1
2|CRIC|TEST|CRIC1:TEST_M2,CRIC2:ODI_M1;IPL_M1;TEST_M2;IPL_M3;T20_M1|C2|D2
Output should change only in COL3 no other columns should be changed, i.e. in COL3 the string which matches the pattern 'IPL_' should be present.
Any other strings like "TEST_M1","ODI_M1" should be made null.
And any unwanted semi colons should be removed.
eg
Question - CRIC1:IPL_M1;IPL_M2;TEST_M1,CRIC2:ODI_M1;IPL_M3
result - CRIC1:IPL_M1;IPL_M2,CRIC2:IPL_M3
Another scenario where if only strings that do not match "IPL_" are present then
Question - CRIC1:TEST_M1,CRIC2:ODI_M1
Result - CRIC1:,CRIC2:
Output File:
COL1|COL2|COL3|COL4|COL5
1|CRIC|IPL|CRIC1:IPL_M1;IPL_M2,CRIC2:IPL_M3|C1|D1
2|CRIC|TEST|CRIC1:,CRIC2:IPL_M1;IPL_M3|C2|D2
Basic requirement is to find and replace the string,
INPUT
COL1|COL2|COL3|COL4|COL5
1|A1|A12|A13|A14|A15
Replace A13 with B13 in column 3 (A13 can change, I mean we have to find any pattern like A13)
OUTPUT
COL1|COL2|COL3|COL4|COL5
1|A1|A12|B13|A14|A15
Thanks in advance.
Re formatting the scenario in simpler terms,by taking only 2 columns, where I need to search "IPL_" and keep only those strings and any other string like "ODI_M3;TEST_M5" should be deleted
{
I/P:
{
COL1|COL2
CRIC1|IPL_M1;IPL_M2;TEST_M1
CRIC2|ODI_M1;IPL_M3
CRIC3|ODI_M3;TEST_M5
CRIC4|IPL_M5;ODI_M5;IPL_M6
}
O/P:
{
COL1|COL2
CRIC1|IPL_M1;IPL_M2
CRIC2|IPL_M3
CRIC3|
CRIC4|IPL_M5;IPL_M6
}
Awaiting your precious suggestions.
Please help I'm new to this platform.
Thanks,
Saquib
}
If I'm reading this correctly (and I'm not entirely sure I am; I'm going mostly by the provided examples), then this could be done relatively sanely with Perl:
#!/usr/bin/perl
while(<>) {
if($. > 1) {
local #F = split /\|/;
$F[3] = join(",", map {
local #H = split /:/;
$H[1] = join(";", grep(/IPL_/, split(";", $H[1])));
join ":", #H;
} split(/,/, $F[3]));
$_ = join "|", #F;
}
print;
}
Put this code into a file, say foo.pl, then if your data is in a file data.txt you can run
perl -f foo.pl data.txt
This works as follows:
#!/usr/bin/perl
# Read lines from input (in our case: data.txt)
while(<>) {
# In all except the first line (the header line):
if($. > 1) {
# Apply the transformation. To do this, first split the line into fields
local #F = split /\|/;
# Then edit the third field. This has to be read right-to-left at the top
# level, which is to say: first the field is split along commas, then the
# tokens are mapped according to the code in the inner block, then they
# are joined with commas between them again.
$F[3] = join(",", map {
# the map block does a similar thing. The inner tokens (e.g.,
# "CRIC1:IPL_M1;IPL_M2") are split at the colon into the CRIC# part
# (which is to be unchanged) and the value list we want to edit.
local #H = split /:/;
# This value list is again split along semicolons, filtered so that
# only those elements that match /IPL_/ remain, and then joined with
# semicolons again.
$H[1] = join(";", grep(/IPL_/, split(";", $H[1])));
# The map result is the CRIC# part joined to the edited list with a colon.
join ":", #H;
} split(/,/, $F[3]));
# When all is done, rejoin the outermost fields with pipe characters
$_ = join "|", #F;
}
# and print the result.
print;
}

writing output to a text file R

I have below code. It works fine.
But i want to write the text within square braces - ["gaugeid" :
"gauge1234",] to a line in the file. How can I do that?
I also want to write the text within square braces to a line -
["abc" : 5,] where 5 in actual value of variable abc. How could I do
that?
I am confused as my line starts with " and ends with '
abc=5
sink("output.txt")
cat("\n")
cat("abc : ")
#cat(""gaugeid" : "gauge1234",")
sink()
You cannot have naked double-quotes in an R string unless is surrounded by single quotes.
> cat('"gaugeid" : "gauge1234",')
"gaugeid" : "gauge1234",
Or you can escape the double quotes inside your original effort:
> cat("\"gaugeid\" : \"gauge1234\",")
"gaugeid" : "gauge1234",
For the second question is as simple as adding a comma and the variable name which will hten be evaluated before writing to the output device:
> cat("abc : ", abc)
abc : 5
Try:
abc=5
sink("output.txt")
cat("\n")
cat("abc : ")
cat(abc)
cat(",")
sink()
The first cat("abc") is adding the string abc, while the second cat(abc) is adding the variable abc to the output file.

Split a string by a plus sign (+) character

I have a string in a data frame as: "(1)+(2)"
I want to split with delimiter "+" such that I get one element as (1) and other as (2), hence preserving the parentheses. I used strsplit but it does not preserve the parenthesis.
Use
strsplit("(1)+(2)", "\\+")
or
strsplit("(1)+(2)", "+", fixed = TRUE)
The idea of using strsplit("(1)+(2)", "+") doesn't work since unless specified otherwise, the split argument is a regular expression, and the + character is special in regex. Other characters that also need extra care are
?
*
.
^
$
\
|
{ }
[ ]
( )
Below Worked for me:
import re
re.split('\\+', 'ABC+CDE')
Output:
['ABC', 'CDE']

Lexical error at line 0, column 0

In the grammar below, I am trying configure any line that starts with ' as a single line comment and anything betweeen /' Multiline Comment '/. The single line comment works ok. But for some reason as soon as I press / or ' or ';' or < or '>' I get the error below. I don't have above characters configured. Shouldn't they be considered default and skip parsing ?
Error
Lexical error at line 0, column 0. Encountered: "\"" (34), after : ""
Lexical error at line 0, column 0. Encountered: ">" (62), after : ""
Lexical error at line 0, column 0. Encountered: "\n" (10), after : "-"
I have only included part of the code below for conciseness. For full Lexer definition please visit the link
TOKEN :
{
< WHITESPACE:
" "
| "\t"
| "\n"
| "\r"
| "\f">
}
/* COMMENTS */
MORE :
{
<"/'"> { input_stream.backup(1); } : IN_MULTI_LINE_COMMENT
}
<IN_MULTI_LINE_COMMENT>
TOKEN :
{
<MULTI_LINE_COMMENT: "'/" > : DEFAULT
}
<IN_MULTI_LINE_COMMENT>
MORE :
{
< ~[] >
}
TOKEN :
{
<SINGLE_LINE_COMMENT: "'" (~["\n", "\r"])* ("\n" | "\r" | "\r\n")?>
}
I can't reproduce every aspect of your problem. You say there is an error "as soon as" you enter certain characters. Here is what I get.
/ There is no error unless the next character is not a '. If the next character is not ', there is an error.
' I see no error. This is correctly treated as the start of comment
; There is always an error. No token can start with ;.
< There only an error if the next characters are not - or <-.
> There always is an error. No token can start with >
I'm not exactly sure why you would expect these not to be errors, since your lexer has no rules to cover these cases. Generally when there is no rule to match a prefix of the input and the input is not exhausted, there will be a TokenMgrError thrown.
If you want to eliminate all these TokenMgrErrors, make a catch-all rule (as explained in the FAQ):
TOKEN: { <UNEXPECTED_CHARACTER: ~[] > }
Make sure this is the very last rule in the .jj file. This rule says that, when no other rule applies, the next character is treated as an UNEXPECTED_CHARACTER token. Of course this just boots the problem up to the parsing level. If you really want the tokenizer to skip all characters that don't belong, just use the following rule as the very last rule:
SKIP : { < ~[] > }
For most languages, that would be an odd thing to do, which is why it is not the default.

Resources