Parse error when text is split on multi lines - jq

i'm getting a parse error when I split a text line on multiple lines and show the JSON file on screen with the command "jq . words.json".
The JSON file with the text value on a single line looks like this
{
"words" : "one two three four five"
}
The command "jq . words.json" works fine and shows the JSON file on screen.
But when i split the value "one two three four five" on two lines and run the same command I get a parse error
{
"words" : "one two
three four five"
^
}
parse error: Invalid string: control characters from U+0000 through
U+001F must be escaped at line 3, column 20
The parse error points to the " at the end of the third line.
How can i solve this?
Tia,
Anthony

That's because the JSON format is invalid. It should look like this:
{
"words" : "one two \nthree four five"
}

You have to escape end of line in JSON:
{
"words" : "one two\nthree four five"
}

To convert the text with the multi-line string to valid JSON, you could use any-json (https://www.npmjs.com/package/any-json), and pipe that into jq:
$ any-json --input-format=cson split-string.txt
{
"words": "one two three four five"
}
$ any-json --input-format=cson split-string.txt | jq length
1
For more on handling almost-JSON texts, see the jq FAQ: https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json

The parse error points to the " at the end of the third line.
The way jq flags this error may be counterintuitive, but the error in the JSON precedes the indicated quote-mark.
If the error is non-obvious, it may be that an end-quote is missing on the prior key or value. In this case, the value that matches the criteria U+0000 through U+001F could be U+000A, which is the line feed character in ASCII.
In the case of this question, the line feed was inserted intentionally. But, unescaped, this is invalid JSON.

In case it helps somebody, I had this error:
E: parse error: Invalid string: control characters from U+0000 through
U+001F must be escaped at line 3, column 5
jq was parsing the file containing this data, with missing " after "someKey
{
"someKey: {
"someData": "someValue"
}
}

Related

Matching dict key to text file and returning Test Pass/fail

I'm a novice at Python, and am currently working on a small test case assignment where I am to find and match the dictionary keys to a small text file, and see if the keys are present in the text file.
As follows, the dictionary goes:
dict = {"description, translation": "test_translation(serial,",
"unit": "test_unit(",}
The text in text file, henceforth called "requirement.txt" as follows:
The description shall display the translation of XXX.
The unit shall be hidden.
The value is read from the file "version.txt".
To the key, I am to find and match if they are present or absent - a match should return a "test pass", no match would return a skip.
Keys from dictionary are to be sorted to a list, then iterated and matched to text. (Values from dictionary are to be sorted to a seperate list and iterated over a seperate file, to which I shall not delve into it here.)
This is the code that I currently have (and stuck):
list = sorted(key_words.keys(), key=lambda d: d[0])
with open('C:/Users-------/requirement.txt', 'r') as outfile:
lines = outfile.readlines()
for line in lines:
line = line.strip()
if line == '':
continue
line_strings = line.split(' ')
for word in list:
if word in line:
print("Test Pass")
print(word)
break
else:
print("Test Fail")
print(line + "\n")
Result currently obtained:
Test Fail
Test Pass
display
The description shall display the translation of XXX.
Test Fail
Test Fail
Test Fail
Test Pass
unit
The unit shall be hidden.
Test Fail
Test Fail
Test Fail
Test Fail
The value is read from the file "version.txt".
Using the current code which I have, (and I am stuck), running the code returned multiple times of "Test pass" and "Test fail", suggesting that the keys are iterated multiple times over each line and the results returned for each multiple iteration.
I am stuck at two fronts:
After seperating the key into a list, how to order them in the sequence of "description, translation", "unit)?
How to modify the code so as to ensure that result is returned once as "Test pass" or "test fail"
Results should ideally return in the following format:
Ideal outcome:
('Text:', "The description shall display the translation of XXX.
('Key:', 'description, translation')
Test Pass
('Text:', 'The unit shall be hidden.')
('Key:', 'unit')
Test Pass
('Text:', 'The value is read from the file "version.txt".')
('Key:', (none))
Test Fail
For your kind enlightenment please, thank you!
Try with this:
list = sorted(key_words.keys(), key=lambda d: d[0])
with open('C:/Users-------/requirement.txt', 'r') as outfile:
lines = outfile.readlines()
for line in lines:
line = line.strip()
if line == '':
continue
# Create an empty list which will contain all the word that match
words_found = []
for word in list:
# if the word match then add it to the list words_found
if word in line:
words_found.append(word)
print("(\'Text:\',\"{}\"")' ".format(line))
print("(\'Keys:\',\"{}\"")' ".format(words_found))
# if the list of words found it's not empty then the test passed
if(words_found):
print("Test Passed")
else:
print("Test Failed")
the idea is to create a list of the words founds and then print them all
I'm using the format Operation and you can find a guide on how to use it here. And the line if(words_found): check if the list is empty.
Additional Notes
In this case, you won't need it but if you wanted to solve only the second point you can use the for else statement as explained in the docs
4.4 break and continue Statements, and else Clauses on Loops
Loop statements may have an else clause; it is executed when the loop terminates through exhaustion of the list (with for) or when the condition becomes false (with while), but not when the loop is terminated by a break statement.
Reducing by one tab the indentation the else of your if statement it became the else of the for statement so it will be executed only if the for never had a break the problem is solved.
list = sorted(key_words.keys(), key=lambda d: d[0])
with open('C:/Users-------/requirement.txt', 'r') as outfile:
lines = outfile.readlines()
for line in lines:
line = line.strip()
if line == '':
continue
line_strings = line.split(' ')
for word in list:
if word in line:
print(word)
print("Test Pass")
break
else:
print("Test Fail")
print(line + "\n")
Edit
To split the key into description and translation we just have to split the two word at the comma with the builtin function split
list = sorted(key_words.keys(), key=lambda d: d[0])
with open('C:/Users-------/requirement.txt', 'r') as outfile:
lines = outfile.readlines()
for line in lines:
line = line.strip()
if line == '':
continue
# Create an empty list which will contain all the word that match
words_found = []
for word in list:
description, translation = word.split(",")
# if the word match then add it to the list words_found
if description in line:
words_found.append(description)
print("(\'Text:\',\"{}\"")' ".format(line))
print("(\'Keys:\',\"{}\"")' ".format(words_found))
# if the list of words found it's not empty then the test passed
if(words_found):
print("Test Passed")
else:
print("Test Failed")

SKIP not working at the beginning of the line

having this
|83.56|
|as.63|
|as.lk|
|as45.34|
as imput in a *.txt file i need to skip the character "|" in the beginning but also at the end of the line, cause the output should be
<83.56> :only numbers
<as.63>
<as.lk> :only letters
<as45.34> :numers/letters together
i got this as my code declaration
whitout "|" at the beggining
and nothing appears as result, this is strange cause if i put the character "|" by this way the result its almost the expected one, its this
<|83.56> :only numbers
<|as.63>
<|as.lk> :only letters
<|as45.34> :numers/letters together
so the matter is that the "|" of the end of the line it´s being skipped propertly, but the one at the beginning don´t
note: I have also declared at the beginning numeros and letras_minusculas, by this way
TOKEN:{<Numeros:["0"-"9"]>}
TOKEN:{<Letras_minusculas:["a"-"z"]>}
JavaCC has provided a skip section where the un-necessary characters are skipped.
Please find the example skip block below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>))+ >
}
sample inputs:
|83.56| --> NUMERIC without "|"
|as.63| --> ALPHA-NUM without "|"
|as.lk| --> ALPHA without "|"
|as45.34| --> ALPHA-NUM without "|"
note the skip will not skip the characters when any possible match already started.
Eg : |83|3.56|
The character "83" will be started to match NUMERIC and ALPHA-NUM so the next character cannot be skipped. Here in our case causes the error, Because all the possible match is not accommodating the "|" symbol.
If we changed the rule like below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>|"|"))+ >
}
Then the input matches to ALPHA-NUM including "|" i.e.: 83|3.56

writing output to a text file R

I have below code. It works fine.
But i want to write the text within square braces - ["gaugeid" :
"gauge1234",] to a line in the file. How can I do that?
I also want to write the text within square braces to a line -
["abc" : 5,] where 5 in actual value of variable abc. How could I do
that?
I am confused as my line starts with " and ends with '
abc=5
sink("output.txt")
cat("\n")
cat("abc : ")
#cat(""gaugeid" : "gauge1234",")
sink()
You cannot have naked double-quotes in an R string unless is surrounded by single quotes.
> cat('"gaugeid" : "gauge1234",')
"gaugeid" : "gauge1234",
Or you can escape the double quotes inside your original effort:
> cat("\"gaugeid\" : \"gauge1234\",")
"gaugeid" : "gauge1234",
For the second question is as simple as adding a comma and the variable name which will hten be evaluated before writing to the output device:
> cat("abc : ", abc)
abc : 5
Try:
abc=5
sink("output.txt")
cat("\n")
cat("abc : ")
cat(abc)
cat(",")
sink()
The first cat("abc") is adding the string abc, while the second cat(abc) is adding the variable abc to the output file.

Split line into multiple lines of 42 Unix after last given char

I have a text file in unix formed from multiple long lines
ALTER Tit como(titel('42423432;434235111;757567562;2354679;5543534;6547673;32322332;54545453'))
ALTER Mit como(Alt('432322;434434211;754324237562;2354679;5543534;6547673;32322332;54545453'))
I need to split each line in multiple lines of no longer than 42 characters.
The split should be done at the end of last ";", and
so my ideal output file will be :
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
I used fold -w 42 givenfile.txt | sed 's/ $/ -/g'
it splits the line but doesnt add the "-" at the end of the line and doesnt split after the ";".
any help is much appreciated.
Thanks !
awk -F';' '
w{
print""
}
{
w=length($1)
printf "%s",$1
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
}
END{
print""
}
' file
This produces the output:
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
How it works
Awk implicitly loops through each line of its input and each line is divided into fields. This code uses a single variable w to keep track of the current width of the output line.
-F';'
Tell awk to break fields on semicolons.
`w{print""}
If the last line was not completed, w>0, then print a newline to terminate it before we start with a new line.
w=length($1); printf "%s",$1
Print the first field of the new line and set w according to its length.
Loop over the remaining fields:
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
This loops over the second to final fields of this line. Whenever we reach the point where we can't print another field without exceeding the 42 character limit, we print ; -\n.
END{print""}
Print a newline at the end of the file.
This might work for you (GNU sed):
sed -r 's/.{1,42}$|.{1,41};/& -\n/g;s/...$//' file
This globally replaces 1 to 41 characters followed by a ; or 1 to 42 characters followed by end of line with -\n. The last string will have three characters too many and so they are deleted.

Lexical error at line 0, column 0

In the grammar below, I am trying configure any line that starts with ' as a single line comment and anything betweeen /' Multiline Comment '/. The single line comment works ok. But for some reason as soon as I press / or ' or ';' or < or '>' I get the error below. I don't have above characters configured. Shouldn't they be considered default and skip parsing ?
Error
Lexical error at line 0, column 0. Encountered: "\"" (34), after : ""
Lexical error at line 0, column 0. Encountered: ">" (62), after : ""
Lexical error at line 0, column 0. Encountered: "\n" (10), after : "-"
I have only included part of the code below for conciseness. For full Lexer definition please visit the link
TOKEN :
{
< WHITESPACE:
" "
| "\t"
| "\n"
| "\r"
| "\f">
}
/* COMMENTS */
MORE :
{
<"/'"> { input_stream.backup(1); } : IN_MULTI_LINE_COMMENT
}
<IN_MULTI_LINE_COMMENT>
TOKEN :
{
<MULTI_LINE_COMMENT: "'/" > : DEFAULT
}
<IN_MULTI_LINE_COMMENT>
MORE :
{
< ~[] >
}
TOKEN :
{
<SINGLE_LINE_COMMENT: "'" (~["\n", "\r"])* ("\n" | "\r" | "\r\n")?>
}
I can't reproduce every aspect of your problem. You say there is an error "as soon as" you enter certain characters. Here is what I get.
/ There is no error unless the next character is not a '. If the next character is not ', there is an error.
' I see no error. This is correctly treated as the start of comment
; There is always an error. No token can start with ;.
< There only an error if the next characters are not - or <-.
> There always is an error. No token can start with >
I'm not exactly sure why you would expect these not to be errors, since your lexer has no rules to cover these cases. Generally when there is no rule to match a prefix of the input and the input is not exhausted, there will be a TokenMgrError thrown.
If you want to eliminate all these TokenMgrErrors, make a catch-all rule (as explained in the FAQ):
TOKEN: { <UNEXPECTED_CHARACTER: ~[] > }
Make sure this is the very last rule in the .jj file. This rule says that, when no other rule applies, the next character is treated as an UNEXPECTED_CHARACTER token. Of course this just boots the problem up to the parsing level. If you really want the tokenizer to skip all characters that don't belong, just use the following rule as the very last rule:
SKIP : { < ~[] > }
For most languages, that would be an odd thing to do, which is why it is not the default.

Resources