writing output to a text file R - r

I have below code. It works fine.
But i want to write the text within square braces - ["gaugeid" :
"gauge1234",] to a line in the file. How can I do that?
I also want to write the text within square braces to a line -
["abc" : 5,] where 5 in actual value of variable abc. How could I do
that?
I am confused as my line starts with " and ends with '
abc=5
sink("output.txt")
cat("\n")
cat("abc : ")
#cat(""gaugeid" : "gauge1234",")
sink()

You cannot have naked double-quotes in an R string unless is surrounded by single quotes.
> cat('"gaugeid" : "gauge1234",')
"gaugeid" : "gauge1234",
Or you can escape the double quotes inside your original effort:
> cat("\"gaugeid\" : \"gauge1234\",")
"gaugeid" : "gauge1234",
For the second question is as simple as adding a comma and the variable name which will hten be evaluated before writing to the output device:
> cat("abc : ", abc)
abc : 5

Try:
abc=5
sink("output.txt")
cat("\n")
cat("abc : ")
cat(abc)
cat(",")
sink()
The first cat("abc") is adding the string abc, while the second cat(abc) is adding the variable abc to the output file.

Related

How to remove double quotes(") and new lines in between ," and ", in a unix file

I am getting a comma delimited file with double quotes to string and date fields. we are getting " and new line feeds in string columns like below.
"1234","asdf","with"doublequotes","new line
feed","withmultiple""doublequotes"
want output like
"1234","asdf","withdoublequotes","new linefeed","withmultipledoublequotes"
I have tried
sed 's/\([^",]\)"\([^",]\)/\1\2/g;s/\([^",]\)""/\1"/g;s/""\([^",]\)/"\1/g' < infile > outfile
its removing double quotes in string and removing last double quote like below
"1234","asdf","withdoublequotes","new line
feed","withmultiple"doublequotes
is there a way to remove " and new line feed comes in between ", and ,"
Your substitutions for two consecutive quotes didn't work because they are placed after the substitution for a sole quote, when only one of the two is left.
We could remove " by repeated substitutions (otherwise a quote inserted by the substitution would stay) and new line feed by joining the next input line if the current one's end is no quote:
sed ':1;/[^"]$/{;N;s/\n//;b1;};:0;s/\([^,]\)"\([^,]\)/\1\2/g;t0' <infile >outfile

Parse error when text is split on multi lines

i'm getting a parse error when I split a text line on multiple lines and show the JSON file on screen with the command "jq . words.json".
The JSON file with the text value on a single line looks like this
{
"words" : "one two three four five"
}
The command "jq . words.json" works fine and shows the JSON file on screen.
But when i split the value "one two three four five" on two lines and run the same command I get a parse error
{
"words" : "one two
three four five"
^
}
parse error: Invalid string: control characters from U+0000 through
U+001F must be escaped at line 3, column 20
The parse error points to the " at the end of the third line.
How can i solve this?
Tia,
Anthony
That's because the JSON format is invalid. It should look like this:
{
"words" : "one two \nthree four five"
}
You have to escape end of line in JSON:
{
"words" : "one two\nthree four five"
}
To convert the text with the multi-line string to valid JSON, you could use any-json (https://www.npmjs.com/package/any-json), and pipe that into jq:
$ any-json --input-format=cson split-string.txt
{
"words": "one two three four five"
}
$ any-json --input-format=cson split-string.txt | jq length
1
For more on handling almost-JSON texts, see the jq FAQ: https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json
The parse error points to the " at the end of the third line.
The way jq flags this error may be counterintuitive, but the error in the JSON precedes the indicated quote-mark.
If the error is non-obvious, it may be that an end-quote is missing on the prior key or value. In this case, the value that matches the criteria U+0000 through U+001F could be U+000A, which is the line feed character in ASCII.
In the case of this question, the line feed was inserted intentionally. But, unescaped, this is invalid JSON.
In case it helps somebody, I had this error:
E: parse error: Invalid string: control characters from U+0000 through
U+001F must be escaped at line 3, column 5
jq was parsing the file containing this data, with missing " after "someKey
{
"someKey: {
"someData": "someValue"
}
}

combining strings to one string in r

I'm trying to combine some stings to one. In the end this string should be generated:
//*[#id="coll276"]
So my inner part of the string is an vector: tag <- 'coll276'
I already used the paste() method like this:
paste('//*[#id="',tag,'"]', sep = "")
But my result looks like following: //*[#id=\"coll276\"]
I don't why R is putting some \ into my string, but how can I fix this problem?
Thanks a lot!
tldr: Don't worry about them, they're not really there. It's just something added by print
Those \ are escape characters that tell R to ignore the special properties of the characters that follow them. Look at the output of your paste function:
paste('//*[#id="',tag,'"]', sep = "")
[1] "//*[#id=\"coll276\"]"
You'll see that the output, since it is a string, is enclosed in double quotes "". Normally, the double quotes inside your string would break the string up into two strings with bare code in the middle:
"//*[#id\" coll276 "]"
To prevent this, R "escapes" the quotes in your string so they don't do this. This is just a visual effect. If you write your string to a file, you'll see that those escaping \ aren't actually there:
write(paste('//*[#id="',tag,'"]', sep = ""), 'out.txt')
This is what is in the file:
//*[#id="coll276"]
You can use cat to print the exact value of the string to the console (Thanks #LukeC):
cat(paste('//*[#id="',tag,'"]', sep = ""))
//*[#id="coll276"]
Or use single quotes (if possible):
paste('//*[#id=\'',tag,'\']', sep = "")
[1] "//*[#id='coll276']"

SKIP not working at the beginning of the line

having this
|83.56|
|as.63|
|as.lk|
|as45.34|
as imput in a *.txt file i need to skip the character "|" in the beginning but also at the end of the line, cause the output should be
<83.56> :only numbers
<as.63>
<as.lk> :only letters
<as45.34> :numers/letters together
i got this as my code declaration
whitout "|" at the beggining
and nothing appears as result, this is strange cause if i put the character "|" by this way the result its almost the expected one, its this
<|83.56> :only numbers
<|as.63>
<|as.lk> :only letters
<|as45.34> :numers/letters together
so the matter is that the "|" of the end of the line it´s being skipped propertly, but the one at the beginning don´t
note: I have also declared at the beginning numeros and letras_minusculas, by this way
TOKEN:{<Numeros:["0"-"9"]>}
TOKEN:{<Letras_minusculas:["a"-"z"]>}
JavaCC has provided a skip section where the un-necessary characters are skipped.
Please find the example skip block below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>))+ >
}
sample inputs:
|83.56| --> NUMERIC without "|"
|as.63| --> ALPHA-NUM without "|"
|as.lk| --> ALPHA without "|"
|as45.34| --> ALPHA-NUM without "|"
note the skip will not skip the characters when any possible match already started.
Eg : |83|3.56|
The character "83" will be started to match NUMERIC and ALPHA-NUM so the next character cannot be skipped. Here in our case causes the error, Because all the possible match is not accommodating the "|" symbol.
If we changed the rule like below.
SKIP : {
"|"
}
TOKEN : {
<NUMERIC: ["0"-"9","."]+ >
<ALPHA : ["a"-"b","."]+ >
<ALPHA-NUM : ((NUMERIC>|<ALPHA>|"|"))+ >
}
Then the input matches to ALPHA-NUM including "|" i.e.: 83|3.56

Removing punctuations from text using R

I need to remove punctuation from the text. I am using tm package but the catch is :
eg: the text is something like this:
data <- "I am a, new comer","to r,"please help","me:out","here"
now when I run
library(tm)
data<-removePunctuation(data)
in my code, the result is :
I am a new comerto rplease helpmeouthere
but what I expect is:
I am a new comer to r please help me out here
Here's how I take your question, and an answer that is very close to #David Arenburg's in the comment above.
data <- '"I am a, new comer","to r,"please help","me:out","here"'
gsub('[[:punct:] ]+',' ',data)
[1] " I am a new comer to r please help me out here "
The extra space after [:punct:] is to add spaces to the string and the + matches one or more sequential items in the regular expression. This has the side effect, desirable in some cases, of shortening any sequence of spaces to a single space.
If you had something like
string <- "hello,you"
> string
[1] "hello,you"
You could do this:
> gsub(",", "", string)
[1] "helloyou"
It replaces the "," with "" in the variable called string

Resources