R: dealing with " " symbols - r

I am using the R programming language. I am copying text data from a website that contains many quotation marks, i.e. "" . When I try to create a data frame that contains this text, I will get an error because of conflicting "" symbols.
For example:
a <- " "blah" blah blah"
Error: unexpected symbol in "a <- " "blah"
Normally, I would have tried to use the gsub() function to remove these quotation marks from the data frame, but I can not even create the data frame to begin with. Of course, I could bring this text into a word processing software and click " ctrl + H" to replace all quotation marks ("") with an empty space (). But is there a way to do this in R itself?
Thanks

The typical way you would handle this would be to escape the literal double quotes with backslash:
a <- " \"blah\" blah blah"
[1] " \"blah\" blah blah"
You could also wrap your string literal inside single quotes and then not even have to escape the double quotes:
a <- ' \"blah\" blah blah'
[1] " \"blah\" blah blah"

Related

In R, what is the regex for removing parentheses with a specific word at the start, which can also sometimes have nested parentheses within them?

I really suck with regex so apologies in advance.
I know that if I do
StringFeat = "I am a feat with (feat including feat feet))"
removefeatTest = stringr::str_replace_all(StringFeat, "\\(.*?\\)", "")
this produces "I am a feat with )". I would like it to be "I am a feat with " instead.
Another example input string is "(feat. The D.O.C., Dr. Rock, Fresh K (Fila Fresh Crew)) blah blah blah", I would like the output to be " blah blah blah".
At the same time, hopefully this regex can also handle parentheses without nested parentheses, like "(feat. The D.O.C.)". But, I would like to specifically look out for "(feat" so that another parentheses like "(this" should not be triggered by the regex.
Is this feasible with regex?
Thank you in advance.
Here you have 2 closing brackets. The code you are using will remove only one. You can use something like:
removefeatTest = stringr::str_replace_all(StringFeat, "\\(.
*?\\)+", "")
which also works with the second example you mentioned:
StringFeat = "(feat. The D.O.C., Dr. Rock, Fresh K (Fila Fresh Crew)) blah blah blah"
stringr::str_replace_all(StringFeat, "\\(.*?\\)+", "")
[1] " blah blah blah"
and you can target expression starting with "(feat" using:
stringr::str_replace_all(StringFeat, "^\\(feat.*?\\)+", "")
[1] " blah blah blah"

Remove double quotes from text in r

I want to eliminate double quotes from text in R. Is there a better way to do it?
I tried below code but it's still not removing double quotes:
gsub("\"", "", a$answer)
The problem with what you tried is that you want the regular expression (i.e. pattern) to be \", but backslashes are special to R, so you need to write it twice in R so it ends up as a single backslash in the pattern.
For example,
withquotes <- ' this is a double quote: " '
gsub('\\"', "gone!", withquotes)
# [1] " this is a double quote: gone! "
We can also do this without escaping the double quotes
gsub('"', "gone!", withquotes)
#[1] " this is a double quote: gone! "
data
withquotes <- ' this is a double quote: " '

combining strings to one string in r

I'm trying to combine some stings to one. In the end this string should be generated:
//*[#id="coll276"]
So my inner part of the string is an vector: tag <- 'coll276'
I already used the paste() method like this:
paste('//*[#id="',tag,'"]', sep = "")
But my result looks like following: //*[#id=\"coll276\"]
I don't why R is putting some \ into my string, but how can I fix this problem?
Thanks a lot!
tldr: Don't worry about them, they're not really there. It's just something added by print
Those \ are escape characters that tell R to ignore the special properties of the characters that follow them. Look at the output of your paste function:
paste('//*[#id="',tag,'"]', sep = "")
[1] "//*[#id=\"coll276\"]"
You'll see that the output, since it is a string, is enclosed in double quotes "". Normally, the double quotes inside your string would break the string up into two strings with bare code in the middle:
"//*[#id\" coll276 "]"
To prevent this, R "escapes" the quotes in your string so they don't do this. This is just a visual effect. If you write your string to a file, you'll see that those escaping \ aren't actually there:
write(paste('//*[#id="',tag,'"]', sep = ""), 'out.txt')
This is what is in the file:
//*[#id="coll276"]
You can use cat to print the exact value of the string to the console (Thanks #LukeC):
cat(paste('//*[#id="',tag,'"]', sep = ""))
//*[#id="coll276"]
Or use single quotes (if possible):
paste('//*[#id=\'',tag,'\']', sep = "")
[1] "//*[#id='coll276']"

How can I delete every single letter of a row after a certain character in R?

I am having a problem doing a cleaning of transactions. I have an excel with every single transaction that clients do, with the number, the gloss and the code of the industry. I convert this excel in text separated by ";" then I only need to clean the gloss and convert it back again into an excel.
tolower(tabla1)
lapply(tabla1, tolower)
tabla1[] <- lapply(tabla1, tolower)
str(tabla1)
tabla1
tabla1_texto <- gsub("[.]", "", tabla1)
table1_texto <- gsub("[(]", " ", tabla1_texto)
I know that I need to use gsub() but I'm not sure how to use it, in other hand, someone know how to do a correct dictionary and only keep certain words and delete every other word?
If you have a string like this one:
string <- "Some text here; and some text here; and some more text here"
Then you can delete everything after the first ; with:
gsub(";.*$", "", string)
[1] "Some text here"
Explanation of ;,*$ which you will be substituting for "" (empty string):
starting with ;
any character . zero or more times *
up until the end of the line $
If you have a table - you will have to do this for every row separately.

Escape character for " in ValidationExpression in ASP.NET

I am using regular expression to filter the invalid input entered by the end user.
The acceptable input is word, space, digital and . / # , # & $ _ : ? ' % ! – ~ " | + ; ” { } - \.
Below is my code.
<asp:RegularExpressionValidator ID="rgVEditTB1" runat="server" ControlToValidate="txtEditTB1"
ValidationExpression="^[\w\s\d\-\.\/\#\,\#\&\$\:\?\"\'\%\!\–\~\|\+\;\”\{\}\-\\]+$" ErrorMessage="Invalid Special Character" />
However, I am encountering problem to escape " in the ValidataionExpression, it errors out with
Server Tag is not well formed error.
I tried to change the escape character to:
\""
\"
""
It also gives me the same error.
What should be the correct escape character to put in the ValidationExpression?
You should be able to pass in the HTML encoding values. So, passing " would be like passing ". Something like this: ValidationExpression="^[^"]+$". In this regex I am saying: Match any character from the beginning till the end of the string which is not a quotation mark (").
The same applies to the other special symbols. You can take a look here for more encoding values.

Resources