I have a dataset called Price which is supposed to be numeric but is generated as a string because all 5 is replaced by +.
It looks like this:
"99000" "98300" "98300" "98290" "98310" " 9831+ " "98310" " 9830+ " " 9830+ " " 9830+ " " 9829+ " " 9828+ " " 9827+ " "98270"
I used the gsub function in R to try and replace + by 5. The code I wrote is:
finalPrice<-gsub("+",5,Price)
However, the output is just a bunch of numbers which doesn't make sense for what I intended:
"59595050505,5 59585350505,5 59585350505,5 59585259505,5 59585351505,5 5 5 595853515+5 5,5 59585351505,5 5 5 595853505+5 5,5 5 5 595853505+5
How can I fix this?
The + sign should be escaped. Try this:
finalPrice<-gsub("\\+",5, Price)
Besides using double-escapes to force a literal-x to be matched by the pattern argument, you can also use either the fixed=TRUE parameter or use a character-class defined by the "[.]"-operation. See the ?regex page for more details:
> gsub("+", "5", txt, fixed=TRUE)
[1] "99000" "98300" "98300" "98290" "98310"
[6] " 98315 " "98310" " 98305 " " 98305 " " 98305 "
[11] " 98295 " " 98285 " " 98275 " "98270"
> gsub("[+]", "5", txt)
[1] "99000" "98300" "98300" "98290" "98310"
[6] " 98315 " "98310" " 98305 " " 98305 " " 98305 "
[11] " 98295 " " 98285 " " 98275 " "98270"
When writing regex, + means match the preceeding group one or more times. As the preceeding character is in your regex before the + is empty, gsub matches every empty string in the target.
The result is that 5 is inserted into each of these positions.
To avoid this, escape the +, which needs to be done with double backslash in R:
finalPrice<-gsub("\\+",5,Price)
Related
I donot understand what I am doing wrong.
I have a dataframe and one of the variables looks like this.
ss <- c("F00020 " , "F13975 " , "F13976 " , "F15334 " , "F12490 " , "F09787 " , "F14675 " ,
"F12129 " , "F04641 " , "F04680 " , "F04715 " , "F04753 " , "F08868 " , "F14031 " ,
"F14033 " , "F12585 " , "F14663 ")
I want to omit the extra blank spaces.
gsub("[[:space:]]","",ss)
The above code works but if I directly call the variable from the dataframe it's not working.
gsub("[[:space:]]","",df$Variable)
I also checked the type of the vector/variables, both are same as a character vector.
So what is happening here?
I cannot reproduce your error:
ss <- c("F00020 " , "F13975 " , "F13976 " , "F15334 " , "F12490 " , "F09787 " , "F14675 " ,
"F12129 " , "F04641 " , "F04680 " , "F04715 " , "F04753 " , "F08868 " , "F14031 " ,
"F14033 " , "F12585 " , "F14663 ")
gsub("[[:space:]]","",ss)
[1] "F00020" "F13975" "F13976" "F15334" "F12490" "F09787" "F14675" "F12129" "F04641" "F04680" "F04715"
[12] "F04753" "F08868" "F14031" "F14033" "F12585" "F14663"
df <- data.frame(Variable = ss)
gsub("[[:space:]]","",df$Variable)
[1] "F00020" "F13975" "F13976" "F15334" "F12490" "F09787" "F14675" "F12129" "F04641" "F04680" "F04715"
[12] "F04753" "F08868" "F14031" "F14033" "F12585" "F14663"
An easy solution for your use case is with trimws:
trimws(ss)
[1] "F00020" "F13975" "F13976" "F15334" "F12490" "F09787" "F14675" "F12129" "F04641" "F04680" "F04715"
[12] "F04753" "F08868" "F14031" "F14033" "F12585" "F14663"
Yes, as noted by others, your solution does work too, just as this, shorter, one does:
sub("\\s", "", ss) # no `gsub` needed **iff** there's always just one whitespace per string (in whatever position)
Good afternoon, I am not an expert in the topic of atomic vectors but I would like some ideas about it
I have the script for the movie "Coco" and I want to be able to get a row that is numbered in the form 1., 2., ... (130 scenes throughout the movie). I want to convert the line of each scene of the movie into a row that contains "Scene 1", "Scene 2", up to "Scene 130" and achieve it sequentially.
url <- "https://www.imsdb.com/scripts/Coco.html"
coco <- read_lines("coco2.txt") #after clean
class(coco)
typeof(coco)
" 48."
[782] " arms full of offerings."
[783] " Once the family clears, Miguel is nowhere to be seen."
[784] " INT. NEARBY CORRIDOR"
[785] " Miguel and Dante hide from the patrolman. But Dante wanders"
[786] " off to inspect a side room."
[787] " INT. DEPARTMENT OF CORRECTIONS"
[788] " Miguel catches up to Dante. He overhears an exchange in a"
[789] " nearby cubicle."
[797] " 49."
[798] " And amigos, they help their amigos."
[799] " worth your while."
[800] " workstation."
[801] " Miguel perks at the mention of de la Cruz."
[809] " Miguel follows him."
[810] " 50." # Its scene number
[811] " INT. HALLWAY"
s <- grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)
My solution is wrong because it is not sequential
v <- gsub(s, pattern = "[^Level].[0-9].$", replacement = paste("Scene", sequence(1:130)))
[1] " Scene1"
[2] " Scene1"
[3] " Scene1"
[4] " Scene1"
[5] " Scene1"
[6] " Scene1"
I'm not clear on what [^Level] represents. However, if the numbers at the end of lines in the text represent the Scene numbers, then you can use ( ) to capture the numbers and substitute them in your replacement text as shown below:
v <- gsub(s, pattern = " ([0-9]{1,3})\\.$", replacement = "Scene \\1")
Curious if you might offer advice on the following.
I have data in a text file in this form:
"var1"
" var1a"
" var1a_descrp1"
" thing"
" var1b"
" var1b_descrp2"
" thing"
" var1b_descrp3"
" thing1"
" thing2"
" var1b_descrp4"
"poobarvar"
" var2a"
" var2a_descrp1"
" var2b"
" var2b_descrp1"
" thing"
" var2b_descrp1"
" thing1"
" thing2"
" thing3"
White spaces go a max depth of 12 spaces, or "three levels" deep.
And I'd love to cleanly parse this into a list structure of something like the following structure:
$var1
$var1$var1a
$var1$var1a$var1a_descrp1
$var1$var1a$var1a_descrp1[[1]]
[1] "thing"
$var1$var2a
$var1$var2a$var2a_descrp2
$var1$var2a$var2a_descrp2[[1]]
[1] "thing"
$var1$var2a$var2a_descrp3
$var1$var2a$var2a_descrp3[[1]]
[1] "thing1"
$var1$var2a$var2a_descrp3[[2]]
[1] "thing2"
$poobarvar
$poobarvar$var2a
list()
$poobarvar$var2b
$poobarvar$var2b$var2b_descrp1
$poobarvar$var2b$var2b_descrp1[[1]]
[1] "thing1"
$poobarvar$var2b$var2b_descrp1[[2]]
[1] "thing2"
$poobarvar$var2b$var2b_descrp1[[3]]
[1] "thing3"
I have a pretty convoluted set of while loops and if-else statements I'd love to clean up.
I want to print to the screen double quotes (") in R, but it is not working. Typical regex escape characters are not working:
> print('"')
[1] "\""
> print('\"')
[1] "\""
> print('/"')
[1] "/\""
> print('`"')
[1] "`\""
> print('"xml"')
[1] "\"xml\""
> print('\"xml\"')
[1] "\"xml\""
> print('\\"xml\\"')
[1] "\\\"xml\\\""
I want it to return:
" "xml" "
which I will then use downstream.
Any ideas?
Use cat:
cat("\" \"xml\" \"")
OR
cat('" "','xml','" "')
Output:
" "xml" "
Alternative using noqoute:
noquote(" \" \"xml\" \" ")
Output :
" "xml" "
Another option using dQoute:
dQuote(" xml ")
Output :
"“ xml ”"
With the help of the print parameter quote:
print("\" \"xml\" \"", quote = FALSE)
> [1] " "xml" "
or
cat('"')
I would like to extract the first value from this list:
[[1]]
[1] " \" 0.0337302" " -0.000248016" " -0.000496032" " -0.000744048"
[5] " -0.000992063" " -0.00124008" " -0.0014881" " -0.00173611"
[9] " -0.00198413" " -0.00223214" " -0.00248016" " -0.00272817"
[13] " -0.00297619" " -0.00322421" " -0.00347222" " -0.00372024"
[17] " -0.00396825" " -0.00421627" " -0.00446429" " -0.0047123"
[21] " -0.00496032" " -0.00520833" " -0.00545635" " -0.00570437"
the name of this test is M, I have tested this M[1] and M[[1]] but I don't get the correct answer.
How can I do that?
You need to subset the list, and then the vector in the list:
M[[1]][1]
In other words, M is a list of 1 element, a character vector of length 24.
You may want to use unlist M to convert it to just a vector.
M <- unlist(M)
Then you can just use M[1].
To remove the \" you can use sub:
sub("\"","",M[1])
[1] " 0.0337302"
The first element in the list you've shown is the entire vector shown by
[1] " \" 0.0337302" " -0.000248016" " -0.000496032" " -0.000744048"
[5] " -0.000992063" " -0.00124008" " -0.0014881" " -0.00173611"
[9] " -0.00198413" " -0.00223214" " -0.00248016" " -0.00272817"
[13] " -0.00297619" " -0.00322421" " -0.00347222" " -0.00372024"
[17] " -0.00396825" " -0.00421627" " -0.00446429" " -0.0047123"
[21] " -0.00496032" " -0.00520833" " -0.00545635" " -0.00570437"
you get that vector by doing M[[1]]
To further get the first element of this vector just recognize that M[[1]] is the vector you want the first element of so use normal subsetting to get that: M[[1]][1]
> M[[1]][1]
[1] " \" 0.0337302"