add text to atomic (character) vector in r - r

Good afternoon, I am not an expert in the topic of atomic vectors but I would like some ideas about it
I have the script for the movie "Coco" and I want to be able to get a row that is numbered in the form 1., 2., ... (130 scenes throughout the movie). I want to convert the line of each scene of the movie into a row that contains "Scene 1", "Scene 2", up to "Scene 130" and achieve it sequentially.
url <- "https://www.imsdb.com/scripts/Coco.html"
coco <- read_lines("coco2.txt") #after clean
class(coco)
typeof(coco)
" 48."
[782] " arms full of offerings."
[783] " Once the family clears, Miguel is nowhere to be seen."
[784] " INT. NEARBY CORRIDOR"
[785] " Miguel and Dante hide from the patrolman. But Dante wanders"
[786] " off to inspect a side room."
[787] " INT. DEPARTMENT OF CORRECTIONS"
[788] " Miguel catches up to Dante. He overhears an exchange in a"
[789] " nearby cubicle."
[797] " 49."
[798] " And amigos, they help their amigos."
[799] " worth your while."
[800] " workstation."
[801] " Miguel perks at the mention of de la Cruz."
[809] " Miguel follows him."
[810] " 50." # Its scene number
[811] " INT. HALLWAY"
s <- grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)
My solution is wrong because it is not sequential
v <- gsub(s, pattern = "[^Level].[0-9].$", replacement = paste("Scene", sequence(1:130)))
[1] " Scene1"
[2] " Scene1"
[3] " Scene1"
[4] " Scene1"
[5] " Scene1"
[6] " Scene1"

I'm not clear on what [^Level] represents. However, if the numbers at the end of lines in the text represent the Scene numbers, then you can use ( ) to capture the numbers and substitute them in your replacement text as shown below:
v <- gsub(s, pattern = " ([0-9]{1,3})\\.$", replacement = "Scene \\1")

Related

How do i get the twenty one like this <<twenty one >> and not like <<twenty>> <<one>>

stuff= c("my favoiet number is 23","zev is the best","i love 23,456", "twenty one", "10", "123,123,123" ,"dfghjklkjhgfghj",
"three is my numner" ,"this cost $1.23" , "roman numeral VI is awesome ")
WordNumber= "(one|two|three|four|five|six|seven|eight|nine|ten|
eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen|twenty|
thirty|forty|fifty|sixty|seventy|eighty|ninety
hundred|thousand|million|billion|trillion)"
gsub(WordNumber,"<<\\1>>" , stuff)
You need to re-arrange your parentheses and add optional spaces:
WordNumber= "((?:(?:one|two|three|four|five|six|seven|eight|nine|ten|
eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen|twenty|
thirty|forty|fifty|sixty|seventy|eighty|ninety
hundred|thousand|million|billion|trillion)\\s*)+)"
gsub(WordNumber,"<<\\1>>" , stuff)
This yields
[1] "my favoiet number is 23" "zev is the best"
[3] "i love 23,456" "<<twenty one>>"
[5] "10" "123,123,123"
[7] "dfghjklkjhgfghj" "<<three >>is my numner"
[9] "this cost $1.23" "roman numeral VI is awesome "

R - Converting a text file with white spaces / tabs and newline to list

Curious if you might offer advice on the following.
I have data in a text file in this form:
"var1"
" var1a"
" var1a_descrp1"
" thing"
" var1b"
" var1b_descrp2"
" thing"
" var1b_descrp3"
" thing1"
" thing2"
" var1b_descrp4"
"poobarvar"
" var2a"
" var2a_descrp1"
" var2b"
" var2b_descrp1"
" thing"
" var2b_descrp1"
" thing1"
" thing2"
" thing3"
White spaces go a max depth of 12 spaces, or "three levels" deep.
And I'd love to cleanly parse this into a list structure of something like the following structure:
$var1
$var1$var1a
$var1$var1a$var1a_descrp1
$var1$var1a$var1a_descrp1[[1]]
[1] "thing"
$var1$var2a
$var1$var2a$var2a_descrp2
$var1$var2a$var2a_descrp2[[1]]
[1] "thing"
$var1$var2a$var2a_descrp3
$var1$var2a$var2a_descrp3[[1]]
[1] "thing1"
$var1$var2a$var2a_descrp3[[2]]
[1] "thing2"
$poobarvar
$poobarvar$var2a
list()
$poobarvar$var2b
$poobarvar$var2b$var2b_descrp1
$poobarvar$var2b$var2b_descrp1[[1]]
[1] "thing1"
$poobarvar$var2b$var2b_descrp1[[2]]
[1] "thing2"
$poobarvar$var2b$var2b_descrp1[[3]]
[1] "thing3"
I have a pretty convoluted set of while loops and if-else statements I'd love to clean up.

Using R package ggmap with Google Directions API token

I'm using ggmap's "route" function to plot driving directions using Google's Driving Directions API. I may need to exceed the 2,500 daily API limit. I know that to exceed that threshold, I'll have to pay the $0.50/1000 hits fee and sign up for a Google Developers API token. However, I don't see any parameters in the ggmap library or route function that allow me to enter my token and key information so that I can exceed the threshold. What am I missing?
I've written the package googleway to access google maps API where you can specify your token key
For example
library(googlway)
key <- "your_api_key"
google_directions(origin = "MCG, Melbourne",
destination = "Flinders Street Station, Melbourne",
key = key,
simplify = F) ## use simplify = T to return a data.frame
[1] "{"
[2] " \"geocoded_waypoints\" : ["
[3] " {"
[4] " \"geocoder_status\" : \"OK\","
[5] " \"partial_match\" : true,"
[6] " \"place_id\" : \"ChIJIdtrbupC1moRMPT0CXZWBB0\","
[7] " \"types\" : ["
[8] " \"establishment\","
[9] " \"point_of_interest\","
[10] " \"train_station\","
[11] " \"transit_station\""
[12] " ]"
[13] " },"
[14] " {"
[15] " \"geocoder_status\" : \"OK\","
[16] " \"place_id\" : \"ChIJSSKDr7ZC1moRTsSnSV5BnuM\","
[17] " \"types\" : ["
[18] " \"establishment\","
[19] " \"point_of_interest\","
[20] " \"train_station\","
[21] " \"transit_station\""
[22] " ]"
[23] " }"
[24] " ],"
[25] " \"routes\" : ["
... etc

Replacing + by 5 in R

I have a dataset called Price which is supposed to be numeric but is generated as a string because all 5 is replaced by +.
It looks like this:
"99000" "98300" "98300" "98290" "98310" " 9831+ " "98310" " 9830+ " " 9830+ " " 9830+ " " 9829+ " " 9828+ " " 9827+ " "98270"
I used the gsub function in R to try and replace + by 5. The code I wrote is:
finalPrice<-gsub("+",5,Price)
However, the output is just a bunch of numbers which doesn't make sense for what I intended:
"59595050505,5 59585350505,5 59585350505,5 59585259505,5 59585351505,5 5 5 595853515+5 5,5 59585351505,5 5 5 595853505+5 5,5 5 5 595853505+5
How can I fix this?
The + sign should be escaped. Try this:
finalPrice<-gsub("\\+",5, Price)
Besides using double-escapes to force a literal-x to be matched by the pattern argument, you can also use either the fixed=TRUE parameter or use a character-class defined by the "[.]"-operation. See the ?regex page for more details:
> gsub("+", "5", txt, fixed=TRUE)
[1] "99000" "98300" "98300" "98290" "98310"
[6] " 98315 " "98310" " 98305 " " 98305 " " 98305 "
[11] " 98295 " " 98285 " " 98275 " "98270"
> gsub("[+]", "5", txt)
[1] "99000" "98300" "98300" "98290" "98310"
[6] " 98315 " "98310" " 98305 " " 98305 " " 98305 "
[11] " 98295 " " 98285 " " 98275 " "98270"
When writing regex, + means match the preceeding group one or more times. As the preceeding character is in your regex before the + is empty, gsub matches every empty string in the target.
The result is that 5 is inserted into each of these positions.
To avoid this, escape the +, which needs to be done with double backslash in R:
finalPrice<-gsub("\\+",5,Price)

extracting first value from a list

I would like to extract the first value from this list:
[[1]]
[1] " \" 0.0337302" " -0.000248016" " -0.000496032" " -0.000744048"
[5] " -0.000992063" " -0.00124008" " -0.0014881" " -0.00173611"
[9] " -0.00198413" " -0.00223214" " -0.00248016" " -0.00272817"
[13] " -0.00297619" " -0.00322421" " -0.00347222" " -0.00372024"
[17] " -0.00396825" " -0.00421627" " -0.00446429" " -0.0047123"
[21] " -0.00496032" " -0.00520833" " -0.00545635" " -0.00570437"
the name of this test is M, I have tested this M[1] and M[[1]] but I don't get the correct answer.
How can I do that?
You need to subset the list, and then the vector in the list:
M[[1]][1]
In other words, M is a list of 1 element, a character vector of length 24.
You may want to use unlist M to convert it to just a vector.
M <- unlist(M)
Then you can just use M[1].
To remove the \" you can use sub:
sub("\"","",M[1])
[1] " 0.0337302"
The first element in the list you've shown is the entire vector shown by
[1] " \" 0.0337302" " -0.000248016" " -0.000496032" " -0.000744048"
[5] " -0.000992063" " -0.00124008" " -0.0014881" " -0.00173611"
[9] " -0.00198413" " -0.00223214" " -0.00248016" " -0.00272817"
[13] " -0.00297619" " -0.00322421" " -0.00347222" " -0.00372024"
[17] " -0.00396825" " -0.00421627" " -0.00446429" " -0.0047123"
[21] " -0.00496032" " -0.00520833" " -0.00545635" " -0.00570437"
you get that vector by doing M[[1]]
To further get the first element of this vector just recognize that M[[1]] is the vector you want the first element of so use normal subsetting to get that: M[[1]][1]
> M[[1]][1]
[1] " \" 0.0337302"

Resources