How to remove paranthesis but keep the text in it in R - r

I am trying to clean a dataset with the column: ltaCpInfoDF$weekdays_rate_1
For some of the rows, I would like to do this:
input: Daily(7am-11pm): $1.20 ; output: 7am-11pm: $1.20
The values within the bracket can be different timings for the rows.
Initially, I was thinking of removing by part such as removing "Daily(" with gsub first then removing ")". However, I seem to be facing issues with that.
ltaCpInfoDF$weekdays_rate_1 <- gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1)
Here is the error shown:
Error in gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
invalid regular expression 'Daily(', reason 'Missing ')''
In addition: Warning message:
In gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
TRE pattern compilation error 'Missing ')''
Could someone share with me a better way? Thank you in advance!

Use sub with a capture group:
input <- "Daily(7am-11pm): $1.20"
output <- gsub("\\S+\\s*\\((.*?)\\)", "\\1", input)
output
[1] "7am-11pm: $1.20"

We may use without capturing
gsub("^[^(]+\\(|\\)", "", str1)
[1] "7am-11pm: $1.20"
data
str1 <- "Daily(7am-11pm): $1.20"

Related

paste specific text to strings that do not have it

I would like to paste "miR" to strings that do not have "miR" already, and skipping those that have it.
paste("miR", ....)
in
c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
out
c("miR-26b", "miR-26a", "miR-1297", "miR-4465", "miR-26b", "miR-26a")
One way could be by removing "miR" if it is present in the beginning of the string using sub and pasting it to every string irrespectively.
paste0("miR-", sub("^miR-","", x))
#[1] "miR-26b" "miR-26a" "miR-1297" "miR-4465" "miR-26b" "miR-26a"
data
x <- c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
vec <- c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
sub("^(?!miR)(.*)$", "miR-\\1", vec, perl = T)
#[1] "miR-26b" "miR-26a" "miR-1297" "miR-4465" "miR-26b" "miR-26a"
If you want to learn more:
type ?sub into R console
learn regex, have a closer look at negative look ahead, capturing groups LEARN REGEX
I've used perl = T because I get an error if I don't. READ MORE

Sparse.model.matrix error message

I'm trying to create a sparse-matrix and get this error message:
Error: fnames == names(mf) are not all TRUE
I think it has something to do with the column names of my data, maybe you can help.
Here are the column names:
Error: fnames == names(mf) are not all TRUE
colnames(trainDataShrinkage) <-"Bildungsgrad2_Lower_secondary_education"
,"Bildungsgrad3_Upper_secondary_education"
,"Bildungsgrad4_Post-secondary_non-tertiary_education"
,"Bildungsgrad5_Short-cycle_tertiary_education"
,"Bildungsgrad6_Bachelors_or_equivalent_level"
,"Bildungsgrad7_Masters_or_equivalent_level"
,"Bildungsgrad8_Doctoral_or_equivalent_level"
,"Familienstand2_Verheiratet,_getrenntlebend"
,"Familienstand3_Ledig"
,"Familienstand4_Geschieden,_eing._gleichg._Partn._aufgehoben"
,"Familienstand5_Verwitwet,_Lebenspartner/in_verstorben"
,"Familienstand6_Eing._gleichg._Partn.,_zusammenlebend"
,"Geschlecht2_Weiblich"
,"Migrationshintergrund2_direkter_Migrationshintergrund"
,"Migrationshintergrund3_indirekter_Migrationshintergrund"
,"Bundesland2_Hamburg"
,"Bundesland3_Niedersachsen"
,"Bundesland4_Bremen"
,"Bundesland5_Nordrhein-Westfalen"
,"Bundesland6_Hessen"
,"Bundesland7_Rheinland-Pfalz"
,"Bundesland8_Baden-Wuerttemberg"
,"Bundesland9_Bayern"
,"Bundesland10_Saarland"
,"Bundesland11_Berlin_(West_und_Ost)"
,"Bundesland12_Brandenburg"
,"Bundesland13_Mecklenburg-Vorpommern"
,"Bundesland14_Sachsen"
,"Bundesland15_Sachsen-Anhalt"
,"Bundesland16_Thueringen"
,"Unternehmengroesse2_5 bis_10"
,"Unternehmengroesse3_11_bis_unter_20"
,"Unternehmengroesse6_20_bis_unter_100"
,"Unternehmengroesse7_100_bis_unter_200"
,"Unternehmengroesse9_200_bis_unter_2000"
,"Unternehmengroesse10_2000_und_mehr"
,"Erwerbsstatus2_Teilzeitbeschaeftigung"
,"Erwerbsstatus4_Geringfuegig_beschaeftigt"
,"Stundenlohn"
,"AlterEins"
,"AlterZwei"
,"AlterDrei"
,"AlterFuenF"
,"AlterSechs"
,"BildungsjahreEins"
,"BildungsjahreZwei"
,"BildungsjahreDrei"
,"BildungsjahreVier"
,"BildungsjahreFuenf"
,"ArbeitsmarkterfahrungVollzeitEins"
,"ArbeitsmarkterfahrungVollzeitZwei"
,"ArbeitsmarkterfahrungVollzeitDrei"
,"ArbeitsmarkterfahrungVollzeitVier"
,"ArbeitsmarkterfahrungVollzeitFuenf"
,"ArbeitsmarkterfahrungTeilzeitEins"
,"ArbeitsmarkterfahrungTeilzeitZwei"
,"ArbeitsmarkterfahrungTeilzeitDrei"
,"ArbeitsmarkterfahrungTeilzeitVier"
,"ArbeitsmarkterfahrungTeilzeitFuenf"
,"BruttoverdienstLetztenMonatEins"
,"BruttoverdienstLetztenMonatZwei"
,"BruttoverdienstLetztenMonatDrei"
,"BruttoverdienstLetztenMonatVier"
,"BruttoverdienstLetztenMonatFuenf")
It does not like some special characters in the column names. I faced issues with column names starting with #, 1, . and /. Can you try to replace these occurrences with '_'.
Simplest way would be to trim your column names off any special characters. Let me know if you cannot rename them due to any limitation.

Extracting part of the string using strapplyc in R

I have the following sentence, sent:
> "#stance=iPhone : The next revolution of Apple"
And I would like to extract the stance.
After using
strapplyc(sent, "stance=(.*)", simplify = TRUE)
I obtained the following:
> "iPhone : The next revolution of Apple"
Anyone knows if there is a better way to just extract out the "iphone" in this case?
Try this (add : in this regex):
strapplyc(sent, "stance=(.*) :", simplify = TRUE)

Error only when running whole block of code

I have code that came with a dataset that I downloaded. This code is supposed to convert factor variables to numeric. When I run each line individually, it works fine, but if I try to highlight a whole section, then I get the following error:
Error: unexpected input in ...
It gives me this error for every line of code, but again if I run each line individually, then it works fine. I've never run into this before. What's going on?? Thanks!
Here's the code that I'm trying to run:
library(prettyR)
lbls <- sort(levels(DF$myVar))
lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
DF$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", DF$myVar))
DF$myVar <- add.value.labels(DF$myVar, lbls)
And here is the output with the errors:
> library(prettyR)
"rror: unexpected input in "library(prettyR)
> lbls <- sort(levels(DF$myVar))
"rror: unexpected input in "lbls <- sort(levels(DF$myVar))
> lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
"rror: unexpected input in "lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
> surv.df$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", DF$myVar))
"rror: unexpected input in "DF$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1",DF$myVar))
> surv.df$BATTLEGROUND <- add.value.labels(DF$myVar, lbls)
Error in add.value.labels(surv.df$myVar, lbls) :
object 'lbls' not found
I figured out the issue (actually someone told me what the problem was)
The code was downloaded as a .R file and must have been written using a text editor with non-standard "new line" coding. So I just copied the code to a text editor, did replace all to switch "\n" to
"#####". Then I used replace all again to switch back to new-lines and copied it back into R studio.
And everything works!

How to replace "unexpected escaped character" in R

When I try to parse JSON from the character object from a Facebook URL I got "Error in fromJSON(data) : unexpected escaped character '\o' at pos 130". Check this out:
library(RCurl)
library(rjson)
data <- getURL("https://graph.facebook.com/search?q=multishow&type=post&limit=1500", cainfo="cacert.perm")
fbData <- fromJSON(data)
Error in fromJSON(data) : unexpected escaped character '\o' at pos 130
#with RSONIO also error
> fbData <- fromJSON(data)
Erro em fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input
Is there any way to replace this '\o' character before I try to parse JSON? I tried gsub but it didn't work (or i'm doing something wrong).
datafixed <- gsub('\o',' ',data)
Error: '\o' is an unrecognized escape sequence in string starting with "\o"
Can somebody hel me with this one? Thanks.
You need to escape \ in your pattern.
Try
gsub('\\o',' ',data)
You could do
fbData <- fromJSON(data,unexpected.escape = "keep")
you will see a warning
Warning message:
In fromJSON(individual_page, unexpected.escape = "keep") :
unexpected escaped character '\m' at pos 10. Keeping value.
if you want you can suppress the warning using
suppressWarnings(fromJSON(data,unexpected.escape = "keep"))
unexpected.escape : changed handling of unexpected escaped characters. Handling value should be one of "error", "skip", or "keep"; on unexpected characters issue an error, skip
the character, or keep the character
You can find more details here - http://cran.r-project.org/web/packages/rjson/rjson.pdf

Resources