I am trying to clean a dataset with the column: ltaCpInfoDF$weekdays_rate_1
For some of the rows, I would like to do this:
input: Daily(7am-11pm): $1.20 ; output: 7am-11pm: $1.20
The values within the bracket can be different timings for the rows.
Initially, I was thinking of removing by part such as removing "Daily(" with gsub first then removing ")". However, I seem to be facing issues with that.
ltaCpInfoDF$weekdays_rate_1 <- gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1)
Here is the error shown:
Error in gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
invalid regular expression 'Daily(', reason 'Missing ')''
In addition: Warning message:
In gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
TRE pattern compilation error 'Missing ')''
Could someone share with me a better way? Thank you in advance!
Use sub with a capture group:
input <- "Daily(7am-11pm): $1.20"
output <- gsub("\\S+\\s*\\((.*?)\\)", "\\1", input)
output
[1] "7am-11pm: $1.20"
We may use without capturing
gsub("^[^(]+\\(|\\)", "", str1)
[1] "7am-11pm: $1.20"
data
str1 <- "Daily(7am-11pm): $1.20"
Related
I would like to paste "miR" to strings that do not have "miR" already, and skipping those that have it.
paste("miR", ....)
in
c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
out
c("miR-26b", "miR-26a", "miR-1297", "miR-4465", "miR-26b", "miR-26a")
One way could be by removing "miR" if it is present in the beginning of the string using sub and pasting it to every string irrespectively.
paste0("miR-", sub("^miR-","", x))
#[1] "miR-26b" "miR-26a" "miR-1297" "miR-4465" "miR-26b" "miR-26a"
data
x <- c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
vec <- c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
sub("^(?!miR)(.*)$", "miR-\\1", vec, perl = T)
#[1] "miR-26b" "miR-26a" "miR-1297" "miR-4465" "miR-26b" "miR-26a"
If you want to learn more:
type ?sub into R console
learn regex, have a closer look at negative look ahead, capturing groups LEARN REGEX
I've used perl = T because I get an error if I don't. READ MORE
I'm trying to create a sparse-matrix and get this error message:
Error: fnames == names(mf) are not all TRUE
I think it has something to do with the column names of my data, maybe you can help.
Here are the column names:
Error: fnames == names(mf) are not all TRUE
colnames(trainDataShrinkage) <-"Bildungsgrad2_Lower_secondary_education"
,"Bildungsgrad3_Upper_secondary_education"
,"Bildungsgrad4_Post-secondary_non-tertiary_education"
,"Bildungsgrad5_Short-cycle_tertiary_education"
,"Bildungsgrad6_Bachelors_or_equivalent_level"
,"Bildungsgrad7_Masters_or_equivalent_level"
,"Bildungsgrad8_Doctoral_or_equivalent_level"
,"Familienstand2_Verheiratet,_getrenntlebend"
,"Familienstand3_Ledig"
,"Familienstand4_Geschieden,_eing._gleichg._Partn._aufgehoben"
,"Familienstand5_Verwitwet,_Lebenspartner/in_verstorben"
,"Familienstand6_Eing._gleichg._Partn.,_zusammenlebend"
,"Geschlecht2_Weiblich"
,"Migrationshintergrund2_direkter_Migrationshintergrund"
,"Migrationshintergrund3_indirekter_Migrationshintergrund"
,"Bundesland2_Hamburg"
,"Bundesland3_Niedersachsen"
,"Bundesland4_Bremen"
,"Bundesland5_Nordrhein-Westfalen"
,"Bundesland6_Hessen"
,"Bundesland7_Rheinland-Pfalz"
,"Bundesland8_Baden-Wuerttemberg"
,"Bundesland9_Bayern"
,"Bundesland10_Saarland"
,"Bundesland11_Berlin_(West_und_Ost)"
,"Bundesland12_Brandenburg"
,"Bundesland13_Mecklenburg-Vorpommern"
,"Bundesland14_Sachsen"
,"Bundesland15_Sachsen-Anhalt"
,"Bundesland16_Thueringen"
,"Unternehmengroesse2_5 bis_10"
,"Unternehmengroesse3_11_bis_unter_20"
,"Unternehmengroesse6_20_bis_unter_100"
,"Unternehmengroesse7_100_bis_unter_200"
,"Unternehmengroesse9_200_bis_unter_2000"
,"Unternehmengroesse10_2000_und_mehr"
,"Erwerbsstatus2_Teilzeitbeschaeftigung"
,"Erwerbsstatus4_Geringfuegig_beschaeftigt"
,"Stundenlohn"
,"AlterEins"
,"AlterZwei"
,"AlterDrei"
,"AlterFuenF"
,"AlterSechs"
,"BildungsjahreEins"
,"BildungsjahreZwei"
,"BildungsjahreDrei"
,"BildungsjahreVier"
,"BildungsjahreFuenf"
,"ArbeitsmarkterfahrungVollzeitEins"
,"ArbeitsmarkterfahrungVollzeitZwei"
,"ArbeitsmarkterfahrungVollzeitDrei"
,"ArbeitsmarkterfahrungVollzeitVier"
,"ArbeitsmarkterfahrungVollzeitFuenf"
,"ArbeitsmarkterfahrungTeilzeitEins"
,"ArbeitsmarkterfahrungTeilzeitZwei"
,"ArbeitsmarkterfahrungTeilzeitDrei"
,"ArbeitsmarkterfahrungTeilzeitVier"
,"ArbeitsmarkterfahrungTeilzeitFuenf"
,"BruttoverdienstLetztenMonatEins"
,"BruttoverdienstLetztenMonatZwei"
,"BruttoverdienstLetztenMonatDrei"
,"BruttoverdienstLetztenMonatVier"
,"BruttoverdienstLetztenMonatFuenf")
It does not like some special characters in the column names. I faced issues with column names starting with #, 1, . and /. Can you try to replace these occurrences with '_'.
Simplest way would be to trim your column names off any special characters. Let me know if you cannot rename them due to any limitation.
I have the following sentence, sent:
> "#stance=iPhone : The next revolution of Apple"
And I would like to extract the stance.
After using
strapplyc(sent, "stance=(.*)", simplify = TRUE)
I obtained the following:
> "iPhone : The next revolution of Apple"
Anyone knows if there is a better way to just extract out the "iphone" in this case?
Try this (add : in this regex):
strapplyc(sent, "stance=(.*) :", simplify = TRUE)
I have code that came with a dataset that I downloaded. This code is supposed to convert factor variables to numeric. When I run each line individually, it works fine, but if I try to highlight a whole section, then I get the following error:
Error: unexpected input in ...
It gives me this error for every line of code, but again if I run each line individually, then it works fine. I've never run into this before. What's going on?? Thanks!
Here's the code that I'm trying to run:
library(prettyR)
lbls <- sort(levels(DF$myVar))
lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
DF$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", DF$myVar))
DF$myVar <- add.value.labels(DF$myVar, lbls)
And here is the output with the errors:
> library(prettyR)
"rror: unexpected input in "library(prettyR)
> lbls <- sort(levels(DF$myVar))
"rror: unexpected input in "lbls <- sort(levels(DF$myVar))
> lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
"rror: unexpected input in "lbls <- (sub("^\\([0-9]+\\) +(.+$)", "\\1", lbls))
> surv.df$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", DF$myVar))
"rror: unexpected input in "DF$myVar <- as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1",DF$myVar))
> surv.df$BATTLEGROUND <- add.value.labels(DF$myVar, lbls)
Error in add.value.labels(surv.df$myVar, lbls) :
object 'lbls' not found
I figured out the issue (actually someone told me what the problem was)
The code was downloaded as a .R file and must have been written using a text editor with non-standard "new line" coding. So I just copied the code to a text editor, did replace all to switch "\n" to
"#####". Then I used replace all again to switch back to new-lines and copied it back into R studio.
And everything works!
When I try to parse JSON from the character object from a Facebook URL I got "Error in fromJSON(data) : unexpected escaped character '\o' at pos 130". Check this out:
library(RCurl)
library(rjson)
data <- getURL("https://graph.facebook.com/search?q=multishow&type=post&limit=1500", cainfo="cacert.perm")
fbData <- fromJSON(data)
Error in fromJSON(data) : unexpected escaped character '\o' at pos 130
#with RSONIO also error
> fbData <- fromJSON(data)
Erro em fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input
Is there any way to replace this '\o' character before I try to parse JSON? I tried gsub but it didn't work (or i'm doing something wrong).
datafixed <- gsub('\o',' ',data)
Error: '\o' is an unrecognized escape sequence in string starting with "\o"
Can somebody hel me with this one? Thanks.
You need to escape \ in your pattern.
Try
gsub('\\o',' ',data)
You could do
fbData <- fromJSON(data,unexpected.escape = "keep")
you will see a warning
Warning message:
In fromJSON(individual_page, unexpected.escape = "keep") :
unexpected escaped character '\m' at pos 10. Keeping value.
if you want you can suppress the warning using
suppressWarnings(fromJSON(data,unexpected.escape = "keep"))
unexpected.escape : changed handling of unexpected escaped characters. Handling value should be one of "error", "skip", or "keep"; on unexpected characters issue an error, skip
the character, or keep the character
You can find more details here - http://cran.r-project.org/web/packages/rjson/rjson.pdf