R gsub regular expression syntax error - r

If I have some string : 2017-01-12T19:00:00.000+000, and I want to have 2017-01-12, so delete all after and including "T" How do I proceed,
gsub("$.*T"," ","2017-01-12T19:00:00.000+000")
, would this not work? I am referring my self to:http://www.endmemo.com/program/R/gsub.php
Thank you!

One approach is to match and capture the date portion of your string using gsub() and then replace the entire string with what was captured.
gsub("(\\d{4}-\\d{2}-\\d{2}).*","\\1","2017-01-12T19:00:00.000+000")
[1] "2017-01-12"
Your original approach:
gsub("T.*","","2017-01-12T19:00:00.000+000")
[1] "2017-01-12"
As others have said, if the need for this format exceeds the scope of this particular timestamp string, then you should consider using a date API instead.
Demo here:
Rextester

Related

Regular Expression using R Programming Language

String<- "46,XX,t(1;19)(p32;q13.3),t(6;9)(p22;q34),del(32)t(12;16)(p12;q21)[cp20]"
The value I want to extract is t(1;19)(p32;q13.3), t(6;9)(p22;q34), t(12;16)(p12;q21)
The regex I'm using
ABC<-str_extract(String, regex("t.{1,16}"))
output I Get: t(1;19)(p32;q13.3
I know my code I incomplete but I'm unable to figure out a way to extract this information.
Thank you in advance
Assuming your String is :
String<- "46,XX,t(1;19)(p32;q13.3),t(6;9)(p22;q34),del(32)t(12;16)(p12;q21)[cp20]"
We can use str_extract_all as :
stringr::str_extract_all(String, "t\\(.*?\\)\\(.*?\\)")[[1]]
#[1] "t(1;19)(p32;q13.3)" "t(6;9)(p22;q34)" "t(12;16)(p12;q21)"
This returns "t" followed by everything in round brackets (()), followed by everything in another round bracket next to it.

Match everything up until first instance of a colon

Trying to code up a Regex in R to match everything before the first occurrence of a colon.
Let's say I have:
time = "12:05:41"
I'm trying to extract just the 12. My strategy was to do something like this:
grep(".+?(?=:)", time, value = TRUE)
But I'm getting the error that it's an invalid Regex. Thoughts?
Your regex seems fine in my opinion, I don't think you should use grep, also you are missing perl=TRUE that is why you are getting the error.
I would recommend using :
stringr::str_extract( time, "\\d+?(?=:)")
grep is little different than it is being used here, its good for matching separate values and filtering out those which has similar pattern, but you can't pluck out values within a string using grep.
If you want to use Base R you can also go for sub:
sub("^(\\d+?)(?=:)(.*)$","\\1",time, perl=TRUE)
Also, you may split the string using strsplit and filter out the first string like below:
strsplit(time, ":")[[1]][1]

How to convert special characters into unicode in R?

When doing some textual data cleaning in R, I can found some special characters. In order to get rid of them, I have to know their unicodes, for example € is \u20AC. I would like to know if it is possible "see" the unicodes with a function that take into account the string within the special character as an input?
Refering to Cath comment, iconv can do the job :
iconv("é", toRaw = TRUE)
Then, you may want to unlist and paste with \u00.
special_char <- "%"
Unicode::as.u_char(utf8ToInt(special_char))

Searching for an exact String in another String

I'm dealing with a very simple question and that is searching for a string inside of another string. Consider the example below:
bigStringList <- c("SO1.A", "SO12.A", "SO15.A")
strToSearch <- "SO1."
bigStringList[grepl(strToSearch, bigStringList)]
I'm looking for something that when I search for "SO1.", it only returns "SO1.A".
I saw many related questions on SO but most of the answers include grepl() which does not work in my case.
Thanks very much for your help in advance.
When searching for a simple string that doesn't include any metacharacters, you can set fixed=TRUE:
grep("SO1.", bigStringList, fixed=TRUE, value=TRUE)
# [1] "SO1.A"
Otherwise, as Frank notes, you'll need to escape the period (so that it'll be interpreted as an actual . rather than as a symbol meaning "any single character"):
grep("SO1\\.", bigStringList, value=TRUE)
# [1] "SO1.A"

R how to get a list of characters in a string

In R how could i get list of characters that are contained in a string. I have provided an example below
somefunction("abc")
should return
"a","b","c"
=============================update1
I am planning to get all the characters, sort them and join them back. I tried paste function but it didnt work :( Any inputs?
sort(strsplit('cba','')[[1]])
paste(sort(strsplit('cba','')[[1]]))
You can do:
strsplit('abc','')[[1]]
You could try
paste(sort(strsplit('cba','')[[1]]), collapse='')
#[1] "abc"

Resources