In my r dataframe, I have a column that looks like this:
df$Year
"Cumulative.12.2013.Actual"
"Cumulative.12.2014.Actual"
"Cumulative.12.2015.Actual"
"Cumulative.12.2016.Actual"
"Cumulative.12.2017.Actual"
"Cumulative.12.2018.Actual"
"Cumulative.12.2019.Actual"
"Cumulative.5.2020.Actual"
I'm trying to re-format the column such that I only include the dates. It should look like:
df$Year
"12/2013"
"12/2014"
"12/2015"
"12/2016"
"12/2017"
"12/2018"
"12/2019"
"5/2020"
How can I achieve this? I tried doing it all in one line, but all it returns in df$Year is "12/2013" for all the rows :
df$Year< - paste(strsplit(df$Year, ".", fixed=TRUE)[[1]][2], strsplit(x, ".", fixed=TRUE)[[1]][3], sep="/")
Here is an option.
df$Year %<>%
as.character %>%
strsplit(.,"([a-zA-Z]\\.)|(\\.[a-zA-Z])") %>%
sapply( .,function(i) i[2] %>%
gsub(".","/",.,fixed=T))
I wouldn't use strsplit for this. It would be better to use something like gsub.
gsub("^[^0-9]+(\\d+).(\\d+)[^0-9]+$", "\\1/\\2", x)
## [1] "12/2013" "12/2014" "12/2015" "12/2016" "12/2017" "12/2018" "12/2019" "5/2020"
There are many other alternatives. For example this is a hack I picked up from #akrun that would be useful here (and probably the fastest, given the nature of the data):
sub(".", "/", trimws(x, whitespace = "[^0-9]"), fixed = TRUE)
## [1] "12/2013" "12/2014" "12/2015" "12/2016" "12/2017" "12/2018" "12/2019" "5/2020"
Or you could do this:
sub(".", "/", gsub("Cumulative.|.Actual", "", x), fixed = TRUE)
## [1] "12/2013" "12/2014" "12/2015" "12/2016" "12/2017" "12/2018" "12/2019" "5/2020"
If you didn't need the forward slash for the date, you could even do something like this instead:
gsub("Cumulative.|.Actual", "", x)
## [1] "12.2013" "12.2014" "12.2015" "12.2016" "12.2017" "12.2018" "12.2019" "5.2020"
Sample data:
x <- c("Cumulative.12.2013.Actual", "Cumulative.12.2014.Actual", "Cumulative.12.2015.Actual",
"Cumulative.12.2016.Actual", "Cumulative.12.2017.Actual", "Cumulative.12.2018.Actual",
"Cumulative.12.2019.Actual", "Cumulative.5.2020.Actual")
Related
I have a list called samples_ID with 116 vectors, each vectors has three elements like these:
"11" "GT20-16829" "S27"
I wanna keep the 116 vectors, but combine the elements to a single element like this
"11_GT20-16829_S27"
I tried something like this
samples_ID_ <- paste(samples_ID, collapse = "_")
it returns a single vector, below is just a part of it:
..._c(\"33\", \"GT20-16846\", \"S24\")_c(\"33\", \"GT20-18142\", \"S72\")_c(\"34\", \"GT20-16819\", \"S50\")_c...
What am I doing wrong?
Can you help me please?
Thanks
A tidyverse option.
library(stringr)
library(purrr)
map(samples_ID, ~ str_c(., collapse = '_'))
# [[1]]
# [1] "11_GT20-16829_S27"
#
# [[2]]
# [1] "12_GT20-16830_S28"
Data
samples_ID <- list(c("11", "GT20-16829", "S27"), c("12", "GT20-16830", "S28"
))
In base R, we can use sapply
sapply(samples_ID, paste, collapse="_")
Another base R option using paste
do.call(paste, c(data.frame(t(list2DF(samples_ID))), sep = "_"))
or
do.call(paste, data.frame(do.call(rbind, samples_ID)), sep = "_"))
I tried to recode values such as (5,10],(20,20] to 5-10%,20-20% using gsub. So, the first parenthesis should be gone, the comma should be changed to dash and the last bracket should be %. What I can do was only
x<-c("(5,10]","(20,20]")
gsub("\\,","-",x)
Then the comma is changed to the dash. How can I change others as well?
Thanks.
Keeping it very simple, a set of gsubs.
x <- c("(5,10]","(20,20]")
x <- gsub(",", "-", x) # remove comma
x <- gsub("\\(", "", x) # remove bracket
x <- gsub("]", "%", x) # replace ] by %
x
"5-10%" "20-20%"
Here's another alternative:
> gsub("\\((\\d+),(\\d+)\\]", "\\1-\\2%", x)
[1] "5-10%" "20-20%"
Other solution.
Using regmatches we extract all the numbers. We then combine every first and second number.
nrs <- regmatches(x, gregexpr("[[:digit:]]+", x))
nrs <- as.numeric(unlist(nrs))
i <- 1:length(nrs); i <- i[(i%%2)==1]
for(h in i){print(paste0(nrs[h],'-',nrs[h+1],'%'))}
[1] "5-10%"
[1] "20-20%"
Just for fun, an ugly one-liner:
sapply(regmatches(x, gregexpr("\\d+", x)), function(x) paste0(x[1], "-", x[2], "%"))
[1] "5-10%" "20-20%"
I have a string "c(\"AV\", \"IM\")", which I'm trying to transform into the string "AV IM".
My issue is that I can't unlist() or flatten() this, as it's a character, and neither paste() nor stringr::str_c() work, since it's technically still 1 character value.
Any ideas how I can do this?
Tidyverse solutions preferred, if possible.
EDIT: I know this can be solved via regex, but I feel like this is more a "fundamental" problem to be solved string-level than it is a regex problem, if that makes any sense.
Not sure how you got here, but this as presented would be an eval/parse situation. However, as noted in many other answers on this site, there's almost always a better way of preparing your data so you end up in a more R-friendly form. See, for starters, What specifically are the dangers of eval(parse(...))?.
> a <- "c(\"AV\", \"IM\")"
> (b <- eval(parse(text=a)))
[1] "AV" "IM"
> paste(b, collapse=" ")
[1] "AV IM"
You can also consider to use regular expression to replace all symbols and the beginning c.
s <- "c(\"AV\", \"IM\")"
s_vec <- strsplit(s, split = ",")[[1]]
gsub("[[:punct:]]|^c", "", s_vec)
# [1] "AV" " IM"
Well it is not quite easy how you got here. You can use eval-parse, though it is not vectorized. And also it is slow. Thus you need a regular expression:
a <- "c(\"AV\", \"IM\")"
stringr::str_extract_all(a,"\\w+(?!\\()")
[[1]]
[1] "AV" "IM"
Other answers output a vector. My understanding is you want a space-delimited list of your strings.
library(dplyr)
a <- "c(\"AV\", \"IM\")"
a %>%
gsub("c(", "", ., fixed=TRUE) %>%
gsub("\"", "", ., fixed=TRUE) %>%
gsub(",", "", ., fixed=TRUE) %>%
gsub(")", "", ., fixed=TRUE)
Output
"AV IM"
EDIT Or simply (from #www's answer):
a %>%
gsub("[[:punct:]]|^c", "", .)
There is a vector with a time value. How can I remove a colon and convert a text value to a numeric value. i.e. from "10:01:02" - character to 100102 - numeric. All that I could find is presented below.
> x <- c("10:01:02", "11:01:02")
> strsplit(x, split = ":")
[[1]]
[1] "10" "01" "02"
[[2]]
[1] "11" "01" "02"
If you want to do everything in one line, you can use the destring() function from taRifx to remove everything that isn't a number and convert the result to numeric.
taRifx::destring(x)
This will also work if some of your data's formatted in a different way, such as "10-01-02", though you may have to set the value of keep.
destring("10-10-10", keep = "0-9")
And if you don't want to have to install the taRifx package you can define the destring() function locally.
destring <- function(x, keep = "0-9.-")
{
return(as.numeric(gsub(paste("[^", keep, "]+", sep = ""),
"", x)))
}
We can use gsub to replace : with "". After that, use as.numeric to do the conversion.
x <- as.numeric(gsub(":", "", x, fixed = TRUE))
Or we can use the regex suggest by Soto
x <- as.numeric(gsub('\\D+', '', x))
Try with
x <- as.numeric(x)
and then to make sure
class(x)
Let's say I have the following string:
s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"
I would like to recover the strings between ";" and "=" to get the following output:
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
Can I use strsplit() with more than one split element?
1) strsplit with matrix Try this:
> matrix(strsplit(s, "[;=]")[[1]], 2)[2,]
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
2) strsplit with gsub or this use of strsplit with gsub:
> strsplit(gsub("[^=;]+=", "", s), ";")[[1]]
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
3) strsplit with sub or this use of strsplit with sub:
> sub(".*=", "", strsplit(s, ";")[[1]])
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
4) strapplyc or this which extracts consecutive non-semicolons after equal signs:
> library(gsubfn)
> strapplyc(s, "=([^;]+)", simplify = unlist)
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
ADDED additional strplit solutions.
I know this is an old question, but I found the usage of lookaround regular expressions quite elegant for this problem:
library(stringr)
your_string <- '/this/file/name.txt'
result <- str_extract(string = your_string, pattern = "(?<=/)[^/]*(?=\\.)")
result
In words,
The (?<=...) part looks before the desired string for a... (in this case a forward slash).
The [^/]* then looks for as many characters in a row that are not a forward slash (in this case name.txt).
The (?=...) then looks after the desired string for a ... (in this case the special period character, which needs to be escaped as \\.).
This also works on dataframes:
library(dplyr)
strings <- c('/this/file/name1.txt', 'tis/other/file/name2.csv')
df <- as.data.frame(strings) %>%
mutate(name = str_extract(string = strings, pattern = "(?<=/)[^/]*(?=\\.)"))
# Optional
names <- df %>% pull(name)
Or, in your case:
your_string <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"
result <- str_extract(string = your_string, pattern = "(?<=;Alias=)[^;]*(?=;)")
result # Outputs 'MIMAT0027618'