R: How to split string and keep part of it? - r

I have a lot of strings like shown below.
> x=c("cat_1_2_3", "dog_2_6_3", "cow_2_8_6")
> x
[1] "cat_1_2_3" "dog_2_6_3" "cow_2_8_6"
I would like to seperate the string, while still holding the first part of it, as demonstrated below.
"cat_1" "cat_2" "cat_3" "dog_2" "dog_6" "dog_3" "cow_2" "cow_8" "cow_6"
Any suggestions?

We can use sub
scan(text=sub("([a-z]+)_(\\d+)_(\\d+)_(\\d+)", "\\1_\\2,\\1_\\3,\\1_\\4",
x), what ='', sep=",", quiet = TRUE)
#[1] "cat_1" "cat_2" "cat_3" "dog_2" "dog_6" "dog_3" "cow_2" "cow_8" "cow_6"
Or another option is split the string with
unlist(lapply(strsplit(x, "_"), function(x) paste(x[1], x[-1], sep="_")))

You could try to split the string, then re-combine using paste.
f <- function(x) {
res <- strsplit(x,'_')[[1]]
paste(res[1], res[2:4], sep='_')
}
x <- c("cat_1_2_3", "dog_2_6_3", "cow_2_8_6")
unlist(lapply(x, f))

Related

A way to strsplit and replace all of one character with several variations of alternate strings?

I am sure there is a simple solution and I am just getting too frustrated to work through it but here is the issue, simplified:
I have a string, ex: AB^AB^AB^^BAAA^^BABA^
I want to replace the ^s (so, 7 characters in the string), but iterate through many variants and be able to retain them all as strings
for example:
replacement 1: CCDCDCD to get: ABCABCABDCBAAADCBABAD
replacement 2: DDDCCCD to get: ABDABDABDCBAAACCBABAD
I imagine strsplit is the way, and I would like to do it in a for loop, any help would be appreciated!
The positions of the "^" can be found using gregexpr, see tmp
x <- "AB^AB^AB^^BAAA^^BABA^"
y <- c("CCDCDCD", "DDDCCCD")
tmp <- gregexpr(pattern = "^", text = x, fixed = TRUE)
You can then split the 'replacements' character by character using strsplit, this gives a list. Finally, iterate over that list and replace the "^" with the characters from your replacements one after the other.
sapply(strsplit(y, split = ""), function(i) {
`regmatches<-`("AB^AB^AB^^BAAA^^BABA^", m = tmp, value = i)
})
Result
# [1] "ABCABCABCCBAAACCBABAC" "ABDABDABDDBAAADDBABAD"
You don't really need a for loop. You can strplit your string and pattern, and then replace the "^" with the vector.
str <- unlist(strsplit(str, ""))
pat <- unlist(strsplit("CCDCDCD", ""))
str[str == "^"] <- pat
paste(str, collapse = "")
# [1] "ABCABCABDCBAAADCBABAD"
An option is also with gsubfn
f1 <- Vectorize(function(str1, str2) {
p <- proto(fun = function(this, x) substr(str2, count, count))
gsubfn::gsubfn("\\^", p, str1)
})
-testing
> unname(f1(x, y))
[1] "ABCABCABDCBAAADCBABAD" "ABDABDABDCBAAACCBABAD"
data
x <- "AB^AB^AB^^BAAA^^BABA^"
y <- c("CCDCDCD", "DDDCCCD")
Given x <- "AB^AB^AB^^BAAA^^BABA^" and y <- c("CCDCDCD", "DDDCCCD"), we can try utf8ToInt + intToUtf8 + replace like below
sapply(
y,
function(s) {
intToUtf8(
replace(
u <- utf8ToInt(x),
u == utf8ToInt("^"),
utf8ToInt(s)
)
)
}
)
which gives
CCDCDCD DDDCCCD
"ABCABCABDCBAAADCBABAD" "ABDABDABDCBAAACCBABAD"

combining words in tm R is not achieving desired result

I am trying to combine a few words so that they count as one.
In this example I want val and valuatin to be counted as valuation.
The code I have been using to try and do this is below:
#load in package
library(tm)
replaceWords <- function(x, from, keep){
regex_pat <- paste(from, collapse = "|")
gsub(regex_pat, keep, x)
}
oldwords <- c("val", "valuati")
newword <- c("valuation")
TextDoc2 <- tm_map(TextDoc, replaceWords, from=oldwords, keep=newword)
However this does not work as expected. Any time there is val in a word it is now being replaced with valuation. For example equivalent becomes equivaluation. How do I get around this error and achieved my desired result?
Try this function -
replaceWords <- function(x, from, keep){
regex_pat <- sprintf('\\b(%s)\\b', paste(from, collapse = '|'))
gsub(regex_pat, keep, x)
}
val matches with equivalent. Adding word boundaries stop that from happening.
grepl('val', 'equivalent')
#[1] TRUE
grepl('\\bval\\b', 'equivalent')
#[1] FALSE

How to move patterns in a string in r?

I am trying to code a function that would allow me to move certain patterns in a string in r. For example, if my strings are pattern_string1, pattern_string2, pattern_string3, pattern_string4, I want to mutate them to string1_pattern, string2_pattern, string3_pattern, string4_pattern.
In oder to achieve this, I tried the following:
string_flip <- function(x, pattern){
if(str_detect(x, pattern)==TRUE){
str_remove(x, pattern) %>%
paste(x, "pattern", sep = "_")
}
}
However, when I try to apply this onto a vector of strings by the following code:
stringvector <- c(pattern_string1, pattern_string2, pattern_string3, pattern_string4, string5, string6)
string_flip(stringvector, "pattern")
it returns a warning and changes all vectors, not only the vectors that contain "pattern". In addition it does not only add pattern to the end of the string, it doubles the string itself as well, so I get the following result:
[1] "_string1_pattern_string1_pattern" "_string2_pattern_string2_pattern" "_string3_pattern_string3_pattern"
[4] "_string4_pattern_string4_pattern" "string5_string5_pattern" "string6_string6_pattern"
Can anybody help me with this?
Thanks a lot in advance!
Your function string_flip is not vectorised. It works for only one string at a time.
I think you have additional x which is why the string is doubling.
In paste, pattern should not be in quotes.
Try this function.
library(stringr)
string_flip <- function(x, pattern){
trimws(ifelse(str_detect(x, pattern),
str_remove(x, pattern) %>% paste(pattern, sep = "_"), x), whitespace = '_')
}
stringvector <- c('pattern_string1', 'pattern_string2', 'pattern_string3', 'pattern_string4')
string_flip(stringvector, "pattern")
#[1] "string1_pattern" "string2_pattern" "string3_pattern" "string4_pattern"

Assigning new strings with conditional match

I have an issue about replacing strings with the new ones conditionally.
I put short version of my real problem so far its working however I need a better solution since there are many rows in the real data.
strings <- c("ca_A33","cb_A32","cc_A31","cd_A30")
Basicly I want to replace strings with replace_strings. First item in the strings replaced with the first item in the replace_strings.
replace_strings <- c("A1","A2","A3","A4")
So the final string should look like
final string <- c("ca_A1","cb_A2","cc_A3","cd_A4")
I write some simple function assign_new
assign_new <- function(x){
ifelse(grepl("A33",x),gsub("A33","A1",x),
ifelse(grepl("A32",x),gsub("A32","A2",x),
ifelse(grepl("A31",x),gsub("A31","A3",x),
ifelse(grepl("A30",x),gsub("A30","A4",x),x))))
}
assign_new(strings)
[1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"
Ok it seems we have solution. But lets say if I have A1000 to A1 and want to replace them from A1 to A1000 I need to do 1000 of rows of ifelse statement. How can we tackle that?
If your vectors are ordered to be matched, then you can use:
> paste0(gsub("(.*_)(.*)","\\1", strings ), replace_strings)
[1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"
You can use regmatches.First obtain all the characters that are followed by _ using regexpr then replace as shown below
`regmatches<-`(strings,regexpr("(?<=_).*",strings,perl = T),value=replace_strings)
[1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"
Not the fastests but very tractable and easy to maintain:
for (i in 1:length(strings)) {
strings[i] <- gsub("\\d+$", i, strings[i])
}
"\\d+$" just matches any number at the end of the string.
EDIT: Per #Onyambu's comment, removing map2_chr as paste is a vectorized function.
foo <- function(x, y){
x <- unlist(lapply(strsplit(x, "_"), '[', 1))
paste(x, y, sep = "_"))
}
foo(strings, replace_strings)
with x being strings and y being replace_strings. You first split the strings object at the _ character, and paste with the respective replace_strings object.
EDIT:
For objects where there is no positional relationship you could create a reference table (dataframe, list, etc.) and match your values.
reference_tbl <- data.frame(strings, replace_strings)
foo <- function(x){
y <- reference_tbl$replace_strings[match(x, reference_tbl$strings)]
x <- unlist(lapply(strsplit(x, "_"), '[', 1))
paste(x, y, sep = "_")
}
foo(strings)
Using the dplyr package:
strings <- c("ca_A33","cb_A32","cc_A31","cd_A30")
replace_strings <- c("A1","A2","A3","A4")
df <- data.frame(strings, replace_strings)
df <- mutate(rowwise(df),
strings = gsub("_.*",
paste0("_", replace_strings),
strings)
)
df <- select(df, strings)
Output:
# A tibble: 4 x 1
strings
<chr>
1 ca_A1
2 cb_A2
3 cc_A3
4 cd_A4
yet another way:
mapply(function(x,y) gsub("(\\w\\w_).*",paste0("\\1",y),x),strings,replace_strings,USE.NAMES=FALSE)
# [1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"

change the sequence of numbers in a filename using R

I am sorry, I could not find an answer to this question anywhere and would really appreciate your help.
I have .csv files for each hour of a year. The filename is written in the following way:
hh_dd_mm.csv (e.g. for February 1st 00:00--> 00_01_02.csv). In order to make it easier to sort the hours of a year I would like to change the filename to mm_dd_hh.csv
How can I write in R to change the filename from the pattern HH_DD_MM to MM_DD_HH?
a <- list.files(path = ".", pattern = "HH_DD_MM")
b<-paste(pattern="MM_DD_HH")
file.rename(a,b)
Or you could do:
a <- c("00_01_02.csv", "00_02_02.csv")
gsub("(\\d{2})\\_(\\d{2})\\_(\\d{2})(.*)", "\\3_\\2_\\1\\4", a)
#[1] "02_01_00.csv" "02_02_00.csv"
Not sure if this is the best solution, but seem to work
a <- c("00_01_02.csv", "00_02_02.csv")
b <- unname(sapply(a, function(x) {temp <- strsplit(x, "(_|[.])")[[1]] ; paste0(temp[[3]], "_", temp[[2]], "_", temp[[1]], ".", temp[[4]])}))
b
## [1] "02_01_00.csv" "02_02_00.csv"
You can use chartr to create the new file name. Here's an example..
> write.csv(c(1,1), "12_34_56")
> list.files()
# [1] "12_34_56"
> file.rename("12_34_56", chartr("1256", "5612", "12_34_56"))
# [1] TRUE
> list.files()
# [1] "56_34_12"
In chartr, you can replace the elements of a string, so long as it doesn't change the number of characters in the original string. In the above code, I basically just swapped "12" with "56", which is what it looks like you are trying to do.
Or, you can write a short string swapping function
> strSwap <- function(x) paste(rev(strsplit(x, "[_]")[[1]]), collapse = "_")
> ( files <- c("84_15_45", "59_95_21", "31_51_49",
"51_88_27", "21_39_98", "35_27_14") )
# [1] "84_15_45" "59_95_21" "31_51_49" "51_88_27" "21_39_98" "35_27_14"
> sapply(files, strSwap, USE.NAMES = FALSE)
# [1] "45_15_84" "21_95_59" "49_51_31" "27_88_51" "98_39_21" "14_27_35"
You could also so it with the substr<- assignment function
> s1 <- substr(files,1,2)
> substr(files,1,2) <- substr(files,7,8)
> substr(files,7,8) <- s1
> files
# [1] "45_15_84" "21_95_59" "49_51_31" "27_88_51" "98_39_21" "14_27_35"

Resources