How to replace the certain character in certain position in the string? - r

I have a question that is how to replace a character which is in a certain place. For example:
str <- c("abcdccc","hijklccc","abcuioccc")
#I want to replace character "c" which is in position 3 to "X" how can I do that?
#I know the function gsub and substr, but the only idea I have got so far is
#use if() to make it. How can I do it quickly?
#ideal result
>str
"abXdccc" "hijklccc" "abXuioccc"

It's a bit awkward, but you can replace a single character dependent on that single character's value like:
ifelse(substr(str,3,3)=="c", `substr<-`(str,3,3,"X"), str)
#[1] "abXdccc" "hijklccc" "abXuioccc"
If you are happy to overwrite the value, you could do it a bit cleaner:
substr(str[substr(str,3,3)=="c"],3,3) <- "X"
str
#[1] "abXdccc" "hijklccc" "abXuioccc"

I wonder if you can use a regex lookahead here to get what you are after.
str <- c("abcdccc","hijklccc","abcuioccc")
gsub("(^.{2})(?=c)(.*$)", "\\1X\\2", str, perl = T)
Or using a positive lookbehind as suggested by thelatemail
sub("(?<=^.{2})c", "X", str, perl = TRUE)
What this is doing is looking to match the letter c which is after any two characters from the start of the string. The c is replaced with X.
(?<= is the start of positive lookbehind
^.{2} means any two characters from the start of the string
)c is the last part which says it has to be a c after the two characters
[1] "abXcdccc" "hijklccc" "abXcuioccc"
If you want to read up more about regex being used (link)
Additionally a generalised function:
switch_letter <- function(x, letter, position, replacement) {
stopifnot(position > 1)
pattern <- paste0("(?<=^.{", position - 1, "})", letter)
sub(pattern, replacement, x, perl = TRUE)
}
switch_letter(str, "c", 3, "X")

This should work too:
str <- c("abcdefg","hijklnm","abcuiowre")
a <- strsplit(str[1], "")[[1]]
a[3] <- "X"
a <- paste(a, collapse = '')
str[1] <- a

How about this idea:
c2Xon3 <- function(x){sprintf("%s%s%s",substring(x,1,3),gsub("c","X",substring(x,3,3)),substring(x,4,nchar(x)))}
str <- c("abcdccc","hijklccc","abcuioccc")
strNew <- sapply(str,c2Xon3 )

This should work
str <- c("abcdefg","hijklnm","abcuiowre")
for (i in 1:length(str))
{
if (substr(str[i],3,3)=='c') {
substr(str[i], 3, 3) <- "X"
}
}

You can just use ifelse with gsub, i.e.
ifelse(substr(str, 3, 3) == 'c', paste0(substring(str, 1, 2),'X', substring(str, 4)), str)
#[1] "abXdccc" "hijklccc" "abXuioccc"

Related

extract the shortest and first encounter match between two strings in R

I want the function to return the string that follows the below condition.
after "def"
in the parentheses right before the first %ile after "def"
So the desirable output is "4", not "5". So far, I was able to extract "2)(3)(4". If I change the function to str_extract_all, the output became "2)(3)(4" and "5" . I couldn't figure out how to fix this problem. Thanks!
x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile"
string.after.match <- str_match(string = x,
pattern = "(?<=def)(.*)")[1, 1]
parentheses.value <- str_extract(string.after.match, # get value in ()
"(?<=\\()(.*?)(?=\\)\\%ile)")
parentheses.value
Take the
Here is a one liner that will do the trick using gsub()
gsub(".*def.*(\\d+)\\)%ile.*%ile", "\\1", x, perl = TRUE)
Here's an approach that will work with any number of "%ile"s. Based on str_split()
x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile(9)%ile"
x %>%
str_split("def", simplify = TRUE) %>%
subset(TRUE, 2) %>%
str_split("%ile", simplify = TRUE) %>%
subset(TRUE, 1) %>%
str_replace(".*(\\d+)\\)$", "\\1")
sub(".*?def.*?(\\d)\\)%ile.*", "\\1", x)
[1] "4"
You can use
x <- "abc(0)(1)%ile, def(2)(3)(4)%ile(5)%ile"
library(stringr)
result <- str_match(x, "\\bdef(?:\\((\\d+)\\))+%ile")
result[,2]
See the R demo online and the regex demo.
Details:
\b - word boundary
def - a def string
(?:\((\d+)\))+ - zero or more occurrences of ( + one or more digits (captured into Group 1) + ) and the last one captured is saved in Group 1
%ile - an %ile string.

A way to strsplit and replace all of one character with several variations of alternate strings?

I am sure there is a simple solution and I am just getting too frustrated to work through it but here is the issue, simplified:
I have a string, ex: AB^AB^AB^^BAAA^^BABA^
I want to replace the ^s (so, 7 characters in the string), but iterate through many variants and be able to retain them all as strings
for example:
replacement 1: CCDCDCD to get: ABCABCABDCBAAADCBABAD
replacement 2: DDDCCCD to get: ABDABDABDCBAAACCBABAD
I imagine strsplit is the way, and I would like to do it in a for loop, any help would be appreciated!
The positions of the "^" can be found using gregexpr, see tmp
x <- "AB^AB^AB^^BAAA^^BABA^"
y <- c("CCDCDCD", "DDDCCCD")
tmp <- gregexpr(pattern = "^", text = x, fixed = TRUE)
You can then split the 'replacements' character by character using strsplit, this gives a list. Finally, iterate over that list and replace the "^" with the characters from your replacements one after the other.
sapply(strsplit(y, split = ""), function(i) {
`regmatches<-`("AB^AB^AB^^BAAA^^BABA^", m = tmp, value = i)
})
Result
# [1] "ABCABCABCCBAAACCBABAC" "ABDABDABDDBAAADDBABAD"
You don't really need a for loop. You can strplit your string and pattern, and then replace the "^" with the vector.
str <- unlist(strsplit(str, ""))
pat <- unlist(strsplit("CCDCDCD", ""))
str[str == "^"] <- pat
paste(str, collapse = "")
# [1] "ABCABCABDCBAAADCBABAD"
An option is also with gsubfn
f1 <- Vectorize(function(str1, str2) {
p <- proto(fun = function(this, x) substr(str2, count, count))
gsubfn::gsubfn("\\^", p, str1)
})
-testing
> unname(f1(x, y))
[1] "ABCABCABDCBAAADCBABAD" "ABDABDABDCBAAACCBABAD"
data
x <- "AB^AB^AB^^BAAA^^BABA^"
y <- c("CCDCDCD", "DDDCCCD")
Given x <- "AB^AB^AB^^BAAA^^BABA^" and y <- c("CCDCDCD", "DDDCCCD"), we can try utf8ToInt + intToUtf8 + replace like below
sapply(
y,
function(s) {
intToUtf8(
replace(
u <- utf8ToInt(x),
u == utf8ToInt("^"),
utf8ToInt(s)
)
)
}
)
which gives
CCDCDCD DDDCCCD
"ABCABCABDCBAAADCBABAD" "ABDABDABDCBAAACCBABAD"

How to mask a string based on a pattern of string of same length

I have the following set of string:
core_string <- "AFFVQTCRE"
mask_string <- "*KKKKKKKK"
What I want to do is to mask core_string with mask_string.
Whenever the * coincide with character in core_string, we will keep that character,
otherwise replace it.
So the desired result is:
AKKKKKKKK
Other example
core_string <- "AFFVQTCRE"
mask_string <- "*KKKK*KKK"
# result AKKKKTKKK
The length of both strings is always the same.
How can I do that with R?
Here's a helper function that will do just that
apply_mask <- function(x, mask) {
unlist(Map(function(z, m) {
m[m=="*"] <- z[m=="*"]
paste(m, collapse="")
}, strsplit(x, ""), strsplit(mask, "")))
}
basically you just split up the string into characters and replace the characters that have a "*" then paste the strings back together.
I used the Map to make sure the function is still vectorized over the inputs. For example
core_string <- c("AFFVQTCRE", "ABCDEFGHI")
mask_string <- "*KKKK*KKK"
apply_mask(core_string, mask_string)
# [1] "AKKKKTKKK" "AKKKKFKKK"
regmatches in replacement form <- can be handy here:
regmatches(core_string, gregexpr("K", mask_string)) <- "K"
core_string
#[1] "AKKKKKKKK"
If it's a 1:1 match of characters rather than a constant, then it has to be changed up a little:
ss <- strsplit(mask_string, "")[[1]]
regmatches(core_string, gregexpr("[^*]", mask_string)) <- ss[ss != "*"]

Replacing the nth number in a string

I have a set of files which I had named incorrectly. The file name is as follows.
Generation_Flux_0_Model_200.txt
Generation_Flux_101_Model_43.txt
Generation_Flux_11_Model_3.txt
I need to replace the second number (the model number) by adding 1 to the existing number. So the correct names would be
Generation_Flux_0_Model_201.txt
Generation_Flux_101_Model_44.txt
Generation_Flux_11_Model_4.txt
This is the code I wrote. I would like to know how to specify the position of the number (replace second number in the string with the new number)?
reNameModelNumber <- function(modelName){
#get the current model number
modelNumber = as.numeric(unlist(str_extract_all(modelName, "\\d+"))[2])
#increment it by 1
newModelNumber = modelNumber + 1
#building the new name with gsub
newModelName = gsub(" regex ", newModelNumber, modelName)
#rename
file.rename(modelName, newModelName)
}
reactionModels = list.files(pattern = "^Generation_Flux_\\d+_Model_\\d+.txt$")
sapply(reactionFiles, function(x) reNameModelNumber(x))
We can use gsubfn to incremement by 1. Capture the digits ((\\d+))
followed by a . and 'txt' at the end ($`) of the string, and replace it by adding 1 to it
library(gsubfn)
gsubfn("(\\d+)\\.txt$", ~ as.numeric(x) + 1, str1)
#[1] "Generation_Flux_0_Model_201" "Generation_Flux_101_Model_44"
#[3] "Generation_Flux_11_Model_4"
data
str1 <- c("Generation_Flux_0_Model_200.txt", "Generation_Flux_101_Model_43.txt",
"Generation_Flux_11_Model_3.txt")
Answering the question, if you want to increment a certain number inside a string, you may use
> library(gsubfn)
> nth = 2
> reactionFiles <- c("Generation_Flux_0_Model_200.txt", "Generation_Flux_101_Model_43.txt", "Generation_Flux_11_Model_3.txt")
> gsubfn(paste0("^((?:\\D*\\d+){", nth-1, "}\\D*)(\\d+)"), function(x,y,z) paste0(x, as.numeric(y) + 1), reactionFiles)
[1] "Generation_Flux_0_Model_201.txt" "Generation_Flux_101_Model_44.txt" "Generation_Flux_11_Model_4.txt"
nth here is the number of the digit chunk to increment.
Pattern details
^((?:\\D*\\d+){n}\\D*) - Capturing group 1 (the value is accessed in the gsubfn method via x):
(?:\\D*\\d+){n} - an n occurrences of
\\D* - 0 or more chars other than digits
\\d+ - 1+ digits
\\D* - 0+ non-digits
(\\d+) - Capturing group 2 (the value is accessed in the gsubfn method via y): one or more digits
Using base-R.
data <- c( # Just an example
"Generation_Flux_0_Model_200.txt",
"Generation_Flux_101_Model_43.txt",
"Generation_Flux_11_Model_3.txt"
)
fixNameModel <- function(data){
n <- length(data)
# get the current model number and increment it by 1
newn = as.integer(sub(".+_(\\d+)\\.txt", "\\1", data)) + 1L
#building the new name with gsub
newModelName <- vector(mode = "character", length = n)
for (i in 1:n) {
newModelName[i] <- gsub("\\d+\\.txt$", paste0(newn[i], ".txt"), data[i])
}
newModelName
}
fixNameModel(data)
[1] "Generation_Flux_0_Model_201.txt" "Generation_Flux_101_Model_44.txt"
[3] "Generation_Flux_11_Model_4.txt"
You can now do something like file.rename(modelName, fixNameModel(modelName))
EDIT:
Here is a bit neater version but makes stronger assumptions instead:
fixNameModel2 <- function(data) {
sapply(
strsplit(data, "_|\\."),
function(x) {
x[5] <- as.integer(x[5]) + 1L
x <- paste0(x, collapse = "_")
gsub("_txt", ".txt", x, fixed = TRUE)
}
)
}
Assuming that the digit always occurs before the extension, as is mentioned in the comments, here is another base R solution that is a little bit simpler.
sapply(regmatches(tmp, regexec("\\d+(?=\\.)", tmp, perl=TRUE), invert=NA),
function(x) paste0(c(x[1], as.integer(x[2]) + 1L, x[3]), collapse=""))
This returns
[1] "Generation_Flux_0_Model_201.txt" "Generation_Flux_101_Model_44.txt"
[3] "Generation_Flux_11_Model_4.txt"
regexec with the invert=NA a list of indices where each list element is the index matching the portions of the full with the matched element returned as the second indexed element. regmatches takes this information and returns a list of character vectors that breaks up the original string along the matches. Feed this list to sapply, convert the second element to integer and increment. Then paste the result to return an atomic vector.
The regex "\d+(?=\.)" uses a perl look behind, "(?=\.)", looking for the dot without capturing it, but capturing the digits with "\d+".
data
tmp <- c("Generation_Flux_0_Model_200.txt", "Generation_Flux_101_Model_43.txt",
"Generation_Flux_11_Model_3.txt")

How to capitalize all but some letters in R

I have a dataframe in R with a column of strings, e.g. v1 <- c('JaStADmmnIsynDK', 'laUksnDTusainS')
My goal is to capitalize all letters in each string except 's', 't' and 'y'.
So the result should end up being: 'JAStADMMNIsyNDK' and 'LAUKsNDTUsAINS'.
Thus not changing any of the said letters: 's', 't' and 'y'.
As of now I do it by simply having 25x
levels(df$strings) <- sub('n', 'N', levels(df$strings))
But that seems to be overkill! How can I do this easily in R?
Try
v2 <- gsub("[sty]", "", paste(letters, collapse=""))
chartr(v2, toupper(v2), v1)
#[1] "JAStADMMNIsyNDK" "LAUKsNDTUsAINS"
data
v1 <- c("JaStADmmnIsynDK", "laUksnDTusainS")
The answer posted by #akrun is indeed brilliant. But here is my more direct brute force approach which I finished too late.
s <- "JaStADmmnIsynDK"
customUpperCase <- function(s,ignore = c("s","t","y")) {
u <- sapply(unlist(strsplit(s,split = "")),
function(x) if(!(x %in% ignore)) toupper(x) else x )
paste(u,collapse = "")
}
customUpperCase(s)
#[1] "JAStADMMNIsyNDK"
We can directly gsub() an uppercase replacement on each applicable lowercase letter, using the perl '\U' operator on the '\1' capture group (which #Akrun reminded of):
v1 <- c("JaStADmmnIsynDK", "laUksnDTusainS")
gsub('([a-ru-xz])', '\\U\\1'), v1, perl = TRUE)
"JAStADMMNIsyNDK" "LAUKsNDTUsAINS"

Resources