change numbers in string vector [duplicate] - r

This question already has answers here:
R: gsub, pattern = vector and replacement = vector
(6 answers)
Closed 3 years ago.
I have a string Vector including numbers like this:
x <- c("abc122", "73dj", "lo7833ll")
x
[1] "abc122" "73dj" "lo7833ll"
I want to Change the numbers of the x Vector and replace them with numbers I have stored in another Vector:
right_numbers <- c(500, 700, 23)
> right_numbers
[1] 500 700 23
How can I do this even if the numbers are in different positions in the string(some are at the beginning, some at the end..)?
This is how the x Vector should look like after the changes:
> x
[1] "abc500" "700dj" "lo23ll"

A vectorized solution with stringr -
str_replace(x, "[0-9]+", as.character(right_numbers))
[1] "abc500" "700dj" "lo23ll"
Possibly a more efficient version with stringi package, thanks to #sindri_baldur -
stri_replace_first_regex(x, '[0-9]+', right_numbers)
[1] "abc500" "700dj" "lo23ll"

Here is an idea,
mapply(function(i, y)gsub('[0-9]+', y, i), x, right_numbers)
# abc122 73dj lo7833ll
#"abc500" "700dj" "lo23ll"

Related

How to get the most frequent character within a character string? [duplicate]

This question already has answers here:
Finding the most repeated character in a string in R
(2 answers)
Closed 1 year ago.
Suppose the next character string:
test_string <- "A A B B C C C H I"
Is there any way to extract the most frequent value within test_string?
Something like:
extract_most_frequent_character(test_string)
Output:
#C
We can use scan to read the string as a vector of individual elements by splitting at the space, get the frequency count with table, return the named index that have the max count (which.count), get its name
extract_most_frequent_character <- function(x) {
names(which.max(table(scan(text = x, what = '', quiet = TRUE))))
}
-testing
extract_most_frequent_character(test_string)
[1] "C"
Or with strsplit
extract_most_frequent_character <- function(x) {
names(which.max(table(unlist(strsplit(x, "\\s+")))))
}
Here is another base R option (not as elegant as #akrun's answer)
> intToUtf8(names(which.max(table(utf8ToInt(gsub("\\s", "", test_string))))))
[1] "C"
One possibility involving stringr could be:
names(which.max(table(str_extract_all(test_string, "[A-Z]", simplify = TRUE))))
[1] "C"
Or marginally shorter:
names(which.max(table(str_extract_all(test_string, "[A-Z]")[[1]])))
Here is solution using stringr package, table and which:
library(stringr)
test_string <- str_split(test_string, " ")
test_string <- table(test_string)
names(test_string)[which.max(test_string)]
[1] "C"

Padding lost zeros not universally in a column [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 1 year ago.
I have a list of US postal zip codes of 5 digits, but some lost their leading zeros. How do I add those zeros back in, while keeping others without the leading 0s intact? I tried formatC, springf, str_pad, and none of them worked, because I am not adding 0s to all values.
We can use sprintf
sprintf('%05d', as.integer(zipcodes))
In which way did str_pad not work?
https://www.rdocumentation.org/packages/stringr/versions/1.4.0/topics/str_pad
df<-data.frame(zip=c(1,22,333,4444,55555))
df$zip <- stringr::str_pad(df$zip, width=5, pad = "0")
[1] "00001" "00022" "00333" "04444" "55555"
Update:
As of the valuable comment of r2evans:
My solution is not very efficient and to get leading 0 we have to modify the paste0 part slightly see here with a dataframe example:
sapply(df$zip, function(x){if(nchar(x)<5){paste0(0,x)}else{x}})
data:
df <- tribble(
~zip,
7889,
2345,
45567,
4394,
34566,
4392,
4599)
df
Output:
[1] "07889" "02345" "45567" "04394" "34566" "04392" "04599"
Fist answer:
This will add a trailing zero to each integer < 5 digits
Where zip is a vector:
sapply(zip, function(x){if(nchar(x)<5){paste0(x,0)}else{x}})
If they start as strings and you don't want to (or cannot) convert to integers first, then an alternative to sprintf is
vec <- c('1','11','11111')
paste0(strrep('0', pmax(0, 5 - nchar(vec))), vec)
# [1] "00001" "00011" "11111"
This will handle strings of any length, and is a no-op for strings of 5 or greater characters.
In a frame, that would be
dat$colname <- paste0(strrep('0', pmax(0, 5 - nchar(dat$colname))), dat$colname)

I want to combine a concatenation of two lists [duplicate]

This question already has answers here:
Creating a sequential list of numbers and letters with R
(2 answers)
Closed 2 years ago.
I need to create a Vector combining the numbers c(1:10) and the Terms c("-KM","-COX"), so that it would turn out like this:
c("1-KM", "1-COX", "2-KM", "2-COX", "3-KM", "3-COX", ...)
I have tried using expand.grid to do that, however it returns a data frame, and I would need it to return a vector. Any help in how I could do that?
Try this version:
apply(expand.grid(v1, v2), 1, function(x) trimws(paste0(x[1], x[2])))
[1] "1-KM" "2-KM" "3-KM" "4-KM" "5-KM" "6-KM" "7-KM" "8-KM"
[9] "9-KM" "10-KM" "1-COX" "2-COX" "3-COX" "4-COX" "5-COX" "6-COX"
[17] "7-COX" "8-COX" "9-COX" "10-COX"
Data:
v1 <- c(1:10)
v2 <- c("-KM", "-COX")
After expand.grid you can use paste to get a vector from the returned data.frame.
do.call(paste0, expand.grid(1:10, c("-KM","-COX")))
# [1] "1-KM" "2-KM" "3-KM" "4-KM" "5-KM" "6-KM" "7-KM" "8-KM"
# [9] "9-KM" "10-KM" "1-COX" "2-COX" "3-COX" "4-COX" "5-COX" "6-COX"
#[17] "7-COX" "8-COX" "9-COX" "10-COX"

How to add leading zeros in a dataframe [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 2 years ago.
I'm trying to change the format of my data. I have a centre-number which is going from 1-15 and a participant-number which is going from 1-~3000
I would like them to start with zeros, so that the centre-number will have two digits and the participant-number will have 4 digits. (For example participant number 1 would then be 0001).
Thank you!
You can use the str_pad function in the 'stringr' package.
library(stringr)
values <- c(1, 5, 23, 123, 43, 7)
str_pad(values, 3, pad='0')
Output:
[1] "001" "005" "023" "123" "043" "007"
In your case as you have two parts to your strings, you can apply the function like this to pad your strings correctly.
# dummy data
centre_participants <- c('1-347', '13-567', '9-7', '15-2507')
# split the strings on "-"
centre_participants <- strsplit(centre_participants, '-')
# apply the right string padding to each component and join together
centre_participants <- sapply(centre_participants, function(x)
paste0(str_pad(x[1], 2, pad='0'),'-',str_pad(x[2], 4, pad='0')))
Output:
[1] "01-0347" "13-0567" "09-0007" "15-2507"

Finding number of r's in the vector (Both R and r) before the first u

rquote <- "R's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
in the above code we need to find the number of r's(R and r) in rquote
You could use substrings.
## find position of first 'u'
u1 <- regexpr("u", rquote, fixed = TRUE)
## get count of all 'r' or 'R' before 'u1'
lengths(gregexpr("r", substr(rquote, 1, u1), ignore.case = TRUE))
# [1] 5
This follows what you ask for in the title of the post. If you want the count of all the "r", case insensitive, then simplify the above to
lengths(gregexpr("r", rquote, ignore.case = TRUE))
# [1] 6
Then there's always stringi
library(stringi)
## count before first 'u'
stri_count_regex(stri_sub(rquote, 1, stri_locate_first_regex(rquote, "u")[,1]), "r|R")
# [1] 5
## count all R or r
stri_count_regex(rquote, "r|R")
# [1] 6
To get the number of R's before the first u, you need to make an intermediate step. (You probably don't need to. I'm sure akrun knows some incredibly cool regular expression to get the job done, but it won't be as easy to understand as this).
rquote <- "R's internals are irrefutably intriguing"
before_u <- gsub("u[[:print:]]+$", "", rquote)
length(stringr::str_extract_all(before_u, "(R|r)")[[1]])
You may try this,
> length(str_extract_all(rquote, '[Rr]')[[1]])
[1] 6
To get the count of all r's before the first u
> length(str_extract_all(rquote, perl('u.*(*SKIP)(*F)|[Rr]'))[[1]])
[1] 5
EDIT: Just saw before the first u. In that case, we can get the position of the first 'u' from either which or match.
Then use grepl in the 'chars' up to the position (ind) to find the logical index of 'R' with ignore.case=TRUE and use sum using the strsplit output from the OP's code.
ind <- which(chars=='u')[1]
Or
ind <- match('u', chars)
sum(grepl('r', chars[seq(ind)], ignore.case=TRUE))
#[1] 5
Or we can use two gsubs on the original string ('rquote'). First one removes the characters starting with u until the end of the string (u.$) and the second matches all characters except R, r ([^Rr]) and replace it with ''. We can use nchar to get count of the characters remaining.
nchar(gsub('[^Rr]', '', sub('u.*$', '', rquote)))
#[1] 5
Or if we want to count the 'r' in the entire string, gregexpr to get the position of matching characters from the original string ('rquote') and get the length
length(gregexpr('[rR]', rquote)[[1]])
#[1] 6

Resources