How to add leading zeros in a dataframe [duplicate] - r

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 2 years ago.
I'm trying to change the format of my data. I have a centre-number which is going from 1-15 and a participant-number which is going from 1-~3000
I would like them to start with zeros, so that the centre-number will have two digits and the participant-number will have 4 digits. (For example participant number 1 would then be 0001).
Thank you!

You can use the str_pad function in the 'stringr' package.
library(stringr)
values <- c(1, 5, 23, 123, 43, 7)
str_pad(values, 3, pad='0')
Output:
[1] "001" "005" "023" "123" "043" "007"
In your case as you have two parts to your strings, you can apply the function like this to pad your strings correctly.
# dummy data
centre_participants <- c('1-347', '13-567', '9-7', '15-2507')
# split the strings on "-"
centre_participants <- strsplit(centre_participants, '-')
# apply the right string padding to each component and join together
centre_participants <- sapply(centre_participants, function(x)
paste0(str_pad(x[1], 2, pad='0'),'-',str_pad(x[2], 4, pad='0')))
Output:
[1] "01-0347" "13-0567" "09-0007" "15-2507"

Related

Padding lost zeros not universally in a column [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 1 year ago.
I have a list of US postal zip codes of 5 digits, but some lost their leading zeros. How do I add those zeros back in, while keeping others without the leading 0s intact? I tried formatC, springf, str_pad, and none of them worked, because I am not adding 0s to all values.
We can use sprintf
sprintf('%05d', as.integer(zipcodes))
In which way did str_pad not work?
https://www.rdocumentation.org/packages/stringr/versions/1.4.0/topics/str_pad
df<-data.frame(zip=c(1,22,333,4444,55555))
df$zip <- stringr::str_pad(df$zip, width=5, pad = "0")
[1] "00001" "00022" "00333" "04444" "55555"
Update:
As of the valuable comment of r2evans:
My solution is not very efficient and to get leading 0 we have to modify the paste0 part slightly see here with a dataframe example:
sapply(df$zip, function(x){if(nchar(x)<5){paste0(0,x)}else{x}})
data:
df <- tribble(
~zip,
7889,
2345,
45567,
4394,
34566,
4392,
4599)
df
Output:
[1] "07889" "02345" "45567" "04394" "34566" "04392" "04599"
Fist answer:
This will add a trailing zero to each integer < 5 digits
Where zip is a vector:
sapply(zip, function(x){if(nchar(x)<5){paste0(x,0)}else{x}})
If they start as strings and you don't want to (or cannot) convert to integers first, then an alternative to sprintf is
vec <- c('1','11','11111')
paste0(strrep('0', pmax(0, 5 - nchar(vec))), vec)
# [1] "00001" "00011" "11111"
This will handle strings of any length, and is a no-op for strings of 5 or greater characters.
In a frame, that would be
dat$colname <- paste0(strrep('0', pmax(0, 5 - nchar(dat$colname))), dat$colname)

I want to combine a concatenation of two lists [duplicate]

This question already has answers here:
Creating a sequential list of numbers and letters with R
(2 answers)
Closed 2 years ago.
I need to create a Vector combining the numbers c(1:10) and the Terms c("-KM","-COX"), so that it would turn out like this:
c("1-KM", "1-COX", "2-KM", "2-COX", "3-KM", "3-COX", ...)
I have tried using expand.grid to do that, however it returns a data frame, and I would need it to return a vector. Any help in how I could do that?
Try this version:
apply(expand.grid(v1, v2), 1, function(x) trimws(paste0(x[1], x[2])))
[1] "1-KM" "2-KM" "3-KM" "4-KM" "5-KM" "6-KM" "7-KM" "8-KM"
[9] "9-KM" "10-KM" "1-COX" "2-COX" "3-COX" "4-COX" "5-COX" "6-COX"
[17] "7-COX" "8-COX" "9-COX" "10-COX"
Data:
v1 <- c(1:10)
v2 <- c("-KM", "-COX")
After expand.grid you can use paste to get a vector from the returned data.frame.
do.call(paste0, expand.grid(1:10, c("-KM","-COX")))
# [1] "1-KM" "2-KM" "3-KM" "4-KM" "5-KM" "6-KM" "7-KM" "8-KM"
# [9] "9-KM" "10-KM" "1-COX" "2-COX" "3-COX" "4-COX" "5-COX" "6-COX"
#[17] "7-COX" "8-COX" "9-COX" "10-COX"

change numbers in string vector [duplicate]

This question already has answers here:
R: gsub, pattern = vector and replacement = vector
(6 answers)
Closed 3 years ago.
I have a string Vector including numbers like this:
x <- c("abc122", "73dj", "lo7833ll")
x
[1] "abc122" "73dj" "lo7833ll"
I want to Change the numbers of the x Vector and replace them with numbers I have stored in another Vector:
right_numbers <- c(500, 700, 23)
> right_numbers
[1] 500 700 23
How can I do this even if the numbers are in different positions in the string(some are at the beginning, some at the end..)?
This is how the x Vector should look like after the changes:
> x
[1] "abc500" "700dj" "lo23ll"
A vectorized solution with stringr -
str_replace(x, "[0-9]+", as.character(right_numbers))
[1] "abc500" "700dj" "lo23ll"
Possibly a more efficient version with stringi package, thanks to #sindri_baldur -
stri_replace_first_regex(x, '[0-9]+', right_numbers)
[1] "abc500" "700dj" "lo23ll"
Here is an idea,
mapply(function(i, y)gsub('[0-9]+', y, i), x, right_numbers)
# abc122 73dj lo7833ll
#"abc500" "700dj" "lo23ll"

slicing and replacing a number in R

I have a number like this example:
fferc114
and would like to:
1- remove the first 3 elements
2- keep the 2nd 3 elements
the expected output would look like this:
expected output:
dfer**
I am trying to that in R using the following code but it does not return what I want. do you know how to fix it?
trying to that
You can try:
x <- "E431250000326"
paste0(substr(x, 4, 6), "-", substr(x, 11, nchar(x)))
[1] "125-326"
Or if you want to subtract the numbers:
as.numeric(substr(x, 4, 6)) - as.numeric(substr(x, 11, nchar(x)))
A regex approach
string <- "E431250000326"
sub(".{3}(.{3})(.{4})(.{3})", "\\1-\\3", string)
#[1] "125-326"
As described in the question this removes first 3 elements, selects (using capture group) next 3 elements, replaces next 4 elements with "-" and selects next 3 elements.
We can specifically match a digit to capture the group
sub(".{3}(\\d{3})\\d{4}(\\d{3})", "\\1-\\2", string)
#[1] "125-326"
data
string <- "E431250000326"

Count string length and remove characters if a certain length [duplicate]

There are functions in Excel called left, right, and mid, where you can extract part of the entry from a cell. For example, =left(A1, 3), would return the 3 left most characters in cell A1, and =mid(A1, 3, 4) would start with the the third character in cell A1 and give you characters number 3 - 6. Are there similar functions in R or similarly straightforward ways to do this?
As a simplified sample problem I would like to take a vector
sample<-c("TRIBAL","TRISTO", "RHOSTO", "EUGFRI", "BYRRAT")
and create 3 new vectors that contain the first 3 characters in each entry, the middle 2 characters in each entry, and the last 4 characters in each entry.
A slightly more complicated question that Excel doesn't have a function for (that I know of) would be how to create a new vector with the 1st, 3rd, and 5th characters from each entry.
You are looking for the function substr or its close relative substring:
The leading characters are straight-forward:
substr(sample, 1, 3)
[1] "TRI" "TRI" "RHO" "EUG" "BYR"
So is extracting some characters at a defined position:
substr(sample, 2, 3)
[1] "RI" "RI" "HO" "UG" "YR"
To get the trailing characters, you have two options:
substr(sample, nchar(sample)-3, nchar(sample))
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
substring(sample, nchar(sample)-3)
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
And your final "complicated" question:
characters <- function(x, pos){
sapply(x, function(x)
paste(sapply(pos, function(i)substr(x, i, i)), collapse=""))
}
characters(sample, c(1,3,5))
TRIBAL TRISTO RHOSTO EUGFRI BYRRAT
"TIA" "TIT" "ROT" "EGR" "BRA"

Resources