slicing and replacing a number in R - r

I have a number like this example:
fferc114
and would like to:
1- remove the first 3 elements
2- keep the 2nd 3 elements
the expected output would look like this:
expected output:
dfer**
I am trying to that in R using the following code but it does not return what I want. do you know how to fix it?
trying to that

You can try:
x <- "E431250000326"
paste0(substr(x, 4, 6), "-", substr(x, 11, nchar(x)))
[1] "125-326"
Or if you want to subtract the numbers:
as.numeric(substr(x, 4, 6)) - as.numeric(substr(x, 11, nchar(x)))

A regex approach
string <- "E431250000326"
sub(".{3}(.{3})(.{4})(.{3})", "\\1-\\3", string)
#[1] "125-326"
As described in the question this removes first 3 elements, selects (using capture group) next 3 elements, replaces next 4 elements with "-" and selects next 3 elements.

We can specifically match a digit to capture the group
sub(".{3}(\\d{3})\\d{4}(\\d{3})", "\\1-\\2", string)
#[1] "125-326"
data
string <- "E431250000326"

Related

How to add leading zeros in a dataframe [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 2 years ago.
I'm trying to change the format of my data. I have a centre-number which is going from 1-15 and a participant-number which is going from 1-~3000
I would like them to start with zeros, so that the centre-number will have two digits and the participant-number will have 4 digits. (For example participant number 1 would then be 0001).
Thank you!
You can use the str_pad function in the 'stringr' package.
library(stringr)
values <- c(1, 5, 23, 123, 43, 7)
str_pad(values, 3, pad='0')
Output:
[1] "001" "005" "023" "123" "043" "007"
In your case as you have two parts to your strings, you can apply the function like this to pad your strings correctly.
# dummy data
centre_participants <- c('1-347', '13-567', '9-7', '15-2507')
# split the strings on "-"
centre_participants <- strsplit(centre_participants, '-')
# apply the right string padding to each component and join together
centre_participants <- sapply(centre_participants, function(x)
paste0(str_pad(x[1], 2, pad='0'),'-',str_pad(x[2], 4, pad='0')))
Output:
[1] "01-0347" "13-0567" "09-0007" "15-2507"

R: Sorting a vector alphabetically after nth character

I would like to sort the elements (string) of a vector alphabetically, but only considering characters after the nth. The strings can contain both numbers and characters, for example:
> v <- c("ENCSR529JNJ_HNR35NPK_21_K562", "ENCSR529MBZ_AP22IG_11_K562", "ENCSR529MBZ_AP22IG_21_K562", "ENCSR530BOP_DUPT6H_11_K562", "ENCSR530BOP_DUPT6H_21_K562")
and after sorting after the 11th character, v would become:
"ENCSR529MBZ_AP22IG_11_K562", "ENCSR529MBZ_AP22IG_21_K562", "ENCSR530BOP_DUPT6H_11_K562", "ENCSR530BOP_DUPT6H_21_K562", "ENCSR529JNJ_HNR35NPK_21_K562"
Any help will be greatly appreciated! Thanks
v[order(substr(v, start = 12, stop = max(nchar(v))))]
# [1] "ENCSR529MBZ_AP22IG_11_K562" "ENCSR529MBZ_AP22IG_21_K562" "ENCSR530BOP_DUPT6H_11_K562" "ENCSR530BOP_DUPT6H_21_K562"
# [5] "ENCSR529JNJ_HNR35NPK_21_K562"
substr(v, start = 12, stop = max(nchar(v))) gives the substring omitting the first 11 characters. So we order by that.

Count string length and remove characters if a certain length [duplicate]

There are functions in Excel called left, right, and mid, where you can extract part of the entry from a cell. For example, =left(A1, 3), would return the 3 left most characters in cell A1, and =mid(A1, 3, 4) would start with the the third character in cell A1 and give you characters number 3 - 6. Are there similar functions in R or similarly straightforward ways to do this?
As a simplified sample problem I would like to take a vector
sample<-c("TRIBAL","TRISTO", "RHOSTO", "EUGFRI", "BYRRAT")
and create 3 new vectors that contain the first 3 characters in each entry, the middle 2 characters in each entry, and the last 4 characters in each entry.
A slightly more complicated question that Excel doesn't have a function for (that I know of) would be how to create a new vector with the 1st, 3rd, and 5th characters from each entry.
You are looking for the function substr or its close relative substring:
The leading characters are straight-forward:
substr(sample, 1, 3)
[1] "TRI" "TRI" "RHO" "EUG" "BYR"
So is extracting some characters at a defined position:
substr(sample, 2, 3)
[1] "RI" "RI" "HO" "UG" "YR"
To get the trailing characters, you have two options:
substr(sample, nchar(sample)-3, nchar(sample))
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
substring(sample, nchar(sample)-3)
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
And your final "complicated" question:
characters <- function(x, pos){
sapply(x, function(x)
paste(sapply(pos, function(i)substr(x, i, i)), collapse=""))
}
characters(sample, c(1,3,5))
TRIBAL TRISTO RHOSTO EUGFRI BYRRAT
"TIA" "TIT" "ROT" "EGR" "BRA"

Replace character at certain location within string

Given a certain string, e.g., s = "tesX123", how can I replace a certain character at a certain location?
In this example, the character at position 4 should be changed to "t".
Does a method exist in the style of setChar(s, 4, "t") which would result in test123?
Try substr()
substr(s, 4, 4) <- "t"
> s
#[1] "test123"
We can use sub
sub("(.{3}).", "\\1t", s)
#[1] "test123"

Extracting characters from entries in a vector in R

There are functions in Excel called left, right, and mid, where you can extract part of the entry from a cell. For example, =left(A1, 3), would return the 3 left most characters in cell A1, and =mid(A1, 3, 4) would start with the the third character in cell A1 and give you characters number 3 - 6. Are there similar functions in R or similarly straightforward ways to do this?
As a simplified sample problem I would like to take a vector
sample<-c("TRIBAL","TRISTO", "RHOSTO", "EUGFRI", "BYRRAT")
and create 3 new vectors that contain the first 3 characters in each entry, the middle 2 characters in each entry, and the last 4 characters in each entry.
A slightly more complicated question that Excel doesn't have a function for (that I know of) would be how to create a new vector with the 1st, 3rd, and 5th characters from each entry.
You are looking for the function substr or its close relative substring:
The leading characters are straight-forward:
substr(sample, 1, 3)
[1] "TRI" "TRI" "RHO" "EUG" "BYR"
So is extracting some characters at a defined position:
substr(sample, 2, 3)
[1] "RI" "RI" "HO" "UG" "YR"
To get the trailing characters, you have two options:
substr(sample, nchar(sample)-3, nchar(sample))
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
substring(sample, nchar(sample)-3)
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
And your final "complicated" question:
characters <- function(x, pos){
sapply(x, function(x)
paste(sapply(pos, function(i)substr(x, i, i)), collapse=""))
}
characters(sample, c(1,3,5))
TRIBAL TRISTO RHOSTO EUGFRI BYRRAT
"TIA" "TIT" "ROT" "EGR" "BRA"

Resources