I have a number like this example:
fferc114
and would like to:
1- remove the first 3 elements
2- keep the 2nd 3 elements
the expected output would look like this:
expected output:
dfer**
I am trying to that in R using the following code but it does not return what I want. do you know how to fix it?
trying to that
You can try:
x <- "E431250000326"
paste0(substr(x, 4, 6), "-", substr(x, 11, nchar(x)))
[1] "125-326"
Or if you want to subtract the numbers:
as.numeric(substr(x, 4, 6)) - as.numeric(substr(x, 11, nchar(x)))
A regex approach
string <- "E431250000326"
sub(".{3}(.{3})(.{4})(.{3})", "\\1-\\3", string)
#[1] "125-326"
As described in the question this removes first 3 elements, selects (using capture group) next 3 elements, replaces next 4 elements with "-" and selects next 3 elements.
We can specifically match a digit to capture the group
sub(".{3}(\\d{3})\\d{4}(\\d{3})", "\\1-\\2", string)
#[1] "125-326"
data
string <- "E431250000326"
Related
This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 2 years ago.
I'm trying to change the format of my data. I have a centre-number which is going from 1-15 and a participant-number which is going from 1-~3000
I would like them to start with zeros, so that the centre-number will have two digits and the participant-number will have 4 digits. (For example participant number 1 would then be 0001).
Thank you!
You can use the str_pad function in the 'stringr' package.
library(stringr)
values <- c(1, 5, 23, 123, 43, 7)
str_pad(values, 3, pad='0')
Output:
[1] "001" "005" "023" "123" "043" "007"
In your case as you have two parts to your strings, you can apply the function like this to pad your strings correctly.
# dummy data
centre_participants <- c('1-347', '13-567', '9-7', '15-2507')
# split the strings on "-"
centre_participants <- strsplit(centre_participants, '-')
# apply the right string padding to each component and join together
centre_participants <- sapply(centre_participants, function(x)
paste0(str_pad(x[1], 2, pad='0'),'-',str_pad(x[2], 4, pad='0')))
Output:
[1] "01-0347" "13-0567" "09-0007" "15-2507"
I would like to sort the elements (string) of a vector alphabetically, but only considering characters after the nth. The strings can contain both numbers and characters, for example:
> v <- c("ENCSR529JNJ_HNR35NPK_21_K562", "ENCSR529MBZ_AP22IG_11_K562", "ENCSR529MBZ_AP22IG_21_K562", "ENCSR530BOP_DUPT6H_11_K562", "ENCSR530BOP_DUPT6H_21_K562")
and after sorting after the 11th character, v would become:
"ENCSR529MBZ_AP22IG_11_K562", "ENCSR529MBZ_AP22IG_21_K562", "ENCSR530BOP_DUPT6H_11_K562", "ENCSR530BOP_DUPT6H_21_K562", "ENCSR529JNJ_HNR35NPK_21_K562"
Any help will be greatly appreciated! Thanks
v[order(substr(v, start = 12, stop = max(nchar(v))))]
# [1] "ENCSR529MBZ_AP22IG_11_K562" "ENCSR529MBZ_AP22IG_21_K562" "ENCSR530BOP_DUPT6H_11_K562" "ENCSR530BOP_DUPT6H_21_K562"
# [5] "ENCSR529JNJ_HNR35NPK_21_K562"
substr(v, start = 12, stop = max(nchar(v))) gives the substring omitting the first 11 characters. So we order by that.
There are functions in Excel called left, right, and mid, where you can extract part of the entry from a cell. For example, =left(A1, 3), would return the 3 left most characters in cell A1, and =mid(A1, 3, 4) would start with the the third character in cell A1 and give you characters number 3 - 6. Are there similar functions in R or similarly straightforward ways to do this?
As a simplified sample problem I would like to take a vector
sample<-c("TRIBAL","TRISTO", "RHOSTO", "EUGFRI", "BYRRAT")
and create 3 new vectors that contain the first 3 characters in each entry, the middle 2 characters in each entry, and the last 4 characters in each entry.
A slightly more complicated question that Excel doesn't have a function for (that I know of) would be how to create a new vector with the 1st, 3rd, and 5th characters from each entry.
You are looking for the function substr or its close relative substring:
The leading characters are straight-forward:
substr(sample, 1, 3)
[1] "TRI" "TRI" "RHO" "EUG" "BYR"
So is extracting some characters at a defined position:
substr(sample, 2, 3)
[1] "RI" "RI" "HO" "UG" "YR"
To get the trailing characters, you have two options:
substr(sample, nchar(sample)-3, nchar(sample))
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
substring(sample, nchar(sample)-3)
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
And your final "complicated" question:
characters <- function(x, pos){
sapply(x, function(x)
paste(sapply(pos, function(i)substr(x, i, i)), collapse=""))
}
characters(sample, c(1,3,5))
TRIBAL TRISTO RHOSTO EUGFRI BYRRAT
"TIA" "TIT" "ROT" "EGR" "BRA"
Given a certain string, e.g., s = "tesX123", how can I replace a certain character at a certain location?
In this example, the character at position 4 should be changed to "t".
Does a method exist in the style of setChar(s, 4, "t") which would result in test123?
Try substr()
substr(s, 4, 4) <- "t"
> s
#[1] "test123"
We can use sub
sub("(.{3}).", "\\1t", s)
#[1] "test123"
There are functions in Excel called left, right, and mid, where you can extract part of the entry from a cell. For example, =left(A1, 3), would return the 3 left most characters in cell A1, and =mid(A1, 3, 4) would start with the the third character in cell A1 and give you characters number 3 - 6. Are there similar functions in R or similarly straightforward ways to do this?
As a simplified sample problem I would like to take a vector
sample<-c("TRIBAL","TRISTO", "RHOSTO", "EUGFRI", "BYRRAT")
and create 3 new vectors that contain the first 3 characters in each entry, the middle 2 characters in each entry, and the last 4 characters in each entry.
A slightly more complicated question that Excel doesn't have a function for (that I know of) would be how to create a new vector with the 1st, 3rd, and 5th characters from each entry.
You are looking for the function substr or its close relative substring:
The leading characters are straight-forward:
substr(sample, 1, 3)
[1] "TRI" "TRI" "RHO" "EUG" "BYR"
So is extracting some characters at a defined position:
substr(sample, 2, 3)
[1] "RI" "RI" "HO" "UG" "YR"
To get the trailing characters, you have two options:
substr(sample, nchar(sample)-3, nchar(sample))
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
substring(sample, nchar(sample)-3)
[1] "IBAL" "ISTO" "OSTO" "GFRI" "RRAT"
And your final "complicated" question:
characters <- function(x, pos){
sapply(x, function(x)
paste(sapply(pos, function(i)substr(x, i, i)), collapse=""))
}
characters(sample, c(1,3,5))
TRIBAL TRISTO RHOSTO EUGFRI BYRRAT
"TIA" "TIT" "ROT" "EGR" "BRA"