Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a set of strings in R. In the form of: "X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt". I want to retrieve the set of strings containing just the first occurrence of a string. It depends on the 4th field. In this set for e.g. I have multiple string with X-Y-Z.3000....." I want only the first one having id = 3000, the same for the others.
For reproducibility:
X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
X-Y-Z.3000.F.PP0016-C.A-SL-0433.F-N.fC-G.txt
X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
X-Y-Z.3008.F.PP0016-C.B-SX-0433.P-N.fC-G.txt
So at the end I would only the first anche 3th string
X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
Extract "4th field" which is 2nd field if we split on ".", then exclude duplicated items:
# data
x <- c("X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt",
"X-Y-Z.3000.F.PP0016-C.A-SL-0433.F-N.fC-G.txt",
"X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt",
"X-Y-Z.3008.F.PP0016-C.B-SX-0433.P-N.fC-G.txt")
x[ !duplicated(sapply(strsplit(x, ".", fixed = "TRUE"), "[", 2)) ]
# [1] "X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt"
# [2] "X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt"
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have vector myvec. I would like to edit the values in the vector so that anything that doesn't begin with "NAC", I want to delete them in addition to stuffs after "_".
myvec = c("NAC1001_09ADAA", "TI09AA_NAC02111", "NACT10099_099AD")
Result I want:
NAC1001, NAC02111, NACT10099
What do I need to do for this?
We can use str_extract
library(stringr)
str_extract(myvec, '(?<=\\b|_)NACT?\\d+')
#[1] "NAC1001" "NAC02111" "NACT10099"
Or with sub from base R
sub(".*(NACT?\\d+).*", "\\1", myvec)
Split on underscore "_", then keep the one that starts with "N":
sapply(strsplit(myvec, "_"), function(i) i[ startsWith(i, "N") ])
# [1] "NAC1001" "NAC02111" "NACT10099"
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Using the first.df data frame, separate the DoB column data into 3 new columns - date, month,year by using the separate() function.I tried last line but it is not giving desired result.
fname <- c("Martina", "Monica", "Stan", "Oscar")
lname <- c("Welch", "Sobers", "Griffith", "Williams")
DoB <- c("1-Oct-1980", "2-Nov-1982", "13-Dec-1979", "27-Jan-1988")
first.df <- data.frame(fname,lname,DoB)
print(first.df)
separate(first.df,DoB,c('date','month','year'),sep = '-')
Moved my comment to an actual answer.
To retain the date column you need to add the remove = FALSE parameter, and to discard one of the separated columns simply add NA instead of a column name. The correct command is then
separate(first.df,DoB,c(NA,'month','year'),sep = '-', remove=FALSE)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have this string, for instance:
str1 = "UNCID_999277.TCGA-CV-7254-01A-11R-2016-07.111118_UNC11-SN627_0167_AD09WDACXX_TAGCTT.txt"
I would like to extract this substring, for instance:
TCGA-CV-7254
I tried something link this:
gsub(pattern = "(*.)(TCGA*)(.*)",
replacement = "\\2",
x = nameArq)
But it returns:
[1] "UNCID_999277TCGA"
Thanks for any help!
You almost had it. In the first parentheses, the period needs to come first (this means "repeat any character any number of times"). You also need some unique endpoint for the second part of your regex.
gsub(pattern = "(.*)(TCGA.*4)(.*)",
replacement = "\\2",
x = str1)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to anonymize names but in a very specific way so that the format of the entire string is still the same (spaces, hyphens, periods are preserved) but all the letters are scrambled. I want to consistently replace say all A's with C's, all D's with Z's, and so on. How would I do that?
We can use chartr
chartr('AD', 'CZ', str1)
#[1] "CZ,ZC. C"
data
str1 <- c('AD,DA. C')
Maybe use gsub?
string <- "ABCDEFG"
text <- gsub('A', 'C', string )
string <- gsub('D', 'Z', string )
string
[1] "CBCZEFG"
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have my data as below
Idle|Idle|Idle|Idle|Idle|Idle|Idle
Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle
Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle
I want to extract numbers in between(which is phone number in ASCII format) which is
(56|55|49|50|53|48|54|52 for 2nd line and 49|51|48|50|50|49|50|57|56|57|56 for 3rd line),
convert them to numbers between "0 and 9" and concatenate as string/number in new column as phone_number in same data set.
2nd row of new column should be 871230652 and 3rd row should be 13022129898
In ASCII format 48 represents 0 and 57 represents 9
Please help
Thanks,
Here's an approach with regular expressions:
res <- sapply(regmatches(x, gregexpr("^(?:Idle\\|)*\\K\\d+(?=\\|)|\\G(?!^)\\|\\K\\d+",
x, perl = TRUE)),
function(x) paste(as.integer(x) - 48, collapse = ""))
# [1] "" "87125064" "13022129898"
If you want to exclude the empty strings, you can use the following command:
res[as.logical(nchar(res))]
# [1] "87125064" "13022129898"
Here x is this vector:
x <- c("Idle|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle")