Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Given the string, how can I get the desired outcome?
('9','','','','','','','','31.23','testing7'),('10','','','','','','','','31.23','testing10')
Desired Output
(9,'','','','','','','',31.23,'testing7'),(10,'','','','','','','',31.23,'testing10')
Try the following regex:
s <- "('9','','','','','','','','31.23','testing7'),
('10','','','','','','','','31.23','testing10')"
gsub("'(-?\\d+(?:[\\.,]\\d+)?)'", x = s, replacement = "\\1")
Regex explanation:
' match literal single quote character
() capture group
? match between 0-1 times
\\d+ match digit 1 and unlimited times
\\1 group 1
This should allow for negative numbers and decimals.
Output
"(9,'','','','','','','',31.23,'testing7'),(10,'','','','','','','',31.23,'testing10')"
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a set of strings in R. In the form of: "X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt". I want to retrieve the set of strings containing just the first occurrence of a string. It depends on the 4th field. In this set for e.g. I have multiple string with X-Y-Z.3000....." I want only the first one having id = 3000, the same for the others.
For reproducibility:
X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
X-Y-Z.3000.F.PP0016-C.A-SL-0433.F-N.fC-G.txt
X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
X-Y-Z.3008.F.PP0016-C.B-SX-0433.P-N.fC-G.txt
So at the end I would only the first anche 3th string
X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt
Extract "4th field" which is 2nd field if we split on ".", then exclude duplicated items:
# data
x <- c("X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt",
"X-Y-Z.3000.F.PP0016-C.A-SL-0433.F-N.fC-G.txt",
"X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt",
"X-Y-Z.3008.F.PP0016-C.B-SX-0433.P-N.fC-G.txt")
x[ !duplicated(sapply(strsplit(x, ".", fixed = "TRUE"), "[", 2)) ]
# [1] "X-Y-Z.3000.F.PP0016-C.A-SL-0433.P-N.fC-G.txt"
# [2] "X-Y-Z.3008.F.PP0016-C.A-SL-0433.P-N.fC-G.txt"
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have this string, for instance:
str1 = "UNCID_999277.TCGA-CV-7254-01A-11R-2016-07.111118_UNC11-SN627_0167_AD09WDACXX_TAGCTT.txt"
I would like to extract this substring, for instance:
TCGA-CV-7254
I tried something link this:
gsub(pattern = "(*.)(TCGA*)(.*)",
replacement = "\\2",
x = nameArq)
But it returns:
[1] "UNCID_999277TCGA"
Thanks for any help!
You almost had it. In the first parentheses, the period needs to come first (this means "repeat any character any number of times"). You also need some unique endpoint for the second part of your regex.
gsub(pattern = "(.*)(TCGA.*4)(.*)",
replacement = "\\2",
x = str1)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to anonymize names but in a very specific way so that the format of the entire string is still the same (spaces, hyphens, periods are preserved) but all the letters are scrambled. I want to consistently replace say all A's with C's, all D's with Z's, and so on. How would I do that?
We can use chartr
chartr('AD', 'CZ', str1)
#[1] "CZ,ZC. C"
data
str1 <- c('AD,DA. C')
Maybe use gsub?
string <- "ABCDEFG"
text <- gsub('A', 'C', string )
string <- gsub('D', 'Z', string )
string
[1] "CBCZEFG"
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a data frame of strings as below and would like to add the string "Market" to each of the elements of the data frame. Is there a function that would allow me to do this easily without having to use a for loop?
V1
1 PUBLIC_DISPATCHSCADA_20141221.zip
2 PUBLIC_DISPATCHSCADA_20141222.zip
3 PUBLIC_DISPATCHSCADA_20141223.zip
4 PUBLIC_DISPATCHSCADA_20141224.zip
5 PUBLIC_DISPATCHSCADA_20141225.zip
6 PUBLIC_DISPATCHSCADA_20141226.zip
We can use paste and specify the delimiter. In this case, I am using _ and pasteing the "Market" at the beginning of the string.
df1$V1 <- paste("Market", df1$V1, sep="_")
If we need to do this for each column
df1[] <- lapply(df1, function(x) paste("Market", x, sep="_"))