remove quotes from string on only numeric values in r [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Given the string, how can I get the desired outcome?
('9','','','','','','','','31.23','testing7'),('10','','','','','','','','31.23','testing10')
Desired Output
(9,'','','','','','','',31.23,'testing7'),(10,'','','','','','','',31.23,'testing10')

Try the following regex:
s <- "('9','','','','','','','','31.23','testing7'),
('10','','','','','','','','31.23','testing10')"
gsub("'(-?\\d+(?:[\\.,]\\d+)?)'", x = s, replacement = "\\1")
Regex explanation:
' match literal single quote character
() capture group
? match between 0-1 times
\\d+ match digit 1 and unlimited times
\\1 group 1
This should allow for negative numbers and decimals.
Output
"(9,'','','','','','','',31.23,'testing7'),(10,'','','','','','','',31.23,'testing10')"

Related

How can I extract a substring in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have this string, for instance:
str1 = "UNCID_999277.TCGA-CV-7254-01A-11R-2016-07.111118_UNC11-SN627_0167_AD09WDACXX_TAGCTT.txt"
I would like to extract this substring, for instance:
TCGA-CV-7254
I tried something link this:
gsub(pattern = "(*.)(TCGA*)(.*)",
replacement = "\\2",
x = nameArq)
But it returns:
[1] "UNCID_999277TCGA"
Thanks for any help!
You almost had it. In the first parentheses, the period needs to come first (this means "repeat any character any number of times"). You also need some unique endpoint for the second part of your regex.
gsub(pattern = "(.*)(TCGA.*4)(.*)",
replacement = "\\2",
x = str1)

In R - how do I replace all letters in a string with other letters? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to anonymize names but in a very specific way so that the format of the entire string is still the same (spaces, hyphens, periods are preserved) but all the letters are scrambled. I want to consistently replace say all A's with C's, all D's with Z's, and so on. How would I do that?
We can use chartr
chartr('AD', 'CZ', str1)
#[1] "CZ,ZC. C"
data
str1 <- c('AD,DA. C')
Maybe use gsub?
string <- "ABCDEFG"
text <- gsub('A', 'C', string )
string <- gsub('D', 'Z', string )
string
[1] "CBCZEFG"

How can get some specific part from a column? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a csv file about IPC data.
It's look like
year date applicant ... ipc number
1978 1/1 noel A43B 13/20
1979 2/2 liam B06C 14/20
1980 3/3 chris D01E 01/30
...
For example,
I need 'A43B','B06C','D01E' of ipc number
but not 'A43B 13/20', 'B07C 14/20', 'D01E 01/30'
Could you please let me know how to deal with it?
Ad hoc I think there are two possibilities:
1. As Pascal wrote go for strsplit and sapply
sapply(
strsplit(c("A43B 13/20", "B06C 14/20", "D01E 01/30"),split = " "),
"[[", 1)
2. Use regular expressions and the function gsub
gsub(pattern = " [0-9]{2}\\/[0-9]{2}", replacement = "", c("A43B 13/20", "B06C 14/20","D01E 01/30"))
We can use sub. We match one or more space (\\s+) followed by one of more characters (.*) to the end ($) of the string and replace with ''.
sub('\\s+.*$', '', str1)
#[1] "A43B" "B07C" "D01E"
data
str1 <- c('A43B 13/20', 'B07C 14/20', 'D01E 01/30')

Extracting number from a non-consistent string in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have my data as below
Idle|Idle|Idle|Idle|Idle|Idle|Idle
Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle
Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle
I want to extract numbers in between(which is phone number in ASCII format) which is
(56|55|49|50|53|48|54|52 for 2nd line and 49|51|48|50|50|49|50|57|56|57|56 for 3rd line),
convert them to numbers between "0 and 9" and concatenate as string/number in new column as phone_number in same data set.
2nd row of new column should be 871230652 and 3rd row should be 13022129898
In ASCII format 48 represents 0 and 57 represents 9
Please help
Thanks,
Here's an approach with regular expressions:
res <- sapply(regmatches(x, gregexpr("^(?:Idle\\|)*\\K\\d+(?=\\|)|\\G(?!^)\\|\\K\\d+",
x, perl = TRUE)),
function(x) paste(as.integer(x) - 48, collapse = ""))
# [1] "" "87125064" "13022129898"
If you want to exclude the empty strings, you can use the following command:
res[as.logical(nchar(res))]
# [1] "87125064" "13022129898"
Here x is this vector:
x <- c("Idle|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|56|55|49|50|53|48|54|52|Idle|Idle|Idle|Idle|Idle|Idle",
"Idle|49|51|48|50|50|49|50|57|56|57|56|Idle|Idle|69|86|65|Idle|Idle|Idle|Idle")

Generate a vector m with the small letters from a to j in alphabetical order? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How do I generate a vector m with the small letters from a to j (in alphabetical order).
R has built-in constants letters, LETTERS for lower-case and upper-case letters of Roman alphabet. If you want to generate a vector from a to j i.e the first 10 alphabets.
m <- letters[1:10]

Resources