How do I count the number of non-numeric characters in a string with both numeric and non-numeric characters in R
for example:
c("ab34","f354","q64423","abf3")
I would like the output to be:
c(2,1,1,3)
1) gsub Replace digits with the empty string and count what is left. No packages are used.
x <- c("ab34","f354","q64423","abf3")
nchar(gsub("\\d", "", x))
## [1] 2 1 1 3
2) gregexpr Another possibility is to use gregexpr and lengths to count the non-numerics. We append a non-numeric to each component so that the value can never be zero (as gregexpr returns NA in that case) and then subtract 1 in the end.
lengths(gregexpr("\\D", paste0(x, "X"))) - 1
## [1] 2 1 1 3
3) strsplit / %in% split the strings into individual characters and sum the number that are not in 0:9
sapply(strsplit(x, ""), function(x) sum(!x %in% 0:9))
[1] 2 1 1 3
4) trimws In the example in the question the numerics are always at the end (or even if they were at the beginning) we can trim them off and then count what is left.
nchar(trimws(x, white = "\\d"))
## [1] 2 1 1 3
5) regexpr If the digits are always at the end then we can use regexpr to find the position of the first one. We append a 0 to ensure that there is at least one digit and subtract 1 at the end.
c(regexpr("\\d", paste0(x, 0)) - 1)
## [1] 2 1 1 3
Related
Suppose I have a long vector with characters which is more or less like this:
vec <- c("32, 25", "5", "15, 24")
I want to apply a function which give me the number of strings for any element separated by a comma and returns me a vector with any individual length. Using lapply and my toy vector, this is my approach:
lapply(vec, function(x) {
a <- strsplit(x, ",")
y <- length(a[[1:length(a)]])
unlist(y[1:length(y)])
})
[[1]]
[1] 2
[[2]]
[1] 1
[[3]]
[1] 2
This almost gives me what I want since first element has 2 strings, second element 1 string and third element 2 strings. The problem is I can't achieve that my function returns me a vector of the form c(2,1,2). I'm using this function to create a new variable on some data.frame which I'm working with.
Any idea will be much appreciated.
You could do:
stringr::str_count(vec, ",") + 1
#> [1] 2 1 2
Or, in base R:
nchar(gsub("[^,]", "", vec)) + 1
#> [1] 2 1 2
This question already has answers here:
Convert comma separated string to integer in R
(3 answers)
Closed 1 year ago.
I am using a function where Timepoints need to be defined as
Timepoints = c(x,y,z)
Now i have a chr list
List
$ chr: "1,2,3,4,5,6,7"
with the timepoints i need to use, already seperated by commas.
I want to use this list in the function and lose the quotation marks, so the function can read my timepoints as
Timepoints= c(1,2,3,4,5,6,7)
I tried using noquote(List), but this is not accepted.
Am is missing something ? printing the list with noquote() results in the desired line of characters 1,2,3,4,5,6,7
1) Base R - scan Assuming that you have a list containing a single character string as shown in L below use scan as shown.
L <- list("1,2,3,4,5,6")
scan(text = L[[1]], sep = ",", quiet = TRUE)
## [1] 1 2 3 4 5 6
2) gsubfn::strapply Another possibility is to use strapply to match each string of digits, convert them to numeric and return it as a vector. (We assume that the numbers have no signs or decimal points but that could readily be added if needed.)
library(gsubfn)
strapply(L[[1]], "\\d+", as.numeric, simplify = unlist)
[1] 1 2 3 4 5 6
Added
In a comment the poster indicated an interest in having a list of character strings as input. The output was not specified but if we assume we want a list of numeric vectors then
L2 <- list(A = "1,2,3,4,5,6", B = "1,2")
Scan <- function(x) scan(text = x, sep = ",", quiet = TRUE)
lapply(L2, Scan)
## $A
## [1] 1 2 3 4 5 6
##
## $B
## [1] 1 2
library(gsubfn)
strapply(L2, "\\d", as.numeric)
## $A
## [1] 1 2 3 4 5 6
##
## $B
## [1] 1 2
Here is an option with strsplit.
as.integer(unlist(strsplit(L[[1]], ",")))
#[1] 1 2 3 4 5 6
I wanted to keep negative sign of character while converting it into numeric class. But ended up getting a warning like NAs introduced by coercion.
test <- c("001","00-2","0003")
test <- as.numeric(as.character(test))
I expect the real output is:
1 -2 3
But the current output is:
1 NA 3
Can anyone please help me on this? Thanks!
1) sub Remove leading 0's and then convert:
as.numeric(sub("^0+", "", test))
## [1] 1 -2 3
2) trimws In R 3.6 or later this would also work:
as.numeric(trimws(test, "left", 0))
## [1] 1 -2 3
Edge case
3) paste0 If it is possible to have all zeros then the sub will reduce that string to an empty string so assuming that we have no decimals we can do this:
test2 <- c(test, "00")
as.numeric(sub("^0+", "", paste0(test2, ".0")))
## [1] 1 -2 3 0
3a) or this:
as.numeric(sub("^0*(.)", "\\1", test2))
## [1] 1 -2 3 0
I want to count how many commas are at the end of a string with a regex:
x <- c("w,x,,", "w,x,", "w,x", "w,x,,,")
I'd like to get:
[1] 2 1 0 3
This gives:
library(stringi)
stringi::stri_count_regex(x, ",+$")
## [1] 1 1 0
Because I'm using a quantifier but don't know how to count actual number of times single character was repeated at end.
The "match.length" attribute within the regexpr seem to get the job done (-1 is used to distinguish no match from zero-width matches such as lookaheads)
attr(regexpr(",+$", x), "match.length")
## [1] 2 1 -1 3
Another option (with contribution from #JasonAizkalns) would be
nchar(x) - nchar(gsub(",+$", "", x))
## [1] 2 1 0 3
Or using stringi package combined with nchar while specifying , keepNA = TRUE (this way no matches will be specified as NAs)
library(stringi)
nchar(stri_extract_all_regex(x, ",+$"), keepNA = TRUE)
## [1] 2 1 NA 3
I have a string and need to count number of appearances of a given value which must appear consequent. I tried to take help from stringr package but it counts every time it finds that value/pattern. For example, say we have to count appearance of "213" in string "2132132132137889213", then the output i need is 4 however, i am getting 5 after using stringr_count function. Please help.
I'm not sure of my "regex" skills but, hopefully, you could make something out of this:
max_rep_pat = function(pat, text)
{
res = gregexpr(paste0("(", pat, ")+"), text)
sapply(res, function(x) max(attr(x, "match.length")) / nchar(pat))
}
max_rep_pat("213", c("2132132132137889213",
"21321321321378892132132132132132213213"))
#[1] 4 5
gregexpr returns the position a pattern occured and the number of characters of the found pattern. Wrapping the pattern in "(pattern)+" means 'find the repetitive pattern'. Compare the following two:
gregexpr("213", "2132132132137889213")
[[1]]
[1] 1 4 7 10 17
attr(,"match.length")
[1] 3 3 3 3 3
#attr(,"useBytes")
#[1] TRUE
gregexpr("(213)+", "2132132132137889213")
[[1]]
[1] 1 17
attr(,"match.length")
[1] 12 3
#attr(,"useBytes")
#[1] TRUE
In the first case, it found the position of each "213" and the length of each match is just the nchar of pattern. In the second case, it found every repetitive pattern of "213" and we see that repetitions of "213" occured two times; first time with 12 / 3 = 4 repetitions and the second with 3 / 3 = 1 repetition. Using max(attr(x, "match.length")) / nchar(pattern) we get that 4.
Another way would be:
fun1 <- function(pat, text) {
max_rep_pat1 <- function(pat, text) {
text1 <- gsub(pat, paste(" ", pat, " "), text)
rl <- rle(scan(text = text1, what = "", quiet = T) == pat)
max(rl$lengths[rl$values])
}
setNames(mapply(max_rep_pat1, pat, text), NULL)
}
str1 <- c("2132132132137889213", "21321321321378892132132132132132213213")
str2 <- "213421342134213477"
fun1("2134", str2)
#[1] 4
fun1("213", str1)
#[1] 4 5