This question already has answers here:
Convert comma separated string to integer in R
(3 answers)
Closed 1 year ago.
I am using a function where Timepoints need to be defined as
Timepoints = c(x,y,z)
Now i have a chr list
List
$ chr: "1,2,3,4,5,6,7"
with the timepoints i need to use, already seperated by commas.
I want to use this list in the function and lose the quotation marks, so the function can read my timepoints as
Timepoints= c(1,2,3,4,5,6,7)
I tried using noquote(List), but this is not accepted.
Am is missing something ? printing the list with noquote() results in the desired line of characters 1,2,3,4,5,6,7
1) Base R - scan Assuming that you have a list containing a single character string as shown in L below use scan as shown.
L <- list("1,2,3,4,5,6")
scan(text = L[[1]], sep = ",", quiet = TRUE)
## [1] 1 2 3 4 5 6
2) gsubfn::strapply Another possibility is to use strapply to match each string of digits, convert them to numeric and return it as a vector. (We assume that the numbers have no signs or decimal points but that could readily be added if needed.)
library(gsubfn)
strapply(L[[1]], "\\d+", as.numeric, simplify = unlist)
[1] 1 2 3 4 5 6
Added
In a comment the poster indicated an interest in having a list of character strings as input. The output was not specified but if we assume we want a list of numeric vectors then
L2 <- list(A = "1,2,3,4,5,6", B = "1,2")
Scan <- function(x) scan(text = x, sep = ",", quiet = TRUE)
lapply(L2, Scan)
## $A
## [1] 1 2 3 4 5 6
##
## $B
## [1] 1 2
library(gsubfn)
strapply(L2, "\\d", as.numeric)
## $A
## [1] 1 2 3 4 5 6
##
## $B
## [1] 1 2
Here is an option with strsplit.
as.integer(unlist(strsplit(L[[1]], ",")))
#[1] 1 2 3 4 5 6
Related
Suppose I have a long vector with characters which is more or less like this:
vec <- c("32, 25", "5", "15, 24")
I want to apply a function which give me the number of strings for any element separated by a comma and returns me a vector with any individual length. Using lapply and my toy vector, this is my approach:
lapply(vec, function(x) {
a <- strsplit(x, ",")
y <- length(a[[1:length(a)]])
unlist(y[1:length(y)])
})
[[1]]
[1] 2
[[2]]
[1] 1
[[3]]
[1] 2
This almost gives me what I want since first element has 2 strings, second element 1 string and third element 2 strings. The problem is I can't achieve that my function returns me a vector of the form c(2,1,2). I'm using this function to create a new variable on some data.frame which I'm working with.
Any idea will be much appreciated.
You could do:
stringr::str_count(vec, ",") + 1
#> [1] 2 1 2
Or, in base R:
nchar(gsub("[^,]", "", vec)) + 1
#> [1] 2 1 2
Say I have the following list (note the usage of non-syntactic names)
list <- list(A = c(1,2,3),
`2` = c(7,8,9))
So the following two way of parsing the list works:
`$`(list,A)
## [1] 1 2 3
`$`(list,`2`)
## [1] 7 8 9
However, this way to proceed fails.
id <- 2
`$`(list,id)
## NULL
Could someone explain why the last way does not work and how I could fix it? Thank you.
Your id is a "computed index", which is not supported by the $ operator. From ?Extract:
Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]].
If you have a computed index, then use [[ to extract.
l <- list(a = 1:3)
id <- "a"
l[[id]]
## [1] 1 2 3
`[[`(l, id) # the same
## [1] 1 2 3
If you insist on using the $ operator, then you need to substitute the value of id in the $ call, like so:
eval(bquote(`$`(l, .(id))))
## [1] 1 2 3
It doesn't really matter whether id is non-syntactic:
l <- list(`!##$%^` = 1:3)
id <- "!##$%^"
l[[id]]
## [1] 1 2 3
`[[`(l, id)
## [1] 1 2 3
eval(bquote(`$`(l, .(id))))
## [1] 1 2 3
I am also trying to get a better grasp of non-syntactic names. Unfortunately, more complex patterns of their use are hard to find. First, read ?Quotes and what backticks do.
For the purpose of learning here is some code:
list <- list(A = c(1,2,3),
`2` = c(7,8,9))
id <- 2
id_backtics <- paste0("`", id,"`")
text <- paste0("`$`(list, ", id_backtics, ")")
text
#> [1] "`$`(list, `2`)"
eval(parse(text = text))
#> [1] 7 8 9
Created on 2022-01-24 by the reprex package (v2.0.1)
Supose I have the following
X <- "1,2,3,4,5"
How do I get the sequence of numeric values
#[1] 1 2 3 4 5
I've already seen this example https://statisticsglobe.com/convert-character-to-numeric-in-r/ But it doesn't quite match with the problem above.
This is one way of doing this:
library(stringr)
l="1,2,3,4,5"
as.numeric(str_split(l, ',', simplify = TRUE))
1) scan will convert such a string to a numeric vector. Omit the quiet argument if you would like it to report the length of the result. No packages are used.
x <- "1,2,3,4,5"
scan(text = x, sep = ",", quiet = TRUE)
## [1] 1 2 3 4 5
2) If what you have is actually a vector of comma separated character stings. xx. and a list of numeric vectors is wanted then lapply over them.
xx <- c(x, x)
lapply(xx, function(x) scan(text = x, sep = ",", quiet = TRUE))
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] 1 2 3 4 5
I want to count how many commas are at the end of a string with a regex:
x <- c("w,x,,", "w,x,", "w,x", "w,x,,,")
I'd like to get:
[1] 2 1 0 3
This gives:
library(stringi)
stringi::stri_count_regex(x, ",+$")
## [1] 1 1 0
Because I'm using a quantifier but don't know how to count actual number of times single character was repeated at end.
The "match.length" attribute within the regexpr seem to get the job done (-1 is used to distinguish no match from zero-width matches such as lookaheads)
attr(regexpr(",+$", x), "match.length")
## [1] 2 1 -1 3
Another option (with contribution from #JasonAizkalns) would be
nchar(x) - nchar(gsub(",+$", "", x))
## [1] 2 1 0 3
Or using stringi package combined with nchar while specifying , keepNA = TRUE (this way no matches will be specified as NAs)
library(stringi)
nchar(stri_extract_all_regex(x, ",+$"), keepNA = TRUE)
## [1] 2 1 NA 3
I have a list of strings which contain random characters such as:
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
I'd like to know which numbers are present at least once (unique()) in this list. The solution of my example is:
solution: c(7,667,11,5,2)
If someone has a method that does not consider 11 as "eleven" but as "one and one", it would also be useful. The solution in this condition would be:
solution: c(7,6,1,5,2)
(I found this post on a related subject: Extracting numbers from vectors of strings)
For the second answer, you can use gsub to remove everything from the string that's not a number, then split the string as follows:
unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2
For the first answer, similarly using strsplit,
unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1] 7 667 11 5 2
PS: don't name your variable list (as there's an inbuilt function list). I've named your data as ll.
Here is yet another answer, this one using gregexpr to find the numbers, and regmatches to extract them:
l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
temp1 <- gregexpr("[0-9]", l) # Individual digits
temp2 <- gregexpr("[0-9]+", l) # Numbers with any number of digits
as.numeric(unique(unlist(regmatches(l, temp1))))
# [1] 7 6 1 5 2
as.numeric(unique(unlist(regmatches(l, temp2))))
# [1] 7 667 11 5 2
A solution using stringi
# extract the numbers:
nums <- stri_extract_all_regex(list, "[0-9]+")
# Make vector and get unique numbers:
nums <- unlist(nums)
nums <- unique(nums)
And that's your first solution
For the second solution I would use substr:
nums_first <- sapply(nums, function(x) unique(substr(x,1,1)))
You could use ?strsplit (like suggested in #Arun's answer in Extracting numbers from vectors (of strings)):
l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
## split string at non-digits
s <- strsplit(l, "[^[:digit:]]")
## convert strings to numeric ("" become NA)
solution <- as.numeric(unlist(s))
## remove NA and duplicates
solution <- unique(solution[!is.na(solution)])
# [1] 7 667 11 5 2
A stringr solution with str_match_all and piped operators. For the first solution:
library(stringr)
str_match_all(ll, "[0-9]+") %>% unlist %>% unique %>% as.numeric
Second solution:
str_match_all(ll, "[0-9]") %>% unlist %>% unique %>% as.numeric
(Note: I've also called the list ll)
Use strsplit using pattern as the inverse of numeric digits: 0-9
For the example you have provided, do this:
tmp <- sapply(list, function (k) strsplit(k, "[^0-9]"))
Then simply take a union of all `sets' in the list, like so:
tmp <- Reduce(union, tmp)
Then you only have to remove the empty string.
Check out the str_extract_numbers() function from the strex package.
pacman::p_load(strex)
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
charvec <- unlist(list)
print(charvec)
#> [1] "djud7+dg[a]hs667" "7fd*hac11(5)" "2tu,g7gka5"
str_extract_numbers(charvec)
#> [[1]]
#> [1] 7 667
#>
#> [[2]]
#> [1] 7 11 5
#>
#> [[3]]
#> [1] 2 7 5
unique(unlist(str_extract_numbers(charvec)))
#> [1] 7 667 11 5 2
Created on 2018-09-03 by the reprex package (v0.2.0).