R - Select Elements from list that meet the criteria - r

I had a tough time selecting elements from a list that meet a function. So documenting the same with a solution.
check.digits <- function(x){ grepl('^(\\d+)$' , x) }
x = "741 abc pqr street 71 15 41 510741"
lx = strsplit(x, split = " ", fixed = TRUE)
lapply(lx, check.digits)
This does not work -
lx[[1]][c(lapply(lx, check.digits))]
Use -
lx[[1]][sapply(lx, check.digits)]
thanks!!!

Given what you're after, perhaps you should just use gregexpr + regmatches:
regmatches(x, gregexpr("\\d+", x))
# [[1]]
# [1] "741" "71" "15" "41" "510741"
Or, from "qdapRegex", use rm_number:
library(qdapRegex)
rm_number(x, extract = TRUE)
# [[1]]
# [1] "741" "71" "15" "41" "510741"
Or, from "stringi", use stri_extract_all_regex:
library(stringi)
stri_extract_all_regex(x, "\\d+")
# [[1]]
# [1] "741" "71" "15" "41" "510741"
Add an [[1]] at the end if you're just dealing with a single string and are just interested in the single vector.

Use
lx[[1]][sapply(lx, check.digits)]
[1] "741" "71" "15" "41" "510741"

Related

How could I split a nested list using str_split without the use of for loop?

Now I have a nested list containing something like this
[[1]]
[1] "53" "682, 684" "677" "683"
[[2]]
[1] "40, 43" "10" "44, 47"
and I want to split each list by ", " so that they can all be a list of numbers into
[[1]]
[1] "53" "682" "684" "677" "683"
[[2]]
[1] "40" "43" "10" "44" "47"
Now my current idea is I want to use apply() and define my own function using recursion, but how to detect there is how many lists in each row?
I'm required not to use for loop, so what should I do??
something like this?
not sure since the format of the sample data in the question is unknown...
#create sample data
v1 <- c("53", "682, 684", "677", "683")
v2 <- c("40, 43", "10", "44, 47")
l <- list( v1, v2 )
#code
lapply( l, function(x) trimws( unlist ( strsplit( x, ",") ) ) )
#output
# [[1]]
# [1] "53" "682" "684" "677" "683"
#
# [[2]]
# [1] "40" "43" "10" "44" "47"
#

How to pass the length of each element in an R vector to the substr function?

I have the following vector.
v <- c('X100kmph','X95kmph', 'X90kmph', 'X85kmph', 'X80kmph',
'X75kmph','X70kmph','X65kmph','X60kmph','X55kmph','X50kmph',
'X45kmph','X40kmph','X35kmph','X30kmph','X25kmph','X20kmph',
'X15kmph','X10kmph')
I want to extract the digits representing speed. They all start at the 2nd position, but end at different places, so I need (length of element i) - 4 as the ending position.
The following doesn't work as length(v) returns the length of the vector and not of each element.
vnum <- substr(v, 2, length(v)-4)
Tried lengths() as well, but doesn't work.
How can I supply the length of each element to substr?
Context:
v actually represents a character column (called Speed) in a tibble which I'm trying to mutate into the corresponding numeric column.
mytibble <- mytibble %>%
mutate(Speed = as.numeric(substr(Speed, 2, length(Speed) - 4)))
Using nchar() instead of length() as suggested by tmfmnk does the trick!
vnum <- substr(v, 2, nchar(v)-4)
If you just want to extract the digits, then here is another option
vnum <- gsub("\\D","",v)
such that
> vnum
[1] "100" "95" "90" "85" "80" "75" "70" "65" "60" "55"
[11] "50" "45" "40" "35" "30" "25" "20" "15" "10"

Remove elements from a list in R based on a condition

I have a list l in R as shown below. I want to remove elements where the only alphanumeric character is 0. How can I do that?
# Create list
l <- list(c('108', '50', '0]'), c('109','58','0','0]'), c('18','0'))
l
[[1]]
[1] "108" "50" "0]"
[[2]]
[1] "109" "58" "0" "0]"
[[3]]
[1] "18" "0"
# What I want:
l
[[1]]
[1] "108" "50"
[[2]]
[1] "109" "58"
[[3]]
[1] "18"
We can use grepl to match either 0 or the ] and negate (!) to remove the values from the list elements
lapply(l, function(x) x[!grepl("^0$|\\]", x)])
#[[1]]
#[1] "108" "50"
#[[2]]
#[1] "109" "58"
#[[3]]
#[1] "18"
Or convert to numeric remove the NA elements along with 0
lapply(l, function(x) x[!is.na(as.numeric(x)) & x != 0])
Or use setdiff
lapply(l, setdiff, c("0", "0]"))
I believe this is a general purpose way.
l2 <- lapply(l, function(s) {
s <- gsub('[^[:digit:]]', '', s)
s[nchar(sub('([^0]*)0([^0]*)', '\\1\\2', s)) != 0]
})
l2
#[[1]]
#[1] "108" "50"
#
#[[2]]
#[1] "109" "58"
#
#[[3]]
#[1] "18"
An even more general solution, that removes potential elements like "&% 00]" (where the only alphanumeric characters are 0)
lapply(l, function(x) x[grep('^[0[:punct:][:blank:]]*$', x, invert = TRUE)])

R: Extract first two digits from nested list elements

For the following vector, I would to keep only the first two digits of each integer:
a <- c('1234 2345 345 234', '323 55432 443', '43 23345 321')
I've attempted to do this by converting the vector into a nested list using strsplit and then applying substr to the list:
a <- strsplit(a, ' ')
a <- substr(a, start = 1, stop = 2)
However, this seems to just extract eh beginning of the concatenated command:
a
[1] "c(" "c(" "c("
Ideally, I would be able to coerce the vector into the following form:
[[1]]
[1] "12" "23" "34" "23"
[[2]]
[1] "32" "55" "44"
[[3]]
[1] "43" "23" "32"
How about
lapply(strsplit(a, " "), substr, 1, 2)
this explicitly does an lapply over the results of the strsplit. This is because substr() tries to coerce your list to a character vector first (it doesn't expect a list as it's first parameter). You can see what it's looking at if you do
as.character(strsplit(a, ' '))
# [1] "c(\"1234\", \"2345\", \"345\", \"234\")" "c(\"323\", \"55432\", \"443\")"
# [3] "c(\"43\", \"23345\", \"321\")"
We can also extract the first two digits from a word boundary
library(stringr)
str_extract_all(a, "\\b\\d{2}")
#[[1]]
#[1] "12" "23" "34" "23"
#[[2]]
#[1] "32" "55" "44"
#[[3]]
#[1] "43" "23" "32"

Extract from string on varying conditions

I am trying to extract characters only and numbers only from a string. Because the positions of these vary, I can't use syntax which relies on the position of the values.
For example, say I have the following column x where values are repeated, but with different numbers:
x <- c("dummy.DR57", "dummy.hour41", "dummy.MAV43", "dummy.SB1")
I want to create two columns:
1: A column with just the characters after the "." but before the numbers:
name <- c("DR", "hour", "MAV", "SB")
2: A column with just the numbers:
number <- c("57", "41", "43", "1")
I've mostly been trying substr and str_sub - but I'm not getting the results I need.
Any help is much appreciated!
x <- c("dummy.DR57", "dummy.hour41", "dummy.MAV43", "dummy.SB1")
(number <- gsub('[[:alpha:]].', '', x))
# [1] "57" "41" "3" "1"
(name <- gsub("[^.]*[.]|[[:digit:]]", "", x))
# [1] "DR" "hour" "MAV" "SB"
> gsub(x, pattern = '[0-9]|dummy\\.', replacement = '')
[1] "DR" "hour" "MAV" "SB"
> gsub(x, pattern = '[a-zA-Z]|\\.', replacement = '')
[1] "57" "41" "43" "1"
You may try this:
gsub(pattern = "(^.*\\.)([[:alpha:]]+)([[:digit:]]+)",
replacement = "\\2",
x = x)
# [1] "DR" "hour" "MAV" "SB"
gsub(pattern = "(^.*\\.)([[:alpha:]]+)([[:digit:]]+)",
replacement = "\\3",
x = x)
# [1] "57" "41" "43" "1"

Resources