I have hundreds of observations of census data - each feature is stored within a list with the name census. I am trying to perform an action
a) on all elements of all lists: I want to make all non character elements numeric.
b) a named element present within each list: I want to remove a prefix from a named column in every list
A toy example below.
Census is a nested list within a list
library(tidyverse)
library(purrr)
POA_CODE = c("POA101","POA102")
dogs = c(4,4)
cats = c(3,2)
children = c(0, 1)
salary = c(100, 120)
employed.prop = c(1,0.5)
pets <- list(POA_CODE, as.integer(dogs), as.integer(cats))
children <-list(POA_CODE, as.integer(children))
employment <-list(POA_CODE, salary, employed.prop)
census <- list(pets, children, employment)
Attempt to change all non-numeric elements in every list to numeric
#change all non-numeric elements to numeric
census_num <- census %>%
map(function(x){
ifelse(is.character == TRUE, x,
as.numeric(x))}
)
I get the following error message:
Error in is.character == TRUE :
comparison (1) is possible only for atomic and list types
Attempt to remove prefix from every postcode in census[[]]$'POA_CODE'
#Remove "POA" prefix from every postcode
census_code <- pmap(census, ~.x[["POA_CODE"]],function(x){
str_replace(POA_CODE,"POA","")
})
I get the error
Error: Element 2 of `.l` must have length 1 or 3, not 2
You have a nested list, so you need nested maps :
library(purrr)
map(census, function(x) map_if(x, is.character, ~as.numeric(sub('POA', '', .x))))
#[[1]]
#[[1]][[1]]
#[1] 101 102
#[[1]][[2]]
#[1] 4 4
#[[1]][[3]]
#[1] 3 2
#[[2]]
#[[2]][[1]]
#[1] 101 102
#[[2]][[2]]
#[1] 0 1
#[[3]]
#[[3]][[1]]
#[1] 101 102
#[[3]][[2]]
#[1] 100 120
#[[3]][[3]]
#[1] 1.0 0.5
In base R, we can solve it with nested lapply :
lapply(census, function(x) lapply(x, function(y)
if(is.character(y)) as.numeric(sub('POA', '', y)) else y))
You could use rapply() in base R:
rapply(
census,
function(x) if(is.character(x)) as.numeric(sub("^\\D+","", x)) else x,
how = "replace")
#> [[1]]
#> [[1]][[1]]
#> [1] 101 102
#>
#> [[1]][[2]]
#> [1] 4 4
#>
#> [[1]][[3]]
#> [1] 3 2
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 101 102
#>
#> [[2]][[2]]
#> [1] 0 1
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 101 102
#>
#> [[3]][[2]]
#> [1] 100 120
#>
#> [[3]][[3]]
#> [1] 1.0 0.5
or purrr::map_depth()
library(purrr)
map_depth(census, 2, ~if(is.character(.)) as.numeric(sub("^\\D+","", .)) else .)
#> [[1]]
#> [[1]][[1]]
#> [1] 101 102
#>
#> [[1]][[2]]
#> [1] 4 4
#>
#> [[1]][[3]]
#> [1] 3 2
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 101 102
#>
#> [[2]][[2]]
#> [1] 0 1
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 101 102
#>
#> [[3]][[2]]
#> [1] 100 120
#>
#> [[3]][[3]]
#> [1] 1.0 0.5
We can use rrapply with parse_number
library(rrapply)
library(readr)
rrapply(census, f = function(x) if(is.character(x)) readr::parse_number(x) else x)
#[[1]]
#[[1]][[1]]
#[1] 101 102
#[[1]][[2]]
#[1] 4 4
#[[1]][[3]]
#[1] 3 2
#[[2]]
#[[2]][[1]]
#[1] 101 102
#[[2]][[2]]
#[1] 0 1
#[[3]]
#[[3]][[1]]
#[1] 101 102
#[[3]][[2]]
#[1] 100 120
#[[3]][[3]]
#[1] 1.0 0.5
Related
gregexpr returns a list containing a vector with some additional data:
[[1]]
[1] 21 136 409 512 587 693
attr(,"match.length")
[1] 3 4 5 5 4 9
How do I extract just one element with a corresponding attribute at once?
[[1]]
[1] 409
attr(,"match.length")
[1] 5
UPD: The final object must be compatible with regmatches function.
In general, there's no way for R to know that elements of the vector correspond 1-1 with elements of one of its attributes.
If you know this is true (as it is with gregexpr results), then the way to tell R about it is to set a class on the object, and write your own subsetting code. For example,
`[.gregexpr_result` <- function(x, i) {
attrs <- lapply(x, function(element) {
allattrs <- attributes(element)
allattrs[["match.length"]] <- allattrs[["match.length"]][i]
allattrs
})
x <- lapply(x, `[`, i)
for (j in seq_along(x))
attributes(x[[j]]) <- attrs[[j]]
x
}
x <- paste(letters[1:2], letters[1:2])
result <- gregexpr("b", x)
class(result) <- "gregexpr_result"
result
#> [[1]]
#> [1] -1
#> attr(,"match.length")
#> [1] -1
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
#>
#> [[2]]
#> [1] 1 3
#> attr(,"match.length")
#> [1] 1 1
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
#>
#> attr(,"class")
#> [1] "gregexpr_result"
result[2]
#> [[1]]
#> [1] NA
#> attr(,"match.length")
#> [1] NA
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
#>
#> [[2]]
#> [1] 3
#> attr(,"match.length")
#> [1] 1
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
Created on 2022-11-20 with reprex v2.0.2
We may do
out <- lapply(lst1, `[`, 3)
attr(out, "match.length") <- attr(lst1, "match.length")[3]
-output
> out
[[1]]
[1] 409
attr(,"match.length")
[1] 5
data
lst1 <- structure(list(c(21, 136, 409, 512, 587, 693)),
match.length = c(3,
4, 5, 5, 4, 9))
I want to generate all subset of {1,2,3,4} with only consecutive numbers. (For example I want subset {1}, {1,2} or {2,3,4} but not {2,4}. )
This is what I have been trying:
library(ggm)
p2<-powerset(1:4, sort = TRUE, nonempty = TRUE)
m2<-p2
for (i in 1:length(p2)){
ifelse(length(p2[[i]]) <2, m2<-m2, ifelse(max(diff(as.numeric(p2[[i]])))>1, m2<-m2[-
c(i)],m2<-m2))
}
I want to first generate power set of {1,2,3,4} and exclude subsets with inconsecutive numbers. But when I am doing the
m2<-m2[- c(i)]
command in the 2nd ifelse to exclude subsets with inconsecutive numbers, I believe I change the index of power set so I keep getting the wrong subsets as I desired.
Any suggestions on how to do it correctly?
Thanks!
You can get all unique ascending sequences between 1 and 4 in base R with the following one-liner:
apply(which(upper.tri(diag(4), TRUE), TRUE), 1, function(x) x[1]:x[2])
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2
#>
#> [[3]]
#> [1] 2
#>
#> [[4]]
#> [1] 1 2 3
#>
#> [[5]]
#> [1] 2 3
#>
#> [[6]]
#> [1] 3
#>
#> [[7]]
#> [1] 1 2 3 4
#>
#> [[8]]
#> [1] 2 3 4
#>
#> [[9]]
#> [1] 3 4
#>
#> [[10]]
#> [1] 4
I was pondering on this after having come across another question.
library(tidyverse)
set.seed(42)
df <- data.frame(x = cut(runif(100), c(0,25,75,125,175,225,299)))
tidyr::extract does a nice job splitting into groups defined by the regex:
df %>%
extract(x, c("start", "end"), "(\\d+),(\\d+)") %>% head
#> start end
#> 1 0 25
#> 2 0 25
#> 3 0 25
#> 4 0 25
#> 5 0 25
#> 6 0 25
Desired output on a character vector. I know you could just create a new function, I wondered if this is already out there.
x_chr <- as.character(df$x)
des_res <- str_split(str_extract(x_chr, "(\\d+),(\\d+)"), ",")
head(des_res)
#> [[1]]
#> [1] "0" "25"
#>
#> [[2]]
#> [1] "0" "25"
#>
#> [[3]]
#> [1] "0" "25"
#>
#> [[4]]
#> [1] "0" "25"
#>
#> [[5]]
#> [1] "0" "25"
#>
#> [[6]]
#> [1] "0" "25"
You can use strcapture in base R :
strcapture("(\\d+),(\\d+)", x_chr,
proto = list(start = numeric(), end = numeric()))
# start end
#1 0 25
#2 0 25
#3 0 25
#4 0 25
#5 0 25
#6 0 25
#...
#...
You can also use stringr::str_match :
stringr::str_match(x_chr, "(\\d+),(\\d+)")[, -1]
In str_match, 1st column returns the complete pattern whereas all the subsequent columns are the capture groups.
How do I remove an element from a list in R?
Imagine this workflow:
# create list
my_list <- lapply(1:10, function(x) x)
# find which ones to exclude
my_list_boolean <- sapply(my_list, function(x) ifelse(x%%2>0,F,T))
# does not work like this!
my_list[[my_list_boolean]]
Is there a solution not having to use a for loop and create a big logic around my statement?
Just use [] and not [[]]
my_list <- lapply(1:10, function(x) x)
# find which ones to exclude
my_list_boolean <- sapply(my_list, function(x) ifelse(x%%2>0,F,T))
# does not work like this!
my_list[my_list_boolean]
#> [[1]]
#> [1] 2
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 6
#>
#> [[4]]
#> [1] 8
#>
#> [[5]]
#> [1] 10
Created on 2018-11-03 by the reprex package (v0.2.1)
You can thus select element of the list with logical vector and not the content (which is [[]]
Do you mean this?
my_list[my_list_boolean]
#[[1]]
#[1] 2
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 6
#
#[[4]]
#[1] 8
#
#[[5]]
#[1] 10
I have a list of values called squares and would like to replace all values which are 0 to a 40.
I tried:
replace(squares, squares==0, 40)
but the list remains unchanged
If it is a list, then loop through the list with lapply and use replace
squares <- lapply(squares, function(x) replace(x, x==0, 40))
squares
#[[1]]
#[1] 40 1 2 3 4 5
#[[2]]
#[1] 1 2 3 4 5 6
#[[3]]
#[1] 40 1 2 3
data
squares <- list(0:5, 1:6, 0:3)
I think for this purpose, you can just treat it as if it were a vector as follows:
squares=list(2,4,6,0,8,0,10,20)
squares[squares==0]=40
Output:
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 40
[[5]]
[1] 8
[[6]]
[1] 40
[[7]]
[1] 10
[[8]]
[1] 20