Why is my unlist result shorter than my list? - r

Why does unlisting my list data structure result in a different length? The length of the list is 13951. The length of the unlisted result is 13654. There are no NULL's or NA's.
> class(price)
[1] "list"
> head(price)
$`570`
[1] 0
$`440`
[1] 0
$`730`
[1] 1499
$`304930`
[1] 0
$`550`
[1] 1999
$`230410`
[1] 0
length(names(price)) # 13951
length(price) # 13951
length(unlist(price)) # 13654
> sum(is.na(price))
[1] 0
> sum(is.null(price))
[1] 0
How do I ensure the unlist length is the same as the list length?
-- ATTEMPTED SOLUTION BELOW:
> out <- do.call(c, lapply(price, (function(x) {
+ if (is.null(x)) {NA} else { x }
+ })))
> length(out) #2
[1] 13654
> table(sapply(price, class))
numeric
13951

One thing is to count elements in the list. Another is to count their lengths...
a.list <- list(a= 1, b= NULL, c = list())
length(a.list) # 3
sum(lengths(a.list)) # 1 {as suggested by Nicola}
# same as: sum(sapply(a.list, length)) # 1
I assume you have named list elements that are NULL (or length == 0). When you unlist, those should be lost.
length(unlist(a.list)) #1
If you want to extract something from all named elements (replacing a NULL with a NA) you could do as follows.
out <- do.call(c, lapply(a.list, (function(x) {
if (is.null(x) | length(x) == 0) {NA} else { x }
})))
length(out) #3
This assumes you have no 2-level lists.

Related

Keeping elements (from list of vectors) that do not have a proper subset within that list (in R)

Proper subset: A proper subset S' of a set S is a subset that is strictly contained in S and so excludes S itself (note I am also excluding the empty set).
Suppose you have the following vectors in a list:
a = c(1,2)
b = c(1,3)
c = c(2,4)
d = c(1,2,3,4)
e = c(2,4,5)
f = c(1,2,3)
My aim is to keep only vectors which have no proper subset within the list, which in this example would be a, b and c. The following code is my solution,
possibilities = list(a,b,c,d,e,f)
final.list <- possibilities
for (i in possibilities) {
for (j in rev(possibilities)) {
if (all(i %in% j) & !all(j %in% i)) {
final.list <- final.list[!(final.list %in% list(j))]
} else {
final.list <- final.list
}
}
}
which gives the intended output, though I am concerned with the scalability of this approach. Does anyone have an idea for a more efficient approach? Thanks!
* Note that for my true purpose the length of the possibilities list--and its sub-vectors--can grow quite large.
One purrr option could be:
map2(.x = possibilities,
.y = seq_along(possibilities),
~ !any(map_lgl(possibilities[-.y], function(z) all(z %in% .x))))
[[1]]
[1] TRUE
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] FALSE
[[5]]
[1] FALSE
[[6]]
[1] FALSE
To keep only the target vectors:
keep(possibilities,
map2_lgl(.x = possibilities,
.y = seq_along(possibilities),
~ !any(map_lgl(possibilities[-.y], function(z) all(z %in% .x)))))
[[1]]
[1] 1 2
[[2]]
[1] 1 3
[[3]]
[1] 2 4
Here is a base R option
final.list <- subset(
possibilities,
sapply(
seq_along(possibilities),
function(k) {
!any(sapply(
possibilities[-k],
function(v) all(v %in% possibilities[[k]]) & length(v) < length(possibilities[[k]])
))
}
)
)
which gives
> final.list
[[1]]
[1] 1 2
[[2]]
[1] 1 3
[[3]]
[1] 2 4

finding sth with loops by indexing in R

Create a for loop that checks whether a numeric vector has at least two zeros in
a row. If so, it prints the position (index) of the first zero.
Here is a version by using rle + inverse.rle
findConZeros <- function(x) {
u <- rle(unlist(strsplit(as.character(x),""))==0)
u$values <- with(u,lengths>=2&values)
which(inverse.rle(u))
}
such that
> findConZeros(1200011)
[1] 3 4 5
> findConZeros(40400)
[1] 4 5
Below is a for loop version
findConZeros_forloop <- function(x) {
s <- unlist(strsplit(as.character(x),""))
res <- c()
for (i in seq_along(s)) {
if (all(s[i:(i+1)]=="0") & i < length(s)) res <- c(res,i,i+1)
}
unique(res)
}
which gives
> findConZeros_forloop(1200011)
[1] 3 4 5
> findConZeros_forloop(40400)
[1] 4 5

problem applying a function to a list in R

I have created a function that converts "YYYYQQ" to integer YYYYMMDD. The function works well with individual values in a list but not on the whole list. I am not unable to understand the warning message.
GetProperDate <- function(x) {
x <- as.character(x)
q<-substr(x, 5, 6)
y<-substr(x, 1,4) %>% as.numeric()
if(q=="Q1"){
x <- as.integer(paste0(y,"03","31"))
}
if(q=="Q2"){
x <- as.integer(paste0(y,"06","30"))
}
if(q=="Q3"){
x <- as.integer(paste0(y,"09","30"))
}
if(q=="Q4"){
x <- as.integer(paste0(y,"12","31"))
}
return(x)
}
> GetProperDate("2019Q1")
[1] 20190331
> GetProperDate("2019Q2")
[1] 20190630
> GetProperDate("2019Q3")
[1] 20190930
> GetProperDate("2019Q4")
[1] 20191231
> date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
> date.list.converted<- date.list %>% GetProperDate()
Warning messages:
1: In if (q == "Q1") { :
the condition has length > 1 and only the first element will be used
2: In if (q == "Q2") { :
the condition has length > 1 and only the first element will be used
3: In if (q == "Q3") { :
the condition has length > 1 and only the first element will be used
4: In if (q == "Q4") { :
the condition has length > 1 and only the first element will be used
> date.list.converted
[1] 20190331 20190331 20190331 20190331
>
As shown above I am getting a warning message and the output is not as expected.
The issue is you have written a function GetProperDate which is not vectorised. if is used for scalar inputs and not vector. You may switch to ifelse which is vectorised and rewrite your function.
Apart from that you can also use as.yearqtr from zoo which is used to handle quarterly dates and get the last date of the quarter by using frac = 1.
as.Date(zoo::as.yearqtr(date.list), frac = 1)
#[1] "2019-03-31" "2019-06-30" "2019-09-30" "2019-12-31"
When you pass a vector to the function,it is comparing vector with a scalar. R automatically takes the first element of the vector. thats why you get warning as the condition has length > 1 and only the first element will be used..Try this
date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
date.list.converted <- sapply(date.list, function(s) GetProperDate(s))
Try this:
library(tidyverse)
GetProperDate <- function(x) {
x <- as.character(x)
q <- substr(x, 5, 6)
y <- substr(x, 1,4) %>%
as.numeric()
x <- case_when(
q=="Q1" ~ as.integer(paste0(y,"03","31")),
q =="Q2" ~ as.integer(paste0(y,"06","30")),
q == "Q3" ~ as.integer(paste0(y,"09","30")),
TRUE ~ as.integer(paste0(y,"12","31")))
return(x)
}
date.list<-c("2019Q1","2019Q2","2019Q3","2019Q4")
GetProperDate(date.list)
> GetProperDate(date.list)
[1] 20190331 20190630 20190930 20191231

Match strings with partial matching allowed but only when there's a unique match

I have a vector of names and another vector v that I need to match against names. I want to receive the indices of names where v matches. Partial matching should be allowed, but only when the partial match is unique.
The following example covers all relevant cases:
names <- c("a", "b", "c", "ab", "def", "defg", "hij")
v1 <- c("a", "b")
v2 <- c("a", "ab")
v3 <- c("d")
v4 <- c("h")
v5 <- c("a", "b", "a")
I expect the following outputs:
match_names(v1, names)
# c(1, 2)
match_names(v2, names)
# c(1, 4)
match_names(v3, names)
# error
match_names(v4, names)
# 7
match_names(v5, names)
# c(1, 2, 1)
How can I write such a function? I thought about (combinations) of which and grep but did not find something useful till now?
What I tried
(Before knowing the requirement of partial matches..)
match_names1 <- function(v, names) {
sapply(v, function(i) which(i == names))
}
This worked fine for examples v1, v2 and v5.
After getting the requirement of partial matches
match_names2 <- function(v, names) {
sapply(v, function(i) grep(i, names))
}
..which of course only works for v4
To catch v3 worked with the following extension of match_names1:
match_names3 <- function(v, names) {
exact <- match_names1(v, names)
assertthat::assert_that(class(exact) != "list")
return(exact)
}
So this covers v1, v2, v3 and v5, but not v4
Thx in advance for any hints.
[[ can be used to partially match one name at a time:
f = function(v){
sapply(v, function(x) setNames(seq_along(names), names)[[x, exact=FALSE]])
}
# try it on the example
vs = list(v1,v2,v3,v4,v5)
for (i in seq_along(vs)){
cat("\nv", i, ":\n", sep="")
print(try( f(vs[[i]]) ))
}
which yields
v1:
a b
1 2
v2:
a ab
1 4
v3:
Error in setNames(seq_along(names), names)[[x, exact = FALSE]] :
subscript out of bounds
[1] "Error in setNames(seq_along(names), names)[[x, exact = FALSE]] : \n subscript out of bounds\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in setNames(seq_along(names), names)[[x, exact = FALSE]]: subscript out of bounds>
v4:
h
7
v5:
a b a
1 2 1
I opted for lists as from your question it is unclear what should happen if a string from v is matched exactly more than once in names and the only way to keep all exact matches is to return a list. If you dont like the lists you can simply unlist() the result.
match_names <- function(v, names){
# check exact matches:
resList <- lapply(v, function(elt) which(names == elt))
notMatched <- which(lengths(resList) == 0)
if (length(notMatched) == 0) return (resList)
#partial matching
else{
resNotMatched <- lapply(v[notMatched], grep, x = names)
matchedOnce <- which(lengths(resNotMatched) == 1)
}
resList[notMatched[matchedOnce]] <- resNotMatched[matchedOnce]
return (resList)
}
> match_names(v1, names)
[[1]]
[1] 1
[[2]]
[1] 2
> # c(1, 2)
> match_names(v2, names)
[[1]]
[1] 1
[[2]]
[1] 4
> # c(1, 4)
> match_names(v3, names)
[[1]]
integer(0)
> # error
> match_names(v4, names)
[[1]]
[1] 7
> # 7
> match_names(v5, names)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 1
>
match function should work in all cases except for the partial match v4.
To cater for partial match, you could define a function something like:
match_names <- function(v, names) {
ind <- match(v, names)
# If can't find the match then try partial matching
if (any(is.na(ind))) {
# grepl to find partial matching index
ind <- which(grepl(v, names))
# To ensure partial matched value is unique.
if (length(ind) > 1) ind <- NA
}
return(ind)
}
> match_names(v1, names)
[1] 1 2
> match_names(v2, names)
[1] 1 4
> match_names(v3, names)
[1] NA
> match_names(v4, names)
[1] 7
> match_names(v5, names)
[1] 1 2 1

Filter list in R base on criteria within list objects

This is a trivial question, but I'm stumped. How can I filter a list of dataframes based on their length? The list is nested -- meaning there are lists of lists of dataframes of different lengths. Here is an example. I'd like to filter or subset the list to include only those objects that are length n, say 3.
Here is an example and my current approach.
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
# Here is my attempt, I'm stuck at how to obtain
# the number of varibles in each dataframe and then filter by it.
map(star.wars_ls, function(x){
map(x, function(x){ ## Incorrectly returns 20 for all
length(y)
})
})
We can do
map(star.wars_ls, ~ if(is.data.frame(.x)) .x[length(.x) == 3] else map(.x, ~ .x[length(.x) == 3]))
You should be able to check whether the item in the star.wars_ls is a list or a data frame. Then, check the number of columns within each item. Try using:
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
datacols <- map(star.wars_ls, function(X) {
if (is.data.frame(X) == T) {
ncol(X) }
else {
map(X, function(Y) {
ncol(Y)
})
}
}
)
# > datacols
# [[1]]
# [1] 5
#
# [[2]]
# [[2]][[1]]
# [1] 8
#
# [[2]][[2]]
# [1] 3
#
#
# [[3]]
# [1] 2
#
# [[4]]
# [[4]][[1]]
# [1] 7
#
# [[4]][[2]]
# [1] 5
#
#
# [[5]]
# [1] 3
This will only give you the length (number of columns) of each data frame within the list. To get the indices (I'm sure there's a more efficient way to do this -- maybe someone else can help with that):
indexlist <- c()
for (i in 1:length(datacols)) {
if (length(datacols[[i]]) == 1) {
if (datacols[[i]][1] == 3) {
index <- i
indexlist <- c(indexlist, as.character(index))
}
} else {
for (j in 1:length(datacols[[i]])) {
if (datacols[[i]][[j]][1] == 3) {
index <- str_c(i, ",", j)
indexlist <- c(indexlist, index)
}
}
}
}
# > indexlist
# [1] "2,2" "5"
you could use recursion. It doesnt matter how deeply nested the list is:
ff = function(x)map(x,~if(is.data.frame(.x)){if(length(.x)==3) .x} else ff(.x))
ff(star.wars_ls)

Resources