Why do identical dataframes become different when changing rownames to the same - r

I've come across a strange behavior when playing with some dataframes: when I create two identical dataframes a,b, then swap their rownames around, they don't come out as identical:
rm(list=ls())
a <- data.frame(a=c(1,2,3),b=c(2,3,4))
b <- a
identical(a,b)
#TRUE
identical(rownames(a),rownames(b))
#TRUE
rownames(b) <- rownames(a)
identical(a,b)
#FALSE
Can anyone reproduce/explain why?

This is admittedly a bit confusing. Starting with ?data.frame we see that:
If row.names was supplied as NULL or no suitable component was found
the row names are the integer sequence starting at one (and such row
names are considered to be ‘automatic’, and not preserved by
as.matrix).
So initially a and b each have an attribute called row.names that are integers:
> str(attributes(a))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: int [1:3] 1 2 3
$ class : chr "data.frame"
But rownames() returns a character vector (as does dimnames(), actually a list of character vectors, called under the hood). So after reassigning the row names you end up with:
> str(attributes(b))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: chr [1:3] "1" "2" "3"
$ class : chr "data.frame"

Related

Search list of words in character string

I have a list of 3, named listWords:
$ : chr [1:6] "Maintenance" "repair" "installation" "activities" ...
$ : chr [1:19] "Manufacture" "specific" "equipment" "energy" ...
$ : chr [1:14] "Manufacture" "discharge" "lamps" "pressure" ...
I have another list, named wordsCP (for example)
$ : chr [1:3] "Cauliflowers" "and" "broccoli"
$ : chr "Lettuce"
and I would like to search the items in words CP that would contain at least 2 or 3 words from the listWords.
How I can do that?
Indeed as a results, I should have a row number for both lists and then get that words in row 1 of listWords can be found in rows x, y or z of wordsCP.
The below will give you a list with all elements of wordsCP that match 2 words or more in any single element of listWords
listWords <- list(c("please", "make", "a", "reprex"),
c("check", "https://stackoverflow.com/a/5965451/5224236"))
wordsCP <- list(c("a", "reprex"),
c("will", "get", "you", "better", "answers"),
c("check", "https://stackoverflow.com/a/5965451/5224236"))
match_matrix <- as.data.frame(sapply(wordsCP, function(x) sapply(listWords, function(y) sum(x %in% y)>=2)))
matches <- sapply(match_matrix, any)
wordsCP[matches]
[[1]]
[1] "a" "reprex"
[[2]]
[1] "check"
[2] "https://stackoverflow.com/a/5965451/5224236"

Why do identical row names yield different results on barplot axis labels? [duplicate]

I've come across a strange behavior when playing with some dataframes: when I create two identical dataframes a,b, then swap their rownames around, they don't come out as identical:
rm(list=ls())
a <- data.frame(a=c(1,2,3),b=c(2,3,4))
b <- a
identical(a,b)
#TRUE
identical(rownames(a),rownames(b))
#TRUE
rownames(b) <- rownames(a)
identical(a,b)
#FALSE
Can anyone reproduce/explain why?
This is admittedly a bit confusing. Starting with ?data.frame we see that:
If row.names was supplied as NULL or no suitable component was found
the row names are the integer sequence starting at one (and such row
names are considered to be ‘automatic’, and not preserved by
as.matrix).
So initially a and b each have an attribute called row.names that are integers:
> str(attributes(a))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: int [1:3] 1 2 3
$ class : chr "data.frame"
But rownames() returns a character vector (as does dimnames(), actually a list of character vectors, called under the hood). So after reassigning the row names you end up with:
> str(attributes(b))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: chr [1:3] "1" "2" "3"
$ class : chr "data.frame"

converting nested list with partially empty values to R data.frame

For a single list of data.frames objects, I'd usually have no trouble converting that:
my_df <- do.call("rbind", lapply(my_list, data.frame))
However, the list object I currently have is nested. It is a list of lists of data.frames. A few points to note:
The elements of some child lists within the parent list are empty.
Among child lists with information, some lists have more than one data.frame object.
The number of data.frame objects can vary among child lists.
Here's a simplified example of what I'm dealing with:
List of 3
$ 1 :List of 2
..$ : NULL
..$ : NULL
$ 2 :List of 2
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ name : chr [1:3] "jack" "jim" "joe" "jon"
.. ..$ value : chr [1:3] "10" "12" "13" "14"
..$ :'data.frame': 4 obs. of 2 variables:
.. ..$ name : chr [1:4] "jacky" "jane" "juanita" "julia"
.. ..$ value : chr [1:4] "11" "9" "10" "14"
$ 3 :List of 1
..$ :'data.frame': 5 obs. of 2 variables:
.. ..$ name : chr [1:5] "adam" "ashley" "arnold" "avery" "arthur"
.. ..$ value : chr [1:5] "6" "7" "11" "12" "11"
The do.call approach above reports an error that arguments imply differing number of rows, so it seems like my lists with data.frames with different row numbers is causing the issue?
I tried some strategies described in this post but each attempt had its own unique error.
The data.table "rbindlist" approach and dplyr "bind_rows" methods both reported:
fill=TRUE, but names of input list at position 1 is NULL
Thanks for any tips on how to deal with the situation.
Consider two do.call runs with Filter in between:
# APPEND ALL ITEMS TO SINGLE, FLAT LIST
df_list <- do.call("c", my_list)
# FILTER OUT NULL ITEMS
df_list <- Filter(NROW, df_list)
# CONCATENATE ALL DFs TO SINGLE DF
final_df <- do.call("rbind", df_list)
Firstly, the NULL values do not matter because rbind(NULL, NULL, ..., A, B, C, ...) is just the same as rbind(A, B, C, ...).
Secondly, the structure of your list matters. If your nested list is just as simple as your example, then the answer is also straightforward. The code below could solve this problem:
# This list is the same as your example
test <- list(
list(NULL, NULL),
list(data.frame(name = c("jack", "jim", "joe", "jon"),
value = c("10", "12", "13", "14")),
data.frame(name = c("jacky", "jane", "juanita", "julia"),
value = c("11", "9", "10", "14"))),
list(data.frame(name = c("adam", "ashley", "arnold", "avery", "arthur"),
value = c("6", "7", "11", "12", "11")))
)
# This function rbinds the dataframes inside a list
ls_rbind <- function(ls_of_df) do.call(rbind, ls_of_df)
# Applying "ls_rbind" to each of your child lists creates a list, each of whose elements is either a NULL or a dataframe
# Applying "ls_rbind" again to the result list gives you a dataframe you want
result_df <- ls_rbind(lapply(test, ls_rbind))
However, if your nested list is, in fact, more complex, then you may need a more general way to handle it. For example, each child list could be either of the following items:
A non-list item i.e. a dataframe or a NULL
A list that may also contain lists, dataframes or NULLs
In this case, recursions could be helpful. Consider the following code:
# These two lines complicate the list structure
test[[4]] <- test
test[[1]][[3]] <- test
recr_ls_rbind <- function(ls_of_ls) {
are_lists <- lapply(ls_of_ls, class) == "list"
# Any child lists will be recursively evaluated. "part1" is always a list
part1 <- lapply(ls_of_ls[are_lists], recr_ls_rbind)
# Applying the above function "ls_rbind" to all non-list child items and then coerce the result into a list.
part2 <- list(ls_rbind(ls_of_ls[!are_lists]))
# Put part1 and part2 into the same list and apply "ls_rbind" to it.
ls_rbind(c(part1, part2))
}
result_df <- recr_ls_rbind(test)

How to change the class of a list from array to list?

I have a data frame with names and values that must be converted into a list of vectors. The names determine in which vector each value must be allocated. To automatize the creation of my list, I'm using tapply:
d_df <- data.frame(name=c(rep("a",5),rep("b",5)),value=LETTERS[1:10])
d_list_auto <- tapply(d_df$value,d_df$name, FUN=as.character)
d_list_auto <- unname(d_list_auto)
d_list_manual <- list(LETTERS[1:5],LETTERS[6:10])
For practical effects, d_list_auto and d_list_manual are the same thing, but their classes are different (and the function to which I pass the list complains about it).
class(d_list_auto) #array
class(d_list_manual) #list
I tried to coerce the class change with as.list() and different flavours of apply functions to no avail:
class(as.list(d_list_auto)) #array
apply(d_list_auto,1,as.list) #Creates a list of lists
How can I coerce d_list_auto into class list without losing the structure of my data?
EDIT
A very nasty solution:
class(apply(d_list_auto,1,as.list)) #list
Someone has a more elegant suggestions?
First let's look at the structures of each object:
str(d_list_auto)
# List of 2
# $ : chr [1:5] "A" "B" "C" "D" ...
# $ : chr [1:5] "F" "G" "H" "I" ...
# - attr(*, "dim")= int 2
str(d_list_manual)
# List of 2
# $ : chr [1:5] "A" "B" "C" "D" ...
# $ : chr [1:5] "F" "G" "H" "I" ...
Looks like the only difference is that d_list_auto has a dim attribute, left over from tapply(). We can remove that by assigning NULL to as the new dimension.
dim(d_list_auto) <- NULL
Now let's see if it worked:
class(d_list_auto)
# [1] "list"
identical(d_list_auto, d_list_manual)
# [1] TRUE

Dynamic list creation

I am vexed by the way R dynamically creates lists and I am hoping someone can help me understand what is going on and what to do to fix my code. My problem is that for assignments of a vector of length one, a named vector is assigned, but assignments of a vector of length greater than one, a list is assigned. My desired outcome is that a list is assigned no matter the length of the vector I am assigning. How do I achieve such a result?
For example,
types <- c("a", "b")
lst <- vector("list", length(types))
names(lst) <- types
str(lst)
List of 2
$ a: NULL
$ b: NULL
lst$a[["foo"]] <- "hi"
lst$b[["foo"]] <- c("hi", "SO")
str(lst)
List of 2
$ a: Named chr "hi"
..- attr(*, "names")= chr "foo"
$ b:List of 1
..$ foo: chr [1:2] "hi" "SO"
str(lst$a)
Named chr "hi"
- attr(*, "names")= chr "foo"
str(lst$b)
List of 1
$ foo: chr [1:2] "hi" "SO"
What I want to have as the outcome is a data structure that looks like this.
List of 2
$ a:List of 1
..$ foo: chr [1] "hi"
$ b:List of 1
..$ foo: chr [1:2] "hi" "SO"
While I also find it surprising, it is documented in ?[[:
Recursive (list-like) objects:
[...]
When ‘$<-’ is applied to a ‘NULL’ ‘x’, it first coerces ‘x’ to
‘list()’. This is what also happens with ‘[[<-’ if the
replacement value ‘value’ is of length greater than one: if
‘value’ has length 1 or 0, ‘x’ is first coerced to a zero-length
vector of the type of ‘value’.
To override that behavior, you could specifically create empty lists before dynamically assigning to them:
lst$a <- list()
lst$b <- list()
or like Josh suggested below, replace your lst <- vector("list", length(types)) with lst <- replicate(length(types), list()).
Now that ‘x’ (lst$a or lst$b) is not ‘NULL’ but an empty list, your code should work as you expected:
lst$a[["foo"]] <- "hi"
lst$b[["foo"]] <- c("hi", "SO")
str(lst)
# List of 2
# $ a:List of 1
# ..$ foo: chr "hi"
# $ b:List of 1
# ..$ foo: chr [1:2] "hi" "SO"
I think you just need to create the types you want and assign them:
R> qq <- list( a=list(foo="Hi"), b=list(foo=c("Hi", "SO")))
R> qq
$a
$a$foo
[1] "Hi"
$b
$b$foo
[1] "Hi" "SO"
R>
where all your requirements are met:
R> class(qq)
[1] "list"
R> names(qq)
[1] "a" "b"
R> sapply(qq, names)
a b
"foo" "foo"
R>

Resources