R Array creation with very specific structure - r

I am a little bit stuck with one exercise in a beginner R course that I need for the following exercises (we should replace values of the previously created object).
Create object A, which returns the following when the structure is queried:
> str(A)
num [1 : 2, 1 : 5, 1 : 3] TRUE FALSE TRUE FALSE TRUE FALSE ...
- attr(, "dimnames")=List of 3
..$ : chr [1:2] "a" "b"
..$ : chr [1:5] "C1" "C2" "C3" "C4" ...
..$ : chr [1:3] "X" "Y" "Z"
Because I am a little bit clueless with the content, that is my beginning:
a <- rep(c(0,1),15)
A <- array(a, dim= c(2,5,3))
rownames(A) <- letters[1:2]
colnames(A) <- paste("C",1:5,sep="")
Unfortunately I struggle with the object itself, I don't see how the array should be filled to be numeric and have a TRUE/FALSE content. Also the naming of the third dimension is something where I didn't find sufficient information.
Can anyone help me here?

This exercise appears to be trying to teach you about array dimensions.
array has 3 arguments:
args(array)
#function (data = NA, dim = length(data), dimnames = NULL)
data = is the data to be put in the array. Replacement is allowed.
dim = an integer vector giving the "maximal indices in each dimension"
dimnames = is a list of character vectors each as long as the corresponding dimension. (As an aside, the character vectors themselves can also be named)
Thus, the following would get pretty close to your desired output:
A = array(data = c(TRUE,FALSE),
dim = c(2,5,3),
dimnames = list(c("a","b"), c("C1","C2","C3","C4","C5"),c("X","Y","Z")))
str(A)
# logi [1:2, 1:5, 1:3] TRUE FALSE TRUE FALSE TRUE FALSE ...
# - attr(*, "dimnames")=List of 3
# ..$ : chr [1:2] "a" "b"
# ..$ : chr [1:5] "C1" "C2" "C3" "C4" ...
# ..$ : chr [1:3] "X" "Y" "Z"
However, I do not see a way for str to print TRUE FALSE TRUE FALSE TRUE FALSE ... while also being class num. Perhaps the lesson is incorrect.
You could also try your approach, but use dimnames(A)[3] to assign the third dimension's names:
dimnames(A)[3] <- list(c("X","Y","Z"))

Related

Search list of words in character string

I have a list of 3, named listWords:
$ : chr [1:6] "Maintenance" "repair" "installation" "activities" ...
$ : chr [1:19] "Manufacture" "specific" "equipment" "energy" ...
$ : chr [1:14] "Manufacture" "discharge" "lamps" "pressure" ...
I have another list, named wordsCP (for example)
$ : chr [1:3] "Cauliflowers" "and" "broccoli"
$ : chr "Lettuce"
and I would like to search the items in words CP that would contain at least 2 or 3 words from the listWords.
How I can do that?
Indeed as a results, I should have a row number for both lists and then get that words in row 1 of listWords can be found in rows x, y or z of wordsCP.
The below will give you a list with all elements of wordsCP that match 2 words or more in any single element of listWords
listWords <- list(c("please", "make", "a", "reprex"),
c("check", "https://stackoverflow.com/a/5965451/5224236"))
wordsCP <- list(c("a", "reprex"),
c("will", "get", "you", "better", "answers"),
c("check", "https://stackoverflow.com/a/5965451/5224236"))
match_matrix <- as.data.frame(sapply(wordsCP, function(x) sapply(listWords, function(y) sum(x %in% y)>=2)))
matches <- sapply(match_matrix, any)
wordsCP[matches]
[[1]]
[1] "a" "reprex"
[[2]]
[1] "check"
[2] "https://stackoverflow.com/a/5965451/5224236"

Why do identical row names yield different results on barplot axis labels? [duplicate]

I've come across a strange behavior when playing with some dataframes: when I create two identical dataframes a,b, then swap their rownames around, they don't come out as identical:
rm(list=ls())
a <- data.frame(a=c(1,2,3),b=c(2,3,4))
b <- a
identical(a,b)
#TRUE
identical(rownames(a),rownames(b))
#TRUE
rownames(b) <- rownames(a)
identical(a,b)
#FALSE
Can anyone reproduce/explain why?
This is admittedly a bit confusing. Starting with ?data.frame we see that:
If row.names was supplied as NULL or no suitable component was found
the row names are the integer sequence starting at one (and such row
names are considered to be ‘automatic’, and not preserved by
as.matrix).
So initially a and b each have an attribute called row.names that are integers:
> str(attributes(a))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: int [1:3] 1 2 3
$ class : chr "data.frame"
But rownames() returns a character vector (as does dimnames(), actually a list of character vectors, called under the hood). So after reassigning the row names you end up with:
> str(attributes(b))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: chr [1:3] "1" "2" "3"
$ class : chr "data.frame"

Why do identical dataframes become different when changing rownames to the same

I've come across a strange behavior when playing with some dataframes: when I create two identical dataframes a,b, then swap their rownames around, they don't come out as identical:
rm(list=ls())
a <- data.frame(a=c(1,2,3),b=c(2,3,4))
b <- a
identical(a,b)
#TRUE
identical(rownames(a),rownames(b))
#TRUE
rownames(b) <- rownames(a)
identical(a,b)
#FALSE
Can anyone reproduce/explain why?
This is admittedly a bit confusing. Starting with ?data.frame we see that:
If row.names was supplied as NULL or no suitable component was found
the row names are the integer sequence starting at one (and such row
names are considered to be ‘automatic’, and not preserved by
as.matrix).
So initially a and b each have an attribute called row.names that are integers:
> str(attributes(a))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: int [1:3] 1 2 3
$ class : chr "data.frame"
But rownames() returns a character vector (as does dimnames(), actually a list of character vectors, called under the hood). So after reassigning the row names you end up with:
> str(attributes(b))
List of 3
$ names : chr [1:2] "a" "b"
$ row.names: chr [1:3] "1" "2" "3"
$ class : chr "data.frame"

How to get dictionary functionality from a data.frame in R

R doesn't seem to have a dictionary structure. Let's say I have a data.frame of people who have unique first names (keys):
people = data.frame(c("Bob", "Jones"), c("Sally", "Smith"));
names(people) = c("Firstname", "Surname");
I want to know what Sally's Surname is, only knowing her Firstname.
I could write some ugly code that traverses people$Firstname, keeping track of an index, and then fetching people$Lastname at that index once I find a match, but this probably isn't the right way.
What's the "R way" to do this?
I don't think your data frame is crafted the way you think it is. Using this one, it's pretty simple:
people <- data.frame(Firstname=c("Bob", "Sally"),
Surname=c("Jones", "Smith"),
stringsAsFactors=FALSE)
people[people$Firstname=="Sally",]$Surname
## [1] Smith
You could also craft it as a list:
ppl <- list("Bob"=list(Surname="Jones"),
"Sally"=list(Surname="Smith"))
ppl[["Bob"]]
## $Surname
## [1] "Jones"
For fun, for this particular example that you've provided, you can also use the person function in base R. Here, I've used as.person:
people <- c(as.person("Bob Jones"), as.person("Sally Smith"))
str(people)
# List of 2
# $ :Class 'person' hidden list of 1
# ..$ :List of 5
# .. ..$ given : chr "Bob"
# .. ..$ family : chr "Jones"
# .. ..$ role : NULL
# .. ..$ email : NULL
# .. ..$ comment: NULL
# $ :Class 'person' hidden list of 1
# ..$ :List of 5
# .. ..$ given : chr "Sally"
# .. ..$ family : chr "Smith"
# .. ..$ role : NULL
# .. ..$ email : NULL
# .. ..$ comment: NULL
# - attr(*, "class")= chr "person"
people$given
# [[1]]
# [1] "Bob"
#
# [[2]]
# [1] "Sally"
people[people$given == "Bob"]
# [1] "Bob Jones"
people[people$given == "Bob"]$family
# [1] "Jones"

Dynamic list creation

I am vexed by the way R dynamically creates lists and I am hoping someone can help me understand what is going on and what to do to fix my code. My problem is that for assignments of a vector of length one, a named vector is assigned, but assignments of a vector of length greater than one, a list is assigned. My desired outcome is that a list is assigned no matter the length of the vector I am assigning. How do I achieve such a result?
For example,
types <- c("a", "b")
lst <- vector("list", length(types))
names(lst) <- types
str(lst)
List of 2
$ a: NULL
$ b: NULL
lst$a[["foo"]] <- "hi"
lst$b[["foo"]] <- c("hi", "SO")
str(lst)
List of 2
$ a: Named chr "hi"
..- attr(*, "names")= chr "foo"
$ b:List of 1
..$ foo: chr [1:2] "hi" "SO"
str(lst$a)
Named chr "hi"
- attr(*, "names")= chr "foo"
str(lst$b)
List of 1
$ foo: chr [1:2] "hi" "SO"
What I want to have as the outcome is a data structure that looks like this.
List of 2
$ a:List of 1
..$ foo: chr [1] "hi"
$ b:List of 1
..$ foo: chr [1:2] "hi" "SO"
While I also find it surprising, it is documented in ?[[:
Recursive (list-like) objects:
[...]
When ‘$<-’ is applied to a ‘NULL’ ‘x’, it first coerces ‘x’ to
‘list()’. This is what also happens with ‘[[<-’ if the
replacement value ‘value’ is of length greater than one: if
‘value’ has length 1 or 0, ‘x’ is first coerced to a zero-length
vector of the type of ‘value’.
To override that behavior, you could specifically create empty lists before dynamically assigning to them:
lst$a <- list()
lst$b <- list()
or like Josh suggested below, replace your lst <- vector("list", length(types)) with lst <- replicate(length(types), list()).
Now that ‘x’ (lst$a or lst$b) is not ‘NULL’ but an empty list, your code should work as you expected:
lst$a[["foo"]] <- "hi"
lst$b[["foo"]] <- c("hi", "SO")
str(lst)
# List of 2
# $ a:List of 1
# ..$ foo: chr "hi"
# $ b:List of 1
# ..$ foo: chr [1:2] "hi" "SO"
I think you just need to create the types you want and assign them:
R> qq <- list( a=list(foo="Hi"), b=list(foo=c("Hi", "SO")))
R> qq
$a
$a$foo
[1] "Hi"
$b
$b$foo
[1] "Hi" "SO"
R>
where all your requirements are met:
R> class(qq)
[1] "list"
R> names(qq)
[1] "a" "b"
R> sapply(qq, names)
a b
"foo" "foo"
R>

Resources