Subsetting of Lists in R - r

I had a few questions about subsetting a named list in R using the [] operator:
For example, consider the list formals <- list(x = DOUBLE, y = DOUBLE, z = NULL). In this example, DOUBLE is treated as a symbol in R.
1) How should I retrieve all elements that are not equal to NULL. I tried formals[formals != NULL] but this only returns an object of type listwith no members.
2) How should I retrieve elements whose names satisfy for a condition. For example, how would I get all elements whose names are not z? I could use names(formals) but this is cumbersome and I was hoping for a quick solution using [].

Another option for the first question:
Filter(Negate(is.null), formals)
For the second case, you'll have to use names. Here's one way:
formals[names(formals) != 'z']
formals is actually a function in R. It's best to avoid names of functions when naming your variables.

This will work for your first question:
formals[!unlist(lapply(formals, is.null))]
I don't think you can avoid using names for the second question.

Related

When creating new data.frame column, what is the difference between `df$NewCol=` and `df[,"NewCol"]=` methods?

Using the default "iris" DataFrame in R, how come when creating a new column "NewCol"
iris[,'NewCol'] = as.POSIXlt(Sys.Date()) # throws Warning
BUT
iris$NewCol = as.POSIXlt(Sys.Date()) # is correct
This issue doesn't exist when assigning Primitive types like chr, int, float, ....
First, notice as #sindri_baldur pointed, as.POSIXlt returns a list.
From R help ($<-.data.frame):
There is no data.frame method for $, so x$name uses the default method which treats x as a list (with partial matching of column names if the match is unique, see Extract). The replacement method (for $) checks value for the correct number of rows, and replicates it if necessary.
So, if You try iris[, "NewCol"] <- as.POSIClt(Sys.Date()) You get warning that You're trying assign a list object to a vector. So only the first element of the list is used.
Again, from R help:
"For [ the replacement value can be a list: each element of the list is used to replace (part of) one column, recycling the list as necessary".
And in Your case, only one column is specified meaning only the first element of the as.POSIXlt's result (list) will be used. And You are warned of that.
Using $ syntax the iris data.frame is treated as a list and then the result of as.POSIXlt - a list again - is appended to it. Finally, the result is data.frame, but take a look at the type of the NewCol - it's a list.
iris[, "NewCol"] <- as.POSIXlt(Sys.Date()) # warning
iris$NewCol2 <- as.POSIXlt(Sys.Date())
typeof(iris$NewCol) # double
typeof(iris$NewCol2) # list
Suggestion: maybe You wanted to use as.POSIXct()?

Question regarding using function c() in R coding

I am studying Data analyitcs, my teacher give a question for class "using one-sigma to find any outlier in vector D". He gave his answere as below but I do not understand why he called Out=c() before using function "for" and called "Out" again in fumction c(Out,o)? Could you help me answer this question? Thank you!
D=c(4,6,1,2,8,11)
xbar=mean(D)
std=sd(D)
L=xbar-std
U=xbar+std
Out=c()
for(j in 1:length(D)){
if(D[j]<L | D[j]>U) {o=D[j]} else{o=NULL} Out=c(Out,o)}
Out=c() is your output. It's just an empty dataframe in the beginning. The for loop is iterating j in the length D. So for each j observation, it's performing the conditional statement if(D[j]<L | D[j]>U) {o=D[j]} else{o=NULL} and then putting the results in the Output Out Hope this helps.
The need for an object named Out to exist before entering a for-loop is based on how the c and [<- functions are designed. They need names to exist in the table of objects that the R interpreter maintains, You used "=" but in that context it is really the <- function, the assignment operator, that is being used. The code in the question doesn't appear to use that operator, but it is actually being called when the "=" sign in Out=c(Out,o) is used. You cannot assign a value to the Out on the LHS of the assignment by appending to it because the Out in the RHS doesn't already a value (not even a value of length-0) within the R data objects list when thec-function tries to access its value.
The <- operator is really a function disguised as an infix operator. You can demonstrate this with:
`<-`(my.out , 4)
> my.out
[1] 4
It also has an indexed assignment version [<- which requires that the named object on the LHS exist. This is another source of error for for-loop users. If the named LHS object given to [<- doesn't exist at the time the loop is run, then the first time through the loop you will get and error:
rm(my.out2) #make sure it doesn't exist
for (i in 1:10) { my.out2[i] <- 4 } # LHS doesn't exist, but RHS value exists
#Error: object 'my.out2' not found

mpfr'izing a data.frame in R

I'm trying to convert a data.frame in R to mpfr format by multiplying by an mpfr unit constant. This works, as demonstrated in the code below, when applied to a column (result variable 'mpfr_col'), but for both approaches shown for working with a data.frame, it does not. The relevant errors for each attempt are listed in comment.
library(Rmpfr)
prec <- 256
m1 <- mpfr(1,prec)
col_build <- 1:10
test_df <- data.frame(col_build, col_build, col_build)
mpfr_col <- m1*(col_build)
mpfr_df <- m1*test_df # (list) object cannot be coerced to type 'double'
for(colnum in 1:length(colnames(test_df))){
test_df[,colnum] <- m1*test_df[,colnum] # attempt to replicate an object of type 'S4'
}
Answer:
Use [[colnum]] to access the columns instead of [,colnum]:
for(colnum in length(colnames(test_df))){
test_df[[colnum]] <- m1*test_df[[colnum]]
}
(Note: the print method of data.frame will fail, but the 'mpfr-izing' work. You can print it either by printing the columns individually or using as_tibble(test_df).
Explanation
The original fails because the [,colnum] assignment doesn't coerce the argument, I think. Using [[ returns an element (aka a column) of the list (aka the data.frame).
See this bit of Hadley Wickham's Advanced R book:
[ selects sub-lists. It always returns a list; if you use it with a
single positive integer, it returns a list of length one. [[ selects
an element within a list. $ is a convenient shorthand: x$y is
equivalent to x[["y"]].
And the help from Extract.data.frame {base}:
When [ and [[ are used to add or replace a whole column, no coercion
takes place but value will be replicated (by calling the generic
function rep) to the right length if an exact number of repeats can be
used.

Convert character vector to numeric vector in R for value assignment?

I have:
z = data.frame(x1=a, x2=b, x3=c, etc)
I am trying to do:
for (i in 1:10)
{
paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="")
}
Problems:
paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval. Neither seemed to work.
paste(c('N'),i,sep="") yields "N1", "N2". I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5, ie "N5" -> 5 instead of N5 -> 5, I get target of assignment expands to non-language object.
This task is pretty trivial since I can simply do:
N1 = x1...
N2 = x2...
etc, but I want to learn something new
I'd suggest using something like for( i in 1:10 ) z[,i] <- N[,i]...
BUT, since you said you want to learn something new, you can play around with parse and substitute.
NOTE: these little tools are funny, but experienced users (not me) avoid them.
This is called "computing on the language". It's very interesting, and it helps understanding the way R works. Let me try to give an intro:
The basic language construct is a constant, like a numeric or character vector. It is trivial because it is not different from its "unevaluated" version, but it is one of the building blocks for more complicated expressions.
The (officially) basic language object is the symbol, also known as a name. It's nothing but a pointer to another object, i.e., a token that identifies another object which may or may not exist. For instance, if you run x <- 10, then x is a symbol that refers to the value 10. In other words, evaluating the symbol x yields the numeric vector 10. Evaluating a non-existant symbol yields an error.
A symbol looks like a character string, but it is not. You can turn a string into a symbol with as.symbol("x").
The next language object is the call. This is a recursive object, implemented as a list whose elements are either constants, symbols, or another calls. The first element must not be a constant, because it must evaluate to the real function that will be called. The other elements are the arguments to this function.
If the first argument does not evaluate to an existing function, R will throw either Error: attempt to apply non-function or Error: could not find function "x" (if the first argument is a symbol that is undefined or points to something other than a function).
Example: the code line f(x, y+z, 2) will be parsed as a list of 4 elements, the first being f (as a symbol), the second being x (another symbol), the third another call, and the fourth a numeric constant. The third element y+z, is just a function with two arguments, so it parses as a list of three names: '+', y and z.
Finally, there is also the expression object, that is a list of calls/symbols/constants, that are meant to be evaluated one by one.
You'll find lots of information here:
https://github.com/hadley/devtools/wiki/Computing-on-the-language
OK, now let's get back to your question :-)
What you have tried does not work because the output of paste is a character string, and the assignment function expects as its first argument something that evaluates to a symbol, to be either created or modified. Alternativelly, the first argument can also evaluate to a call associated with a replacement function. These are a little trickier, but they are handled by the assignment function itself, not by the parser.
The error message you see, target of assignment expands to non-language object, is triggered by the assignment function, precisely because your target evaluates to a string.
We can fix that building up a call that has the symbols you want in the right places. The most "brute force" method is to put everything inside a string and use parse:
parse(text=paste('N',i," -> ",'z$x',i,sep=""))
Another way to get there is to use substitute:
substitute(x -> y, list(x=as.symbol(paste("N",i,sep="")), y=substitute(z$w, list(w=paste("x",i,sep="")))))
the inner substitute creates the calls z$x1, z$x2 etc. The outer substitute puts this call as the taget of the assignment, and the symbols N1, N2 etc as the values.
parse results in an expression, and substitute in a call. Both can be passed to eval to get the same result.
Just one final note: I repeat that all this is intended as a didactic example, to help understanding the inner workings of the language, but it is far from good programming practice to use parse and substitute, except when there is really no alternative.
A data.frame is a named list. It usually good practice, and idiomatically R-ish not to have lots of objects in the global environment, but to have related (or similar) objects in lists and to use lapply etc.
You could use list2env to multiassign the named elements of your list (the columns in your data.frame) to the global environment
DD <- data.frame(x = 1:3, y = letters[1:3], z = 3:1)
list2env(DD, envir = parent.frame())
## <environment: R_GlobalEnv>
## ta da, x, y and z now exist within the global environment
x
## [1] 1 2 3
y
## [1] a b c
## Levels: a b c
z
## [1] 3 2 1
I am not exactly sure what you are trying to accomplish. But here is a guess:
### Create a data.frame using the alphabet
data <- data.frame(x = 'a', y = 'b', z = 'c')
### Create a numerical index corresponding to the letter position in the alphabet
index <- which(tolower(letters[1:26]) == data[1, ])
### Use an 'lapply' to apply a function to every element in 'index'; creates a list
val <- lapply(index, function(x) {
paste('N', x, sep = '')
})
### Assign names to our list
names(val) <- names(data)
### Observe the result
val$x

Different beheavior of get and mget in aggregation (R)

I have an character array (chr [1:5] named keynn) of column names on which I would like to perform an aggregation.
All elements of the array is a valid column name of the data frame (mydata), but it is a string and not the variable ("YEAR" instead of mydata$YEAR).
I tried using get() to return the column from the name and it works, for the first element, like so:
attach(mydata)
aggregate(mydata, by=list(get(keynn, .GlobalEnv)), FUN=length)
I tried using mget() since my array as more than one element, like this:
attach(mydata)
aggregate(mydata, by=list(mget(keynn, .GlobalEnv)), FUN=length)
but I get an error:
value for 'YEAR' not found.
How can I get the equivalent of get for multiple columns to aggregate by?
Thank you!
I would suggest not using attach in general
If you are just trying to get columns from mydata you can use [ to index the list
aggregate(mydata, by = mydata[keynn], FUN = length)
should work -- and is very clear that you want to get keynn from mydata
The problem with using attach is that it adds mydata to the search path (not copying to the global environment)
try
attach(mydata)
mget(keynn, .GlobalEnv)
so if you were to use mget and attach, you need
mget(keynn, .GlobalEnv, inherits = TRUE)
so that it will not just search in the global environment.
But that is more effort than it is worth (IMHO)
The reason get works is that inherits = TRUE by default. You could thus use lapply(keynn, get) if mydata were attached, but again this ugly and unclear about what it is doing.
another approach would be to use data.table, which will evaluate the by argument within the data.table in question
library(data.table)
DT <- data.table(mydata)
DT[, {what you want to aggregate} , by =keynn]
Note that keynn doesn't need to be a character vector of names, it can be a list of names or a named list of functions of names etc

Resources