Different beheavior of get and mget in aggregation (R) - r

I have an character array (chr [1:5] named keynn) of column names on which I would like to perform an aggregation.
All elements of the array is a valid column name of the data frame (mydata), but it is a string and not the variable ("YEAR" instead of mydata$YEAR).
I tried using get() to return the column from the name and it works, for the first element, like so:
attach(mydata)
aggregate(mydata, by=list(get(keynn, .GlobalEnv)), FUN=length)
I tried using mget() since my array as more than one element, like this:
attach(mydata)
aggregate(mydata, by=list(mget(keynn, .GlobalEnv)), FUN=length)
but I get an error:
value for 'YEAR' not found.
How can I get the equivalent of get for multiple columns to aggregate by?
Thank you!

I would suggest not using attach in general
If you are just trying to get columns from mydata you can use [ to index the list
aggregate(mydata, by = mydata[keynn], FUN = length)
should work -- and is very clear that you want to get keynn from mydata
The problem with using attach is that it adds mydata to the search path (not copying to the global environment)
try
attach(mydata)
mget(keynn, .GlobalEnv)
so if you were to use mget and attach, you need
mget(keynn, .GlobalEnv, inherits = TRUE)
so that it will not just search in the global environment.
But that is more effort than it is worth (IMHO)
The reason get works is that inherits = TRUE by default. You could thus use lapply(keynn, get) if mydata were attached, but again this ugly and unclear about what it is doing.
another approach would be to use data.table, which will evaluate the by argument within the data.table in question
library(data.table)
DT <- data.table(mydata)
DT[, {what you want to aggregate} , by =keynn]
Note that keynn doesn't need to be a character vector of names, it can be a list of names or a named list of functions of names etc

Related

When creating new data.frame column, what is the difference between `df$NewCol=` and `df[,"NewCol"]=` methods?

Using the default "iris" DataFrame in R, how come when creating a new column "NewCol"
iris[,'NewCol'] = as.POSIXlt(Sys.Date()) # throws Warning
BUT
iris$NewCol = as.POSIXlt(Sys.Date()) # is correct
This issue doesn't exist when assigning Primitive types like chr, int, float, ....
First, notice as #sindri_baldur pointed, as.POSIXlt returns a list.
From R help ($<-.data.frame):
There is no data.frame method for $, so x$name uses the default method which treats x as a list (with partial matching of column names if the match is unique, see Extract). The replacement method (for $) checks value for the correct number of rows, and replicates it if necessary.
So, if You try iris[, "NewCol"] <- as.POSIClt(Sys.Date()) You get warning that You're trying assign a list object to a vector. So only the first element of the list is used.
Again, from R help:
"For [ the replacement value can be a list: each element of the list is used to replace (part of) one column, recycling the list as necessary".
And in Your case, only one column is specified meaning only the first element of the as.POSIXlt's result (list) will be used. And You are warned of that.
Using $ syntax the iris data.frame is treated as a list and then the result of as.POSIXlt - a list again - is appended to it. Finally, the result is data.frame, but take a look at the type of the NewCol - it's a list.
iris[, "NewCol"] <- as.POSIXlt(Sys.Date()) # warning
iris$NewCol2 <- as.POSIXlt(Sys.Date())
typeof(iris$NewCol) # double
typeof(iris$NewCol2) # list
Suggestion: maybe You wanted to use as.POSIXct()?

Parameterize name of output dataframe in global environment, assigned to from a function

Trying to pass into a function what I want it to name the dataframe it creates, then save it to global environment.
I am trying to automate creating dataframes that are subsets of other dataframes by filtering for a value; since I'm creating 43 of these, I'm writing a function that can automatically:
a) subset rows containing a certain string into it's own data.frame then
b) name a dataframe after that string and save it to my global environment. (The string in a) is also the suffix I want it to name the data.frame after in b))
I can do a) fine but am having trouble with b).
Say I have a dataset which includes a column named "Team" (detailing whose team that member belongs to):
original.df <- read_csv("../original_data_set")
I create a function to split that dataset according to values in one of its columns...
split.function <- function(string){
x <- original.df
as.name(string) <<- filter(x, str_detect(`Team`, string))
}
... then save the dataframe with the name:
split.by.candidate('Team.Curt')
I keep getting:
> Error in as.name(x) <<- filter(y, str_detect(`Receiving Committee`, x)) :
object 'x' not found
But I just want to see Team.Curt saved as a data.frame in my global environment when I do this with rows including the term Team.Curt
You can use assign to create objects based on a string:
split.function <- function(string){
x <- original.df
assign(string, filter(x, str_detect(`Team`, string)), envir = .GlobalEnv)
}
Here, envir = .GlobalEnv is used to assign the value to the global environment.
Both <- and <<- assignments require that the statement hardcodes the object name. Since you want to parameterize the name, as in your cases, you must use assign().
<<- is merely a variant of <- that can be used inside a function, and does a bottom-up search of environments until it either reaches the top (.GlobalEnv) or finds an existing object of that name. In your case that's unnecessary and slightly dangerous, since if an object of that name existed in some environment halfway up the hierarchy, you'd pick it up and assign to it instead.
So just use assign(..., envir = .GlobalEnv) instead.
But both <<- or assigning directly into .GlobalEnv within functions are strongly discouraged as being disasters in waiting, or "life by a volcano" (burns-stat.com/pages/Tutor/R_inferno.pdf). See the caveats at Assign multiple objects to .GlobalEnv from within a function. tidyverse is probably a better approach for managing multiple dataframes.

Renaming an unnamed variable with dplyr

I have to read a bunch of .xlsx files into R, which I do with readxl::read_excel(). Each of these files does not give a variable name for the first column. Since there are plenty of files, I do not want to change those manually.
In order to process the data properly, it is necessary to give these first columns a name. In the end, I want to write a function that I can call for each of these .xlsx files (e.g. using purrr:map) and within this function I would prefer to get a single pipe as a solution.
Unfortunately, dplyr::rename(df, timeseries = ``) throws the following error:
Error: attempt to use zero-length variable name
Using the column index (dplyr::rename(df, timeseries = 1)) does not work either:
Error: Arguments to rename() must be unquoted variable names.
Argument timeseries is not.
How can I avoid to interrupt the pipe in order to rename the variable by names(df)[1] <- "timeseries"?
This can be accomplished with dplyr::select() in the following way:
select(df, timeseries = 1, everything())
Obviously, dplyr::select() can handle column indices, which allows this solution.
Please comment if you are aware of any particular reason why this is not possible with dplyr:rename()!
If you want to use rename and a column index (in this case 1), you can do
rename_(df, timeseries = names(df)[1])
When chaining, use a dot:
df %>% ... %>% rename_(timeseries = names(.)[1])

Using string as object in R

I have variable a as string:
a = "jul_0_baseline,jul_1_baseline,...jul_11_baseline,jul_12_baseline"
When I try to merge the following zoo series to one table using:
temp <- merge(jul_0_baseline,jul_1_baseline,...jul_11_baseline,jul_12_baseline)
it works, however when I try to merge it using
temp <- merge(a)
I get an error as it the variable a is a string (even though the text is correct). I am assuming that it is effectively inputting
temp <- merge("jul_0_baseline,jul_1_baseline,...jul_11_baseline,jul_12_baseline")
Any help would be greatly appreciated
a is a string because it is created using the code:
a <- paste("jul","0","baseline",sep = "_")
for (d in 1:12){ b <- paste("jul",d,"baseline",sep = "_")
a <- paste(a,b, sep=",")
}
Form the entire command string (including merge) and then parse and evaluate it:
eval(parse(text = sprintf("merge(%s)", a)))
Since each jul_i_baseline listed in the string a corresponds to an actual object, you can do this:
temp <- Reduce(function(...) merge(..., all=T), mget(strsplit(a, ",")[[1]]))
The strsplit() function splits a into a vector of strings where each element is "jul_i_baseline". It returns a one-element list, so we can use [[1]] to get the vector of strings.
mget() interprets the list of variables in strings as objects. It returns a list where each element corresponds to the object. So each element is the actual Zoo object jul_i_baseline.
Reduce(function(...) merge(..., all=T), <list>) merges the objects stored in each element of the list. Assuming the objects have a common variable on which you want to merge, you can also add a by variable in merge().
An alternate approach as suggested in the comments is to use do.call() which would work since you're dealing with Zoo objects. (The former approach works with non-Zoo objects as well but this does not.) The command would be structured like so:
temp <- do.call(merge, mget(strsplit(c, ",")[[1]]))
Again we're getting the objects using mget() and strsplit().
#G.Grothendiek's suggestion of using eval(parse(...)) also works in this situation. However, many R users discourage the use of eval(parse(...)) in general. See here.

Subsetting of Lists in R

I had a few questions about subsetting a named list in R using the [] operator:
For example, consider the list formals <- list(x = DOUBLE, y = DOUBLE, z = NULL). In this example, DOUBLE is treated as a symbol in R.
1) How should I retrieve all elements that are not equal to NULL. I tried formals[formals != NULL] but this only returns an object of type listwith no members.
2) How should I retrieve elements whose names satisfy for a condition. For example, how would I get all elements whose names are not z? I could use names(formals) but this is cumbersome and I was hoping for a quick solution using [].
Another option for the first question:
Filter(Negate(is.null), formals)
For the second case, you'll have to use names. Here's one way:
formals[names(formals) != 'z']
formals is actually a function in R. It's best to avoid names of functions when naming your variables.
This will work for your first question:
formals[!unlist(lapply(formals, is.null))]
I don't think you can avoid using names for the second question.

Resources