Can "assign()" and "get()" be written more concisely? - r

Below is my code. I use an extra variation "tmp" to clean the "ABC_Chla". Because the "Location_name" can change, I use "assign()" and "get()" function.
Location_name <- "ABC_"
tmp <- get(paste(Location_name,"DO",sep = "")) %>% filter(log.DO != -Inf)
assign(paste(Location_name,"DO",sep = ""), tmp)
My code can achieve this goal, but it seems not concise (introduce a temporary variable). Is there a better way?

Assuming the inputs shown reproducibly in the Note at the end (next time please make sure your question includes complete reproducible code including inputs) we can make the following changes:
use paste0 instead of paste
create a variable locname to hold the name of the data frame and a variable e to be the environment where our data frame is located
use e[[...]] instead of get and assign
use magrittr %<>% two-way pipe
possibly use filter(is.finite(log.DO)) -- not shown below
giving this code:
library(dplyr)
library(magrittr)
e <- .GlobalEnv # change if our data frame is in some other environment
locname <- paste0(Location_name, "DO")
e[[locname]] %<>%
filter(log.DO != -Inf)
The result is:
get(locname, e)
## log.DO
## 1 1
## 2 2
Alternative
This alternative only uses ordinary pipes. We use e and locname from above.
library(dplyr)
e[[locname]] <- e[[locname]] %>%
filter(log.DO != -Inf)
Note
Test input:
ABC_DO <- data.frame(log.DO = c(1, -Inf, 2))
Location_name <- "ABC_"

You only have a temporary variable because you store the data in tmp, i don't see it as a problem.But, n this case, the only thing that i see you can do is pass the code of tmp directly to assign, like:
assign(
paste(Location_name,"DO",sep = ""),
get(paste(Location_name,"DO",sep = "")) %>% filter(log.DO != -Inf)
)

Related

Save a dataframe name and then reference that object in subsequent code

Would like to reference a dataframe name stored in an object, such as:
dfName <- 'mydf1'
dfName <- data.frame(c(x = 5)) #want dfName to resolve to 'mydf1', not create a dataframe named 'dfName'
mydf1
Instead, I get: Error: object 'mydf1' not found
CORRECTED SCENARIO:
olddf <- data.frame(c(y = 8))
mydf1 <- data.frame(c(x = 5))
assign('dfName', mydf1)
dfName <- olddf #why isnt this the same as doing "mydf1 <- olddf"?
I don't want to reference an actual dataframe named "dfName", rather "mydf1".
UPDATE
I have found a clunky workaround for what I wanted to do. The code is:
olddf <- data.frame(x = 8)
olddfName <- 'olddf'
newdfName <- 'mydf1'
statement <- paste(newdfName, "<-", olddfName, sep = " ")
writeLines(statement, "mycode.R")
source("mycode.R")
Anyone have a more elegant way, especially without resorting to a write/source?
I am guessing you want to store multiple data.frames in a loop or similar. In that case it is much more efficient and better to store them in a named list. However, you can achieve your goal with assign
assign('mydf1', data.frame(x = 5))
mydf1
x
1 5

Refer to a variable by pasting strings then make changes and see them refrelcted in the original variable

my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)

Can I write a function to revalue levels of a factor?

I have a column 'lg_with_children' in my data frame that has 5 levels, 'Half and half', 'Mandarin', 'Shanghainese', 'Other', 'N/A', and 'Not important'. I want to condense the 5 levels down to just 2 levels, 'Shanghainese' and 'Other'.
In order to do this I used the revalue() function from the plyr package to successfully rename the levels. I used the code below and it worked fine.
data$lg_with_children <- revalue(data$lg_with_children,
c("Mandarin" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
c("Half and half" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
c("N/A" = "Other"))
data$lg_with_children <- revalue(data$lg_with_children,
c("Not important" = "Other"))
To condense the code a little I went back data before I revalued the levels and attempted to write a function. I tried the following after doing research on how to write your own functions (I'm rather new at this).
revalue_factor_levels <- function(df, col, source, target) {df$col <- revalue(df$col, c("source" = "target"))}
I intentionally left the df, col, source, and target generic because I need to revalue some other columns in the same way.
Next, I tried to run the code filling in the args and get this message:
warning message
I am not quite sure what the problem is. I tried the following adjustment to code and still nothing.
revalue_factor_levels <- function(df, col, source, target) {df$col <- revalue(df$col, c(source = target))}
Any guidance is appreciated. Thanks.
You can write your function to recode the levels - the easiest way to do that is probably to change the levels directly with levels(fac) <- list(new_lvl1 = c(old_lvl1, old_lvl2), new_lvl2 = c(old_lvl3, old_lvl4))
But there are already several functions that do it out of the box. I typically use the forcats package to manipulate factors.
Check out fct_recode from the forcats package. Link to doc.
There are also other functions that could help you - check out the comments below.
Now, as to why your code isn't working:
df$col looks for a column literally named col. The workaround is to do df[[col]] instead.
Don't forget to return df at the end of your function
c(source = target) will create a vector with one element named "source", regardless of what happens to be in the variable source.
The solution is to create the vector c(source = target) in 2 steps.
revalue_factor_levels <- function(df, col, source, target) {
to_rename <- target
names(to_rename) <- source
df[[col]] <- revalue(df[[col]], to_rename)
df
}
Returning the df means the syntax is:
data <- revalue_factor_levels(data, "lg_with_children", "Mandarin", "Other")
I like functions that take the data as the first argument and return the modified data because they are pipeable.
library(dplyr)
data <- data %>%
revalue_factor_levels("lg_with_children", "Mandarin", "Other") %>%
revalue_factor_levels("lg_with_children", "Half and half", "Other") %>%
revalue_factor_levels("lg_with_children", "N/A", "Other")
Still, using forcats is easier and less prone to breaking on edge cases.
Edit:
There is nothing preventing you from both using forcats and creating your custom function. For example, this is closer to what you want to achieve:
revalue_factor_levels <- function(df, col, ref_level) {
df[[col]] <- forcats::fct_others(df[[col]], keep = ref_level)
df
}
# Will keep Shanghaisese and revalue other levels to "Other".
data <- revalue_factor_levels(data, "lg_with_children", "Shanghainese")
Here is what I ended up with thanks to help from the community.
revalue_factor_levels <- function(df, col, ref_level) {
df[[col]] <- fct_other(df[[col]], keep = ref_level)
df
}
data <- revalue_factor_levels(data, "lg_with_children", "Shanghainese")

use outside variable inside of rename() function in R

I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.
You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct

zoo create new column with dynamic column name

I am trying to add a column to a zoo object. I found merge which works well
test = zoo(data.frame('x' = c(1,2,3)))
test = merge(test, 'x1' = 0)
However when I try to name the column dynamically, it no longer works
test = merge(test, paste0('x',1) = 0)
Error: unexpected '=' in "merge(test,paste0('x',1) ="
I have been working with data frames and the same syntax works
test = data.frame('x' = c(1,2,3))
test[paste0('x',1)] = 0
Can someone help explain what the problem is and how to get around this?
Try setNames :
setNames( merge(test, 0), c(names(test), paste0("x", 1)) )
or names<-.zoo like this:
test2 <- merge(test, 0)
names(test2) <- c(names(test), paste0("x", 1))
I found this solution very easy and elegant. It uses the eval() function to interpret a string as an R command. Thus, you are completely free to assemble the string exactly the way you want:
test = merge(test, paste0("x",1) = 0)
# does not work (see question)
test[,"x1"] <- 0
# does not work for uninitialized columns
test$x1 <- 0
# works to initialize a new column
# so lets trick R by assembling this command out of strings:
newcolumn <- "x1"
eval(parse(text=paste0("test$",newcolumn," <- 0")))
# welcome test$x1 :-)
Merge expects a string as variable name, it doesn't understand variable names that are return values of functions. Why not
test = zoo(data.frame('x' = c(1,2,3)))
var <- paste0('x',1)
test = merge(test, var = 0)

Resources