Why there are different behaviour between assignment symbol “=” and “<-” in R? - r

I'm assigning values within the output list of a function, like:
nofun = function(sth){
something happening here
metrics = list(
metric1 = value1
metric2 <- value2 )
return(metrics)
}
Once I query metrics, I noticed that the use of <- and = differs: the first ones only assigns the value to a variable with no name (i.e. "x1"= value1), while the second one applies also the correct name (i.e. metric1 = value1).
This behaviour is cited also for data.frame at the bottom of an old more generic question, but there is no explanation of this specific usage case.
It caused me quite many headaches and waste of time before noticing it, but I didn't find any other useful information.
thanks in advance!

To define a named list you have to use the syntax list(name1 = value1, name2 = value2, ...). Elements of the list defined in this way have an attribute name containing their name.
Writing name2 <- value2 assigns value2 to a variable name2. If you write this inside of a list definition (list(name2 <- variable2)) the variable is included in the list but no name attribute is defined. So it is equivalent to:
name2 <- variable2
list(name2)
You can compare both statements:
attributes(list(a=3))
# $names
# [1] "a"
attributes(list(a<-3))
# NULL

Related

How to get an R function to have a global effect on a dataframe?

I have been trying to create a function which will permanently change a value in specific cells of my data frame. I insert the data frame name, row index I wish to change, and the new name as a string. However, the function seems to change the value name within the local environment but not global.
The function is as follows:
#change name function
name_change <- function(df, row, name) {
df[row, 1] = name
return(df[row, 1])
}
E.g. if data frame was:
Name
Column B
Mark
2
Beth
4
The function name_change(df, 2, 'Jess') would change Beth to Jess.
When inserted as raw code it does permanently change the value. But then does not work when used as a function.
df[2, 1] = 'Jess'
Thanks in advance for your time
If you change your function like this:
name_change <- function(df, row, name) {
df[row, 1] = name
return(df)
}
and then assign the result of the function back to the original df, you will get the change you are looking for:
df = name_change(df,2,'Jess')
An alternative to the solutions already provided is to use the superassignment operator <<-. The ordinary assignment <- (or '=' you used) operate in your function's environment only. The superassignment reaches beyond your function's closure and can thus modify the dataframe residing in the global environment. Note, though, this is a quick'n'dirty fix only.
That said, the code would read like this:
#change name function
dirty_name_change <- function(df, row, name) {
df[row, 1] <<- name ## note the double arrow
}
You are returning the value of the cell, not the mutated df. R passes by arguments by value so you can imagine the function modifying a copy of df passed in. The solution is to return the mutated df and reassign it.
Can you pass-by-reference in R?

removing selected variables in R environment

This is certainly a simple question but I can't find a solution.
I want to clean my environment by removing some variables I don't need anymore and keep some others.
I unterstand ls() can list them and ls()[[i]] returns the name of the variable, as a string.
So If I want to remove the 10th, let's say it's the variable age , ls()[[10]] will return "age", and  I would like to do something like rm(ls()[[10]), but it does not work. I can't figure out to force rm(ls([10])) to be be equivalent to rm(age).
I guess I need to force some evaluation of string "age" to return the variable age but can't find the proper function in R documentation.
Thanks if you can help.
The list argument of rm will help you. It accepts a character vector. Consider:
age <- 1
rm(list = "age") # Same effect as rm(age)
age
#Error: object 'age' not found
So running e.g.
rm(list = ls())
will clear all visible objects in the specified environment.
In your case rm(list = ls()[10]) will do what you want. However, note that ls() always returns a sorted character vector, so the 10th entry can change rather easily. You probably want to do the following
objects_to_remove <- c("age", "another_object") # etc
rm(list = objects_to_remove)
How about the following:
1: Grab the list in the environment,
2: Define the items you want to remove,
3: Filter the list by the items you want to remove
4: Then remove them
list <- ls()
to_remove <- c("Item1", "Item2")
list_to_remove <- list[ list %in% to_remove]
list_to_remove
rm(list=list_to_remove)

How to name tables based on the loop they are part of R

I am trying to create a loop, which produces new tables on each loop, I want each table to be called table_loopnumber, and they will need to look at the table created in the previous loop.
I've tested this code for I=1 and it works fine, but it doesn't work as a loop. Any help would be appreciated, as I am very new to R.
for(i in 1:2) {
proj4810_op_iteration_[i+1]<-setDT(proj4810_op_iteration_i[, list(Median_High = median(unlist(.SD), na.rm = TRUE)),
by = list(item1, section,RL_Description_Full,
seed_dept,
Total_DOD,
england_DoD,
scotland_DoD,
wales_DoD,
IOM_DoD,
NI_DoD,
unknown_DoD,
turnover,
baskets,
items,
unit_price,
Ambient_Low,
Bakery_Low ,
Cleaning_Low,
FTN_Low,
Fresh_Low,
FrozPrep_Low,
current_seedprod)])
}
Thanks in advance
you can "Assign a value to a name in an environment" using assign. It takes as a first argument "a variable name, given as a character string." and as the second argument the object you want to assign to that variable name. See ?assign. The opposite (get an object based on its name is get). Hence, the following should work:
for(i in 1:2){
previous <- get(paste0("proj4810_op_iteration_", i-1) # get previous data table
tmp<-setDT(previous[, list(Median_High = median(unlist(.SD), na.rm = TRUE)),
by = list(item1, section,RL_Description_Full,seed_dept,Total_DOD,england_DoD,scotland_DoD,
wales_DoD,IOM_DoD,NI_DoD,unknown_DoD,turnover,baskets,items, unit_price,
Ambient_Low, Bakery_Low ,Cleaning_Low, FTN_Low, Fresh_Low, FrozPrep_Low,
current_seedprod)])
vname <- paste0("proj4810_op_iteration_", i) # name of object to be created # name of the current data table
assign(vname, tmp) # save the data table
}
Of course, for the first loop iteration you need to create an object proj4810_op_iteration_0 before the loop begins, otherwise it won't find anything.
As for the elegance of this approach, I agree more with the list-solution someone else already posted, but if you really want to it this way, this should work.
And please remember for the next time that you ask something, that you provide a minimal reproducible example.
Another approach is to use a list for the dataframes.
Here is a simplified version of your problem solved. It involves calculating the emelents of the fibbonacci sequence. The Desired datatables are
dfList[[1]]
# xnmo xn
# 1 1 2
dfList[[2]]
# xnmo xn
# 1 2 3
dfList[[3]]
# xnmo xn
# 1 3 5
So the first table contains the first and the second part of the sequence. The second table contains the second and the third part of the sequence, etc.
A loop to calculate such tables can be written as follows
dfList = list(data.frame(xnmo=1, xn=1))
for(i in 1:10)
dfList[[i+1]] = data.frame(
xnmo = dfList[[i]]$xn,
xn = dfList[[i]]$xnmo + dfList[[i]]$xn
)

R : rename columns time series data

I am trying to rename the columns of a time series using assign function as follows -
assign(colnames(paste0(<logic_to_get_dataset>)),
c(<logic_to_get_column_names>))
I am getting a warning : In assign(colnames(get(paste0("xvars_", TopVars[j, 1], "_lag", :
only the first element is used as variable name
also, the column name assignment does not happen. I think this is happening because of colnames() function. Is there a workaround ?
The issue is that assign only looks at the first element of the vector.
You can try this, for example:
df = data.frame(x = 1:3, y = 4:2)
within(df, assign(colnames(df),c('a','b'))
You'll notice that R only looks at the first variable, and it tries to reassign the values that are described by those column names to the second value. This behavior is obviously not what you're looking for.
Unfortunately, it's kind of hackey, but you can always use something like this
data.frame.name = get_df()#some function that returns text
data.frame.columns = get_cols()#some function that returns text
eval(parse(text = paste0('colnames(',data.frame.name,') = c(',
paste(data.frame.columns,collapse = ','),')')))
I prefer to avoid doing these kinds of expressions, but it should work as intended.
Here it goes -
temp_var <- paste0('colnames(var_',TopLines[j,1],'_lag',get(paste0('uniqLg_',TopLines[j,1]))[k,],'_',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12 ,
') <- c(gsub( "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'" , "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'__',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12,
'", colnames(var_',TopLines[j,1],'_xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],')))')
print(temp_var )
eval(parse( text=temp_var ))
where TopLines is a data frame with one column and contains a list of lines. The only problem with this method is, I can't test the output of eval unless I actually open the dataset and see if the changes have been affected.

R equivalent to the MATLAB structure?

Is there an R type equivalent to the Matlab structure type?
I have a few named vectors and I try to store them in a data frame. Ideally, I would simply access one element of an object and it would return the named vectors (like a structure in Matlab). I feel that using a data frame is not the right thing to do since it can store the values of the named vectors but not the names when they differ from one vector to the other.
More generally, is it possible to store a bunch of different objects in a single one in R?
Edit: As Joran said I think that list does the job.
l = list()
l$vec1 = namedVector1
l$vec2 = namedVector2
...
If I have a list of names
name1 = 'vec1'
name2 = 'vec2'
is there any way for the interpreter to understand that when I use a variable name like name1, I am not referring to the variable name but to its content? I have tried get(name1) but it does not work.
I could still be wrong about what you're trying to do, but I think this is the best you're going to get in terms of accessing each list element by name:
l <- list(a= 1:3,b = 1:10)
> ind <- "a"
> l[[ind]]
[1] 1 2 3
Namely, you're going to have to use [[ explicitly.

Resources