Adding column using a double loop - r

I have the following double loop:
indexnames = c(a, b, c, d, etc.)
# with
# length(indexnames) = 87
# class(indexnames) = "character"
# (indexnames = indexes I want to add in a column)
files = c(aname, bname, cname, dname, etc.)
# with
# length(files) = 87
# class(files) = "character"
# (files = name of files in the global environment)
Now I want to loop through the two list and add to the files[1] a column of name "index" with the input index[1]. I implemented this the following way:
for(i in files){
for(j in indexnames){
files[i] = cbind(Index = indexnames[j], files[i])
}
}
When I run this, I get an error message of 50 or more warnings.
What am I doing wrong?
Appreciating any help, thanks.

You need to use get() and assign() functions to get the behavior you want.
Actually you don't have to use i or j in name elements when creating loops. It's easier to debug a loop if you name them in a more human readable way. Still let's look at your inner part of the loop.
files[i]
Given files is a vector, you cannot call a specific element by it's value this way (nor you'd want to, since it's just a vector with the name of objects). Instead make "i" cycle through a number vector 'for(i in 1:87)'
for (index in 1:87) {
assign( files[i] , `[[<-`(get(files[i]), 'index', value = indexnames[i] ))
}
I found some help in this answer:
How to use `assign()` or `get()` on specific named column of a dataframe?

Related

Access name of the object that "i" represents when it iterates in a for-loop through a list of objects in R

I have:
directories (let's say two: A and B) that contain files;
two character objects storing the directories (dir_A, dir_B);
a function that takes the directory as argument and returns the list of the names of the files found there (in a convenient way for me that is different from list.files()).
directories <- c(dir_A, dir_B)
read_names <- function(x) {foo}
Using a for-loop, I want to create objects that each contain the list of files of a different directory as given by read_names(). Essentially, I want to use a for-loop to do the equivalent as:
files_A <- read_names(dir_A)
files_B <- read_names(dir_B)
I wrote the loop as follows:
for (i in directories) {
assign(paste("files_", sub('.*\\_', '', deparse(substitute(i))), sep = ""), read_names(i))
}
However, although outside of the for-loop deparse(substitute(dir_A)) returns "dir_A" (and, consequently, the sub() function written as above would return "A"), it seems to me that in the for-loop substitute(i) makes i stop being one of the directories, and just being i.
It follows that deparse(substitute(i)) returns "i" and that the output of the for-loop above is only one object called files_i, which contains the list of the files in the last directory of the iteration because that is the last one that has been overwritten on files_i.
How can I make the for-loop read the name (or part of the name in my case, but it is the same) of the object that i is representing in that moment?
There are two issues here, I think:
How to reference both the name (or index) and the value of each element within a list; and
How to transfer data from a named list into the global (or any) environment.
1. Reference name/index with data
Once you index with for (i in directories), the full context (index, name) of i within directories is lost. Some alternatives:
for (ix in seq_along(directories)) {
directories[[ix]] # the *value*
names(directories)[ix] # the *name*
ix # the *index*
# ...
}
for (nm in names(directories)) {
directories[[nm]] # the *value*
nm # the *name*
match(nm, names(directories)) # the *index*
# ...
}
If you're amenable to Map-like functions (a more idiomatic way of dealing with lists of similar things), then
out <- Map(function(x, nm) {
x # the *value*
nm # the *name*
# ...
}, directories, names(directories))
out <- purrr::imap(directories, function(x, nm) {
x # the *value*
nm # the *name*
# ...
})
# there are other ways to identify the function in `purrr::` functions
Note: while it is quite easy to use match within these last two to get the index, it is a minor scope-breach that I prefer to avoid when reasonable. It works, I just prefer alternative methods. If you want the value, name, and index, then
out <- Map(function(x, nm, ix) {
x # the *value*
nm # the *name*
ix # the *index*
# ...
}, directories, names(directories), seq_along(directories))
2. Transfer list to env
In your question, you're doing this to assign variables within a list into another environment. Some thoughts on that effort:
If they are all similar (the same structure, different data), then Don't. Keep them in a list and work on them en toto using lapply or similar. (How do I make a list of data frames?)
If you truly need to move them from a list to the global environment, then perhaps list2env is useful here.
# create my fake data
directories <- list(a=1, b=2)
# this is your renaming step, rename before storing in the global env
# ... not required unless you have no names or want/need different names
names(directories) <- paste0("files_", names(directories))
# here the bulk of the work; you can safely ignore the return value
list2env(directories, envir = .GlobalEnv)
# <environment: R_GlobalEnv>
ls()
# [1] "directories" "files_a" "files_b"
files_a
# [1] 1

R - Use names in a list to feed named objects to a loop?

I have a data frame of some 90 financial symbols (will use 3 for simplicity)
> View(syM)
symbol
1 APPL
2 YAHOO
3 IBM
I created a function that gets JSON data for these symbols and produce an output. Basically:
nX <- function(x) {
#get data for "x", format it, and store it in "nX"
nX <- x
return(nX)
}
I used a loop to get the data and store the zoo series named after each symbol accordingly.
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,]),
value = nX(x = syM[i,]))
Sys.sleep(time = 1)
}
Which results in:
[1] "APPL" "YAHOO" "IBM"
Each is a zoo series with 5 columns of data.
Further, I want to get some plotting done to each series and output the result, preferably using a for loop or something better.
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yN <- y[,2:3]
return(yN)
}
Following a similar logic to my previous loop I tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
But so far the data is not being sent to the function, only the name of the symbol, so I naturally get:
y[,2:3] : incorrect number of dimensions
I have also tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,],".plot"),
value = yN(y = ls(pattern = paste0(syM[i,]))))
}
With similar results. When I input the name of the series manually it does save the plot of the first symbol as "APPL.Plot".
assign(paste0(syM[1,], ".Plot"),
value = yN(p = APPL))
Consider lapply with setNames to create a named list of nX returned objects:
nX_list <- setNames(lapply(syM$symbol, nX), syM$symbol)
# OUTPUT ZOO OBJECTS BY NAMED INDEX
nX_list$AAPL
nX_list$YAHOO
nX_list$IBM
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(nX_list, envir=.GlobalEnv)
For plot function, first add a get inside function to retrieve an object by its string name, then similarly run lapply with setNames:
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yobj <- get(nX_list[[y]]) # IF USING ABOVE LIST
yobj <- get(y) # IF USING SEPARATE OBJECT
yN <- yobj[,2:3]
return(yN)
}
plot_list <- setNames(lapply(syM$symbol, yN), paste0(syM$symbol, ".plot"))
# OUTPUT PLOTS BY NAMED INDEX
plot_list$AAPL.plot
plot_list$YAHOO.plot
plot_list$IBM.plot
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(plot_list, envir=.GlobalEnv)
As you note, you're calling yN with a character argument in:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
paste0(syM[i,]) is going to resolve to a character and not the zoo object it appears you're trying to reference. Instead, use something like get():
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = get(paste0(syM[i,]))))
}
Or perhaps just store your zoo objects in a list in the first place and then operate on all elements of the list with something like lapply()...

Accessing ... function arguments by (string) name inside the function in R?

I'm trying to write a function with dynamic arguments (i.e. the function argument names are not determined beforehand). Inside the function, I can generate a list of possible argument names as strings and try to extract the function argument with the corresponding name (if given). I tried using match.arg, but that does not work.
As a (massively stripped-down) example, consider the following attempt:
# Override column in the dataframe. Dots arguments can be any
# of the column names of the data.frame.
dataframe.override = function(frame, ...) {
for (n in names(frame)) {
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
AA = data.frame(a = 1:5, b = 6:10, c = 11:15)
dataframe.override(AA, b = c(5,6,6,6,6)) # Should override column b
Unfortunately, the match.arg apparently does not work:
Error in match.arg(n) : 'arg' should be one of
So, my question is: Inside a function, how can I check whether the function was called with a given argument and extract its value, given the argument name as a string?
Thanks,
Reinhold
PS: In reality, the "Do something..." part is quite complicated, so simply assigning the vector to the dataframe column directly without such a function is not an option.
You probably want to review the chapter on Non Standard Evaluation in Advanced-R. I also think Hadley's answer to a related question might be useful.
So: let's start from that other answer. The most idiomatic way to get the arguments to a function is like this:
get_arguments <- function(...){
match.call(expand.dots = FALSE)$`...`
}
That provides a list of the arguments with names:
> get_arguments(one, test=2, three=3)
[[1]]
one
$test
[1] 2
$three
[1] 3
You could simply call names() on the result to get the names.
Note that if you want the values as strings you'll need to use deparse, e.g.
deparse(get_arguments(one, test=2, three=3)[[2]])
[1] "2"
P.S. Instead of looping through all columns, you might want to use intersect or setdiff, e.g.
dataframe.override = function(frame, ...) {
columns = names(match.call(expand.dots = FALSE)$`...`)[-1]
matching.cols <- intersect(names(frame), names(columns))
for (i in seq_along(matching.cols) {
n = matching.cols[[i]]
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
P.P.S: I'm assuming there's a reason you're not using dplyr::mutate for this.

R invoking a dynamically created list within for loop

I have the following dynamic list created with the names cluster_1, cluster_2... like so:
observedUserShifts <- vector("list")
cut <- 2
for (i in 1:cut) {
assign(paste('cluster_', i, sep=''), subset(sortedTestRTUser, cluster==i))
observedUserShifts[[i]] <- mean(cluster_1$shift_length_avg)
}
Notice that i have cut=2 so 2 lists are created dynamically with the names due to the 'assign' function: cluster_1 and cluster_2
I want to invoke each of the above lists within the for loop. Notice that i have hard coded cluster_1 in the for loop (2nd line inside for loop). How do I change this so that this is not hard coded?
I tried:
> observedUserShifts[[i]] <- mean((paste('cluster_','k',sep='')$shift_length_avg)
+ )
Error in paste("cluster_", "k", sep = "")$shift_length_avg :
$ operator is invalid for atomic vectors
Agree this is suboptimal coding practice, but to answer the specific question, use get:
for (i in 1:cut) {
assign(paste('cluster_', i, sep=''), subset(sortedTestRTUser, cluster==i))
observedUserShifts[[i]] <-
mean( get(paste('cluster_', i, sep='') )[['shift_length_avg']] )
}
Notice that instead of using $ I chose to use [[ with a quoted column name.

Substituting variables in a loop?

I am trying to write a loop in R but I think the nomenclature is not correct as it does not create the new objects, here is a simplified example of what I am trying to do:
for i in (1:8) {
List_i <-List
colsToGrab_i <-grep(predefinedRegex_i, colnames(List_i$table))
List_i$table <- List_i$table[,predefinedRegex_i]
}
I have created 'predefinedRegex'es 1:8 which the grep should use to search
The loop creates an object called "List_i" and then fails to find "predefinedRegex_i".
I have tried putting quotes around the "i" and $ in front of the i , also [i] but these do not work.
Any help much appreciated. Thank you.
#
Using #RyanGrammel's answer below::
#CREATING regular expressions for grabbing sets groups 1 -7 ::::
g_1 <- "DC*"
g_2 <- "BN_._X.*"
g_3 <- "BN_a*"
g_4 <- "BN_b*"
g_5 <- "BN_a_X.*"
g_6 <- "BN_b_X.*"
g_7 <- "BN_._Y.*"
for i in (1:8)
{
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA)
assign(x = paste("Forgrep_", i, sep=""), value = colnames(get(x = paste("tableA_", i, sep=""))))
assign(x = paste("grab_", i, sep=""), value = grep((get(x = paste("g_",i, sep=""))), (get(x = paste("Forgrep_",i, sep="")))))
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA[,get(x = paste("grab_",i, sep=""))])
}
This loop is repeated for each table inside "BigList".
I found I could not extract columnnames from
(get(x = paste("BigList_", i, "$tableA" sep=""))))
or from
(get(x = paste("BigList_", i, "[[2]]" sep=""))))
so it was easier to extract the tables first. I will now write a loop to repack the lists up.
Problem
Your syntax is off: you don't seem to understand how exactly R deals with variable names.
for(i in 1:10) name_i <- 1
The above code doesn't assign name_1, name_2,....,name_10. It assigns "name_i" over and over again
To create a list, you call 'list()', not List
creating a variable List_i in a loop doesn't assign List_1, List_2,...,List_8.
It repeatedly assigns an empty list to the name 'List_i'. Think about it; if R names variables in the way you tried to, it'd be equally likely to name your variables L1st_1, L2st_2...See 'Solution' for some valid R code do something similar
'predefinedRegex_i' isn't interpreted as an attempt to get the variable 'predefinedRegex_1', 'predefinedRegex_2', and so one.
However, get(paste0("predefinedRegex_", i)) is interpreted in this way. Just make sure i actually has a value when using this. See below.
Solution:
In general, use this to dynamically assign variables (List_1, List_2,..)
assign(x = paste0("prefix_", i), value = i)
if i is equal to 199, then this code assigns the variable prefix_199 the value 199.
In general, use this to dynamically get the variables you assigned using the above snippet of code.
get(x = paste0("prefix_", i))
if i is equal to 199, then this code gets the variable prefix_199.
That should solve the crux of your problem; if you need any further help feel free to ask for clarification here, or contact me via my Twitter Feed.

Resources