I have the following double loop:
indexnames = c(a, b, c, d, etc.)
# with
# length(indexnames) = 87
# class(indexnames) = "character"
# (indexnames = indexes I want to add in a column)
files = c(aname, bname, cname, dname, etc.)
# with
# length(files) = 87
# class(files) = "character"
# (files = name of files in the global environment)
Now I want to loop through the two list and add to the files[1] a column of name "index" with the input index[1]. I implemented this the following way:
for(i in files){
for(j in indexnames){
files[i] = cbind(Index = indexnames[j], files[i])
}
}
When I run this, I get an error message of 50 or more warnings.
What am I doing wrong?
Appreciating any help, thanks.
You need to use get() and assign() functions to get the behavior you want.
Actually you don't have to use i or j in name elements when creating loops. It's easier to debug a loop if you name them in a more human readable way. Still let's look at your inner part of the loop.
files[i]
Given files is a vector, you cannot call a specific element by it's value this way (nor you'd want to, since it's just a vector with the name of objects). Instead make "i" cycle through a number vector 'for(i in 1:87)'
for (index in 1:87) {
assign( files[i] , `[[<-`(get(files[i]), 'index', value = indexnames[i] ))
}
I found some help in this answer:
How to use `assign()` or `get()` on specific named column of a dataframe?
Related
I have:
directories (let's say two: A and B) that contain files;
two character objects storing the directories (dir_A, dir_B);
a function that takes the directory as argument and returns the list of the names of the files found there (in a convenient way for me that is different from list.files()).
directories <- c(dir_A, dir_B)
read_names <- function(x) {foo}
Using a for-loop, I want to create objects that each contain the list of files of a different directory as given by read_names(). Essentially, I want to use a for-loop to do the equivalent as:
files_A <- read_names(dir_A)
files_B <- read_names(dir_B)
I wrote the loop as follows:
for (i in directories) {
assign(paste("files_", sub('.*\\_', '', deparse(substitute(i))), sep = ""), read_names(i))
}
However, although outside of the for-loop deparse(substitute(dir_A)) returns "dir_A" (and, consequently, the sub() function written as above would return "A"), it seems to me that in the for-loop substitute(i) makes i stop being one of the directories, and just being i.
It follows that deparse(substitute(i)) returns "i" and that the output of the for-loop above is only one object called files_i, which contains the list of the files in the last directory of the iteration because that is the last one that has been overwritten on files_i.
How can I make the for-loop read the name (or part of the name in my case, but it is the same) of the object that i is representing in that moment?
There are two issues here, I think:
How to reference both the name (or index) and the value of each element within a list; and
How to transfer data from a named list into the global (or any) environment.
1. Reference name/index with data
Once you index with for (i in directories), the full context (index, name) of i within directories is lost. Some alternatives:
for (ix in seq_along(directories)) {
directories[[ix]] # the *value*
names(directories)[ix] # the *name*
ix # the *index*
# ...
}
for (nm in names(directories)) {
directories[[nm]] # the *value*
nm # the *name*
match(nm, names(directories)) # the *index*
# ...
}
If you're amenable to Map-like functions (a more idiomatic way of dealing with lists of similar things), then
out <- Map(function(x, nm) {
x # the *value*
nm # the *name*
# ...
}, directories, names(directories))
out <- purrr::imap(directories, function(x, nm) {
x # the *value*
nm # the *name*
# ...
})
# there are other ways to identify the function in `purrr::` functions
Note: while it is quite easy to use match within these last two to get the index, it is a minor scope-breach that I prefer to avoid when reasonable. It works, I just prefer alternative methods. If you want the value, name, and index, then
out <- Map(function(x, nm, ix) {
x # the *value*
nm # the *name*
ix # the *index*
# ...
}, directories, names(directories), seq_along(directories))
2. Transfer list to env
In your question, you're doing this to assign variables within a list into another environment. Some thoughts on that effort:
If they are all similar (the same structure, different data), then Don't. Keep them in a list and work on them en toto using lapply or similar. (How do I make a list of data frames?)
If you truly need to move them from a list to the global environment, then perhaps list2env is useful here.
# create my fake data
directories <- list(a=1, b=2)
# this is your renaming step, rename before storing in the global env
# ... not required unless you have no names or want/need different names
names(directories) <- paste0("files_", names(directories))
# here the bulk of the work; you can safely ignore the return value
list2env(directories, envir = .GlobalEnv)
# <environment: R_GlobalEnv>
ls()
# [1] "directories" "files_a" "files_b"
files_a
# [1] 1
I have a data frame of some 90 financial symbols (will use 3 for simplicity)
> View(syM)
symbol
1 APPL
2 YAHOO
3 IBM
I created a function that gets JSON data for these symbols and produce an output. Basically:
nX <- function(x) {
#get data for "x", format it, and store it in "nX"
nX <- x
return(nX)
}
I used a loop to get the data and store the zoo series named after each symbol accordingly.
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,]),
value = nX(x = syM[i,]))
Sys.sleep(time = 1)
}
Which results in:
[1] "APPL" "YAHOO" "IBM"
Each is a zoo series with 5 columns of data.
Further, I want to get some plotting done to each series and output the result, preferably using a for loop or something better.
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yN <- y[,2:3]
return(yN)
}
Following a similar logic to my previous loop I tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
But so far the data is not being sent to the function, only the name of the symbol, so I naturally get:
y[,2:3] : incorrect number of dimensions
I have also tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,],".plot"),
value = yN(y = ls(pattern = paste0(syM[i,]))))
}
With similar results. When I input the name of the series manually it does save the plot of the first symbol as "APPL.Plot".
assign(paste0(syM[1,], ".Plot"),
value = yN(p = APPL))
Consider lapply with setNames to create a named list of nX returned objects:
nX_list <- setNames(lapply(syM$symbol, nX), syM$symbol)
# OUTPUT ZOO OBJECTS BY NAMED INDEX
nX_list$AAPL
nX_list$YAHOO
nX_list$IBM
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(nX_list, envir=.GlobalEnv)
For plot function, first add a get inside function to retrieve an object by its string name, then similarly run lapply with setNames:
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yobj <- get(nX_list[[y]]) # IF USING ABOVE LIST
yobj <- get(y) # IF USING SEPARATE OBJECT
yN <- yobj[,2:3]
return(yN)
}
plot_list <- setNames(lapply(syM$symbol, yN), paste0(syM$symbol, ".plot"))
# OUTPUT PLOTS BY NAMED INDEX
plot_list$AAPL.plot
plot_list$YAHOO.plot
plot_list$IBM.plot
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(plot_list, envir=.GlobalEnv)
As you note, you're calling yN with a character argument in:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
paste0(syM[i,]) is going to resolve to a character and not the zoo object it appears you're trying to reference. Instead, use something like get():
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = get(paste0(syM[i,]))))
}
Or perhaps just store your zoo objects in a list in the first place and then operate on all elements of the list with something like lapply()...
I'm trying to write a function with dynamic arguments (i.e. the function argument names are not determined beforehand). Inside the function, I can generate a list of possible argument names as strings and try to extract the function argument with the corresponding name (if given). I tried using match.arg, but that does not work.
As a (massively stripped-down) example, consider the following attempt:
# Override column in the dataframe. Dots arguments can be any
# of the column names of the data.frame.
dataframe.override = function(frame, ...) {
for (n in names(frame)) {
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
AA = data.frame(a = 1:5, b = 6:10, c = 11:15)
dataframe.override(AA, b = c(5,6,6,6,6)) # Should override column b
Unfortunately, the match.arg apparently does not work:
Error in match.arg(n) : 'arg' should be one of
So, my question is: Inside a function, how can I check whether the function was called with a given argument and extract its value, given the argument name as a string?
Thanks,
Reinhold
PS: In reality, the "Do something..." part is quite complicated, so simply assigning the vector to the dataframe column directly without such a function is not an option.
You probably want to review the chapter on Non Standard Evaluation in Advanced-R. I also think Hadley's answer to a related question might be useful.
So: let's start from that other answer. The most idiomatic way to get the arguments to a function is like this:
get_arguments <- function(...){
match.call(expand.dots = FALSE)$`...`
}
That provides a list of the arguments with names:
> get_arguments(one, test=2, three=3)
[[1]]
one
$test
[1] 2
$three
[1] 3
You could simply call names() on the result to get the names.
Note that if you want the values as strings you'll need to use deparse, e.g.
deparse(get_arguments(one, test=2, three=3)[[2]])
[1] "2"
P.S. Instead of looping through all columns, you might want to use intersect or setdiff, e.g.
dataframe.override = function(frame, ...) {
columns = names(match.call(expand.dots = FALSE)$`...`)[-1]
matching.cols <- intersect(names(frame), names(columns))
for (i in seq_along(matching.cols) {
n = matching.cols[[i]]
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
P.P.S: I'm assuming there's a reason you're not using dplyr::mutate for this.
I have the following dynamic list created with the names cluster_1, cluster_2... like so:
observedUserShifts <- vector("list")
cut <- 2
for (i in 1:cut) {
assign(paste('cluster_', i, sep=''), subset(sortedTestRTUser, cluster==i))
observedUserShifts[[i]] <- mean(cluster_1$shift_length_avg)
}
Notice that i have cut=2 so 2 lists are created dynamically with the names due to the 'assign' function: cluster_1 and cluster_2
I want to invoke each of the above lists within the for loop. Notice that i have hard coded cluster_1 in the for loop (2nd line inside for loop). How do I change this so that this is not hard coded?
I tried:
> observedUserShifts[[i]] <- mean((paste('cluster_','k',sep='')$shift_length_avg)
+ )
Error in paste("cluster_", "k", sep = "")$shift_length_avg :
$ operator is invalid for atomic vectors
Agree this is suboptimal coding practice, but to answer the specific question, use get:
for (i in 1:cut) {
assign(paste('cluster_', i, sep=''), subset(sortedTestRTUser, cluster==i))
observedUserShifts[[i]] <-
mean( get(paste('cluster_', i, sep='') )[['shift_length_avg']] )
}
Notice that instead of using $ I chose to use [[ with a quoted column name.
I am trying to write a loop in R but I think the nomenclature is not correct as it does not create the new objects, here is a simplified example of what I am trying to do:
for i in (1:8) {
List_i <-List
colsToGrab_i <-grep(predefinedRegex_i, colnames(List_i$table))
List_i$table <- List_i$table[,predefinedRegex_i]
}
I have created 'predefinedRegex'es 1:8 which the grep should use to search
The loop creates an object called "List_i" and then fails to find "predefinedRegex_i".
I have tried putting quotes around the "i" and $ in front of the i , also [i] but these do not work.
Any help much appreciated. Thank you.
#
Using #RyanGrammel's answer below::
#CREATING regular expressions for grabbing sets groups 1 -7 ::::
g_1 <- "DC*"
g_2 <- "BN_._X.*"
g_3 <- "BN_a*"
g_4 <- "BN_b*"
g_5 <- "BN_a_X.*"
g_6 <- "BN_b_X.*"
g_7 <- "BN_._Y.*"
for i in (1:8)
{
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA)
assign(x = paste("Forgrep_", i, sep=""), value = colnames(get(x = paste("tableA_", i, sep=""))))
assign(x = paste("grab_", i, sep=""), value = grep((get(x = paste("g_",i, sep=""))), (get(x = paste("Forgrep_",i, sep="")))))
assign(x = paste("tableA_", i, sep=""), value = BigList$tableA[,get(x = paste("grab_",i, sep=""))])
}
This loop is repeated for each table inside "BigList".
I found I could not extract columnnames from
(get(x = paste("BigList_", i, "$tableA" sep=""))))
or from
(get(x = paste("BigList_", i, "[[2]]" sep=""))))
so it was easier to extract the tables first. I will now write a loop to repack the lists up.
Problem
Your syntax is off: you don't seem to understand how exactly R deals with variable names.
for(i in 1:10) name_i <- 1
The above code doesn't assign name_1, name_2,....,name_10. It assigns "name_i" over and over again
To create a list, you call 'list()', not List
creating a variable List_i in a loop doesn't assign List_1, List_2,...,List_8.
It repeatedly assigns an empty list to the name 'List_i'. Think about it; if R names variables in the way you tried to, it'd be equally likely to name your variables L1st_1, L2st_2...See 'Solution' for some valid R code do something similar
'predefinedRegex_i' isn't interpreted as an attempt to get the variable 'predefinedRegex_1', 'predefinedRegex_2', and so one.
However, get(paste0("predefinedRegex_", i)) is interpreted in this way. Just make sure i actually has a value when using this. See below.
Solution:
In general, use this to dynamically assign variables (List_1, List_2,..)
assign(x = paste0("prefix_", i), value = i)
if i is equal to 199, then this code assigns the variable prefix_199 the value 199.
In general, use this to dynamically get the variables you assigned using the above snippet of code.
get(x = paste0("prefix_", i))
if i is equal to 199, then this code gets the variable prefix_199.
That should solve the crux of your problem; if you need any further help feel free to ask for clarification here, or contact me via my Twitter Feed.