Writing a loop in R - r

I have written a loop in R. The code is expected to go through a list of variables defined in a list and then for each of the variables perform a function.
Problem 1 - I cannot loop through the list of variables
Problem 2 - I need to insert each output from the values into Mongo DB
Here is an example of the list:
121715771201463_626656620831011
121715771201463_1149346125105084
Based on this value - I am running a code and i want this output to be inserted into MongoDB. Right now only the first value and its corresponding output is inserted
test_list <-
C("121715771201463_626656620831011","121715771201463_1149346125105084","121715771201463_1149346125105999")
for (i in test_list)
{ //myfunction//
mongo.insert(mongo, DBNS, i)
}
I am able to only pick the values for the first value and not all from the list
Any help is appreciated.

Try this example, which prints the final characters
myfunction <- function(x){ print( substr(x, 27, nchar(x)) ) }
test_list <- c("121715771201463_626656620831011",
"121715771201463_1149346125105084",
"121715771201463_1149346125105999")
for (i in test_list){ myfunction(i) }
for (j in 1:length(test_list)){ myfunction(test_list[j]) }
The final two lines should each produce
[1] "31011"
[1] "105084"
[1] "105999"

It is not clear whether "variable" is the same as "value" here.
If what you mean by variable is actually an element in the list you construct, then I think Ilyas comment above may solve the issue.
If "variable" is instead an object in the workspace, and elements in the list are the names of the objects you want to process, then you need to make sure that you use get. Like this:
for(i in ls()){
cat(paste(mode(get(i)),"\n") )
}
ls() returns a list of names of objects. The loop above goes through them all, uses get on them to get the proper object. From there, you can do the processing you want to do (in the example above, I just printed the mode of the object).
Hope this helps somehow.

Related

Assign value to element of list using variable name in R

I wish to recursively modify the slot slot1 of several objects that share part of the name. I know how to modify the objects (obj_num1, obj_num2, obj_num3) using assign, but what I don't know how to do is to modify/access only part of the object by selecting the slot (and, ideally, a single column of the slot, if the object is a data frame/matrix).
## creating simpler objects
obj_num1 <- list(a=runif(10), b='original')
obj_num2 <- list(a=runif(10), b='original')
obj_num3 <- list(a=runif(10), b='original')
## modifying the objects with assign
for(it in 1:3){
assign(paste0('obj_num', it), list(a=runif(10), b='modified'))
}
## it works
## but what if the object has a slot? --this is what I don't know how to do
setClass("myclass",
representation = representation(
slot1="character"
))
obj_num1 <- new('myclass', slot1='original')
obj_num2 <- new('myclass', slot1='original')
obj_num3 <- new('myclass', slot1='original')
for(it in 1:3){
assign(paste0('obj_num', it, '#slot1'), 'modified') ## nothing changes
}
help would be appreciated!
How to assign a name to a specific element of a vector in R, although similar, does not answer the question
You should generally avoid get/assign. It's an anti-pattern in R. It's better to work with items in lists and apply functions over lists rather than create a bunch of objects in the global environment that have indexes in their name. Note that you can only use variable names with assign something like obj_num1#slot1 is not a name, it's an expression that evaluates to a value. You can kind of get around that with geting the value, then updating it with the slot<- function and assinging the result. For example
for(it in 1:3){
assign(paste0('obj_num', it), `slot<-`(get(paste0('obj_num', it)), 'slot1', TRUE, 'modified'))
}
But if you had things in a list you could just do
objs <- list(
obj_num1,
obj_num2,
obj_num3
)
lapply(objs, function(x) {
x#slot1 <- "modified"
x
})

How to remove the first row from multiple dataframes?

I have multiple dataframes and would like to remove the first row in all of them.
I have tried using a for loop but cannot understand what I am doing wrong
for (i in cities){
i <- i[-1, ]
}
I get the following error code:
Error in i[-1, ] : incorrect number of dimensions
If we assume that the only objects in your workspace are dataframes then this might succeed:
cities <- objects() )
for (i in cities) { assign(i, get(i)[-1,])}
Explanation:
Two thing wrong with original codes:
One was already mentioned in comments. "df" is not the same as df. You need to use get to convert a character value to a "true" R name that is used to retrieve an object having that name. The result of object() is only a character value. In R the term "name" means a "language object". See the help page: ?mode. (There is potential confusion about rownames and columnnames which are always "character"-class.) It's not like SAS which is a macro language that has no such distinction.
The second error was trying to get substitution for the i on the left-hand side of <-. The would have failed even if you were working with actual R names. The assign function is designed to handle character values that are then converted to R names.
say you get a list of all the tables in your environment, and you call that list cities. You can't just iterate over each value of cities and change things, because in the list they are just characters.
Here is what you need:
for (i in cities){
tmp <- get(i) # load the actual table
tmp <- tmp[-1, ] # remove first column
assign(i, tmp) # re-assign table to original table name
}

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

update a vector using assign in R

I am implementing k-means in R.
In a loop, I am initiating several vectors that will be used to store values that belong to a particular cluster, as seen here:
for(i in 1:k){
assign(paste("cluster",i,sep=""),vector())
}
I then want to add to a particular "cluster" vector, depending on the value I get for the variable getIndex. So if getIndex is equal to 2 I want to add the variable minimumDistance to the vector called cluster2. This is what I am attempting to do:
minimumDistance <- min(distanceList)
getIndex <- match(minimumDistance,distanceList)
clusterName <- paste("cluster",getIndex,sep="")
name <- c(name, minimumDistance)
But obviously the above code does not work because in order to append to a vector that I'm naming I need to use assign as I do when I instantiate the vectors. But I do not know how to use assign, when using paste, when also appending to a vector.
I cannot use the index such as vector[i] because I don't know what index of that particular vector I want to add to.
I need to use the vector <- c(vector,newItem) format but I do not know how to do this in R. Or if there is any other option I would greatly, greatly appreciate it. If I were using Python I would simply use paste and then use append but I can't do that in R. Thank you in advance for your help!
You can do something like this:
out <- list()
for (i in 1:nclust) {
# assign some data (in this case a list) to a cluster
assign(paste0("N_", i), list(...))
# here I put all the clusters data in a list
# but you could use a similar statement to do further data manipulation
# ie if you've used a common syntax (here "N_" <index>) to refer to your elements
# you can use get to retrieve them using the same syntax
out[[i]] <- get(paste0("N_", i))
}
If you want a more complete code example, this link sounds like a similar problem emclustr::em_clust_mvn

Loop over a string variable in R

I would like to loop over a string variable. For example:
clist <- c("BMI", "trig", "hdl")
for (i in clist) {
data_FK_i<-subset(data_FK, subset= !is.na(FK) & (!is.na(i)))
}
The "i" should receive a different name from the list.
What am I doing wrong? It's not working? Adding "" doesn't seem to help.
Thank,
Einat
Thanks, the "assign" answer did the work!!!!!!!!!!
I agree with #Thomas. You should use a list. However, let me demonstrate how to modify your code to create multiple objects. You can use the function assign to create objects based on strings.
clist <- c("BMI", "trig", "hdl")
for (i in clist) {
assign(paste0("data_FK_", i), complete.cases(data[c("FK", i)]))
}
Try something like this instead, which will give you a list containing the three subsetted dataframes:
lapply(clist, function(x) data_FK[ !is.na(data_FK$FK) & !is.na(data_FK[,x]) ,])
The problem in your code is that i is a character string, specifically one of the values from clist in each iteration of the for-loop. So, when R reads !is.na(i) you're saying !is.na("BMI"), etc.
Various places on Stack Overflow advise against using subset at all in favor of extraction indices (i.e., [) like in the example code above because subset relies on non-standard evaluation that is confusing and sometimes leads you down bad rabbit holes.
Is this what you want?
You need to give the loop something to store the data into.
Also you need to tell the loop how long you want it to run.
clist <- c("BMI", "trig", "hdl")
#empty vector
data_FK<-c()
#I want a loop and it will 'loop' 3 times (1 to 3), which is the length of my list
for (i in 1:length(clist)) {
#each loop stores the corresponding item from the list into the vector
data_FK<-c(data_FK,clist[i])
}
## or if you want to store the values in a data frame
## there are other ways to create this, but here is a simple solution
data_FK<-data.frame(placer=1:length(clist))
for(i in 1:length(clist)){
data_FK$items[i]<-clist[i]
}
## or maybe you just want to print the names
for (i in 1:length(clist)){
print(clist[i])
}

Resources