Using $ to find a column in loops in r not working - r

I have been trying to find an answer to this on stack but I can not.
It is a pretty simple question, I am basically trying to understand why in some cases the item in my loop will take on values but in other times it will not.
For example:
for (i in colnames(df)) {
print(unique(df$i)
}
Nothing appears, shouldn't it work? Should for the first iteration df$i take on df$names of column 1. However when I instead type df[i], it does. I am trying to understand how exactly i is taking on different names in the loop when it will work and when it will not.
Here is another example I am trying to understand
for (var in var_names) {
print(var)
var_vector <-sum(case_when(df$x == var ~ df$y)
table<- cbind(table,var_vector)
}
For this I thought that var_vector would be called something different each time like x_vector, y_vector, etc. However, instead it is just called var_vector for each iteration of the loop. Is there a way to specify in the loop make all the "var"'s in a loop take on that value. For example if you are familiar with stata `var'_vector would rename the vector different with each iteration.

I'm not totally sure that this is the best practice, but you could use assign. It really depends more on the context I think.
x <- LETTERS[1:3]
for(i in seq_along(x)){
assign(paste0(x[i], "_vector"), runif(10))
}
A_vector
#> [1] 0.4221484184 0.6695296315 0.3161487477 0.4168466690 0.1906193914
#> [6] 0.2252857985 0.0005740104 0.6336193492 0.7917131276 0.2764370542
B_vector
#> [1] 0.3575036 0.3554171 0.6053375 0.9268683 0.2017908 0.4303173 0.6608523
#> [8] 0.2539930 0.8057227 0.0895042
C_vector
#> [1] 0.1287253 0.4172858 0.2453591 0.2957820 0.2213195 0.2940916 0.6900414
#> [8] 0.5104015 0.8996254 0.7504864

Related

how to call a column name using loop:

I'm very new to R, and I would like to know what is the best way to call a different column using for loop.
My code goes like this:
Variables <- c("Var1","Var2","Var3","Var4","Var5","Var6","Var7")
Years <- c(2015,2016,2017,2018)
for (Year in Years) {
for (Var in Variables) {
TT = auc(data[data$Def_Year==Year,]$Good_Bad,
data[data$Def_Year==Year,]$Var)
print (TT)
}
}
I'm tryng to calculate the AUC (area under roc curve) for each variable in each year in order to check the credit scoring model performance stability.
The thing is R does not understand the $Var command. In excel I sometimes use & to overcome such obstacles. I would love to hear your recommendations.
Hi you could do something like this. See my sample code below
df <- data.frame(v1 = c(1,2,3), v2 = c(4,5,6))
variables <- c("v1", "v2")
for(var in variables) {
print(df[, var])
}
Output:
[1] 1 2 3
[1] 4 5 6
I have not solved your code directly as it is not advised on SO to solve the task fully but rather to give general guideline towards solution. I would suggest you go through this: https://stats.idre.ucla.edu/r/modules/subsetting-data/ to better understand subsetting in R.
Also see https://cran.r-project.org/doc/manuals/R-lang.html#Indexing to understand the indexing in R.
From above:
The form using $ applies to recursive objects such as lists and pairlists. It allows only a literal character string or a symbol as the index. That is, the index is not computable: for cases where you need to evaluate an expression to find the index, use x[[expr]]. Applying $ to a non-recursive object is an error.

Add element in a vector while looping in R

I have a problem to solve in R language but I may need to add element in a loop while I am looping into it with a for, but the loop does not go through the new values.
I made a simple loop to explain the type of problem I have.
Here is the code:
c=c(1,2)
for(i in c){
c=c(c,i+2)
print(i)
}
And the result:
[1] 1
[1] 2
I would like this result:
[1] 1
[1] 2
[1] 3
[1] 4
It continues until I reach a condition.
Can someone tell me wether it is possible or not with an other way?
Thank you,
Robin
You could use a while loop instead:
test <- c(1,2)
n <- 1
while(n <= length(test)){
if(n == 5){
print(test)
break
}
print(test[n])
test <- c(test, n+2)
n <- n + 1
}
Note that in this case, the loop will keep on printing forever, so you should add some other condition to stop the loop at some point (here I quit it at 5).
Sidenote: You use c as a name for c(1,2). That's generally a bad idea, because c is reserved for defining vectors in R. It's always a good idea to avoid using names that are already used for other things by R itself.

Including different numbers in the names of dataframes

I need to include different numbers (result of a loop function) in the name of a data frame which I am trying to read at the same time. However, I do not know how.
Example, of what I am looking for are names of these dataframes:
data.1
data.2
data.3
An example of what I tried to do (which did not work, but illustrates better my question) is:
for (a in 1:3) data.(a) <- read.csv2(file.csv, header = TRUE)
Is it possible to include different numbers in the names of dataframes? If yes, how, please?
Sorry for beginner's question, but I have not found it anywhere.
Although I agree with joran in this case, sometimes it can be useful to use the assign() function, for instance as follows:
for (i in 1:3) {
assign(paste0("data.", i), i)
}
This results in the following:
> ls()
[1] "data.1" "data.2" "data.3" "i"
> data.1
[1] 1
> data.2
[1] 2
> data.3
[1] 3

R- Please help. Having trouble writing for loop to lag date

I am attempting to write a for loop which will take subsets of a dataframe by person id and then lag the EXAMDATE variable by one for comparison. So a given row will have the original EXAMDATE and also a variable EXAMDATE_LAG which will contain the value of the EXAMDATE one row before it.
for (i in length(uniquerid))
{
temp <- subset(part2test, RID==uniquerid[i])
temp$EXAMDATE_LAG <- temp$EXAMDATE
temp2 <- data.frame(lag(temp, -1, na.pad=TRUE))
temp3 <- data.frame(cbind(temp,temp2))
}
It seems that I am creating the new variable just fine but I know that the lag won't work properly because I am missing steps. Perhaps I have also misunderstood other peoples' examples on how to use the lag function?
So that this can be fully answered. There are a handful of things wrong with your code. Lucaino has pointed one out. Each time through your loop you are going to create temp, temp2, and temp3 (or overwrite the old one). and thus you'll be left with only the output of the last time through the loop.
However, this isnt something that needs a loop. Instead you can make use of the vectorized nature of R
x <- 1:10
> c(x[-1], NA)
[1] 2 3 4 5 6 7 8 9 10 NA
So if you combine that notion with a library like plyr that splits data nicely you should have a workable solution. If I've missed something or this doesn't solve your problem, please provide a reproducible example.
library(plyr)
myLag <- function(x) {
c(x[-1], NA)
}
ddply(part2test, .(uniquerid), transform, EXAMDATE_LAG=myLag(EXAMDATE))
You could also do this in base R using split or the data.table package using its by= argument.

Assigning output of a function to two variables in R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
function with multiple outputs
This seems like an easy question, but I can't figure it out and I haven't had luck in the R manuals I've looked at. I want to find dim(x), but I want to assign dim(x)[1] to a and dim(x)[2] to b in a single line.
I've tried [a b] <- dim(x) and c(a, b) <- dim(x), but neither has worked. Is there a one-line way to do this? It seems like a very basic thing that should be easy to handle.
This may not be as simple of a solution as you had wanted, but this gets the job done. It's also a very handy tool in the future, should you need to assign multiple variables at once (and you don't know how many values you have).
Output <- SomeFunction(x)
VariablesList <- letters[1:length(Output)]
for (i in seq(1, length(Output), by = 1)) {
assign(VariablesList[i], Output[i])
}
Loops aren't the most efficient things in R, but I've used this multiple times. I personally find it especially useful when gathering information from a folder with an unknown number of entries.
EDIT: And in this case, Output could be any length (as long as VariablesList is longer).
EDIT #2: Changed up the VariablesList vector to allow for more values, as Liz suggested.
You can also write your own function that will always make a global a and b. But this isn't advisable:
mydim <- function(x) {
out <- dim(x)
a <<- out[1]
b <<- out[2]
}
The "R" way to do this is to output the results as a list or vector just like the built in function does and access them as needed:
out <- dim(x)
out[1]
out[2]
R has excellent list and vector comprehension that many other languages lack and thus doesn't have this multiple assignment feature. Instead it has a rich set of functions to reach into complex data structures without looping constructs.
Doesn't look like there is a way to do this. Really the only way to deal with it is to add a couple of extra lines:
temp <- dim(x)
a <- temp[1]
b <- temp[2]
It depends what is in a and b. If they are just numbers try to return a vector like this:
dim <- function(x,y)
return(c(x,y))
dim(1,2)[1]
# [1] 1
dim(1,2)[2]
# [1] 2
If a and b are something else, you might want to return a list
dim <- function(x,y)
return(list(item1=x:y,item2=(2*x):(2*y)))
dim(1,2)[[1]]
[1] 1 2
dim(1,2)[[2]]
[1] 2 3 4
EDIT:
try this: x <- c(1,2); names(x) <- c("a","b")

Resources