Loop two variables one is conditional on another one - r

I want to make a loop which contains two variables i,j. for each i equals 1:24, j can be 1:24
but I don't know to make this loop;
i=1
while(i<=24)
{
j=seq(1,24,by=1)
for (j in j)
{
cor[i,j]
}
}
i=i+1
is this right? my output is cor[i,j].

In order to accomplish your final goal try...
cor(myMatrix)
The result is a matrix containing all of the correlations of all of the columns in myMatrix.
If you want to try to go about it the way you were it's probably best to generate a matrix of all of the possible combinations of your items using combn. Try combn(1:4,2) and see what it looks like for a small example. For your example with 24 columns the best way to cycle through all combinations using a for loop is...
myMatrix <- matrix(rnorm(240), ncol = 24)
myIndex <- combn(1:24,2)
for(i in ncol(myIndex)){
temp <- cor(myMatrix[,myIndex[1,i]],myMatrix[,myIndex[2,i]])
print(c(myIndex[,i],temp))
}
So, it's possible to do it with a for loop in R you'd never do it that way.
(and this whole answer is based on a wild guess about what you're actually trying to accomplish because the question, and your comments, are very hard to figure out)

Related

Problem with checking logical within for loop

Inspired by the leetcode challenge for two sum, I wanted to solve it in R. But while trying to solve it by brute-force I run in to an issue with my for loop.
So the basic idea is that given a vector of integers, which two integers in the vector, sums up to a set target integer.
First I create 10000 integers:
set.seed(1234)
n_numbers <- 10000
nums <- sample(-10^4:10^4, n_numbers, replace = FALSE)
The I do a for loop within a for loop to check every single element against eachother.
# ensure that it is actually solvable
target <- nums[11] + nums[111]
test <- 0
for (i in 1:(length(nums)-1)) {
for (j in 1:(length(nums)-1)) {
j <- j + 1
test <- nums[i] + nums[j]
if (test == target) {
print(i)
print(j)
break
}
}
}
My problem is that it starts wildly printing numbers before ever getting to the right condition of test == target. And I cannot seem to figure out why.
I think there are several issues with your code:
First, you don't have to increase your j manually, you can do this within the for-statement. So if you really want to increase your j by 1 in every step you can just write:
for (j in 2:(length(nums)))
Second, you are breaking only the inner-loop of the for-loop. Look here Breaking out of nested loops in R for further information on that.
Third, there are several entries in nums that gave the "right" result target. Therefore, your if-condition works well and prints all combination of nums[i]+nums[j] that are equal to target.

Control Flow: How to get the data to show and not the index

I have the code below that I'm trying to loop the condition over. I keep getting the indexes of the data frame instead of the elements (which is what I want) of the data frame.
airport <- airport_data
for (i in 1:135) {
if (airport$Scheduled[i] < airport$Performed[i])
print(i)
}
Airport City Scheduled Performed
HARTSFIELD INTL ATLANTA 280003 298003
BALTI INTL BALTIMOR 56001 59000
It is hard to give a definitive answer without seeing your dataframe, but the best way is to specify which column you want the loop to start is below - e.g. if you wanted the loop to start on the second column of your dataframe the code would be:
airport <- airport_data
for (i in 2:ncol(airport){
if (airport$Scheduled[i]<airport$Performed[i])
print(i)}
If you want to combine the rows then you shouldn't print them. I understand you're trying to practice for loops, but when you're working with matrices or data, you want to use vectorized operations, and not work on a row one by one.
Vectorized operations are optimized to be much faster than a typical for loop, and you should always try a vectorized solution in a language like R or Matlab.
airport[airport$Scheduled < airport$Performed,]
That being said, if you really want to do it with a for loop and want to "merge" the rows, you can just rbind them:
result <- data.frame() # empty frame
for (i in 1:135) {
if (airport$Scheduled[i] < airport$Performed[i])
result <- rbind(result, airport[i,])
}
Okay now that I have a better idea of what you want, I actually think you're better off using filter (from library(dplyr)). Make sure both airport$scheduled and airport$Performed as in numeric form.
new_df <- filter(airport, airport$Scheduled < airport$Performed)

="x" & i in loop in R

I am quite new in R and I know it is very simple but i got stuck.
Could you please tell me how I can write an Excel formula ="X" & i (for i for instance from 1 to 10) used in loop in r.
For example, assume I have two dataframes with a single column "SUBSET1" and "SUBSET2". What I want is to save the result of the sum of each column in two different dataframes.
For an reproducible example please refer below to the EDIT part:
Illustration:
for (i in 1:2)
{
assign(paste0("sum_results", i),"")
}
for (i in 1:2)
{
sum_results & i<-sum(subset & i) ----something which works in this way
}
I would be very grateful for any hint.
EDIT: Proper example:
Let's assume I have the following data frames
a<-c(2,3,4)
b<-c(2,3,5)
subset1<-data.frame(a,b)
a<-c(2,7,5)
b<-c(4,8,15)
subset2<-data.frame(a,b)
So desired output is that I have two data frames: sum_results1 & sum_results2, where sum_results1
is the sum of the column "a" of the subset1, and sum_results2 is the sum of the column "a" of the subset2.
for (i in 1:2)
{
assign(paste0("sum_results", i),"")
}
for (i in 1:2)
{
sum_results & i<-sum(subset & i)$a --that is where the problem is
}
you were very close. Assuming I am understanding your question correctly, try this:
for (i in 1:2)
{
assign(paste0("sum_results", i),sum(get(paste0("subset",i))))
}
Generally, you want to avoid loops in R. See the comments to your question regarding lapply There are probably much more efficient ways to solving this question. But you have not provided a replicable example as also mentioned in your comments. But let me know if this helps!
EDIT:: below is how you would use sapply and then my solution above to rename your results. sapply will allow you to use a more complicated function that could potentially do things with more than one column. You will have to be specific.
N <- 2
res <- sapply(1:N, function(i) sum(get(paste0("subset",i))))
for (i in 1:N)
{
assign(paste0("sum_results", i),res[i])
}

Double "for loops" in a dataframe in R

I need to do a quality control in a dataset with more than 3000 variables (columns). However, I only want to apply some conditions in a couple of them. A first step would be to replace outliers by NA. I want to replace the observations that are greater or smaller than 3 standard deviations from the mean by NA. I got it, doing column by column:
height = ifelse(abs(height-mean(height,na.rm=TRUE)) <
3*sd(height,na.rm=TRUE),height,NA)
And I also want to create other variables based on different columns. For example:
data$CGmark = ifelse(!is.na(data$mark) & !is.na(data$height) ,
paste(data$age, data$mark,sep=""),NA)
An example of my dataset would be:
name = factor(c("A","B","C","D","E","F","G","H","H"))
height = c(120,NA,150,170,NA,146,132,210,NA)
age = c(10,20,0,30,40,50,60,NA,130)
mark = c(100,0.5,100,50,90,100,NA,50,210)
data = data.frame(name=name,mark=mark,age=age,height=height)
data
I have tried this (for one condition):
d1=names(data)
list = c("age","height","mark")
ntraits=length(list)
nrows=dim(data)[1]
for(i in 1:ntraits){
a=list[i]
b=which(d1==a)
d2=data[,b]
for (j in 1:nrows){
d2[j] = ifelse(abs(d2[j]-mean(d2,na.rm=TRUE)) < 3*sd(d2,na.rm=TRUE),d2[j],NA)
}
}
Someone told me that I am not storing d2. How can I create for loops to apply the conditions I want? I know that there are similar questions but i didnt get it yet. Thanks in advance.
You pretty much wrote the answer in your first line. You're overthinking this one.
First, it's good practice to encapsulate this kind of operation in a function. Yes, function dispatch is a tiny bit slower than otherwise, but the code is often easier to read and debug. Same goes for assigning "helper" variables like mean_x: the cost of assigning the variable is very, very small and absolutely not worth worrying about.
NA_outside_3s <- function(x) {
mean_x <- mean(x)
sd_x <- sd(x,na.rm=TRUE)
x_outside_3s <- abs(x - mean(x)) < 3 * sd_x
x[x_outside_3s] <- NA # no need for ifelse here
x
}
of course, you can choose any function name you want. More descriptive is better.
Then if you want to apply the function to very column, just loop over the columns. That function NA_outside_3s is already vectorized, i.e. it takes a logical vector as an argument and returns a vector of the same length.
cols_to_loop_over <- 1:ncol(my_data) # or, some subset of columns.
for (j in cols_to_loop_over) {
my_data[, j] <- NA_if_3_sd(my_data[, j])
}
I'm not sure why you wrote your code the way you did (and it took me a minute to even understand what you were trying to do), but looping over columns is usually straightforward.
In my comment I said not to worry about efficiency, but once you understand how the loop works, you should rewrite it using lapply:
my_data[cols_to_loop_over] <- lapply(my_data[cols_to_loop_over], NA_outside_3s)
Once you know how the apply family of functions works, they are very easy to read if written properly. And yes, they are somewhat faster than looping, but not as much as they used to be. It's more a matter of style and readability.
Also: do NOT name a variable list! This masks the function list, which is an R built-in function and a fairly important one at that. You also shouldn't generally name variables data because there is also a data function for loading built-in data sets.

using a for loop to add columns to a data frame

I am new to R and it seems like this shouldn't be a difficult task but I cannot seem to find the answer I am looking for. I am trying to add multiple vectors to a data frame using a for loop. This is what I have so far and it works as far as adding the correct columns but the variable names are not right. I was able to fix them by using rename.vars but was wondering if there was a way without doing that.
for (i in 1:5) {
if (i==1) {
alldata<-data.frame(IA, rand1) }
else {
alldata<-data.frame(alldata, rand[[i]]) }
}
Instead of the variable names being rand2, rand3, rand4, rand5, they show up as rand..i.., rand..i...1, rand..i...2, and rand..i...3.
Any Suggestions?
You can set variable names using the colnames function. Therefore, your code would look something like:
newdat <- cbind(IA, rand1, rand[2:5])
colnames(newdat) <- c(colnames(IA), paste0("rand", 1:5))
If you're creating your variables in a loop, you can assign the names during the loop
alldata <- data.frame(IA)
for (i in 1:5) {alldata[, paste0('rand', i)] <- rand[[i]]}
However, R is really slow at loops, so if you are trying to do this with tens of thousands of columns, the cbind and rename approach will be much faster.
Just do cbind(IA, rand1, rand[2:5]).

Resources