How to use if else statements in loops within an existing tibble - r

I'm attempting to create a new variable which combines two existing variables. I've taught myself how to do this with an if else statement with a loop. However, all of the examples online create variables to use in their if else loops by assigning a few values to a variable name. I understand why the online examples do this, but it makes it hard for me to figure out how to incorporate the if else loop into an existing tibble.
The new variable should use values from one existing variable (b) if they are not missing and values from another existing variable (a) if the scores for the first variable (b) are missing. At time 1, participants might have taken test a, test b, or both. I've filtered out those participants who took neither test. Now I'm trying to create a new variable (c) which combines the two tests. If participants took test b (is not missing), the new variable should reflect test b scores. If participants did not take test b (is missing), the new variable should reflect test a scores. I can make an example work using the code below, but I can't get a similar format to work with the variables in my actual data.
a <- c(40, 50, 60, 70, 80, 90, 100)
b <- c(10, NA, 30, NA, NA, 40, 50)
c <- vector("double")
for (i in seq_along(b)) {
if (!is.na(b[i])) {
c[i] <- (b[i])
} else {
c[i] <- (a[i])
}
}
c

Related

R weird data frame subset formula vs. no formula

Sorry for a kind of newb-ish question, as I've been using R for years, but I hadn't noticed this behavior until a student pointed it out to me and I can't explain it. First, build a little data frame. x-values greater than 100 are supposed to be illegal, but some have snuck in here. We also have a "group" independent variable.:
x = c(20, 30, 50, 60, 150, 35, 55, 75, 45, 145)
g = c(1,1,1,1,1,2,2,2,2,2)
df = data.frame(cbind(x,g))
Now, box plots, both grouped and ungrouped, which show all the data, including the illegal values, as they should:
boxplot(x~g)
boxplot(x)
So, we want to remove the illegal values by selecting only those rows in the frame with x-values less than 100. The grouped version works exactly as expected:
boxplot(x~g, data=df[x < 100,])
But the ungrouped one doesn't! All the data, including the values over 100, are plotted. Why does the previous one work and this one doesn't?
boxplot(x, data=df[x < 100,])
I'm sure I'm missing something simple, but for the life of me I can't figure out what it is, and I couldn't find the answer via Google or searching here.
boxplot is an S3 generic, which means that depending on what the first argument is, totally different functions are actually being called. boxplot.formula has different arguments than boxplot.default. Specifically, boxplot.default has no data argument at all; it's probably being sucked into ... and is then ignored as an unknown graphical parameter.
Try boxplot(x[x < 100]) instead.
The reason is because boxplot is reading x from the global environment, and not the data frame.
Note that this does not work as well:
df1 = df[x < 100, ]
boxplot(x, data=df1)
However, this works:
boxplot(df[df$x < 100, 'x'])

Issues with nested while loop in for loop for R

I am using R to code simulations for a research project I am conducting in college. After creating relevant data structures and generating data, I seek to randomly modify a proportion P of observations (in increments of 0.02) in a 20 x 20 matrix by some effect K. In order to randomly determine the observations to be modified, I sample a number of integers equal to P*400 twice to represent row (rRow) and column (rCol) indices. In order to guarantee that no observation will be modified more than once, I perform this algorithm:
I create a matrix, alrdyModded, that is 20 x 20 and initialized to 0s.
I take the first value in rRow and rCol, and check whether alrdyModded[rRow[1]][rCol[1]]==1; WHILE alrdyModded[rRow[1]][rCol[1]]==1, i randomly select new integers for the indices until it ==0
When alrdyModded[rRow[1]][rCol[1]]==0, modify the value in a treatment matrix with same indices and change alrdyModded[rRow[1]][rCol[1]] to 1
Repeat for the entire length of rRow and rCol vectors
I believe a good method to perform this operation is a while loop nested in a for loop. However, when I enter the code below into R, I receive the following error code:
R CODE:
propModded<-1.0
trtSize<-2
numModded<-propModded*400
trt1<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
cont<- matrix(rnorm(400,0,1),nrow = 20, ncol = 20)
alrdyModded1<- matrix(0, nrow = 20, ncol = 20)
## data structures for computation have been intitialized and filled
rCol<-sample.int(20,numModded,replace = TRUE)
rRow<-sample.int(20,numModded,replace = TRUE)
## indices for modifying observations have been generated
for(b in 1:numModded){
while(alrdyModded1[rRow[b]][rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b]][rCol[b]]<-'+'(trt1[rRow[b]][rCol[b]],trtSize)
alrdyModded[rRow[b]][rCol[b]]<-1
}
## algorithm for guaranteeing no observation in trt1 is modified more than once
R OUTPUT
" Error in while (alrdyModded1[rRow[b]][rCol[b]] == 1) { :
missing value where TRUE/FALSE needed "
When I take out the for loop and run the code, the while loop evaluates the statement just fine, which implies an issue with accessing the correct values from the rRow and rCol vectors. I would appreciate any help in resolving this problem.
It appears you're not indexing right within the matrix. Instead of having a condition like while(alrdyModded1[rRow[b]][rCol[b]]==1){, it should read like this: while(alrdyModded1[rRow[b], rCol[b]]==1){. Matrices are indexed like this: matrix[1, 1], and it looks like you're forgetting your commas. The for-loop should be something closer to this:
for(b in 1:numModded){
while(alrdyModded1[rRow[b], rCol[b]]==1){
rRow[b]<-sample.int(20,1)
rCol[b]<-sample.int(20,1)}
trt1[rRow[b], rCol[b]]<-'+'(trt1[rRow[b], rCol[b]],trtSize)
alrdyModded1[rRow[b], rCol[b]]<-1
}
On a side note, why not make alrdyModded1 a boolean matrix (populated with just TRUE and FALSE values) with alrdyModded1<- matrix(FALSE, nrow = 20, ncol = 20) in line 7, and have the condition be just while(alrdyModded1[rRow[b], rCol[b]]){ instead?

Evaluating a function for given vectors with different lengths in R

I have written an predictor function on R and I tried several combinations of inputs in the function to see how the output would change.
The problem is that my function takes 4 numeric parameters and I want to test my function by plugging all possible combinations of elements obtained from specified vectors ( vectors have different lengths)
I've tried using replicate, apply and sapply functions but I couldn't get the output that I wanted to see. I can do for loops for each parameter but when it comes to several parameters i need several loops and I don't know how to store the values after this many loops.
So my function looks like this;
predictVAR(Dataset, ColumnNumber, Correlation, Lags, FcastHorizon)
And while keeping the Dataset constant ( or i can just remove it from parameter list and assign it as the default data frame in function)
ColumnNumber takes values between 1 and 20 ( each of these are the corresponding variables from Dataset)
Correlation will take values in seq(0.15,0.9,by=0.15)
Lags will take values in c(10, 20, 30, 50, 80, 100)
and finally FcastHorizon will take values from list c(20,252)
So if I started doing this manually and evaluate each combination from these specified vectors, it would look like
predictVAR(1, 0.1, 10, 20) => predictVAR(1, 0.1, 10, 252) => predictVAR(1,
0.1, 20, 20) => predictVAR(1, 0.1, 20, 252) . . . . and finally;=> predictVAR(20, 0.9,100 ,252)
By the end of process, I should obtain 20*6*6*2=1440 different outputs and the corresponding input specifications.
Could you help me about what function would help me to obtain the results? I have read topics about the family of apply functions but I need to evaluate the model with all cross combinations and I couldn't find a solution so far.
Regards

Dynamically adding values to dynamically created vectors

I just started learning to code in R. I have a requirement where I have to keep adding unknown number of values to different vectors (number of vectors is not known). So, I tried to implement this using -
clust_oo = c()
clust_oo[k] = c(clust_oo[k],init_dataset[k,1])
Without the [k], the above code works but since i don't know the number of vectors/lists i have to use [k] as a differentiator. clust_oo[1] could have values say, 1,23,45 , clust_oo[2] could have other values 4, 40 and clust_oo[3] with values 44, 67, 455, 885. Where the values are added dynamically.
Is this the right way to proceed for this?
Try:
clust_oo = c()
for(i in 1:3)
clust_oo[length(clust_oo)+1] = i
clust_oo
[1] 1 2 3

Generating new random variable with for loop

A TA wants me to create a new random variable Y_n=sum(X_i), where X_i are n binomial random variables, with N = 4 and p = 1/3. This wasn't too bad; I just use the following for loop: for(i in 1:100){yn[i] <- c(sum(rbinom(i, 4, (1/3))))}. However, he then wants me to recreate Y_n for every tenth number from 1 to 10,000 (i.e., 10, 20, 30,...,9990,10000). I tried to use this code: yseq <- seq(10, 10000, by=10)
for(i in yseq){
Y2[i] <- c(sum(rbinom(i,4,(1/3))))}. It sorta works, but not really. It returns a list (I checked its class) with seemingly correct values, but a bunch of NAs. This has created two problems for me: 1) R won't let me reclass the list as a vector, and 2) R tells me that the list is length 1, which is a bunch of rubbish.
Can some please tell me where I am going wrong? I've said it before: programming is not my forte, but I am always doing my best to learn!
Thanks!

Resources