Looping through rows until criteria is met in R - r

I'm trying to loop over each row in my dataframe and if that row contains a 1, I'm looking "bf" to change to True so that the loop cancels and then prints out the index of row. Here's the code ive tried below.
bf <- FALSE
for(row in 1:nrow(df)){
while(bf == FALSE){
if(df[row, ] == 1){
bf==TRUE
print(row)
}
}
}
However what happens with this code is that it never seems to get if statement and execute it properly to my knowledge

You can use the apply, any, which functions to id rows with a 1. Then select the first row:
bdrows <- apply(df, 1, function(x) any(x == 1))
bd <- which(bdrows == TRUE)
firstbdrow <- bd[1]

bf==TRUE is used for comparison, you might be looking for bf = TRUE. Also this doesn't operation doesn't require for or while loop. Let's say you have a column called column_name in your data you can do :
which.max(df$column_name == 1)
Or
which(df$column_name == 1)[1]

Related

Using if/else nested within a for loop in order to cycle through & reassign values within a column in R?

I know that this is not the most efficient way in order to achieve my goal; however, I am using this as a teaching moment (i.e., to show that you can use a if/else statement nested within a for loop). Specifically, I have a nominal variable that uses integers as of right now. I want to use the if/else combined with the for loop in order to reassign these numbers to their respective category (class character). I have tried to do this in multiple ways, my current code is as follows:
# Take the original data and separate out the variable of interest
oasis_CDR <- oasis_final %>% select('CDR')
# transpose this data
oasis_CDR <- t(oasis_CDR)
# create the for loop
for(i in seq_along(oasis_CDR)){
if(i == 0.0){
oasis_CDR[1, i] <- "Normal"
} else if(i == 0.5) {
oasis_CDR[1 ,i] <- "Very Mild Dementia"
} else if(i == 1.0){
oasis_CDR[1 ,i] <- "Mild Dementia"
} else if(i == 2.0){
oasis_CDR[1 ,i] <- "Moderate Dementia"
} else if(i == 3.0){
oasis_CDR[1 ,i] <- "Severe Dementia"
} else{
oasis_CDR[1 ,i] <- "NA"
}
}
When I look at oasis_CDR it returns 'NA' for all observations.
If i replace 'i' with 'CDR' in each 'for' statement it only returns with 'Normal'.
Is there any way that this can be done in order for the reassignments in order to match what the data is?
If you have a different value to assign to every number you can use dplyr::recode
library(dplyr)
oasis_CDR <- oasis_CDR %>%
mutate(new_col = recode(CDR, `0` = 'Normal',
`0.5` = 'Very Mild Dementia',
`1` = 'Mild Dementia',
`1.5` = 'Moderate Dementia',
`3` = 'Severe Dementia',
.default = NA_character_))
Run a check on your seq_along(oasis_CDR) expression! These will be your i values.
My guess is that you do not really want to compare 0.0, 0.5, 1 and 2 against 1 up to > 220, do you?
And if you really wanna work through this via a for loop and not with indexing the vector then
isn't it more likely that you want to achive something like this:
oasis_CDR$result <- NA_character_
j <- 1
for (i in oasis_CDR) {
if (i == ...) oasis_CDR$result[j] <- 'Normal'
...
j <- j + 1
}
But imho that can get the job done but is not (very) nice R (or any other similar language) code.

Use if else statement for Dummy-Coding in R

I tried to create a If Else Statement to Recode my Variable in a Dummy-Variable.
I Know there is the ifelse() Function and the fastDummy-Package, but I tried this Way without succes.
Why does this not work? I want to learn and understand R in a better Way.
if(df$iscd115==1){
df$iscd1151 <- 1
} else {
df$iscd1151 <- 0
}
This should be a reasonable solution.
First we'll find out what the positions of your important columns are, and then we'll apply a function that will search the rows (margin = 1) that will check if that our important column is 1 or 0, and then modify the other column accordingly.
col1 <- which(names(df) == "iscd115")
col2 <- which(names(df) == "iscd1151")
mat <- apply(df, margin = 1, function(x) {
if (x[col1] == 1) {x[col2] <- 1
} else {
x[col2] == 0
}
x
})
Unfortunately, this transforms the original data frame into a transposed matrix. We can re-transpose the matrix back and turn it back into a data frame with the following.
new_df <- as.data.frame( t(mat))

Use mutate_at with nested ifelse

This will make values, which are not in columnA, NA given the conditions (using %>%).
mutate_at(vars(-columnA), funs(((function(x) {
if (is.logical(x))
return(x)
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
else
return(NA)
})(.))))
How can I achieve the same result using mutate_at and nested ifelse?
For example, this does not produce the same result:
mutate_at(vars(-columnA),funs(ifelse(is.logical(.),.,
ifelse(!is.na(as.numeric(.)),as.numeric(.),NA))))
Update (2018-1-5)
The intent of the question is confusing, in part, due to a misconception I had in regard to what was being passed to the function.
This is what I had intended to write:
mutate_at(vars(-columnA), funs(((function(x) {
for(i in 1:length(x))
{
if(!is.na(as.numeric(x[i])) && !is.logical(x[i]))
{
x[i] <- as.numeric(x[i]);
}
else if(!is.na(x[i]))
{
x[i] <- NA
}
}
return(x)
})(.))))
This is a better solution:
mutate_at(vars(-columnA), function(x) {
if(is.logical(x))
return(x)
return(as.numeric(x))
})
ifelse may not be appropriate in this case, as it returns a value that is the same shape as the condition i.e., 1 logical element. In this case, is.logical(.), the result of the condition is of length 1, so the return value will be first element of the column that is passed to the function.
Update (2018-1-6)
Using ifelse, this will return columns that contain logical values or NA as-is and it will apply as.numeric to columns otherwise.
mutate_at(vars(-columnA),funs(
ifelse(. == TRUE | . == FALSE | is.na(.),.,as.numeric(.))))
The main issue is the
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
The if/else works on a vector of length 1. If the length of the vector/column where the function is applied is more than 1, it is better to use ifelse. In the above, the !is.na(as.numeric(x)) returns a logical vector of length more than 1 (assuming that the number of rows in the dataset is greater than 1). The way to make it work is to wrap with all/any (depending on what we need)
f1 <- function(x) {
if (is.logical(x))
return(x)
else if (all(!is.na(as.numeric(x))))
return(as.numeric(x))
else
return(x) #change if needed
}
df1 %>%
mutate_all(f1)
data
set.seed(24)
df1 <- data.frame(col1 = sample(c(TRUE, FALSE), 10, replace = TRUE),
col2 = c(1:8, "Good", 10), col3 = as.character(1:10),
stringsAsFactors = FALSE)

Optimize code to filter R dataframe

I have some R code that takes in the args string from the command line and then filters a dataframe based on values in a column; the args string contains the column names. Right now I'm doing it by looping through the vector but something tells me that there has to be a better way. Is there a way to optimize this code?
args = c("col1","col2")
for(i in args){
df = df[df[,i]==0,]
}
If I understand correctly, you want to keep the rows where all of the args are equal to 0 (or any other given value).
First get the indices of the columns you're interested in:
idx <- match(args, colnames(df))
Then you can simply do:
df <- df[apply(df[, idx], 1, function(x) all(x == 0)), ]
Another possibility:
df <- df[rowSums(df[, idx] != 0) == 0, ]

Confused about if statement and for loop in R

So I have a Data frame in R where one column is a variable of a few factors and I want to create a handful of dummy variables for each factor but when I write a loop to do this I get an error.
So for example if the column is made up of various factors a, b, c and I want to code a dummy variable of 1 or 0 for each one, the code I have to create one is:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[,1] == "a") {
h[i] = 1
} else {
h[i] = 0
}
}
cbind(data, h)
This gives me the error message "the condition has length > 1 and only the first element will be used" I have seen in other places on this site saying I should try and write my own function to solve problems and avoid for loops and I don't really understand a) how to solve this by writing a function (at least immediately) b)the benefit of doing this as a function rather than with loops.
Also I ended up using the ifelse statement to create each vector and then cbind to add it to the data frame but an explanation would really be appreciated.
Change if (data[,1] == "a") { to if (data[i,1] == "a") {
Aakash is correct in pointing out the problem in your loop. Your test is
if (data[,1] == "a")
Since your test doesn't depend on i, it will be the same for every iteration. You could fix your loop like this:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
} else {
h[i] = 0
}
}
We could even simplify, since h is initialized to 0, there is no need to set it to 0 in the else case, we can just move on:
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
}
}
A more substantial improvement would be to introduce vectorization. This will speed up your code and is usually easier to write once you get the hang of it. if can only check a single condition, but ifelse is vectorized, it will take a vector of tests, a vector of "if true" results, a vector of "if false" results, and combine them:
h = ifelse(data[, 1] == "a", 1, 0)
With this, there is no need to initialize h before the statement, and we could add it directly to a data frame:
data$h = ifelse(data[, 1] == "a", 1, 0)
In this case, your test case and results are so simple, that we can do even better.
data[, 1] == "a" ## run this and look at the output
The above code is just a boolean vector of TRUE and FALSE. If we run as.numeric() on it TRUE values will be coerced to 1s and FALSE values will be coerced to 0s. So we can just do
data$h = as.numeric(data[, 1] == "a")
which will be even more efficient than ifelse.
This operation is so simple that there is no benefit in writing a function to do it.

Resources