So I have a Data frame in R where one column is a variable of a few factors and I want to create a handful of dummy variables for each factor but when I write a loop to do this I get an error.
So for example if the column is made up of various factors a, b, c and I want to code a dummy variable of 1 or 0 for each one, the code I have to create one is:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[,1] == "a") {
h[i] = 1
} else {
h[i] = 0
}
}
cbind(data, h)
This gives me the error message "the condition has length > 1 and only the first element will be used" I have seen in other places on this site saying I should try and write my own function to solve problems and avoid for loops and I don't really understand a) how to solve this by writing a function (at least immediately) b)the benefit of doing this as a function rather than with loops.
Also I ended up using the ifelse statement to create each vector and then cbind to add it to the data frame but an explanation would really be appreciated.
Change if (data[,1] == "a") { to if (data[i,1] == "a") {
Aakash is correct in pointing out the problem in your loop. Your test is
if (data[,1] == "a")
Since your test doesn't depend on i, it will be the same for every iteration. You could fix your loop like this:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
} else {
h[i] = 0
}
}
We could even simplify, since h is initialized to 0, there is no need to set it to 0 in the else case, we can just move on:
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
}
}
A more substantial improvement would be to introduce vectorization. This will speed up your code and is usually easier to write once you get the hang of it. if can only check a single condition, but ifelse is vectorized, it will take a vector of tests, a vector of "if true" results, a vector of "if false" results, and combine them:
h = ifelse(data[, 1] == "a", 1, 0)
With this, there is no need to initialize h before the statement, and we could add it directly to a data frame:
data$h = ifelse(data[, 1] == "a", 1, 0)
In this case, your test case and results are so simple, that we can do even better.
data[, 1] == "a" ## run this and look at the output
The above code is just a boolean vector of TRUE and FALSE. If we run as.numeric() on it TRUE values will be coerced to 1s and FALSE values will be coerced to 0s. So we can just do
data$h = as.numeric(data[, 1] == "a")
which will be even more efficient than ifelse.
This operation is so simple that there is no benefit in writing a function to do it.
Related
I know that this is not the most efficient way in order to achieve my goal; however, I am using this as a teaching moment (i.e., to show that you can use a if/else statement nested within a for loop). Specifically, I have a nominal variable that uses integers as of right now. I want to use the if/else combined with the for loop in order to reassign these numbers to their respective category (class character). I have tried to do this in multiple ways, my current code is as follows:
# Take the original data and separate out the variable of interest
oasis_CDR <- oasis_final %>% select('CDR')
# transpose this data
oasis_CDR <- t(oasis_CDR)
# create the for loop
for(i in seq_along(oasis_CDR)){
if(i == 0.0){
oasis_CDR[1, i] <- "Normal"
} else if(i == 0.5) {
oasis_CDR[1 ,i] <- "Very Mild Dementia"
} else if(i == 1.0){
oasis_CDR[1 ,i] <- "Mild Dementia"
} else if(i == 2.0){
oasis_CDR[1 ,i] <- "Moderate Dementia"
} else if(i == 3.0){
oasis_CDR[1 ,i] <- "Severe Dementia"
} else{
oasis_CDR[1 ,i] <- "NA"
}
}
When I look at oasis_CDR it returns 'NA' for all observations.
If i replace 'i' with 'CDR' in each 'for' statement it only returns with 'Normal'.
Is there any way that this can be done in order for the reassignments in order to match what the data is?
If you have a different value to assign to every number you can use dplyr::recode
library(dplyr)
oasis_CDR <- oasis_CDR %>%
mutate(new_col = recode(CDR, `0` = 'Normal',
`0.5` = 'Very Mild Dementia',
`1` = 'Mild Dementia',
`1.5` = 'Moderate Dementia',
`3` = 'Severe Dementia',
.default = NA_character_))
Run a check on your seq_along(oasis_CDR) expression! These will be your i values.
My guess is that you do not really want to compare 0.0, 0.5, 1 and 2 against 1 up to > 220, do you?
And if you really wanna work through this via a for loop and not with indexing the vector then
isn't it more likely that you want to achive something like this:
oasis_CDR$result <- NA_character_
j <- 1
for (i in oasis_CDR) {
if (i == ...) oasis_CDR$result[j] <- 'Normal'
...
j <- j + 1
}
But imho that can get the job done but is not (very) nice R (or any other similar language) code.
I tried to create a If Else Statement to Recode my Variable in a Dummy-Variable.
I Know there is the ifelse() Function and the fastDummy-Package, but I tried this Way without succes.
Why does this not work? I want to learn and understand R in a better Way.
if(df$iscd115==1){
df$iscd1151 <- 1
} else {
df$iscd1151 <- 0
}
This should be a reasonable solution.
First we'll find out what the positions of your important columns are, and then we'll apply a function that will search the rows (margin = 1) that will check if that our important column is 1 or 0, and then modify the other column accordingly.
col1 <- which(names(df) == "iscd115")
col2 <- which(names(df) == "iscd1151")
mat <- apply(df, margin = 1, function(x) {
if (x[col1] == 1) {x[col2] <- 1
} else {
x[col2] == 0
}
x
})
Unfortunately, this transforms the original data frame into a transposed matrix. We can re-transpose the matrix back and turn it back into a data frame with the following.
new_df <- as.data.frame( t(mat))
I've got this code in R:
j <- 1
k <- nrow(group_IDs)
while (j <= k)
{
d_clust <- Mclust(Customers_Attibutes_s[which (Customers_Attibutes_s$Group_ID == group_IDs$Group_ID[j]),3:7], G=2:7)
temp <- cbind(Customers_Attibutes[which (Customers_Attibutes$Group_ID == group_IDs$Group_ID[j]),], as.data.frame (predict.Mclust(d_clust, Customers_Attibutes[which(Customers_Attibutes$Group_ID == group_IDs$Group_ID[j]), 3:7]))[1])
temp_ <- rbind(temp,temp_)
j <- j+1
}
j <= k in the while statement is returning this error:
missing value where TRUE/FALSE needed.
group_IDs is not null and it actually contains the value 8 in this case.
It seems to get into the loop and crash at the second round.
You can get around the indexing issues using for, e.g.:
for (ID in group_IDs) {}
This, of course, assumes that group_IDs is a vector of values.
Note: Your code shows the following inside the loop group_IDs$Group_ID[j] which implies something other than a vector; perhaps you meant group_IDs[j]?
Since group_ IDsis a vector, try length(group_IDs) instead of nrow. A vector doesn't have rows, so the equivalent is length.
Here's what I suspect is happening:
> group_IDs <- 8
> nrow(group_IDs)
NULL
I've been trying to create a very simple function. Essentially I want every element in t$C changed according to the if then statement in my code, and others stay the same. So here's my code:
set.seed(20)
x1=rnorm(100)
x2=rnorm(100)
x3=rnorm(100)
t=data.frame(a=x1,b=x1+x2,c=x1+x2+x3)
fun1=function(multi1,multi2)
{
v=t$c
s=c()
for (i in v)
{
if (i<0)
{
s[i]=i*multi1
}
else if(i>0)
{
s[i]=i*multi2
}
}
return(s)
}
fun1(multi1=0.5,multi2=2)
But it gave me just a few numbers. I felt I might made some stupid mistakes but I couldn't figure out.
tl;dr This operation can be vectorized. You can use the following method, assuming you want to leave values that are 0 or NA alone.
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
If you want to include them in one side (e.g. on the positive side), it's even more simple.
with(t, c * ifelse(c < 0, 0.5, 2))
As far as your loop goes, you've got a few issues there.
First, you were indexing s by decimal values, which would likely cause errors in the calculations. This is also the reason why your result vector was so short. When you indexed in the loop, the indices were moved to integer values and since some of them were repeated, s ended up being very short.
The actual unique index length went something like this -
length(unique(as.integer(t$c)))
# [1] 9
And as a result you got, as a simple example,
s[c(1, 2, 1, 1)] <- something
Since 1 is repeated, only indices 1 and 2 were changed. This is what was happening in your loop. Further illustrated as
x <- 1:5
x[1.2]
# [1] 1
x[1.99]
# [1] 1
Next, notice below that we have allocated the vector s. We can do that because we know the length of the resulting vector will be the same as v. This is the recommended, more efficient way rather than building the vector in the loop.
Moving on, I changed for(i in v) to for(i in seq_along(v)) to correct this. Now we are indexing with a sequence for i. Then we also need to index v in the same manner. Finally, we can assign s[i] <- if(... instead of assigning to the same index inside the if() statement.
Also note that you haven't accounted for 0 or any other values that may appear in v (like NA). I added a final else where we just leave those values alone. Change that as you see necessary. Furthermore, instead of going to the global environment to get t$c, we can pass it as an argument and make this function more general (credit to #ShawnMehan for that suggestion). Here's the revised version:
fun1 <- function(vec, multi1, multi2) {
s <- vector("numeric", length(vec))
for (i in seq_along(vec)) {
s[i] <- if (vec[i] < 0) {
vec[i] * multi1
} else if(vec[i] > 0) {
vec[i] * multi2
} else {
vec[i]
}
}
return(s)
}
So now we have a length 100 result
x <- fun1(t$c, 0.5, 2)
str(x)
# num [1:100] 2.657 -0.949 7.423 -0.749 5.664 ...
I wrote this long explanation because I figure you are learning how to write a loop. In R though, we can vectorize this entire operation and put it into one line of code. The following line gives the same result as fun1(t$c, 0.5, 2).
with(t, c * ifelse(c < 0, 0.5, ifelse(c > 0, 2, 1)))
Thanks to #Frank for catching my calculation oversight.
Hopefully this all makes sense. Sometimes I don't do well with explanations and technical jargon. If there are any questions, please comment.
I am trying to create an indictor variable, Z, in R, i.e If I have some event A, I want Z to give a result of 1 if A is true and 0 if A is false.
I have tried this:
Z=0
if(A==(d>=5 && d<=10))
{
Z=1
}
else
{
Z=0
}
But this doesn't work. I was also thinking i could try to write a separate function called indicator:
indicator = function()
Any suggestions would be really helpful, thank you
You could easily write something like this
indicator<-function(condition) ifelse(condition,1,0)
ifelse can be used on vectors, but it works perfectly fine on single logical values (TRUE or FALSE).
Booleans FALSE / TRUE can be coerced to be 0 or 1 and then be multiplied:
Indicator<-function(x, min=0, max=Inf)
{
as.numeric(x >= min) * as.numeric(x <= max)
}
You can use
a <- data.frame(a = -5:5, b = 1:11)
indicator <- function(data) I(data > 0) + 1 - 1
indicator(a)
a can be vector, data frame...
And you can chance the logical in I function with your interest.
There is no need to define A, just test the condition.
Also, remember that && and & have different uses in R (see R - boolean operators && and || for more details), so maybe that is part of your problem.
if (d>=5 & d<=10)
{
Z <- 1
}
else
{
Z <- 0
}
Or, as suggested in the other answer use ifelse:
z <- ifelse((d>=5 & d<=10), 1, 0)