I've got this code in R:
j <- 1
k <- nrow(group_IDs)
while (j <= k)
{
d_clust <- Mclust(Customers_Attibutes_s[which (Customers_Attibutes_s$Group_ID == group_IDs$Group_ID[j]),3:7], G=2:7)
temp <- cbind(Customers_Attibutes[which (Customers_Attibutes$Group_ID == group_IDs$Group_ID[j]),], as.data.frame (predict.Mclust(d_clust, Customers_Attibutes[which(Customers_Attibutes$Group_ID == group_IDs$Group_ID[j]), 3:7]))[1])
temp_ <- rbind(temp,temp_)
j <- j+1
}
j <= k in the while statement is returning this error:
missing value where TRUE/FALSE needed.
group_IDs is not null and it actually contains the value 8 in this case.
It seems to get into the loop and crash at the second round.
You can get around the indexing issues using for, e.g.:
for (ID in group_IDs) {}
This, of course, assumes that group_IDs is a vector of values.
Note: Your code shows the following inside the loop group_IDs$Group_ID[j] which implies something other than a vector; perhaps you meant group_IDs[j]?
Since group_ IDsis a vector, try length(group_IDs) instead of nrow. A vector doesn't have rows, so the equivalent is length.
Here's what I suspect is happening:
> group_IDs <- 8
> nrow(group_IDs)
NULL
Related
I have a dataframe column with NA, I want to how can I use apply (or lapply, sapply, ...) to the column.
I've tried with apply and lapply, but it return an error.
The function I want to apply to the column is:
a.b <- function(x, y = 165){
if (x < y)
return('Good')
else if (x > y)
return('Bad')
}
the column of the dataframe is:
data$col = 180 170 NA NA 185 185
When I use apply I get:
apply(data$col, 2, a.b)
Error in apply(data$col, 2, a.b) :
dim(X) must have a positive length
I have try dim(data$col) and the return is NULL and I think it is because of the NA's.
I also use lapply and I get:
lapply(data$col, a.b)
Error in if (x < y) return("Good") else if (x > y) return("Bad") :
missing value where TRUE/FALSE needed
This is for a course of R for beginners that I am doing so I am sorry if I made some mistakes. Thanks for taking your time to read it and trying to help.
apply is used on a matrix, not a vector. Try:
a.b <- function(x, y = 165){
if (is.na(x)){
return("NA")
} else if (x < y){
return('Good')} else if (x > y){
return('Bad')}
}
data$col=sapply(data$col,a.b)
You should be able to solve this with mapply by specifying the values to pass into your parameters:
mapply(a.b, x = data[,'col'], y = 165)
Note that you may need to modify your a.b.() function in order to manage the NA's.
There's a few issues going on here:
apply is meant to run on a something with a dimension to act over, which is the MARGIN argument. A column, which you're passing to apply has no dimension. see below:
> dim(mtcars)
[1] 32 11
> dim(mtcars$cyl)
NULL
apply and lapply are meant to run over all columns (or rows if you're using that margin for apply). If you want to just replace one column, you should not use apply. Do something like data$my_col <- my_func(data$my_col) if you want to replace my_col with the result of passing it to my_func
NA values do not return TRUE or FALSE when using an operator on them. Note that 7 < NA will return NA. Your if statement is looking for a TRUE or FALSE value but getting an NA value, hence the error in your second attempt. If you want to handle NA values, you may need to incorporate that into your function with is.na.
Your function should be vectorized. See circle 3 of the R-Inferno. Currently, it will just return length 1 vectors of "Good" or "Bad". My hunch is what you want is similar to the following (although not exactly same if x == y)
a.b <- function(x, y = 165){
ifelse(x < y, "Good", "Bad")
}
I beleive using the above info should get you where you want to be.
I have a vector like this:
x <- c(0.9,0.9,0,0,0.9,0,0.8)
I want to eliminate all the zeros and create a new vector from it, so I have created this if statement:
if (x[i] == 0) {
y <- x[-(i)]}
But I get the following error:
Error in if (x[i] == 0) { : argument is of length zero
Anyone has a solution?
Thanks in advance!
We don't need a for loop with if/else. It can be simply done with vectorization
y <- x[x != 0]
Create the logical vector with expression x != 0 , use that to subset (?Extract with square brackets) the original vector and assign the output vector to a variable with identifier 'y'
have a very large data ~1GB and would like to extract summary data with such condition:
for loop:
if(a[i] == 999) then extract b[i+1]
else next
so that i can then table(b) to find the its distribution/composition, assuming column b is of class character, column a is of class integer
my R code:
summary123 <- data.frame()
j = 1
k = 1
for(i in 1:nrow(df1)){
if(df1$a[i] == 999 & i != nrow(df1)){
j = i + 1
summary123[k,1] <- df1$b[j]
k = k + 1
}
else{
next
}
}
however it is taking a long time, would like faster R-code equivalent
Use lead from dplyr:
output=lead(df1$b,1)[df1$a==999]
Then the answer you are looking for is:
output[-1]
(basically removing the last element, which is a NA introduced by the lead function)
So I have a Data frame in R where one column is a variable of a few factors and I want to create a handful of dummy variables for each factor but when I write a loop to do this I get an error.
So for example if the column is made up of various factors a, b, c and I want to code a dummy variable of 1 or 0 for each one, the code I have to create one is:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[,1] == "a") {
h[i] = 1
} else {
h[i] = 0
}
}
cbind(data, h)
This gives me the error message "the condition has length > 1 and only the first element will be used" I have seen in other places on this site saying I should try and write my own function to solve problems and avoid for loops and I don't really understand a) how to solve this by writing a function (at least immediately) b)the benefit of doing this as a function rather than with loops.
Also I ended up using the ifelse statement to create each vector and then cbind to add it to the data frame but an explanation would really be appreciated.
Change if (data[,1] == "a") { to if (data[i,1] == "a") {
Aakash is correct in pointing out the problem in your loop. Your test is
if (data[,1] == "a")
Since your test doesn't depend on i, it will be the same for every iteration. You could fix your loop like this:
h = rep(0, nrow(data))
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
} else {
h[i] = 0
}
}
We could even simplify, since h is initialized to 0, there is no need to set it to 0 in the else case, we can just move on:
for (i in 1:nrow(data)) {
if (data[i, 1] == "a")
h[i] = 1
}
}
A more substantial improvement would be to introduce vectorization. This will speed up your code and is usually easier to write once you get the hang of it. if can only check a single condition, but ifelse is vectorized, it will take a vector of tests, a vector of "if true" results, a vector of "if false" results, and combine them:
h = ifelse(data[, 1] == "a", 1, 0)
With this, there is no need to initialize h before the statement, and we could add it directly to a data frame:
data$h = ifelse(data[, 1] == "a", 1, 0)
In this case, your test case and results are so simple, that we can do even better.
data[, 1] == "a" ## run this and look at the output
The above code is just a boolean vector of TRUE and FALSE. If we run as.numeric() on it TRUE values will be coerced to 1s and FALSE values will be coerced to 0s. So we can just do
data$h = as.numeric(data[, 1] == "a")
which will be even more efficient than ifelse.
This operation is so simple that there is no benefit in writing a function to do it.
I have problem to create a empty vector in R, and save the results of another vector into them. This is my code:
k<-vector(mode = "numeric", length = 0)
for (j in length(Pe)){
if ((Pe[j])>0) {
k[j]<-Pe[j]
}
}
The lenght of the vector Pe is 1000. I need only to save the values mayor than zero in the vector k, but when I type the vector k the display window show:
numerical(0)
This is the correct way to initiate a empty vector in R (k)?
Thanks
in fact, it can be much more easy. Type
k <- c()
instead. But I think this won't get you what you want.
What happens when element p is not > 0? R will fill k[p] with NA, while I think you want k to be a shorter vector of only the elements of Pe which are > 0, not to be the same length but with NA's?
If so, you don't even need a loop. Try
k <- Pe[Pe > 0]
This will get you a vector only containing the elements of Pe > 0, no NA's.
Excuse my bad english, hope I helped you
As MaxPD pointed out
for (j in length(Pe)) print(j)
would only print the length of Pe, you should
for (j in seq_len(length(Pe))) print(j)
## or
for (j in seq_along(Pe)) print(j)
## or
for (j in 1:length(Pe)) print(j)
but in your case i wouldn't even use a loop
k<-vector(mode = "numeric", length = 0)
k[Pe > 0] <- Pe[Pe > 0]
should do the trick if both objects are vectors and have the same length.