I have a vector like this:
x <- c(0.9,0.9,0,0,0.9,0,0.8)
I want to eliminate all the zeros and create a new vector from it, so I have created this if statement:
if (x[i] == 0) {
y <- x[-(i)]}
But I get the following error:
Error in if (x[i] == 0) { : argument is of length zero
Anyone has a solution?
Thanks in advance!
We don't need a for loop with if/else. It can be simply done with vectorization
y <- x[x != 0]
Create the logical vector with expression x != 0 , use that to subset (?Extract with square brackets) the original vector and assign the output vector to a variable with identifier 'y'
Related
I am trying to produce vector (PA) from the conditions of two other vectors. The other vectors are both 1978 elements in length.
The vectors are taken from loading in an excel file and using 'x <- S1L1$PercentageCoverage' and 'y <- S1L1$FoldCount'.
I believe the code should be checking if each element in x is equal to or greater than 1, and if each element in y is equal to or greater than 70. If they both pass this check, it should add the value of the ith element in the x vector to PA. If either fail, it should add 0.
PresenceCollector <- function(x, y){
PA <- c()
for (i in 1:1978){
if ((x[i] >= 1) && (y[i] >= 70)){
PA <- c(PA, x[i])
} else {
PA < - c(PA, 0)
}
}
}
The PA vector remains 'null' even after running the code, and it returns a warning message saying 'There were 50 or more warnings (use warnings() to see the first 50)'. This returns
1: In PA < -c(PA, 0) :
longer object length is not a multiple of shorter object length
2: In PA < -c(PA, 0) :
longer object length is not a multiple of shorter object length
3: In PA < -c(PA, 0) :
longer object length is not a multiple of shorter object length
and so on until it prints 50
Any help on how to fix this? Thank you!
Here a very short solution:
function(x, y){
x * (x >= 1 & y >= 70)
}
Pay attention to the parenthesis. In there a boolean value is created, i.e. 1s or 0s, and these are then multiplied with x.
"The vectors are taken from loading in an excel file and using 'x <- S1L1$PercentageCoverage' and 'y <- S1L1$FoldCount'."
It looks like you have S1L1 as a dataframe with PercentageCoverage and FoldCount columns and want to create a new column PA when PercentageCoverage >= 1 and FoldCount>=70, then PA should be PercentageCoverage else 0, correct?
S1L1$PA <- ifelse(S1L1$PercentageCoverage>=1 & S1L1$FoldCount>=70, S1L1$PercentageCoverage, 0)
I have a dataframe column with NA, I want to how can I use apply (or lapply, sapply, ...) to the column.
I've tried with apply and lapply, but it return an error.
The function I want to apply to the column is:
a.b <- function(x, y = 165){
if (x < y)
return('Good')
else if (x > y)
return('Bad')
}
the column of the dataframe is:
data$col = 180 170 NA NA 185 185
When I use apply I get:
apply(data$col, 2, a.b)
Error in apply(data$col, 2, a.b) :
dim(X) must have a positive length
I have try dim(data$col) and the return is NULL and I think it is because of the NA's.
I also use lapply and I get:
lapply(data$col, a.b)
Error in if (x < y) return("Good") else if (x > y) return("Bad") :
missing value where TRUE/FALSE needed
This is for a course of R for beginners that I am doing so I am sorry if I made some mistakes. Thanks for taking your time to read it and trying to help.
apply is used on a matrix, not a vector. Try:
a.b <- function(x, y = 165){
if (is.na(x)){
return("NA")
} else if (x < y){
return('Good')} else if (x > y){
return('Bad')}
}
data$col=sapply(data$col,a.b)
You should be able to solve this with mapply by specifying the values to pass into your parameters:
mapply(a.b, x = data[,'col'], y = 165)
Note that you may need to modify your a.b.() function in order to manage the NA's.
There's a few issues going on here:
apply is meant to run on a something with a dimension to act over, which is the MARGIN argument. A column, which you're passing to apply has no dimension. see below:
> dim(mtcars)
[1] 32 11
> dim(mtcars$cyl)
NULL
apply and lapply are meant to run over all columns (or rows if you're using that margin for apply). If you want to just replace one column, you should not use apply. Do something like data$my_col <- my_func(data$my_col) if you want to replace my_col with the result of passing it to my_func
NA values do not return TRUE or FALSE when using an operator on them. Note that 7 < NA will return NA. Your if statement is looking for a TRUE or FALSE value but getting an NA value, hence the error in your second attempt. If you want to handle NA values, you may need to incorporate that into your function with is.na.
Your function should be vectorized. See circle 3 of the R-Inferno. Currently, it will just return length 1 vectors of "Good" or "Bad". My hunch is what you want is similar to the following (although not exactly same if x == y)
a.b <- function(x, y = 165){
ifelse(x < y, "Good", "Bad")
}
I beleive using the above info should get you where you want to be.
How do you change values on a row, in function of the position of a specific character ?
I want to replace, by row, all NA values by 0 that are BEFORE S on the line. After this specific character S, NAs on the row has to be keeped.
S is the marker of the end of data by row.
Before S: NA should be values (in fact zero values !!).
After S: NA stays NA, no values at all.
An example of data frame is available here dataframe.txt
I've tried this loop
for (i in 1:length(df)) {
x <- pos = 's' ; y <- pos = i if (y < x) { if (y == "NA"){ replace(y,0) } }
}
Maybe with the which function ...
Thanks for your ideas on that !!
Alex,
This code will replace all NAs before "S" with 0 in your vector:
initial_row <- c(1,2,4,NA,4,NA,2,"S",NA,NA,NA)
result_row <- initial_row
result_row[is.na(result_row[1:which(result_row == "S")[1]])] <- 0
Explanation: First we copied the initial row into the result row that we will do the work on. Then we selected the NAs in the result row that are between position 1 and the position of the "S". Those values get replaced with zero.
Important assumptions:
The vector is at least length 2.
The vector contains an "S"
Loop version
If you insist on using a loop to do this (will run slower), you can do this:
for(i in 1:length(result_row)){
if(result_row[i] == "S"){
break
}
if(is.na(result_row[i])){
result_row[i] <- 0
}
}
Edit: If you have characters "NA" in your vector instead of NA (which R recognizes as a missing element) this code will need to be modified as such:
result_row[(result_row[1:which(result_row == "S")[1]]) == "NA"] <- 0
or
for(i in 1:length(result_row)){
if(result_row[i] == "S"){
break
}
if(result_row[i] == "NA"){
result_row[i] <- 0
}
}
I've got this code in R:
j <- 1
k <- nrow(group_IDs)
while (j <= k)
{
d_clust <- Mclust(Customers_Attibutes_s[which (Customers_Attibutes_s$Group_ID == group_IDs$Group_ID[j]),3:7], G=2:7)
temp <- cbind(Customers_Attibutes[which (Customers_Attibutes$Group_ID == group_IDs$Group_ID[j]),], as.data.frame (predict.Mclust(d_clust, Customers_Attibutes[which(Customers_Attibutes$Group_ID == group_IDs$Group_ID[j]), 3:7]))[1])
temp_ <- rbind(temp,temp_)
j <- j+1
}
j <= k in the while statement is returning this error:
missing value where TRUE/FALSE needed.
group_IDs is not null and it actually contains the value 8 in this case.
It seems to get into the loop and crash at the second round.
You can get around the indexing issues using for, e.g.:
for (ID in group_IDs) {}
This, of course, assumes that group_IDs is a vector of values.
Note: Your code shows the following inside the loop group_IDs$Group_ID[j] which implies something other than a vector; perhaps you meant group_IDs[j]?
Since group_ IDsis a vector, try length(group_IDs) instead of nrow. A vector doesn't have rows, so the equivalent is length.
Here's what I suspect is happening:
> group_IDs <- 8
> nrow(group_IDs)
NULL
I have a data.frame df with > 110 000 rows. It looks like that:
traking_id A1_CTRL A2_CTRL A3_CTRL A4_CTRL A5_CTRL A1_DEX A2_DEX A3_DEX A4_DEX A5_DEX
1 ENSMUST00000000001 1.35358e+01 1.03390e+01 1.03016e+01 1.12654e+01 1.22707e+01 1.40684e+01 9.15279e+00 1.17276e+01 1.14550e+01 1.46256e+01
2 ENSMUST00000000003 5.01868e-06 5.59107e-06 1.60922e-01 2.45402e-01 2.18614e-01 2.24124e-01 2.88035e-01 7.18876e-06 1.74746e-06 0.00000e+00
...
I'm interested in perform shapiro.test twice for each row - once for values in columns 2:6, an once for values in columns 7:11.
I want to obtain two lists of objects that function shapiro.test returns in order to extract from them p.value column. I want to do it by using function apply, but my code
shapiro.test_CTRL <- apply(data.matrix(df[,2:6]), 1, shapiro.test)
returns an error
Error in FUN(newX[, i], ...) : all 'x' values are identical
However, when I use pearson.test everything works fine:
pearson.test_CTRL <- apply(data.matrix(df[,2:6]), 1, pearson.test)
Calculating shapiro.test just for one row also works fine:
shapiro.test(data.matrix(x[1,2:6]))
I would like to know why using apply with shapiro.test the way I did resulted in error and how to correctly do it?
If you look at the source for shapiro.test it has this line:
...
x <- sort(x[complete.cases(x)])
n <- length(x)
if (is.na(n) || n < 3L || n > 5000L)
stop("sample size must be between 3 and 5000")
rng <- x[n] - x[1L]
if (rng == 0)
stop("all 'x' values are identical")
...
This error is triggered the values of your row are all the same. The same error can be triggered with this code:
mtcars[2,] <- 1
apply(mtcars[,2:5], 1, shapiro.test)
You can avoid this error by testing for that condition and returning something else:
f <- function(x) {
if (diff(range(x)) == 0) list() else shapiro.test(x)
}
apply(mtcars[,2:5], 1, f)