Difference between the following two codes - r

i have a dataframe named as newdata. it has two columns named as BONUS and GENDER.
When i write the following code in r:
> newdata <- within(newdata,{
PROMOTION=ifelse(BONUS>=1500,1,0)})
it works though i haven't used loop here but the following codes don't work without loop. Why?
> add <- with(newdata,
if(GENDER==F)sum(PROMOTION))
Warning message:
In if (GENDER == F) sum(PROMOTION) :
the condition has length > 1 and only the first element will be used
My question is why in the first code all elements have been used?

ifelse is vectorized, but if is not. For example:
> x <- rbinom(20,1,.5)
> ifelse(x,TRUE,FALSE)
[1] TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
[13] FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
> if(x) {TRUE} else {FALSE}
[1] TRUE
Warning message:
In if (x) { :
the condition has length > 1 and only the first element will be used

Related

How do I write an R function that gives as output the name of a gene from column index instead of value and emit error?

This is my first time posting so apologies in advance if not completely clear. I have a challenge task (!!not coursework!!) with a large data set of gene ids (column 1) and expression levels (columns 40-47). I am writing a function that returns an output if the standard deviation is larger than the mean. So far I have been able to print the results, but I want to print the names of the genes if there are no FALSE outputs for that row.
I am also getting a warning message and can't figure out why.
Please help! (z) is the data.frame that I will call after.
> getHighlyVariableGenes <- function(z) {
for (i in z[40:47]) {
if (output <- sapply(z[,40:74], sd) > rowMeans(z[,40:74])){
return(output)
} else {
return("")
}
}
}
> getHighlyVariableGenes(RNA_data)
Which gives me:
[947] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[958] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
[969] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[980] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[991] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[ reached getOption("max.print") -- omitted 62652 entries ]
Warning messages:
1: In sapply(z[, 40:74], sd) > rowMeans(z[, 40:74]) :
longer object length is not a multiple of shorter object length
2: In if (output <- sapply(z[, 40:74], sd) > rowMeans(z[, 40:74])) { :
the condition has length > 1 and only the first element will be used
I have also tried:
>getHighlyVariableGenes <- function(z) {
for (i in z[40:47]) {
if (output <- sapply(z[,40:74], sd) > rowMeans(z[,40:74])){
return(z['gene_id'])
} else {
return("")
}
}
}
Which seemingly prints every value in the first column and isn't emitting the FALSE outputs:
>995 ENSG00000066735
>996 ENSG00000066739
>997 ENSG00000066777
>998 ENSG00000066813
>999 ENSG00000066827
>1000 ENSG00000066855
[ reached getOption("max.print") -- omitted 62652 rows ]
Any suggestion is greatly appreciated!

Is there a built-in function "none"?

I hope the question is not too foolish.
Is there a built-in R function that returns TRUE when all the cases are FALSE?
Similar to any() or all() but when, in the case of a logical vector of 2, TRUE TRUE returns FALSE, TRUE FALSE returns FALSE and FALSE FALSE returns TRUE.
I would call it none().
We can use ! with any
!any(c(FALSE, FALSE))
Negate(any) ?
> none <- Negate(any)
> none(c(TRUE,TRUE))
[1] FALSE
> none(c(TRUE,FALSE))
[1] FALSE
> none(c(FALSE,FALSE))
[1] TRUE
Or all:
all(!vec)
Or using sum:
sum(vec)==0
where vec is your vector.

Determine if a vector is ordered

I would like to determine if a vector is either always increasing or always decreasing in R.
Ideally, if I had these three vectors:
asc=c(1,2,3,4,5)
des=c(5,4,3,2,1)
non=c(1,3,5,4,2)
I would hope that the first two would return TRUE, and the last would return FALSE.
I tried a few approaches. First, I tried:
> is.ordered(asc)
[1] FALSE
> is.ordered(des)
[1] FALSE
> is.ordered(non)
[1] FALSE
And I also tried:
> order(non)
[1] 1 5 2 4 3
And hoped that I could simply compare this vector with 1,2,3,4,5 and 5,4,3,2,1, but even that returns a string of logicals, rather than a single true or false:
> order(non)==c(1,2,3,4,5)
[1] TRUE FALSE FALSE TRUE FALSE
Maybe is.unsorted is the function your looking for
> is.unsorted(asc)
[1] FALSE
> is.unsorted(rev(des)) # here you need 'rev'
[1] FALSE
> is.unsorted(non)
[1] TRUE
From the Description of is.unsorted you can find:
Test if an object is not sorted (in increasing order), without the cost of sorting it.
Here's one way using ?is.unsorted:
is.sorted <- function(x, ...) {
!is.unsorted(x, ...) | !is.unsorted(rev(x), ...)
}
Have a look at the additional arguments to is.unsorted, which can be passed here as well.
Here is one way without is.unsorted() to check if to vectors are sorted. This function will return true, if all elements in the vector given are sorted in an ascending manner or false if not:
is.sorted <- function(x) {
if(all(sort(x, decreasing = FALSE) == x)) {
return(TRUE)
} else {
return(FALSE)
}
}

extracting data from a weird looking object in R script

In my R script...
I have an object myObject which is something that looks like this:
> myObject
m convInfo data call dataClasses control
FALSE FALSE FALSE FALSE FALSE FALSE
It is what is returned from an is.na(obj) where obj is an nls fit.
I'm trying to test if that first item is FALSE rather than TRUE. How can I extract that out? I tried myObject$m but that didn't work.
You have a named (logical) vector.
> v <- 1:5
> names(v) <- LETTERS[1:5]
> is.na(v)
A B C D E
FALSE FALSE FALSE FALSE FALSE
> myObj <- .Last.value
You address it like any other atomic vector:
> myObj[1]
A
FALSE
> myObj[1] == FALSE
A
TRUE
The object returned by nls() is a list. The behaviour of is.na() on a list is somewhat peculiar in the sense of what is an is not NA. From ?is.na:
Value:
The default method for ‘is.na’ applied to an atomic vector returns
a logical vector of the same length as its argument ‘x’,
containing ‘TRUE’ for those elements marked ‘NA’ or, for numeric
or complex vectors, ‘NaN’ (!) and ‘FALSE’ otherwise. ‘dim’,
‘dimnames’ and ‘names’ attributes are preserved.
The default method also works for lists and pairlists: the result
for an element is false unless that element is a length-one atomic
vector and the single element of that vector is regarded as ‘NA’
or ‘NaN’.
So t is a logical vector with the TRUE & FALSE values in your t determined as per the quoted text above. Therefore all of
t[1]
t["m"]
head(t, 1)
extract the first element of t. If you want to test for FALSE then I might try:
!isTRUE(t[1])
E.g.
> set.seed(1)
> logi <- sample(c(TRUE,FALSE), 5, replace = TRUE)
> logi
[1] TRUE TRUE FALSE FALSE TRUE
> !isTRUE(logi[1])
[1] FALSE
The reason the $ version won't work is that $ is documented to apply only to non-atomic vectors. logi (or your t) is an atomic vector, in that it contains elements of the same type.
> is.atomic(logi)
[1] TRUE
> names(logi) <- letters[1:5]
> logi$a
Error in logi$a : $ operator is invalid for atomic vectors
> logi["a"]
a
TRUE

R is there a way to find Inf/-Inf values?

I'm trying to run a randomForest on a large-ish data set (5000x300). Unfortunately I'm getting an error message as follows:
> RF <- randomForest(prePrior1, postPrior1[,6]
+ ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE)
Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE, :
NA/NaN/Inf in foreign function call (arg 1)
So I try to find any NA's using :
> df2 <- prePrior1[is.na(prePrior1)]
> df2
character(0)
> df2 <- postPrior1[is.na(postPrior1[,6])]
> df2
numeric(0)
which leads me to believe that it's Inf's that are the problem as there don't seem to be any NA's.
Any suggestions for how to root out Inf's?
You're probably looking for is.finite, though I'm not 100% certain that the problem is Infs in your input data.
Be sure to read the help for is.finite carefully about which combinations of missing, infinite, etc. it picks out. Specifically, this:
> is.finite(c(1,NA,-Inf,NaN))
[1] TRUE FALSE FALSE FALSE
> is.infinite(c(1,NA,-Inf,NaN))
[1] FALSE FALSE TRUE FALSE
One of these things is not like the others. Not surprisingly, there's an is.nan function as well.
randomForest's 'NA/NaN/Inf in foreign function call' is often a false warning, and really irritating:
you will get this if any of the variables passed is character
actual NaNs and Infs almost never happen in clean data
My fast-and-dirty trick to narrow things down, do a binary-search on your variable list, and use token parameters like ntree=2 to get an instant pass/fail on the subset of variables:
RF <- randomForest(prePrior1[m:n],ntree=2,...)
In analogy to is.na, you can use is.infinite to find occurrences of infinites.
Take a look at with, e.g.:
> with(df, df == Inf)
foo bar baz abc ...
[1,] FALSE FALSE TRUE FALSE ...
[2,] FALSE TRUE FALSE FALSE ...
...
joran's answer is what you want and informative. For more details about is.na() and is.infinite(), you should check out https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods.html
and besides, after you get the logical vector which says whether each element of the original vector is NA/Inf, you can use the which() function to get the indices, just like this:
> v1 <- c(1, Inf, 2, NaN, Inf, 3, NaN, Inf)
> is.infinite(v1)
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
> which(is.infinite(v1))
[1] 2 5 8
> is.na(v1)
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
> which(is.na(v1))
[1] 4 7
the document for which() is here https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.html

Resources