How to get vector's TRUE names - r

I have a vector:
table(FilterGenes)
FilterGenes
FALSE TRUE
74 5
I'd like to see only TRUE names.

With respect to showing true values from the vector you could simply do:
ran.vec <- rep(c(F,T,F), 10)
ran.vec[ran.vec==TRUE]
I reckon that you want to filter your data frame by column having the true value. This could look like that:
data(mtcars)
mtcars$someTF <- mtcars$am == 1
mtcars[mtcars$someTF == TRUE,]

Related

Indexing tables of logical vectors with zero counts in R

I have the following:
> v1 <- c(T, F, T, T, F)
> table(v)
v
FALSE TRUE
2 3
To index the 'True' column, I do this:
> `table(v1)[2]`
TRUE
3
However, if a logical vector contains only FALSE values, the table will only have one column and the previos strategy no longer works to retrieve the TRUE column:
> v2 <- c(F, F, F, F, F)
> table(v2)[2]
<NA>
NA
How can one consistently index the TRUE column regardless of if its count is zero? One solution is to do this:
> table(factor(v2, levels= c("FALSE", "TRUE")))[2]
TRUE
0
But this feels like cheating because it treats TRUE and FALSE as characters that become levels of a factor. For non-logical vectors, this behaviour is understandable, because there is no way of knowing what levels exist. (1) Is there a way to force table() to take into consideration the fact that logical vectors only take on two values and always present two columns for them? (2) Am I overthinking this and the last command is an acceptable and robust practice?
Convert to factor with levels specified so that it always have two levels - without a TRUE value, there is no way the table to create the count of TRUE as that information is not present. With factor levels, it gives the TRUE count to be 0
table(factor(v2, levels = c(FALSE, TRUE)))[2]
It is not clear why a logical vector TRUE values needs to be counted with table and then extract based on the TRUE, FALSE names. It can be more easily done with sum as TRUE -> 1 and FALSE -> 0, negating (!) reverses this
> sum(v1)
[1] 3
> sum(!v1)
[1] 2
> sum(v2)
[1] 0
> sum(!v2)
[1] 5
Because the case of logical is so specific for the requirements, I would write a specific function:
logitable <- function(x)
{
x <- as.logical(x)
kNA <- sum(is.na(x))
kT <- sum(x, na.rm=TRUE)
kF <- length(x) - kT - kNA
return (structure(
c(kT, kF, kNA),
names = c("TRUE", "FALSE", "NA")
))
}
Please note that the type of the return object is not of class "table" --- let me know if this is important to you, to return such an object.
Test with:
logitable(c(T,F,T,F,T))
logitable(c(T,T,T,T,T))
logitable(c(F,F,F,F,F))
logitable(c(T,F,T,F,NA))

Structure numbers in a vector R

I try to subset values in R depending on values in column y like shown in the following:
I have the data set "data" which is like this:
data <- data.frame(y = c(0,0,2000,1500,20,77,88),
a = "bla", b = "bla")
And would end up with this:
I have this R code:
data <- arrange(subset(data, y != 0 & y < 1000 & y !=77 & [...]), desc(y))
print(head(data, n =100))
Which works.
However I would like to collect the values to exclude in a list as:
[0, 1000, 77]
And somehow loop through this, with the lowest possible running time instead of hardcoding them directly in the formula. Any ideas?
The list, should only contain "!=" operations:
[0, 77]
and the "<" should be remain in the formula or in another list.
I'm going to answer your original question because it's more interesting. I hope you won't mind.
Imagine you had values and operators to apply to your data:
my.operators <- c("!=","<","!=")
my.values <- c(0,1000,77)
You can use Map from base R to apply a function to two vectors. Here I'll use get so we can obtain the actual operator given by the character string.
Map(function(x,y)get(y)(data$y,x),my.values,my.operators)
[[1]]
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
[[2]]
[1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE
[[3]]
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE
As you can see, we get a list of logical vectors for each value, operator pair.
To better understand what's going on here, consider only the first value of each vector:
get("!=")(data$y,0)
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Now we can use Reduce:
Reduce(`&`,lapply(my.values,function(x) data$y!=x))
[1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE
And finally subset the data:
data[Reduce("&",Map(function(x,y)get(y)(data$y,x),my.values,my.operators)),]
y a b
5 20 bla bla
7 88 bla bla

Test if a column matches given values

I'm trying to test a column of my dataset for dynamically changing given values. The values come from a previous calculation and change all the time, such that the ifelse command cannot be used.
I tried it with a for-loop since it needs to be flexible but it was not working. An example of my problem is below:
require(dplyr)
data <- data.frame(step=c(1,1,1,1,3,3,3,3,4,4,5,6,7,7,7,7,4,4,4,4,6,5,7,7,3,4,3,1))
data <- mutate(data, col2 = 0)
data <- mutate(data, col3 = 0)
data_check <- data.frame(step=c(3,4))
for(j in 1:length(data_check)){
for(i in 1:nrow(data)){
if(data$step[i] == data_check[j]){
data <- mutate(data, Occurrence = 1)
} else {
data <- mutate(data, Occurrence = 0)
}
}
}
The goal is to get an additional column 'Occurrence' in the dataset, which tells if any of the given values occur or not.
I can't understand what you're trying to do, but if you're trying to test if each entry in data$step is present in data_check or not, then something like:
data_check <- list(3,4) # so you can use the %in% operator
data$Occurrence <- data$step %in% data_check
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
[25] TRUE TRUE TRUE FALSE
EDIT: and as Eumenedies said, you want to apply as.numeric() to that.

Filtering with logical + NA values in one column

I have the following data frame:
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
unique(df$Logical)
length(df$Logical == TRUE)
I'm trying to figure out, how many TRUE-values do I have in my df$Logical column. But seems I'm missing something and length(df$Logical == TRUE) returns no of records in my logical column.
What I'm doing wrong in this particular case. Desired result should be 2 for TRUE-values in df$Logical column. Many thanks in advance.
We need to specify the string in the lower case as the values were 'true/false' and not exactly TRUE/FALSE. Also, instead of length, sum should be used. The sum gets the number of TRUE elements.
sum(df$Logical == "true")
#[1] 2
If there are NA elements in the column, use na.rm = TRUE
sum(df$Logical=='true', na.rm = TRUE)
#[1] 2
The length of a logical or any other vector would be the same as the original length/number of rows of the dataset.
length(df$Logical == "true")
#[1] 6
because it returns a logical vector of length 6.
df$Logical == "true"
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
To get the counts of both true and false, we can use table
table(df$Logical)
First of all "true" and "false" as you put it into you data frame are not Booleans but simple strings.
Moreover, length(df$Logical == TRUE) will always return 6 in this example, i.e. the number of elements in the column. This is because df$Logical == TRUE returns a sequence of TRUE or FALSE. In your case it will return
FALSE NA FALSE FALSE FALSE FALSE
because the boolean expression is never true. However, the length of this will be 6 as returned by length().
To overcome the problem you might define your data frame like this
df <- data.frame("Logical"=c(TRUE,NA,FALSE,FALSE,NA,FALSE),
"Numeric"=c(1,2,3,4,5,6))
And then you can sum up the number of TRUE
sum(df$Logical == TRUE, na.rm = T)
[1] 2
na.rm = T is important here because otherwise the sum will return NA if one more more elements are NA.
Alternatively, you can work with strings to indicate true or false (and empty strings a NA)
Then you could write
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
sum(df$Logical == "true", na.rm = T)
[1] 2

Remove quotes from Factor variables in R

I have over 500 factor columns in my dataframe many of which are only "True"/"False". Is there any way to remove quotes for just these columns in one shot?
Example code --
sample=as.list(dataframe[1,])
for(i in 1:length(sample)){
if(sample[i]=="false") sample[i]=false
}
The above code doesn't seem to work. Any leads appreciated!
If you give a better example (with some columns to convert, some columns not to convert), I'm happy to test. From your description, I think this will work:
data = lapply(data, FUN = function(x) {
if (is.factor(x) & all(toupper(levels(x)) %in% c("TRUE", "FALSE"))) {
return(as.logical(x))
}
return(x)
})
It tests if the column is a factor and if its levels can be coerced to TRUE and FALSE, converts it to logical if yes, returns the column unchanged if no.
This solves your problem:
> as.logical(c("true", "false", "True", "TRUE", "False"))
[1] TRUE FALSE TRUE TRUE FALSE
I was surprised too.
EDIT: I just noticed your code and I figured you could use a complete example.
Your data is in a data.frame (which is basically a list of columns). This is similar to a spreadsheet if you will.
Doing dataframe[1,] extracts the first line of your dataset. I guess what you want is rather to get the first column with dataframe[,1]. This column is a vector, which is good to operate on, no need to put it in a list.
So you would do:
as.logical(dataframe[,1])
But that would only return the data you want, not modify the dataframe! So you want to assign this result to the first column:
dataframe[,1] <- as.logical(dataframe[,1])
There you go, the first column no longer contains strings but logicals, no matter what the capitalization was.
If by any chance you actually meant to work on the row, this is unusual and likely means that you should transpose your data.frame, i.e swap rows and columns. This is done with t.
I think this is what you want assuming that the columns you are talking about have two levels - "FALSE" and "TRUE".
df = data.frame(a=c("\"true\"","\"false\""), b=c("\"FALSE\"","\"TRUE\""), c=c("TRUE","FALSE"))
df
# a b c
# 1 "true" "FALSE" TRUE
# 2 "false" "TRUE" FALSE
ftlev = c("\"FALSE\"", "\"TRUE\"")
df2 = lapply(df, FUN = function(x) {
if (identical(ftlev,toupper(levels(x)))) {
x = gsub('"','',x)
}
return(x)
})
as.data.frame(df2)
Output:
a b c
1 true FALSE TRUE
2 false TRUE FALSE
The as.logical() function has been proposed in other answers/comments but it does not produce the expected output:
df2 = lapply(df, FUN = function(x) {
if (identical(ftlev,toupper(levels(x)))) {
x = as.logical(x)
}
return(x)
})
as.data.frame(df2)
Output:
a b c
1 NA NA TRUE
2 NA NA FALSE

Resources