I have the following:
> v1 <- c(T, F, T, T, F)
> table(v)
v
FALSE TRUE
2 3
To index the 'True' column, I do this:
> `table(v1)[2]`
TRUE
3
However, if a logical vector contains only FALSE values, the table will only have one column and the previos strategy no longer works to retrieve the TRUE column:
> v2 <- c(F, F, F, F, F)
> table(v2)[2]
<NA>
NA
How can one consistently index the TRUE column regardless of if its count is zero? One solution is to do this:
> table(factor(v2, levels= c("FALSE", "TRUE")))[2]
TRUE
0
But this feels like cheating because it treats TRUE and FALSE as characters that become levels of a factor. For non-logical vectors, this behaviour is understandable, because there is no way of knowing what levels exist. (1) Is there a way to force table() to take into consideration the fact that logical vectors only take on two values and always present two columns for them? (2) Am I overthinking this and the last command is an acceptable and robust practice?
Convert to factor with levels specified so that it always have two levels - without a TRUE value, there is no way the table to create the count of TRUE as that information is not present. With factor levels, it gives the TRUE count to be 0
table(factor(v2, levels = c(FALSE, TRUE)))[2]
It is not clear why a logical vector TRUE values needs to be counted with table and then extract based on the TRUE, FALSE names. It can be more easily done with sum as TRUE -> 1 and FALSE -> 0, negating (!) reverses this
> sum(v1)
[1] 3
> sum(!v1)
[1] 2
> sum(v2)
[1] 0
> sum(!v2)
[1] 5
Because the case of logical is so specific for the requirements, I would write a specific function:
logitable <- function(x)
{
x <- as.logical(x)
kNA <- sum(is.na(x))
kT <- sum(x, na.rm=TRUE)
kF <- length(x) - kT - kNA
return (structure(
c(kT, kF, kNA),
names = c("TRUE", "FALSE", "NA")
))
}
Please note that the type of the return object is not of class "table" --- let me know if this is important to you, to return such an object.
Test with:
logitable(c(T,F,T,F,T))
logitable(c(T,T,T,T,T))
logitable(c(F,F,F,F,F))
logitable(c(T,F,T,F,NA))
Related
(Note: following the suggestions in the comments, I have changed the original title "Comparing the content of two vectors in R?" to "Subsetting a logical vector with a logical vector in R")
I am trying to understand the following R code snippet (by the way, the question originated while I was trying to understand this example.)
I have a vector a defined as:
a = c(FALSE, FALSE)
Then I can define b:
b <- a
I check b's content and everything looks OK:
b
#> [1] FALSE FALSE
Question
Now, what is the following code doing? Is it checking if b is equal to "not" a?
b[!a]
#> [1] FALSE FALSE
But if I try b[a] the result is different:
b[a]
#> logical(0)
I also tried a different example:
a = c(FALSE, TRUE)
b <- a
b
#> [1] FALSE TRUE
Now I try the same operations as above, but I get a different result:
b[!a]
#> [1] FALSE
b[a]
#> [1] TRUE
Created on 2021-03-23 by the reprex package (v0.3.0)
[] is used for subsetting a vector. You can subset a vector using integer index or logical values.
When you are using logical vector to subset a vector, a value in the vector is selected if it is TRUE. In your example you are subsetting a logical vector with a logical vector which might be confusing. Let's take another example :
a <- c(10, 20)
b <- c(TRUE, FALSE)
a[b]
#[1] 10
Since 1st value is TRUE and second is FALSE, the first value is selected.
Now if we invert the values, 20 would be selected because !b returns FALSE TRUE.
a[!b]
#[1] 20
Now implement this same logic in your example -
a = c(FALSE, FALSE)
b <- a
!b returns TRUE TRUE, hence both the values are selected when you do b[!a] and the none of the value is selected when you do b[a].
b[!a] will result in displaying those values of b which are at TRUE positions as evalauted by !a.
!a is actually T, T therefore displays first and second values of b which are F and F
More efficiently please see this
a <- 1:4
b <- c(T, T, F, T)
now a[!b] will display a[c(F, F, T, F)] i.e. only third element of a
I want to make a vector of sums where sum would be a number of 1s in one column in one df if another column from the same df has values equal or greater with a column from a different df in which I actually want to write vector.
I have something like this
DF1$A <- c( 0.12 , 0.29, 0.36, 0.55)
DF2
x <- c(0,0,1,0,1,0,1,0,0,1)
y <- c(0.11, 0.55, 0.23,0.33,0.59,0.66,0.88,0.11,0.05,0.90)
I want to make a vector DF1$B
DF1B<- sum(DF2$Y >= DF1$A & DF2$X == 1)
Problem is that I get a vector of one value and I want every value of the vector to be different based on a condition that is also a vector.
Also, I am getting this massage l
longer object length is not a multiple of shorter object length.
Ones and zeros serve as logical values, and so giving a numeric vector positions in logical terms would only take the elements correcponding to the TRUE or 1 positions.
as.logical(x)
# FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
y[as.logical(x)]
# 0.23 0.59 0.88 0.90
sum(y[as.logical(x)])
# 2.6
Just read the warning message and try it in the console and you will see yourself ...
c(1:4) >= c(1:10)
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In c(1:4) >= c(1:10) :
longer object length is not a multiple of shorter object length
you have to care about what you are comparing, length matters.
This should be ok ...
c(1:4) >= c(1:4)
This should be ok as well
c(1:4) >= c(1:8)
or
c(c(1:4),c(1:4)) >= c(1:4)
Some time you would like to compare one row with many rows, expecting the length of the rows is the same. So this is why you see that warning.
Length matters and in your case, the length of longer is not multiple of shorter object length.
BTW
The length related commands are length for vectors and lists, nrow, ncol, dim for tables like objects.
I have the following data frame:
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
unique(df$Logical)
length(df$Logical == TRUE)
I'm trying to figure out, how many TRUE-values do I have in my df$Logical column. But seems I'm missing something and length(df$Logical == TRUE) returns no of records in my logical column.
What I'm doing wrong in this particular case. Desired result should be 2 for TRUE-values in df$Logical column. Many thanks in advance.
We need to specify the string in the lower case as the values were 'true/false' and not exactly TRUE/FALSE. Also, instead of length, sum should be used. The sum gets the number of TRUE elements.
sum(df$Logical == "true")
#[1] 2
If there are NA elements in the column, use na.rm = TRUE
sum(df$Logical=='true', na.rm = TRUE)
#[1] 2
The length of a logical or any other vector would be the same as the original length/number of rows of the dataset.
length(df$Logical == "true")
#[1] 6
because it returns a logical vector of length 6.
df$Logical == "true"
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
To get the counts of both true and false, we can use table
table(df$Logical)
First of all "true" and "false" as you put it into you data frame are not Booleans but simple strings.
Moreover, length(df$Logical == TRUE) will always return 6 in this example, i.e. the number of elements in the column. This is because df$Logical == TRUE returns a sequence of TRUE or FALSE. In your case it will return
FALSE NA FALSE FALSE FALSE FALSE
because the boolean expression is never true. However, the length of this will be 6 as returned by length().
To overcome the problem you might define your data frame like this
df <- data.frame("Logical"=c(TRUE,NA,FALSE,FALSE,NA,FALSE),
"Numeric"=c(1,2,3,4,5,6))
And then you can sum up the number of TRUE
sum(df$Logical == TRUE, na.rm = T)
[1] 2
na.rm = T is important here because otherwise the sum will return NA if one more more elements are NA.
Alternatively, you can work with strings to indicate true or false (and empty strings a NA)
Then you could write
df <- data.frame("Logical"=c("true",NA,"false","true","","false"),
"Numeric"=c(1,2,3,4,5,6))
sum(df$Logical == "true", na.rm = T)
[1] 2
In R, for the sake of example, I have a list composed of equal-length numeric vectors of form similar to:
list <- list(c(1,2,3),c(1,3,2),c(2,1,3))
[[1]]
[1] 1 2 3
[[2]]
[1] 1 3 2
[[3]]
[1] 2 1 3
...
Every element of the list is unique. I want to get the index number of the element x <- c(2,1,3), or any other particular numeric vector within the list.
I've attempted using match(x,list), which gives a vector full of NA, and which(list==(c(1,2,3)), which gives me a "(list) object cannot be coerced to type 'double'" error. Coercing the list to different types didn't seem to make a difference for the which function. I also attempted various grep* functions, but these don't return exact numeric vector matches. Using find(c(1,2,3),list) or even some fancy sapply which %in% type functions didn't give me what I was looking for. I feel like I have a type problem. Any suggestions?
--Update--
Summary of Solutions
Thanks for your replies. The method in the comment for this question is clean and works well (via akrun).
> which(paste(list)==deparse(x))
[1] 25
The next method didn't work correctly
> which(duplicated(c(x, list(y), fromLast = TRUE)))
[1] 49
> y
[1] 1 2 3
This sounds good, but in the next block you can see the problem
> y<-c(1,3,2)
> which(duplicated(c(list, list(y), fromLast = TRUE)))
[1] 49
More fundamentally, there are only 48 elements in the list I was using.
The last method works well (via BondedDust), and I would guess it is more efficient using an apply function:
> which( sapply(list, identical, y ))
[1] 25
match works fine if you pass it the right data.
L <- list(c(1,2,3),c(1,3,2),c(2,1,3))
match(list(c(2,1,3)), L)
#[1] 3
Beware that this works via coercing lists to character, so fringe cases will fail - with a hat-tip to #nicola:
match(list(1:3),L)
#[1] NA
even though:
1:3 == c(1,2,3)
#[1] TRUE TRUE TRUE
Although arguably:
identical(1:3,c(1,2,3))
#[1] FALSE
identical(1:3,c(1L,2L,3L))
#[1] TRUE
You can use duplicated(). If we add the matching vector to the end of the original list and set fromLast = TRUE, we will find the duplicate(s). Then we can use which() to get the index.
which(duplicated(c(list, list(c(2, 1, 3)), fromLast = TRUE))
# [1] 3
Or you could add it as the first element and subtract 1 from the result.
which(duplicated(c(list(c(2, 1, 3)), list))) - 1L
# [1] 3
Note that the type always matters with this type of comparison. When comparing integers and numerics, you will need to convert doubles to integers for this to run without issue. For example, 1:3 is not the same type as c(1, 2, 3).
> L <- list(c(1,2,3),c(1,3,2),c(2,1,3))
> sapply(L, identical, c(2,1,3))
[1] FALSE FALSE TRUE
> which( sapply(L, identical, c(2,1,3)) )
[1] 3
This would be slightly less restrictive in its test:
> which( sapply(L, function(x,y){all(x==y)}, c(1:3)) )
[1] 1
Try:
vapply(list,function(z) all(z==x),TRUE)
#[1] FALSE FALSE TRUE
Enclosing the above line to which gives you the index of the list.
I have over 500 factor columns in my dataframe many of which are only "True"/"False". Is there any way to remove quotes for just these columns in one shot?
Example code --
sample=as.list(dataframe[1,])
for(i in 1:length(sample)){
if(sample[i]=="false") sample[i]=false
}
The above code doesn't seem to work. Any leads appreciated!
If you give a better example (with some columns to convert, some columns not to convert), I'm happy to test. From your description, I think this will work:
data = lapply(data, FUN = function(x) {
if (is.factor(x) & all(toupper(levels(x)) %in% c("TRUE", "FALSE"))) {
return(as.logical(x))
}
return(x)
})
It tests if the column is a factor and if its levels can be coerced to TRUE and FALSE, converts it to logical if yes, returns the column unchanged if no.
This solves your problem:
> as.logical(c("true", "false", "True", "TRUE", "False"))
[1] TRUE FALSE TRUE TRUE FALSE
I was surprised too.
EDIT: I just noticed your code and I figured you could use a complete example.
Your data is in a data.frame (which is basically a list of columns). This is similar to a spreadsheet if you will.
Doing dataframe[1,] extracts the first line of your dataset. I guess what you want is rather to get the first column with dataframe[,1]. This column is a vector, which is good to operate on, no need to put it in a list.
So you would do:
as.logical(dataframe[,1])
But that would only return the data you want, not modify the dataframe! So you want to assign this result to the first column:
dataframe[,1] <- as.logical(dataframe[,1])
There you go, the first column no longer contains strings but logicals, no matter what the capitalization was.
If by any chance you actually meant to work on the row, this is unusual and likely means that you should transpose your data.frame, i.e swap rows and columns. This is done with t.
I think this is what you want assuming that the columns you are talking about have two levels - "FALSE" and "TRUE".
df = data.frame(a=c("\"true\"","\"false\""), b=c("\"FALSE\"","\"TRUE\""), c=c("TRUE","FALSE"))
df
# a b c
# 1 "true" "FALSE" TRUE
# 2 "false" "TRUE" FALSE
ftlev = c("\"FALSE\"", "\"TRUE\"")
df2 = lapply(df, FUN = function(x) {
if (identical(ftlev,toupper(levels(x)))) {
x = gsub('"','',x)
}
return(x)
})
as.data.frame(df2)
Output:
a b c
1 true FALSE TRUE
2 false TRUE FALSE
The as.logical() function has been proposed in other answers/comments but it does not produce the expected output:
df2 = lapply(df, FUN = function(x) {
if (identical(ftlev,toupper(levels(x)))) {
x = as.logical(x)
}
return(x)
})
as.data.frame(df2)
Output:
a b c
1 NA NA TRUE
2 NA NA FALSE