I'm trying to test a column of my dataset for dynamically changing given values. The values come from a previous calculation and change all the time, such that the ifelse command cannot be used.
I tried it with a for-loop since it needs to be flexible but it was not working. An example of my problem is below:
require(dplyr)
data <- data.frame(step=c(1,1,1,1,3,3,3,3,4,4,5,6,7,7,7,7,4,4,4,4,6,5,7,7,3,4,3,1))
data <- mutate(data, col2 = 0)
data <- mutate(data, col3 = 0)
data_check <- data.frame(step=c(3,4))
for(j in 1:length(data_check)){
for(i in 1:nrow(data)){
if(data$step[i] == data_check[j]){
data <- mutate(data, Occurrence = 1)
} else {
data <- mutate(data, Occurrence = 0)
}
}
}
The goal is to get an additional column 'Occurrence' in the dataset, which tells if any of the given values occur or not.
I can't understand what you're trying to do, but if you're trying to test if each entry in data$step is present in data_check or not, then something like:
data_check <- list(3,4) # so you can use the %in% operator
data$Occurrence <- data$step %in% data_check
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
[25] TRUE TRUE TRUE FALSE
EDIT: and as Eumenedies said, you want to apply as.numeric() to that.
Related
I try to subset values in R depending on values in column y like shown in the following:
I have the data set "data" which is like this:
data <- data.frame(y = c(0,0,2000,1500,20,77,88),
a = "bla", b = "bla")
And would end up with this:
I have this R code:
data <- arrange(subset(data, y != 0 & y < 1000 & y !=77 & [...]), desc(y))
print(head(data, n =100))
Which works.
However I would like to collect the values to exclude in a list as:
[0, 1000, 77]
And somehow loop through this, with the lowest possible running time instead of hardcoding them directly in the formula. Any ideas?
The list, should only contain "!=" operations:
[0, 77]
and the "<" should be remain in the formula or in another list.
I'm going to answer your original question because it's more interesting. I hope you won't mind.
Imagine you had values and operators to apply to your data:
my.operators <- c("!=","<","!=")
my.values <- c(0,1000,77)
You can use Map from base R to apply a function to two vectors. Here I'll use get so we can obtain the actual operator given by the character string.
Map(function(x,y)get(y)(data$y,x),my.values,my.operators)
[[1]]
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
[[2]]
[1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE
[[3]]
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE
As you can see, we get a list of logical vectors for each value, operator pair.
To better understand what's going on here, consider only the first value of each vector:
get("!=")(data$y,0)
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Now we can use Reduce:
Reduce(`&`,lapply(my.values,function(x) data$y!=x))
[1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE
And finally subset the data:
data[Reduce("&",Map(function(x,y)get(y)(data$y,x),my.values,my.operators)),]
y a b
5 20 bla bla
7 88 bla bla
EDITED:
In R. I'm trying to generate a data frame full of logicals that tells me for all values that are TRUE, whether the same row in the previous column is also TRUE. The columns represent time points, and I want to know for any row that's true, is it the first instance of that row being true? Note- i only need it to look as far as one time point (column) though. If it was true three columns ago, but not on the last one, it's still considered a new instance.
example data frame:
T1<- c(TRUE, TRUE, FALSE)
T2<- c(FALSE, TRUE, FALSE)
T3<- c(TRUE, FALSE, TRUE)
df<- data.frame(cbind(T1,T2,T3))
df
looks like:
T1 T2 T3
1 TRUE FALSE TRUE
2 TRUE TRUE FALSE
3 FALSE FALSE TRUE
since I'm asking about the previous column, need to add a null column at the beginning
df_w_null<-cbind("null_col"= logical(nrow(df)), df)
df_w_null
looks like:
null_col T1 T2 T3
1 FALSE TRUE FALSE TRUE
2 FALSE TRUE TRUE FALSE
3 FALSE FALSE FALSE TRUE
for each row, where TRUE, is it the first instance of TRUE? (is the previous column true? If yes, it's not a new instance, print false)
for (i in 2:ncol(df_w_null)){
status[i]<- as.data.frame(apply((!df_w_null[,i, drop=FALSE] == df_w_null[,i-1, drop=FALSE]), 1, isTRUE))
status<- data.frame(status)
return(status)
}
looks like:
status[,2:ncol(df_w_null)]
1 TRUE TRUE TRUE
2 TRUE FALSE TRUE
3 FALSE FALSE TRUE
#expected result:
1 TRUE FALSE TRUE
2 TRUE FALSE FALSE
3 FALSE FALSE TRUE
There are a lot of little step going on here. First, the data.frame gets split up into pairs of columns, then those pairs of columns are checked to see whether they meet the requirement of FALSE then TRUE and then the resulting logical vectors are reassembled into a final data.frame.
as.data.frame(do.call(cbind, lapply(setNames(lapply(2:ncol(df_w_null), function(x) data.frame(df_w_null[x-1], df_w_null[x])), names(df_w_null)[-1]),
function(x) ifelse(x[,1] == F & x[,2] == T, T, F))))
T1 T2 T3
1 TRUE FALSE TRUE
2 TRUE FALSE FALSE
3 FALSE FALSE TRUE
Here's a data frame with all values in the first column FALSE
df1 <- cbind(FALSE, df)
You would like a TRUE value whenever the column i is not TRUE (we're not interested in the last column, so !df1[, -ncol(df1)]) AND the column i + 1 is TRUE (we're not interested in the first column, so df1[, -1]). We have
> (!df1[, -ncol(df1)]) & (df1[, -1])
T1 T2 T3
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE FALSE
[3,] FALSE FALSE TRUE
1)I need to intersect two vectors and return a vector with the same and with the intersected values.
intersect() does not return a vector with the same size.
2) Also why does this return c(TRUE TRUE TRUE) and not c(FALSE TRUE TRUE) ?
set1 = c(TRUE,FALSE,TRUE)
set2 = c(FALSE,FALSE,TRUE)
testset = set1 %in% set2
> print(testset)
[1] TRUE TRUE TRUE
I got as result TRUE TRUE TRUE and I need FALSE FALSE TRUE.
To do the intersection, you need to use the & operator, as follows:
testset = set1 & set2
This will give you the following result: FALSE FALSE TRUE
Hope it helps.
A %in% B checks for every element in A if that element is in B. The results always has the same length as length(A). Try e.g.
1:3 %in% 1:9
1:9 %in% 1:3
I think what you want is this:
set1 == set2
[1] FALSE TRUE TRUE
Looking for a better way: How can I make R check the values of a flexible subset of multiple columns element-wise (let's say Var2 and Var3 here) and write the result of the check to a new logical column?
Is there a shorter, more elegant way than using row-wise apply() here?
df <- read.csv(
text = '"Var1","Var2","Var3"
"","",""
"","","a"
"","a",""
"a","a","a"
"a","","a"
"","a",""
"","",""
"","","a"
"","a",""
"","","a"'
)
criticalColumns <- c("Var2", "Var3")
df$criticalColumnsAreEmpty <-
apply(df[, criticalColumns], 1, function(curRow) {
return(all(curRow == ""))
})
I could also do this in an explicit way, but this is not a flexible then:
df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""
Desired output:
Var1 Var2 Var3 criticalColumnsAreEmpty
TRUE
a FALSE
a FALSE
a a a FALSE
a a FALSE
a FALSE
TRUE
a FALSE
a FALSE
a FALSE
We can use rowSums on the logical matrix
df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
Or another option (for big datasets to avoid converting to matrix for memory reasons) is loop over the columns, check whether the elements are blank and use Reduce with &
Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))
I have a dataset in R, which contains the results of a rapid diagnostic test. The test has a visible line if it is working properly (control line) and a visible line for each of the two parasite species it detects, if they are present in the patient sample.
The dataset contains a logical column for each test line, as follows:
(database is called RDTbase)
Control Pf Pv
1. TRUE TRUE FALSE
2. TRUE FALSE TRUE
3. FALSE FALSE FALSE
4. TRUE TRUE TRUE
5. TRUE FALSE FALSE
I would like to add a new column which contains a single result for each rapid test. The results are designated according to the different logical conditions met by the three lines. For the example above the new column would look like this:
Control Pf Pv Result
1. TRUE TRUE FALSE Pf
2. TRUE FALSE TRUE Pv
3. FALSE FALSE FALSE Invalid
4. TRUE TRUE TRUE Mixed
5. TRUE FALSE FALSE Negative
I am able to create the new column, but it takes a lot of coding and I think there has to be a much simpler (and shorter) way to do this.
Here is my current (long) method:
R.Pf <- RDTbase[which(Control == "TRUE" & Pf == "TRUE" & Pv == "FALSE"),]
R.Pv <- RDTbase[which(Control == "TRUE" & Pf == "FALSE" & Pv == "TRUE"),]
R.inv <- RDTbase[which(Control == "FALSE"),]
R.mix <- RDTbase[which(Control == "TRUE" & Pf == "TRUE" & Pv == "TRUE"),]
R.neg <- RDTbase[which(Control == "TRUE" & Pf == "FALSE" & Pv == "FALSE"),]
R.Pf$Result <- c("Pf")
R.Pv$Result <- c("Pv")
R.inv$Result <- c("Invalid")
R.mix$Result <- c("Mixed")
R.neg$Result <- c("Negative")
RDTbase2 <- rbind(R.Pf, R.Pv, R.inv, R.mix, R.neg)
Any ideas on how to simplify and shorten this code would be greatly appreciated, as I have to do this kind of thing to my databases alot.
Many thanks,
Amy
I would simply create another column of the data frame and assign to different subsets of it conditionally. You can also slim down the data frame indexing code.
RDTbase$Result = NA
RDTbase <- within(RDTbase, Result[Control=="TRUE" & Pf=="TRUE" & Pv=="FALSE"] <- "Pf")
RDTbase <- within(RDTbase, Result[Control=="FALSE"] <- "Invalid")
etc.
"within" just saves a little typing.
First of all it would be nice when you use logical vector instead character, then you could write Control instead Control == "TRUE" and !Control instead Control == "FALSE". And your code will be shorter.
For you problem I will use several ifelse:
RDTbase$Result <- ifelse(
Control == "TRUE",
ifelse(
Pf == "TRUE",
ifelse(Pv == "TRUE","Mixed","Pf"), # when Control is TRUE, Pf is TRUE
ifelse(Pv == "TRUE","Pv","Negative"), # when Control is TRUE, Pf is FALSE
),
"Invalid" # when Control is FALSE
)
But I like magic tricks so you could do follow:
num_code <- (
as.numeric(as.logical(Control))
+ 2*as.numeric(as.logical(Pf))
+ 4*as.numeric(as.logical(Pv))
) # values are 0,1,2,...,7
# then
RDTbase$Result <- c(
"Invalid" , # 0 = F,F,F # Control, Pf, Pv
"Negative", # 1 = T,F,F
"Invalid" , # 2 = F,T,F
"Pf" , # 3 = T,T,F
"Invalid" , # 4 = F,F,T
"Pv" , # 5 = T,F,T
"Invalid" , # 6 = F,T,T
"Mixed" , # 7 = T,T,T
)[num_code+1]
It's nice trick when you need to decode several logical column to character.
Using transform makes this compact and elegant:
transform(a, Result =
ifelse(Control,
ifelse(Pf,
ifelse(Pv, "Mixed", "Pf"),
ifelse(Pv, "Pv", "Negative")),
"Invalid"))
Yields
Control Pf Pv Result
1 TRUE TRUE FALSE Pf
2 TRUE FALSE TRUE Pv
3 FALSE FALSE FALSE Invalid
4 TRUE TRUE TRUE Mixed
5 TRUE FALSE FALSE Negative
Alternatively, building on Marek's version we can use logical vectors to calculate the index slightly more compactly:
a$Result = apply(a,1,
function(x){
c(rep("Invalid", 4), "Negative", "Pv", "Pf", "Mixed")
[1+sum(c(4,2,1)[x])]})