Matching multiple column criteria in an R dataframe from integer vectors - r

I have a large dataframe, populated with 1's and 0's.
I have two integer vectors, "a" and "b" which relate to specific columns in the dataframe. No column reference in a exists in b, and vice versa (i.e. no intersect).
What I'm trying to do is generate a new column containing a flag when:
ANY of the columns in "a" are 1 (on a given row) and
ALL of the columns in "b" are 0 (on the same row)
I'm trying to do this by:
processed.tbl$flag <- ifelse(processed.tbl[, a] == 1 & processed.tbl[, b] ==0,
1, 0)
but I get an error of non-conformable arrays, presumably because it's trying to join the two table subsets. How do I do this correctly (in base R ideally)?
Thanks.

Okay; think I've found a way to do this, but please do add if there's a slicker way!
processed.tbl$flag[
which( apply(processed.tbl[, b], MARGIN = 1, function(x) all (x == 0))
&apply(processed.tbl[, a], MARGIN = 1, function(y) any (y == 1))
, arr.ind = FALSE)] <- 1

Related

How to get rows where a condition in range of columns is meet?

I hope I am formatting my question right so following the next example
Lets say I have a dataframe that looks like this:
df <- data.frame <- name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
number = c(0,1,2,3,1,0,2,1),
number2different c(0,1,1,1,2,1,1,0),
number3differentname = c(0,1,0,1,2,0,1,2)
And if wanted to select or create a subset where only the rows from the column called name showed where all the other columns had a value of 0 for example Sai which only has values next to it that equal 0 which command would work the best?
I tried searching about selecting row based on columns but the conditions are always for a single column its there a way to select rows based on conditions for a range of columns to be meet?
Thanks a lot for your help i really appreciate it.
You could use tidyverse "filter" with multiple conditions using "&":
df %>% filter(number == 0 & number2different == 0 & number3different == 0)
# Sample Data
df <- data.frame(name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
number = c(0,1,2,3,1,0,2,1),
number2different = c(0,1,1,1,2,1,1,0),
number3differentname = c(0,1,0,1,2,0,1,2))
# Subsetting by the 'name' column those rows where all other columns equal zero
df$name[rowSums(df[names(df) != "name"]) == 0]
# Subsetting by row where the specified columns sum to zero
df[rowSums(df[names(df) != "name"]) == 0,]
# Subsetting by row where the specified columns fall within the range (1:2)
df[rowSums(sapply(df[names(df) != "name"], function(x) x >= 1 & x <= 2)) == ncol(df[names(df) != "name"]),]

R loop to list all the values of a variable

As an exercice, I have to create a vector of all the different values of a variable (a$dep).
This vector can be created with the code: unique(a$dep)
I need to create this vector using a for loop
I wrote a loop that doesn't give the right result but I don't understand where is the problem:
v<-vector()
for (i in seq_along(a$dep)){
v<-ifelse(a$dep[i] %in% v, v,c(v,a$dep[i]))
}
Thank you very much for your help !
Based on the description, if we need unique values an if condition is sufficient i.e. loop over the sequence of 'dep' column if the element is not (!) %in% 'v', append that element to 'v' and update the 'v' by assignment (<-)
v <- vector()
for(i in seq_along(a$dep)) {if(!a$dep[i] %in% v) v <- c(v, a$dep[i])}
As ifelse requires all arguments to be of same length, 'v' is dynamic in length as we concatenate elements to it, thus, the 'yes', 'no' (always length 1 -a$dep[i]) mismatches in length.
One option with ifelse would be to initiate a vector 'v' with the same length as the 'dep' column length, then use ifelse to check whether the 'dep' element is %in% the whole vector (return TRUE/FALSE - length 1), then return blank (yes - "" - length 1) or else return the element of 'dep (no - a$dep[i]- length 1)
v <- character(nrow(a))
for(i in seq_along(a$dep)) v[i] <- ifelse(a$dep[i] %in% v, "", a$dep[i])
and then remove the blank elements
v[v != ""]
#[1] "a" "b" "c" "e"
The ifelse is useful as vectorized function and its use would not be optimal here
data
a <- data.frame(dep = c('a', 'b', 'a', 'c', 'e', 'a'))

Find duplicate in data frame and change identified value

I am stuck with probably a stupid and easy to solve issue.
I have a trigger that code 1 when the computer key is pressed (and) and 0 when the key is released. I need to identify each trigger start and stop (i.e., first and last 1) and replace the 1 in between by 0. The data record is time (continuous, t below) and value (electrodermal activity, value). To process the data more quickly, I need to preprocess it, that is identify the 1 corresponding to the beginning and the end of the window of interest.
Please find an exemple of the code:
t <- seq(0.1,10,0.1)
value <- rnorm(length(t), mean=1, sd=2)
trig <- c(rep(0,20),rep(c(rep(1,10), rep(0,10)),4))
id <- 1:length(t)
the expected output is
trig_result <- c(rep(0,20), rep(c(1, rep(0,8),1,rep(0,10)),4)); length(trig_result)
The use of duplicate only identify the first 1 and the last one but not the intermediate value. I have seen similar post, but none solve the identification issue.
I look into dplyr function but I cannot figure out how to replace the 1 in 0 to end the preprocessing phase.
Your help will be greatly appreciated.
Sincerely your,
Here's a base R solution with rle and cumsum:
result <- rep(0,length(trig))
result[head(cumsum(rle(trig)$lengths)+c(1,0),-1)] <- 1
all.equal(result,trig_result)
#[1] TRUE
Note that this solution assumes the data begins and ends with 0.
Here is another base R solution, using logical vectors.
borders <- function(x, b = 1){
n <- length(x)
d1 <- c(x[1] == b, diff(x) != 0 & x[-1] == b)
d2 <- c(rev(diff(rev(x)) != 0 & rev(x[-n]) == b), x[n] == b)
d1 + d2
}
trig <- c(rep(0,20),rep(c(rep(1,10), rep(0,10)),4))
tr <- borders(trig)
The result is not identical() to the expected output because its class is different but the values are all.equal().
trig_result <- c(rep(0,20), rep(c(1, rep(0,8),1,rep(0,10)),4))
identical(trig_result, tr) # FALSE
all.equal(trig_result, tr) # TRUE
class(trig_result)
#[1] "numeric"
class(tr)
#[1] "integer"
One option is to create a grouping index with rle or rleid (from data.table)
library(data.table)
out <- ave(trig, rleid(trig), FUN = function(x)
x == 1 & (!duplicated(x) | !duplicated(x, fromLast = TRUE)))
identical(trig_result, out)
#[1] TRUE
You'd like to find the starts and ends of runs of 1s, and remove all 1s that aren't the start or end of a run.
The start of a run of ones is where the value of the current row is a 1, and the value of the previous row is a 0. You can access the value of previous row using the lag function.
The end of a run of 1s is where the current row is a 1, and the next row is a zero. You can access the value of the next row using the lead function.
library(tidyverse)
result = tibble(Trig = trig) %>%
mutate(StartOfRun = Trig == 1 & lag(Trig == 0),
EndOfRun = Trig == 1 & lead(Trig == 0),
Result = ifelse(StartOfRun | EndOfRun, 1, 0)) %>%
pull(Result)

Check if a column in a dataframe is of the same value

It is a follow-up question to this one. What I would like to check is if any column in a data frame contain the same value (numerical or string) for all rows. For example,
sample <- data.frame(col1=c(1, 1, 1), col2=c("a", "a", "a"), col3=c(12, 15, 22))
The purpose is to inspect each column in a data frame to see which column does not have identical entry for all rows. How to do this? In particular, there are both numbers as well as strings.
My expected output would be a vector containing the column number which has non-identical entries.
We can use apply columnwise (margin = 2) and calculate unique values in the column and select the columns which has number of unique values not equal to 1.
which(apply(sample, 2, function(x) length(unique(x))) != 1)
#col3
# 3
The same code can also be done using sapply or lapply call
which(sapply(sample, function(x) length(unique(x))) != 1)
#col3
# 3
A dplyr version could be
library(dplyr)
sample %>%
summarise_all(funs(n_distinct(.))) %>%
select_if(. != 1)
# col3
#1 3
We can use Filter
names(Filter(function(x) length(unique(x)) != 1, sample))
#[1] "col3"

ow to get matrix column number based on pre determined value, by row

I am new to coding and am using r 3.4.0 with windows10. I have a matrix with 64 columns and 17000 rows. By row, Each column contains a number from 1 to 64 with no duplicates or missing values. I want to search each row for a value of 1 across all columns and return the column name or number that contains the 1.
Here is what I have tried:
LargeVector1 <- c(1)
which(apply(matrix, 1, function(x) any(x == LargeVector1)))
This returns the row number instead of the column number
I also tried this to try and return the col names:
colnames(matrix)[apply(matrix, 1, function(x) any(x == LargeVector1))]
This is returning all NA's. Any help would be greatly appreciated.
res =apply(matrix, 1, function(x) {if(1 %in% x) {
colnames(matrix)[which(x == 1)]}
else return NA}
Will give you a vector with each element corresponding to colname, with NA representing rows with no 1 in them

Resources