Combining logic in subsetting

Combining logic in subsetting - r

If I am sub-setting using logical statements, is there a way of combining without using logical operators? i.e. is there a more effective way of doing the following:
train$TOD[train$Hour == 23 | train$Hour == 0 | train$Hour == 1 | train$Hour == 2]

With a reproducible example it could be great but I think that this code is what you are looking for:
train[train$Hour %in% c(0, 1, 2, 23), ]

Related

ifelse command in R not populating column in data frame?

I would appreciate any help with this error I am getting in my code for a research project I am working on in R:
I am trying to create a column (named non_political) in a data frame (named privacy, imported from an sav file) representing survey data where:
1 signifies that the respondent was non-political (answered in a non-political way to some questions) and
0 signifies the opposite.
So far, I have written:
privacy$non_political<-NA
privacy$non_political<-ifelse(((privacy$q19 == 3) | (privacy$q19 == 4) | (privacy$q20 == 2) | (privacy$q21==2)), 1, 0)
but
head(privacy$non_political)
returns NA's along with 1's, which means that the 0 option is never executed in the ifelse command.
What could I be doing wrong here?
Thank you!

Using == will generate NA's as output if you have NA in your data. You can use %in% which will return FALSE when comparing with NA.
privacy$non_political<- with(privacy,
ifelse(q19 %in% c(3, 4) | q20 %in% 2 | q21 %in% 2, 1, 0))
You can do this without ifelse as well.
privacy$non_political<- with(privacy,
as.integer(q19 %in% c(3, 4) | q20 %in% 2 | q21 %in% 2))

How to pull out columns in r based on various criteria

I have a huge data.set in R (1mil+ rows) and 51 columns. One of my columns is "StateFIPS" the other is "CountyFIPS" and another is "event type". The rest I do not care about.
Is there an easy way to take that dataframe and pull out all the columns that have "StateFIPS"=3 AND "CountyFIPS=4" AND "event type"=Tornado, and put all those rows into a new dataframe.
Thanks!

We can use subset
df2 <- subset(df1, StateFIPS == 3 & CountyFIPS == 4 & `event type` == "Tornado")

It is quite easy. This should do it (supposing your data.frame is named "data_set")
new_data <- data_set[(data_set$CountyFIPS == 4) |
(data_set$event_type == 'Tornado') |
(data_set$StateFIPS == 3),]

Sure,
You can sue the which() command, see https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/which
You can then use any logical conditions (and combine them with & (=and) and | (=or)

Is there a way to use the "which"-function on several matrix rows without using & several times?

I am working with relatively large matrices (2197x100) and need to quickly select specific rows. The first 7 columns are values with which I can identify the needed rows, as in this example:
design_matrix = matrix(c(rep(1:3, each = 729),
rep(rep(1:3, each = 243), 3),
rep(rep(1:3, each = 81), 9),
rep(rep(1:3, each = 27), 27),
rep(rep(1:3, each = 9), 81),
rep(rep(1:3, each = 3), 243),
rep(1:3, 729)),
nrow = 2187)
I know I can use the function which() to find rows of a matrix with specific values. I am also aware I can use & to use multiple criteria. Since I need to check several entries multiple times, I am trying to find a way to do something like this:
which(design_matrix[,1:6] == c(1,1,1,1,1,1))
to get the rows with the respective values, in this case c(1,2,3). Instead I get the TRUE values of each element-wise comparison. Is there a way to do this without having to use mutliple & like in
which((design_matrix[,1] == 1) & (design_matrix[,2] == 1) & (design_matrix[,3] == 1) &
(design_matrix[,4] == 1) & (design_matrix[,5] == 1) & (design_matrix[,6] == 1))
which does what I want but would need to be rewritten whenever I need different values?
I'm using the which() function to subset my data as in design_matrix[which(design_matrix[,1]==1),], so if there is an easy way of doing this using subset that would also answer my question.

You can use sweep with margin 2 and the function != and rowSums with which to get the rows which have c(1,1,1,1,1,1).
which(rowSums(sweep(design_matrix[,1:6], 2, c(1,1,1,1,1,1), "!="))==0)
#[1] 1 2 3

You could define a key to make lookup trivial and less computationally expensive if you have to do this a lot:
key <- apply(design_matrix, 1, paste0, collapse = "")
design_matrix[key == "2133132", ]
#> [1] 2 1 3 3 1 3 2

Another solution is to use apply():
which(apply(design_matrix[,1:6],
1,
function(row)
all(row == c(1,1,1,1,1,1))
)
)
# [1] 1 2 3

How to recode in a tidy manner with better looking code

I am a medical researcher. I have a very large administrative database where the diagnoses are included in columns with headers dx1 - dx15 (dx = diagnosis). These columns contain numbers/letter codes which are in character form in R. I have written code to run through these dx columns, but would like to rewrite the code in the form of an array. I can do that easily in SAS, but am finding it difficult to do the same in R.
I am attaching the code that I use here:
a <- as.character(c("4578","4551")) # here I identify initially the codes for the diagnosis that I am interested in.
Then I create a new variable cancer in my dataframe df and use this code to identify patients with cancer. the new variable df$cancer will be either 0 or 1 depending upon diagnosis.
The code work, but as you can see, is not tidy and elegant at all.
df$cm_cancer <- with(df, ifelse((dx3 %in% a | dx4 %in% a | dx5 %in% a |
dx6 %in% a | dx7 %in% a | dx8 %in% a | dx9 %in% a |
dx10 %in% a | dx11 %in% a | dx12 %in% a | dx13 %in% a |
dx14 %in% a | dx15 %in% a), 1, 0))
With SAS, I can do the same with this elegant piece of code:
data df2;
set df;
cancer = 0;
array dgn[15] dx1 - dx15;
do i = 1 to 15;
if dgn[i] in ("4578","4551") then
cancer = 1;
end;
drop i;
run;
I refuse to believe that SAS has better answers for this than R; just agree that I am still a novice in the use of R.
Any help welcome; believe me, I have tried to google to find arrays in R, loops in R; anything that would help me to rewrite this code better.

Logical Operators not subsetting as expected

I am trying to create a subset of the rows that have a value of 1 for variable A, and a value of 1 for at least one of the following variables: B, C, or D.
Subset1 <- subset(Data,
Data$A==1 &
Data$B ==1 ||
Data$C ==1 |
Data$D == 1,
select= A)
Subset1
The problem is that the code above returns some rows that have A=0 and I am not sure why.
To troublehsoot:
I know that && and || are the long forms or and and or which vectorizes it.
I have run this code several times using &&, ||,& and | in different places. Nothing returns what I am looking for exactly.
When I shorten the code, it works fine and I subset only the rows that I would expect:
Subset1 <- subset(Data,
Data$A==1 &
Data$B==0,
select= A)
Subset1
Unfortunately, this doesn't suffice since I also need to capture rows whose C or D value = 1.
Can anyone explain why my first code block is not subsetting what I am expecting it to?

You can use parens to be more specific about what your & is referring to. Otherwise (as #Patrick Trentin clarified) your logical operators are combined according to operator precedence (within the same level of precedence they are evaluated from left to right).
Example:
> FALSE & TRUE | TRUE #equivalent to (FALSE & TRUE) | TRUE
[1] TRUE
> FALSE & (TRUE | TRUE)
[1] FALSE
So in your case you can try something like below (assuming you want items that A == 1 & that meet one of the other conditions):
Data$A==1 & (Data$B==1 | Data$C==1 | Data$D==1)

Since you didn't provide the data you're working with, I've replicated some here.
set.seed(20)
Data = data.frame(A = sample(0:1, 10, replace=TRUE),
B = sample(0:1, 10, replace=TRUE),
C = sample(0:1, 10, replace=TRUE),
D = sample(0:1, 10, replace=TRUE))
If you use parenthesis, which can evaluate to a logical function, you can achieve what you're looking for.
Subset1 <- subset(Data,
Data$A==1 &
(Data$B == 1 |
Data$C == 1 |
Data$D ==1),
select=A)
Subset1
A
1 1
2 1
4 1
5 1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Combining logic in subsetting - r

If I am sub-setting using logical statements, is there a way of combining without using logical operators? i.e. is there a more effective way of doing the following: train$TOD[train$Hour == 23 | train$Hour == 0 | train$Hour == 1 | train$Hour == 2]

With a reproducible example it could be great but I think that this code is what you are looking for: train[train$Hour %in% c(0, 1, 2, 23), ]

Related

ifelse command in R not populating column in data frame?

How to pull out columns in r based on various criteria

Is there a way to use the "which"-function on several matrix rows without using & several times?

How to recode in a tidy manner with better looking code

Logical Operators not subsetting as expected

Categories

Resources