Error in eval_tidy(expr = expr, data = data.subset) - r

I have a list of genes (HFE, ATP7B, ITPA) and I am using the code below to calculate the number of samples that have expression above 0 for each gene and finally the number of samples in a Seurat object that do not express any of the three genes with two conditions: the batch is "First" and the cell type is Endo.
subset(x= seurat_main, subset = batch == "first" & HFE >0 & CellType == "Endo")
subset(x= seurat_main, subset = batch == "first" & ATP7B >0 & CellType == "Endo")
subset(x= seurat_main, subset = batch == "first" & ITPA >0 & CellType == "Endo")
The code above worked and I want to use the same code and change the condition to not have gene expression above 0 for any of the three genes. I used the code bellow:
subset(x = seurat_main, subset = batch == "first" & !(HFE > 0 | ATP7B > 0 | ITPA > 0) & CellType == "Endo")
It gave me this error message:
Error in eval_tidy(expr = expr, data = data.subset) :
object 'HFE' not found
Although, the gene is there and the spiling is correct because the first code above worked, and this didn’t. Could you help me?
Thank you!

Related

Replace logical values conditionally in R

I am sure this question has been asked before and has an easy solution, but I can't seem to find it.
I am trying to conditionally replace the logical value of a variable based on the value of other variables in the data. Specifically, I am trying to determine eligibility based on survey responses.
I have created my eligibility variable in dataframe screen:
screen$eligible <- ifelse (
(screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& (screen$county_1 == 17 | screen$county_1 == 27 | screen$county_1 == 31)
& (screen$residence_1 == 47),
TRUE,
FALSE)
And now, based on study changes, I would like to further limit eligibility. I tried the code below, and it works in part, but it appears that I am introducing NAs to my eligibility variable and missing out on folks who should be eligible.
screen$eligible <- ifelse( screen$eligible ==TRUE, ifelse(
(screen$gender_1 == 1 & screen$age > 18)
|(screen$gender_8 == 1 & screen$age > 20),
FALSE, TRUE), FALSE)
I ultimately want TRUE or FALSE values.
Two questions
Is there a clearer or more concise way to update the code to update my eligibility requirements?
Any ideas as to why I might be introducing NAs?
continuing from what #zephryl wrote, an even more readable code is:
screen$eligible <- with(screen,
(age > 17 & age < 23)
& (alcohol > 3 | marijuana > 3)
& (country == 0 | ageus < 12)
& county_1 %in% c(17, 27, 31)
& (residence_1 == 47))
to detect where are the NAs:
sapply(screen, anyNA)
1. Is there a clearer or more concise way to update the code to update my eligibility requirements?
If you ever find yourself writing x = ifelse(condition, TRUE, FALSE), as you are here -- that's equivalent to just writing x = condition. Also, your three county_1 == x statements can be replaced with one county_1 %in% c(x, y, z). So your first code block could be written as,
screen$eligible <- (screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& screen$county_1 %in% c(17, 27, 31)
& (screen$residence_1 == 47)
Likewise, your second codeblock could be simplified as:
screen$eligible <- screen$eligible
& ((screen$gender_1 == 1 & screen$age > 18)
| (screen$gender_8 == 1 & screen$age > 20))
2. Any ideas as to why I might be introducing NAs?
It's hard to say without seeing your data, but the NAs probably indicate that one or more of your constituent variables (gender_1, gender_8, age) is NA for some cases.

Count number of rows meeting multiple conditions in dataframe

I have a question. I'm working on a database with patients and multiple conditions I scored as yes/no or numbers. I first counted the number of patients (rows) in which patients meet at least one criteria of 5, see this code (working):
nrow( df_1[df_1$tenderness_CS != 'no' | df_1$intoxication != 'no' |
df_1$focal_neuro_deficits != 'no' | df_1$EMV <= 13 | df_1$distr_injury != 'no',] )
But now I want to count how many patients meet 2, 3 and 4 criteria of the above standing. Doesn't matter which of the 5 criteria are met, just if 2 or 3 are met. I really don't know how to do that.
Any help? Thanks!
You can do
n_conditions <- (df_1$tenderness_CS != 'no') +
(df_1$intoxication != 'no') +
(df_1$focal_neuro_deficits != 'no') +
(df_1$EMV <= 13) +
(df_1$distr_injury != 'no')
which will give you a vector of the number of conditions each patient met.
You can then do
table(n_conditions)
to show the times each number of conditions was met, and
df_1[n_conditions == 3,]
To subset the dara frame to get only those patients who met 3 conditions etc.
Instead of doing +, we can make use of rowSums. The advantage is that it would also take of NA elements with na.rm argument i.e. if a particular column have NA in a row, it would result in NA if we do +
nm1 <- c("tenderness_CS", "intoxication",
"focal_neuro_deficits", "distr_injury")
n_conditions <- rowSums(cbind(df_1[nm1] != "no", df_1$EMV <= 13), na.rm = TRUE)
Now, we get the frequency of counts with table
table(n_conditions)
The logicals TRUE and FALSE can be treated like numerics 1 and 0.
So for example TRUE+TRUE is equal to 2.
So you can write:
nrow( df_1[df_1$tenderness_CS != 'no' + df_1$intoxication != 'no' +
df_1$focal_neuro_deficits != 'no' + (df_1$EMV <= 13) + df_1$distr_injury != 'no' %in% c(2,3,4),])
because this will first sum the results of each condition (1 when the condition is TRUE and 0 when it is FALSE) and then test whether the sum is in the vector c(2,3,4).

Looping over a data set, using ifelse to check the value of a column in order to set a new column (factor)

I'm new to R, I've had this issue for a week now and have attempted many searches for a solution and just can't figure this out.
I am using the House Regression data set from Kaggle and attempting to do some feature engineering. See link below for more information about the data set
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
In short, there are two columns: Condition1 and Condition2. Each has 9 factor levels of the same factors. 4/9 factors specify that the house is near a railroad. Instead of using 18 factors to specify if the house is near a railroad, I'm attempting to create a new column: ByRR. This column is a binary column, 0 if the value in either of these two columns is in the RR_List list, 1 if it is.
I've done a series of different methods to try to accomplish this task. The latest is running an lapply function over the data set. I get the following warning message and all of the values are set to 0. I know that there are values that should be = 1.:
'''
RR_List = c("RRNn","RRAn","RRNe", "RRAe")
data <-data[c("Condition1","Condition2")]
data$ByRR = factor(x="No",levels= c("Yes","No"), labels=c(1,0))
lapply(data,function(x) {
x$ByRR <- ifelse(data$Condition1 %in% RR_List ||
data$Condition2 %in% RR_List, 1, 0)
})
Warning messages:
1: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in%
Coercing LHS to a list
2: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in% :
Coercing LHS to a list
3: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in% :
Coercing LHS to a list
Any help with this is much appreciated!
No need to use lapply, try using |
data$ByRR <- +(data$Condition1 %in% RR_List | data$Condition2 %in% RR_List)
This will create a new column ByRR in data if either of Condition1 or Condition2 has value from RR_List. + converts the boolean value (TRUE, FALSE) to integer value (1, 0) respectively.
If you need values to be "Yes"/"No" instead of 1/0 use
data$ByRR <- c("No", "Yes")[(data$Condition1 %in% RR_List |
data$Condition2 %in% RR_List) + 1]

Is there a way to combine if statment with filter function?

I need to filter out some rows if a condition is true.
Tried this:
if(DiffTask$TaskCue == "square1"){DiffTasks<-subset(data,-subject==10)}
Warning message:
In if (DiffTask$TaskCue == "square1") { : the condition has length >
1 and only the first element will be used
You can connect any number of logical conditions, e.g.
DiffTask <- data.frame(TaskCue = rep(c("square1", "square2"), 5),
subject = rep(10:6, 2))
subset(DiffTask, !(TaskCue == "square1" & subject == 10))

Subsetting data in R returns 0 columns, 619 rows

Currently I am working with a dataset in r that I imported from a SPSS file converted to a csv. The data includes multiple factors such as gender, ethnicity, and test group, along with a set of weights I want to sum. I want to sum these weights based on multiple conditions (i.e. female + white + group1) so I tried subsetting the data.
small.set<-subset(df, df[,"gender"]==1 & df[,"ethnicity"] ==1 &
df[,"group"==1])
However, I get the following error:
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,
: 'data' must be of a vector type, was 'NULL'
I found that when trying to select group 1 in any case, R returned strange results:
df["group"==1]
> data frame with 0 columns and 619 rows
The structure of "group" is as follows:
str(df["group")
>Factor w/ 3 levels "1", "2", "3": 1 3 1 1 2...
Does anyone know what is causing this to happen?
why dont you aboid using sample and use directly:
small.set<-df[df$gender == 1 & df$ethnicity == 1 && df$group == 1,]
Another good way is using data.table package:
library(data.table)
df<-data.table(df)
small.set<-df[.(gender==1,ethnicity == 1,group == 1)]

Resources