I have a list of genes (HFE, ATP7B, ITPA) and I am using the code below to calculate the number of samples that have expression above 0 for each gene and finally the number of samples in a Seurat object that do not express any of the three genes with two conditions: the batch is "First" and the cell type is Endo.
subset(x= seurat_main, subset = batch == "first" & HFE >0 & CellType == "Endo")
subset(x= seurat_main, subset = batch == "first" & ATP7B >0 & CellType == "Endo")
subset(x= seurat_main, subset = batch == "first" & ITPA >0 & CellType == "Endo")
The code above worked and I want to use the same code and change the condition to not have gene expression above 0 for any of the three genes. I used the code bellow:
subset(x = seurat_main, subset = batch == "first" & !(HFE > 0 | ATP7B > 0 | ITPA > 0) & CellType == "Endo")
It gave me this error message:
Error in eval_tidy(expr = expr, data = data.subset) :
object 'HFE' not found
Although, the gene is there and the spiling is correct because the first code above worked, and this didn’t. Could you help me?
Thank you!
Related
I am sure this question has been asked before and has an easy solution, but I can't seem to find it.
I am trying to conditionally replace the logical value of a variable based on the value of other variables in the data. Specifically, I am trying to determine eligibility based on survey responses.
I have created my eligibility variable in dataframe screen:
screen$eligible <- ifelse (
(screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& (screen$county_1 == 17 | screen$county_1 == 27 | screen$county_1 == 31)
& (screen$residence_1 == 47),
TRUE,
FALSE)
And now, based on study changes, I would like to further limit eligibility. I tried the code below, and it works in part, but it appears that I am introducing NAs to my eligibility variable and missing out on folks who should be eligible.
screen$eligible <- ifelse( screen$eligible ==TRUE, ifelse(
(screen$gender_1 == 1 & screen$age > 18)
|(screen$gender_8 == 1 & screen$age > 20),
FALSE, TRUE), FALSE)
I ultimately want TRUE or FALSE values.
Two questions
Is there a clearer or more concise way to update the code to update my eligibility requirements?
Any ideas as to why I might be introducing NAs?
continuing from what #zephryl wrote, an even more readable code is:
screen$eligible <- with(screen,
(age > 17 & age < 23)
& (alcohol > 3 | marijuana > 3)
& (country == 0 | ageus < 12)
& county_1 %in% c(17, 27, 31)
& (residence_1 == 47))
to detect where are the NAs:
sapply(screen, anyNA)
1. Is there a clearer or more concise way to update the code to update my eligibility requirements?
If you ever find yourself writing x = ifelse(condition, TRUE, FALSE), as you are here -- that's equivalent to just writing x = condition. Also, your three county_1 == x statements can be replaced with one county_1 %in% c(x, y, z). So your first code block could be written as,
screen$eligible <- (screen$age > 17 & screen$age < 23)
& (screen$alcohol > 3 | screen$marijuana > 3)
& (screen$country == 0 | screen$ageus < 12)
& screen$county_1 %in% c(17, 27, 31)
& (screen$residence_1 == 47)
Likewise, your second codeblock could be simplified as:
screen$eligible <- screen$eligible
& ((screen$gender_1 == 1 & screen$age > 18)
| (screen$gender_8 == 1 & screen$age > 20))
2. Any ideas as to why I might be introducing NAs?
It's hard to say without seeing your data, but the NAs probably indicate that one or more of your constituent variables (gender_1, gender_8, age) is NA for some cases.
I have a question. I'm working on a database with patients and multiple conditions I scored as yes/no or numbers. I first counted the number of patients (rows) in which patients meet at least one criteria of 5, see this code (working):
nrow( df_1[df_1$tenderness_CS != 'no' | df_1$intoxication != 'no' |
df_1$focal_neuro_deficits != 'no' | df_1$EMV <= 13 | df_1$distr_injury != 'no',] )
But now I want to count how many patients meet 2, 3 and 4 criteria of the above standing. Doesn't matter which of the 5 criteria are met, just if 2 or 3 are met. I really don't know how to do that.
Any help? Thanks!
You can do
n_conditions <- (df_1$tenderness_CS != 'no') +
(df_1$intoxication != 'no') +
(df_1$focal_neuro_deficits != 'no') +
(df_1$EMV <= 13) +
(df_1$distr_injury != 'no')
which will give you a vector of the number of conditions each patient met.
You can then do
table(n_conditions)
to show the times each number of conditions was met, and
df_1[n_conditions == 3,]
To subset the dara frame to get only those patients who met 3 conditions etc.
Instead of doing +, we can make use of rowSums. The advantage is that it would also take of NA elements with na.rm argument i.e. if a particular column have NA in a row, it would result in NA if we do +
nm1 <- c("tenderness_CS", "intoxication",
"focal_neuro_deficits", "distr_injury")
n_conditions <- rowSums(cbind(df_1[nm1] != "no", df_1$EMV <= 13), na.rm = TRUE)
Now, we get the frequency of counts with table
table(n_conditions)
The logicals TRUE and FALSE can be treated like numerics 1 and 0.
So for example TRUE+TRUE is equal to 2.
So you can write:
nrow( df_1[df_1$tenderness_CS != 'no' + df_1$intoxication != 'no' +
df_1$focal_neuro_deficits != 'no' + (df_1$EMV <= 13) + df_1$distr_injury != 'no' %in% c(2,3,4),])
because this will first sum the results of each condition (1 when the condition is TRUE and 0 when it is FALSE) and then test whether the sum is in the vector c(2,3,4).
I'm new to R, I've had this issue for a week now and have attempted many searches for a solution and just can't figure this out.
I am using the House Regression data set from Kaggle and attempting to do some feature engineering. See link below for more information about the data set
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
In short, there are two columns: Condition1 and Condition2. Each has 9 factor levels of the same factors. 4/9 factors specify that the house is near a railroad. Instead of using 18 factors to specify if the house is near a railroad, I'm attempting to create a new column: ByRR. This column is a binary column, 0 if the value in either of these two columns is in the RR_List list, 1 if it is.
I've done a series of different methods to try to accomplish this task. The latest is running an lapply function over the data set. I get the following warning message and all of the values are set to 0. I know that there are values that should be = 1.:
'''
RR_List = c("RRNn","RRAn","RRNe", "RRAe")
data <-data[c("Condition1","Condition2")]
data$ByRR = factor(x="No",levels= c("Yes","No"), labels=c(1,0))
lapply(data,function(x) {
x$ByRR <- ifelse(data$Condition1 %in% RR_List ||
data$Condition2 %in% RR_List, 1, 0)
})
Warning messages:
1: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in%
Coercing LHS to a list
2: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in% :
Coercing LHS to a list
3: In x$ByRR <- ifelse(data$Condition1 %in% RR_List || data$Condition2 %in% :
Coercing LHS to a list
Any help with this is much appreciated!
No need to use lapply, try using |
data$ByRR <- +(data$Condition1 %in% RR_List | data$Condition2 %in% RR_List)
This will create a new column ByRR in data if either of Condition1 or Condition2 has value from RR_List. + converts the boolean value (TRUE, FALSE) to integer value (1, 0) respectively.
If you need values to be "Yes"/"No" instead of 1/0 use
data$ByRR <- c("No", "Yes")[(data$Condition1 %in% RR_List |
data$Condition2 %in% RR_List) + 1]
I need to filter out some rows if a condition is true.
Tried this:
if(DiffTask$TaskCue == "square1"){DiffTasks<-subset(data,-subject==10)}
Warning message:
In if (DiffTask$TaskCue == "square1") { : the condition has length >
1 and only the first element will be used
You can connect any number of logical conditions, e.g.
DiffTask <- data.frame(TaskCue = rep(c("square1", "square2"), 5),
subject = rep(10:6, 2))
subset(DiffTask, !(TaskCue == "square1" & subject == 10))
Currently I am working with a dataset in r that I imported from a SPSS file converted to a csv. The data includes multiple factors such as gender, ethnicity, and test group, along with a set of weights I want to sum. I want to sum these weights based on multiple conditions (i.e. female + white + group1) so I tried subsetting the data.
small.set<-subset(df, df[,"gender"]==1 & df[,"ethnicity"] ==1 &
df[,"group"==1])
However, I get the following error:
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,
: 'data' must be of a vector type, was 'NULL'
I found that when trying to select group 1 in any case, R returned strange results:
df["group"==1]
> data frame with 0 columns and 619 rows
The structure of "group" is as follows:
str(df["group")
>Factor w/ 3 levels "1", "2", "3": 1 3 1 1 2...
Does anyone know what is causing this to happen?
why dont you aboid using sample and use directly:
small.set<-df[df$gender == 1 & df$ethnicity == 1 && df$group == 1,]
Another good way is using data.table package:
library(data.table)
df<-data.table(df)
small.set<-df[.(gender==1,ethnicity == 1,group == 1)]