Ifelse in R or apply - r

I am attempting to make elements in the first column of my df null (no entry at all) if it is equal to the element in the same row in the second column. This is a very simple thing, but I haven't been able to find the answer in the message boards.
Below are two of my attempts:
ifelse(y2014[y2014[,1]==y2014[,2]],y2014[,1]=="",y2014[,1]==y2014[,1])
y2014$new=ifelse(y2014[,1]==y2014[,2],0,y2014[,1])
Both give the following error: "level sets of factors are different" I checked the number of levels in each and they're equal, though several cells are blank in column 2. Would an apply function work better for what I'm trying to accomplish?
Really appreciate your help for a newbie.

Two things, factors generally need to be converted to character prior to comparing, and you want to assign NA rather than 0 to the value.
Something like this might be better:
y2014$new <- y2014[,1]
y2014$new[as.character(y2014$new) == as.character(y2014[,2])] <- NA

Related

Adressing columns based on only parts of the name in order to simplify lines

My first question here and I am not very experienced, however I hope this question is easy enough to answer since I only want to know if what I describe in the title is possible.
I have multiple dataframes taken from online capacity tests participants did.
For all Items I have response, score, and durationvariables among others.
Now I want to delete rows where all responsevariables are NA. So I can't just use a command to delete rows with where all is NA but there are also to many columns to do it by hand. And I also want to keep the dataframe together while doing it in order to really drop the complete rows, so just extracting all responsevariables doesn't sound like a good option.
However, besides a 3digit number based on the specific items the responsevariablenames are basically the same.
So instead of writing a very long impractical line mentioning all responsevariables and to drop the row if they all contain NA is there a way to not use the full anme of a variable but only use the end of the name for example so R checks the condition for all variables ending that way?
simplified e.g: instead of
newdf <- olddf[!(olddf$item123response != NA & olddf$item131response != NA & etc),]
Can I just do something like newdf <- olddf[!(olddf$xxxresponse != NA),] ?
I tried to google an answer but I didn't know how to frame my question effectively.
Thanks in advance!
Try This
newdf <- olddf[complete.cases(olddf[, grep('response', names(olddf))]), ]

Is there a way to select only partial matching and not the exact ones

I have two character vectors. The first one contains some references of natural disasters: "avalanche","flash flood","thunderstorm winds"...etc.
The second one with similar data but slightly different:"avalanche","flood","heat","winds" etc..... I am trying to find, not only the exact, but also the partial matching of the first one within the second one. So I thought I could do it separetely.
The first one is quite direct: match(dt_event,ref_event). Now I need also the "thunderstom winds" and"floash flood" to be considered as "flood" and "winds" from the ref_event, and thus get the indices as well instead of NAs. If there is a way to have both exact and partial matching in one command that would be better. Thanks in advance.
dt_event <- c("avalanche","flash flood","thunderstorm winds")
ref_event <- c("avalanche","flood","winds")
match(dt_event,ref_event)
1 NA NA
Assuming that two vectors are ordered and you want to compare element-wise:
dt_event[unlist(Map(grepl, ref_event, dt_event))]
or in a more general case:
dt_event[sapply(ref_event, function(x) which(grepl(x, dt_event)))]

R: NA Issue with checking if all elements in column are the same

There are several threads asking how to check if all elements in a vector are the same. That is not my issue.
I have been using the all function in R with no issues. I wanted to assess if all elements of a column mydataframe$colA are the same as in mydataframe$colB:
if(all(mydataframe$colA == mydataframe$colB) == FALSE) {...}
However today I started to see NA as being returned as result of the all function, instead of a boolean. I have tried other ways to find if all elements are the same or not. For example:
table(mydataframe$colA == mydataframe$colB) gives me only TRUE
So indeed all my values from column A are the same as the ones in column B.
What can be wrong here? I stress that my script has been working fine, and even today I ran these same lines 8 times before with data from different samples with no problem. All data and all samples are supposed to be in the exact same format of each other.
Thanks in advance!

How to "select" multiple rows that have a certain variable value?

I also want to be able to count/sum the number of these rows.
I can't seem to find this answer anywhere and it's so simple.
In Python it's something like:
df[df[value1]=="<value2>"].count()
How is this done in R?
Given your comment about NA you should be able to combine #kdopen response with is.na() to exclude the NA values in df$VALUE:
sum(df$VALUE[!is.na(df$VALUE)]==24)

R: creating factor using data from multiple columns

I want to create a column that codes for whether patients have had a comorbid diagnosis of depression or not. Problem is, the diagnosis can be recorded in one of 4 columns:
ComorbidDiagnosis;
OtherDiagnosis;
DischargeDiagnosis;
OtherDischargeDiagnosis.
I've been using
levels(dataframe$ynDepression)[levels(dataframe$ComorbidDiagnosis)=="Depression"]<-"Yes"
for all 4 columns but I don't know how to code those who don't have a diagnosis in any of the columns. I tried:
levels(dataframe$ynDepression)[levels(dataframe$DischOtherDiagnosis &
dataframe$OtherDiagnosis &
dataframe$ComorbidDiagnosis &
dataframe$DischComorbidDiagnosis)==""]<-"No"
I also tried using && instead but it didn't work. Am I missing something?
Thanks in advance!
Edit: I tried uploading an image of some example data but I don't have enough reputations to upload images yet. I'll try to put an example here but might not work:
Patient ID PrimaryDiagnosis OtherDiagnosis ComorbidDiagnosis
_________AN__________Depression
_________AN
_________AN__________Depression______PTSD
_________AN_________________________Depression
What's inside the [] must be (transformable to) a boolean for the subset to work. For example:
x<-1:5
x[x>3]
#4 5
x>3
# F F F T T
works because the condition is a boolean vector. Sometimes, the booleanship can be implicite, like in dataframe[,"var"] which means dataframe[,colnames(dataframe)=="var"] but R must be able to make it a boolean somehow.
EDIT : As pointed out by beginneR, you can also subset with something like df[,c(1,3)], which is numeric but works the same way as df[,"var"]. I like to see that kind of subset as implicit booleans as it enables a yes/no choice but you may very well not agree and only consider that they enable R to select columns and rows.
In your case, the conditions you use are invalid (dataframe$OtherDiagnosisfor example).
You would need something like rowSums(df[,c("var1","var2","var3")]=="")==3, which is a valid condition.

Resources