if-condition with previously unknown set of varibles - r

I did not find an answer to this - in my opinion quite basic - question. So in case I missed out on an already existing solution, I am sorry for that and would appreciate a link to the thread.
I am facing the following problem:
I want to create an if-condition whether or not an observations fulfills certain criteria. However, the set of variables i want to test is unknown, as they are created in the process and might change, depending on the data fed into the model.
I now have hard-coded the variable names, like below:
data$selectvar <- ifelse(data$crit1 == 1 | data$crit2 == 1 | data$crit3 == 1, 1, 0)
In above example, there could be cases where e.g., I only have crit1 and crit3 in the data set data, but not crit2. So the if condition would throw an error in these cases.
The way I have named the variables is that they all have the same prefix, so maybe there is a way to work with grepl or similar, but I don't know how.

There are many ways to do it, but if you want to use ifelse then you can try this..
ifelse(data[,grep("crit",colnames(data))]==1,1,0)

Related

How to combine two actions into one object

I recently just started with R a few weeks ago at the Uni. We were given a problem which we had to solve. However in this problem, I find that there are two answers that fit the question:
Verify that you created lo_heval correctly (incl. missing values). Store your verification in the object proof2.
So i find this is correct:
proof2 <- soep[1:100, c("heval", "lo_heval")]
But I think that this answer is also correct:
proof2 <- table(soep$heval, soep$lo_heval, useNA = "always")
Instead of having to decide for one answer, how do I combine them both into the object? I tried to use &, but I get an error. I may be using it wrong.
Prof. if you're seeing this, please don't fail me. I just can't decide between them.
Thanks in advance!
R lists can hold any arbitrary objects in them, so you could use
proof2 <- list(
soep[1:100, c("heval", "lo_heval")],
table(soep$heval, soep$lo_heval, useNA = "always")
)
However, to my mind 100 rows of two columns isn't proof - it's an exercise to look through those and verify things are right. (And what about the rows past 100? It's a decent spot check, but if there are more rows in the data it is more strong evidence than proof.) The table approach, on the other hand, seems succinct and effective.

Adressing columns based on only parts of the name in order to simplify lines

My first question here and I am not very experienced, however I hope this question is easy enough to answer since I only want to know if what I describe in the title is possible.
I have multiple dataframes taken from online capacity tests participants did.
For all Items I have response, score, and durationvariables among others.
Now I want to delete rows where all responsevariables are NA. So I can't just use a command to delete rows with where all is NA but there are also to many columns to do it by hand. And I also want to keep the dataframe together while doing it in order to really drop the complete rows, so just extracting all responsevariables doesn't sound like a good option.
However, besides a 3digit number based on the specific items the responsevariablenames are basically the same.
So instead of writing a very long impractical line mentioning all responsevariables and to drop the row if they all contain NA is there a way to not use the full anme of a variable but only use the end of the name for example so R checks the condition for all variables ending that way?
simplified e.g: instead of
newdf <- olddf[!(olddf$item123response != NA & olddf$item131response != NA & etc),]
Can I just do something like newdf <- olddf[!(olddf$xxxresponse != NA),] ?
I tried to google an answer but I didn't know how to frame my question effectively.
Thanks in advance!
Try This
newdf <- olddf[complete.cases(olddf[, grep('response', names(olddf))]), ]

R: creating factor using data from multiple columns

I want to create a column that codes for whether patients have had a comorbid diagnosis of depression or not. Problem is, the diagnosis can be recorded in one of 4 columns:
ComorbidDiagnosis;
OtherDiagnosis;
DischargeDiagnosis;
OtherDischargeDiagnosis.
I've been using
levels(dataframe$ynDepression)[levels(dataframe$ComorbidDiagnosis)=="Depression"]<-"Yes"
for all 4 columns but I don't know how to code those who don't have a diagnosis in any of the columns. I tried:
levels(dataframe$ynDepression)[levels(dataframe$DischOtherDiagnosis &
dataframe$OtherDiagnosis &
dataframe$ComorbidDiagnosis &
dataframe$DischComorbidDiagnosis)==""]<-"No"
I also tried using && instead but it didn't work. Am I missing something?
Thanks in advance!
Edit: I tried uploading an image of some example data but I don't have enough reputations to upload images yet. I'll try to put an example here but might not work:
Patient ID PrimaryDiagnosis OtherDiagnosis ComorbidDiagnosis
_________AN__________Depression
_________AN
_________AN__________Depression______PTSD
_________AN_________________________Depression
What's inside the [] must be (transformable to) a boolean for the subset to work. For example:
x<-1:5
x[x>3]
#4 5
x>3
# F F F T T
works because the condition is a boolean vector. Sometimes, the booleanship can be implicite, like in dataframe[,"var"] which means dataframe[,colnames(dataframe)=="var"] but R must be able to make it a boolean somehow.
EDIT : As pointed out by beginneR, you can also subset with something like df[,c(1,3)], which is numeric but works the same way as df[,"var"]. I like to see that kind of subset as implicit booleans as it enables a yes/no choice but you may very well not agree and only consider that they enable R to select columns and rows.
In your case, the conditions you use are invalid (dataframe$OtherDiagnosisfor example).
You would need something like rowSums(df[,c("var1","var2","var3")]=="")==3, which is a valid condition.

Put a variable into an object in R

Sorry about the title. I'm actually having a hard time figuring out how to even phrase the question, which is why I can't just google it.
I want to get information from a data frame in R using a variable as the column title.
test = data.frame(season=c('winter','summer'), temp=c('cold','hot'))
what.season = 'winter'
test$what.season
The third line obviously doesn't work, but what I am trying to pass it is the value of what.season so that it reads test$winter and returns 'cold'
Edit for future readers: I'm tired and I phrased it wrong, but the correct answer got at what I was trying to do.
Here is how I would do it
test[test$season == "winter", ]$temp
The $ operator at the end selects to column of interest while the logical operator == selects the row of interest
You can also use subset function
> subset(test, season==what.season, select=temp)
temp
1 cold
You can use %in% command
test$temp[test$season%in%what.season]
test$season%in%what.season will give a logical output after searching all rows (of the column test$season) for the values of what.season (winter). You can then use the logical output to filter out values from the column test$temp.
The shortest way (that I know of) would be test[test$season==what.season, 'temp'].

using value of a function & nested function in R

I wrote a function in R - called "filtre": it takes a dataframe, and for each line it says whether it should go in say bin 1 or 2. At the end, we have two data frames that sum up to the original input, and corresponding respectively to all lines thrown in either bin 1 or 2. These two sets of bin 1 and 2 are referred to as filtre1 and filtre2. For convenience the values of filtre1 and filtre2 are calculated but not returned, because it is an intermediary thing in a bigger process (plus they are quite big data frame). I have the following issue:
(i) When I later on want to use filtre1 (or filtre2), they simply don't show up... like if their value was stuck within the function, and would not be recognised elsewhere - which would oblige me to copy the whole function every time I feel like using it - quite painful and heavy.
I suspect this is a rather simple thing, but I did search on the web and did not find the answer really (I was not sure of best key words). Sorry for any inconvenience.
Thxs / g.
It's pretty hard to know the optimum way of achieve what you want as you do not provide proper example, but I'll give it a try. If your variables filtre1 and filtre2 are defined inside of your function and you do not return them, of course they do not show up on your environment. But you could just return the classification and make filtre1 and filtre2 afterwards:
#example data
df<-data.frame(id=1:20,x=sample(1:20,20,replace=TRUE))
filtre<-function(df){
#example function, this could of course be done by bins<-df$x<10
bins<-numeric(nrow(df))
for(i in 1:nrow(df))
if(df$x<10)
bins[i]<-1
return(bins)
}
bins<-filtre(df)
filtre1<-df[bins==1,]
filtre2<-df[bins==0,]

Resources