Sorry about the title. I'm actually having a hard time figuring out how to even phrase the question, which is why I can't just google it.
I want to get information from a data frame in R using a variable as the column title.
test = data.frame(season=c('winter','summer'), temp=c('cold','hot'))
what.season = 'winter'
test$what.season
The third line obviously doesn't work, but what I am trying to pass it is the value of what.season so that it reads test$winter and returns 'cold'
Edit for future readers: I'm tired and I phrased it wrong, but the correct answer got at what I was trying to do.
Here is how I would do it
test[test$season == "winter", ]$temp
The $ operator at the end selects to column of interest while the logical operator == selects the row of interest
You can also use subset function
> subset(test, season==what.season, select=temp)
temp
1 cold
You can use %in% command
test$temp[test$season%in%what.season]
test$season%in%what.season will give a logical output after searching all rows (of the column test$season) for the values of what.season (winter). You can then use the logical output to filter out values from the column test$temp.
The shortest way (that I know of) would be test[test$season==what.season, 'temp'].
Related
So I kind of already know the possible solution but I don't know how to exactly go about it so please give me a bit of grace here.
I have a dataset for youtube trends that I want to read the values from two columns (likes and dislikes) and based off their contents I want an entry to be made in the new column. If the likes are higher than the dislikes I want it to be said as a 'positive' video and if it has more dislikes it should be 'negative'.
I'm primarily not sure how to go about this since most of the previous asks are based off of one column rather than two. I know some mentioned using cut, but would it still work the same?
all help is appreciated, thanks.
You can use a simple ifelse :
df$new_col <- ifelse(df$likes > df$dislikes, 'positive', 'negative')
This can also be written without ifelse as :
df$new_col <- c('negative', 'positive')[as.integer(df$likes > df$dislikes) + 1]
You can use Vectorize to create a vectorized version of a function. vfunc <- Vectorize(func) will allow you to call df$newcol <- vfunc(df$likes, df$dislikes) if your function takes two arguments and then return the result for each row in a vector that's assigned to a new column.
My first question here and I am not very experienced, however I hope this question is easy enough to answer since I only want to know if what I describe in the title is possible.
I have multiple dataframes taken from online capacity tests participants did.
For all Items I have response, score, and durationvariables among others.
Now I want to delete rows where all responsevariables are NA. So I can't just use a command to delete rows with where all is NA but there are also to many columns to do it by hand. And I also want to keep the dataframe together while doing it in order to really drop the complete rows, so just extracting all responsevariables doesn't sound like a good option.
However, besides a 3digit number based on the specific items the responsevariablenames are basically the same.
So instead of writing a very long impractical line mentioning all responsevariables and to drop the row if they all contain NA is there a way to not use the full anme of a variable but only use the end of the name for example so R checks the condition for all variables ending that way?
simplified e.g: instead of
newdf <- olddf[!(olddf$item123response != NA & olddf$item131response != NA & etc),]
Can I just do something like newdf <- olddf[!(olddf$xxxresponse != NA),] ?
I tried to google an answer but I didn't know how to frame my question effectively.
Thanks in advance!
Try This
newdf <- olddf[complete.cases(olddf[, grep('response', names(olddf))]), ]
I am new to 'R' and 'Stackoverflow' so forgive me for the incredibly basic question. I'm trying to find the 'index' of the first female in my dataset.
Code Snapshot
My overall dataset is called 'bike', so first I thought it would be a good idea to assign a new vector of just the genders...
bike$genders
Then I tried using the function:
match(1, genders)
match(F, genders)
Neither of which worked! I know this is and should be relatively simple but I'm just starting out so I really appreciate your help.
Probably the most direct method would be to use
match("F", bike[,"genders"] which will return the index of the first match.
If you want to know the rows#, this should give you the rows, with their numbers printed to the screen, and you will see the index for rows with it.
bike[bike$gender=="F",]
and if you only want the row numbers to set to a vector
rnam<-row.names(bike[bike$gender=="F",])
I have been trying to figure out how to apply the apply functions plyr is out there. I will learn that later. But, I need help. I can get output with actually typing the object name in, but I am trying to loop a list through it. The code is as follows:
list<-noquote(c("T","AAVL"))
lapply(list,function(i) xts(l.df$i[,-1:-5],order.by=as.POSIXct(rownames(l.df$i))))
If I just do xts(l.df$T[,-1:-5],order.by=as.POSIXct(rownames(l.df$T))
I get the xts file that I need. Could someone please help me loop the names without quotes into the lapply(), so that I could have this work for numerous elements in my list? Thank you!
There are a number of ways to subset a list in R. See https://ramnathv.github.io/pycon2014-r/learn/subsetting.html or http://adv-r.had.co.nz/Subsetting.html for more detailed discussion.
However, in your case the issue is that the dollar operator $ takes a fixed string rather than a variable name. So myList[["item"]] and myList$item are equivalent. In the example you gave, you're trying to find the member of the data.frame called "i", not the one referenced by the variable i. The noquote class you used purely affects printing of a character vector; it has no effect on subsetting.
The version of your code that works doesn't work as you explain in your comment. It works because you're now subsetting the column whose name is stored in i not the one called "i".
I want to add 106 new columns to a dataframe that are the length of the df ofcourse and filled with zeros (0). How would I loop over i in this case:
geo <- unique(df$geo)
geo
[1] "AL" "AT1" "AT2" "AT3" "BE1" "BE2" "BE3"
for(i in geo) {
df$i <- v(0,length(df)
}
Emil Krabbe 2 mins ago Edit
I want to create a column that codes for whether patients have had a comorbid diagnosis of depression or not. Problem is, the diagnosis can be recorded in one of 4 columns:
ComorbidDiagnosis;
OtherDiagnosis;
DischargeDiagnosis;
OtherDischargeDiagnosis.
I've been using
levels(dataframe$ynDepression)[levels(dataframe$ComorbidDiagnosis)=="Depression"]<-"Yes"
for all 4 columns but I don't know how to code those who don't have a diagnosis in any of the columns. I tried:
levels(dataframe$ynDepression)[levels(dataframe$DischOtherDiagnosis &
dataframe$OtherDiagnosis &
dataframe$ComorbidDiagnosis &
dataframe$DischComorbidDiagnosis)==""]<-"No"
I also tried using && instead but it didn't work. Am I missing something?
Thanks in advance!
Edit: I tried uploading an image of some example data but I don't have enough reputations to upload images yet. I'll try to put an example here but might not work:
Patient ID PrimaryDiagnosis OtherDiagnosis ComorbidDiagnosis
_________AN__________Depression
_________AN
_________AN__________Depression______PTSD
_________AN_________________________Depression
What's inside the [] must be (transformable to) a boolean for the subset to work. For example:
x<-1:5
x[x>3]
#4 5
x>3
# F F F T T
works because the condition is a boolean vector. Sometimes, the booleanship can be implicite, like in dataframe[,"var"] which means dataframe[,colnames(dataframe)=="var"] but R must be able to make it a boolean somehow.
EDIT : As pointed out by beginneR, you can also subset with something like df[,c(1,3)], which is numeric but works the same way as df[,"var"]. I like to see that kind of subset as implicit booleans as it enables a yes/no choice but you may very well not agree and only consider that they enable R to select columns and rows.
In your case, the conditions you use are invalid (dataframe$OtherDiagnosisfor example).
You would need something like rowSums(df[,c("var1","var2","var3")]=="")==3, which is a valid condition.