Assign a Value based on the numbers in a separate columns in R - r

So I kind of already know the possible solution but I don't know how to exactly go about it so please give me a bit of grace here.
I have a dataset for youtube trends that I want to read the values from two columns (likes and dislikes) and based off their contents I want an entry to be made in the new column. If the likes are higher than the dislikes I want it to be said as a 'positive' video and if it has more dislikes it should be 'negative'.
I'm primarily not sure how to go about this since most of the previous asks are based off of one column rather than two. I know some mentioned using cut, but would it still work the same?
all help is appreciated, thanks.

You can use a simple ifelse :
df$new_col <- ifelse(df$likes > df$dislikes, 'positive', 'negative')
This can also be written without ifelse as :
df$new_col <- c('negative', 'positive')[as.integer(df$likes > df$dislikes) + 1]

You can use Vectorize to create a vectorized version of a function. vfunc <- Vectorize(func) will allow you to call df$newcol <- vfunc(df$likes, df$dislikes) if your function takes two arguments and then return the result for each row in a vector that's assigned to a new column.

Related

Adressing columns based on only parts of the name in order to simplify lines

My first question here and I am not very experienced, however I hope this question is easy enough to answer since I only want to know if what I describe in the title is possible.
I have multiple dataframes taken from online capacity tests participants did.
For all Items I have response, score, and durationvariables among others.
Now I want to delete rows where all responsevariables are NA. So I can't just use a command to delete rows with where all is NA but there are also to many columns to do it by hand. And I also want to keep the dataframe together while doing it in order to really drop the complete rows, so just extracting all responsevariables doesn't sound like a good option.
However, besides a 3digit number based on the specific items the responsevariablenames are basically the same.
So instead of writing a very long impractical line mentioning all responsevariables and to drop the row if they all contain NA is there a way to not use the full anme of a variable but only use the end of the name for example so R checks the condition for all variables ending that way?
simplified e.g: instead of
newdf <- olddf[!(olddf$item123response != NA & olddf$item131response != NA & etc),]
Can I just do something like newdf <- olddf[!(olddf$xxxresponse != NA),] ?
I tried to google an answer but I didn't know how to frame my question effectively.
Thanks in advance!
Try This
newdf <- olddf[complete.cases(olddf[, grep('response', names(olddf))]), ]

In R, dataframe[-NULL] returns an empty dataframe

I'm creating some routines in R to ease model creation and to distinguish several groups based on several parameters (ex: original watches VS fakes ones using watches common attributes).
During the proccess, I keep track of the potential excluded lines in a vector (empty at first), and I get ride of them at the end using:
model$var <- raw_data[-line_excluded,]
The problem is that if line_excluded is c() (ndlr no line exlcuded), model$var is an empty dataframe then in that case I want all the lines of the dataframe.
The only solution I have think about is the us of
if (!is.null(line_excluded)){
model$var <- raw_data[-line_excluded,]}
But that's not really pretty, and I have several tracking variables as line_excluded which need that.
Thanks for the help
You can make it in another way using setdiff(), which can deal with empty line_excluded i.e.,
model$var <- raw_data[setdiff(seq(nrow(raw_data)),line_excluded),]
You can also try:
model$var <- raw_data[!(1:nrow(raw_data) %in% line_excluded),]
This is similar to what #THomasIsCoding suggested, you look for the row numbers that are not in your line_excluded..

Selecting rows of a dataframe fullfilling an specific condition in R

First of all, I have to say that this is my first post. Despite of having look for the answer using the search toolbox it might be possible that I passed over the right topic without realizing myself, so just in case sorry for that.
Having said that, my problem is the following one:
I have a data table composed by several columns.
I have to select the
rows that are fullfilling one specific condition ex.
which(DT_$var>value, arr.ind = T)) or which(DT_$var>value &&
DT_$var2>value2, arr.ind = T))
I have to keep these columns in a new
data frame.
My approach was the following one but it is not working, probably because I did not understand the loops correctly:
while (i in nrow(DT)) {
if(DT$var[i]>value){
DT_aux[i]=DT[i]
i<-i+1
}
}
Error in if (DT$value[i] > 45) { : argument is of length zero
I hope that you can help me
There is a very good chance that you want to use dplyr and it's filter function. It would work like this:
library(dplyr)
DT %>% filter(var>value && var2>value2)
You don't need to use DT$var and DT$var2 here; dplyr knows what you mean when you refer to variables.
You can, of course, do the same with base R, but this kind of work is exactly what dplyr was made for, so sticking with base R, in this case, is just masochism.

Put a variable into an object in R

Sorry about the title. I'm actually having a hard time figuring out how to even phrase the question, which is why I can't just google it.
I want to get information from a data frame in R using a variable as the column title.
test = data.frame(season=c('winter','summer'), temp=c('cold','hot'))
what.season = 'winter'
test$what.season
The third line obviously doesn't work, but what I am trying to pass it is the value of what.season so that it reads test$winter and returns 'cold'
Edit for future readers: I'm tired and I phrased it wrong, but the correct answer got at what I was trying to do.
Here is how I would do it
test[test$season == "winter", ]$temp
The $ operator at the end selects to column of interest while the logical operator == selects the row of interest
You can also use subset function
> subset(test, season==what.season, select=temp)
temp
1 cold
You can use %in% command
test$temp[test$season%in%what.season]
test$season%in%what.season will give a logical output after searching all rows (of the column test$season) for the values of what.season (winter). You can then use the logical output to filter out values from the column test$temp.
The shortest way (that I know of) would be test[test$season==what.season, 'temp'].

Cumulative sum for n rows

I have been trying to produce a command in R that allows me to produce a new vector where each row is the sum of 25 rows from a previous vector.
I've tried making a function to do this, this allows me to produce a result for one data point.
I shall put where I haver got to; I realise this is probably a fairly basic question but it is one I have been struggling with... any help would be greatly appreciated;
example<-c(1;200)
fun.1<-function(x)
{sum(x[1:25])}
checklist<-sapply(check,FUN=fun.1)
This then supplies me with a vector of length 200 where all values are NA.
Can anybody help at all?
Your example is a bit noisy (e.g., c(1;200) has no meaning, probably you want 1:200 there, or, if you would like to have a list of lists then something like rep, there is no check variable, it should have been example, etc.).
Here's the code what I think you need probably (as far as I was able to understand it):
x <- rep(list(1:200), 5)
f <- function(y) {y[1:20]}
sapply(x, f)
Next time please be more specific, try out the code you post as an example before submitting a question.

Resources