Adressing columns based on only parts of the name in order to simplify lines - r

My first question here and I am not very experienced, however I hope this question is easy enough to answer since I only want to know if what I describe in the title is possible.
I have multiple dataframes taken from online capacity tests participants did.
For all Items I have response, score, and durationvariables among others.
Now I want to delete rows where all responsevariables are NA. So I can't just use a command to delete rows with where all is NA but there are also to many columns to do it by hand. And I also want to keep the dataframe together while doing it in order to really drop the complete rows, so just extracting all responsevariables doesn't sound like a good option.
However, besides a 3digit number based on the specific items the responsevariablenames are basically the same.
So instead of writing a very long impractical line mentioning all responsevariables and to drop the row if they all contain NA is there a way to not use the full anme of a variable but only use the end of the name for example so R checks the condition for all variables ending that way?
simplified e.g: instead of
newdf <- olddf[!(olddf$item123response != NA & olddf$item131response != NA & etc),]
Can I just do something like newdf <- olddf[!(olddf$xxxresponse != NA),] ?
I tried to google an answer but I didn't know how to frame my question effectively.
Thanks in advance!

Try This
newdf <- olddf[complete.cases(olddf[, grep('response', names(olddf))]), ]

Related

Assign a Value based on the numbers in a separate columns in R

So I kind of already know the possible solution but I don't know how to exactly go about it so please give me a bit of grace here.
I have a dataset for youtube trends that I want to read the values from two columns (likes and dislikes) and based off their contents I want an entry to be made in the new column. If the likes are higher than the dislikes I want it to be said as a 'positive' video and if it has more dislikes it should be 'negative'.
I'm primarily not sure how to go about this since most of the previous asks are based off of one column rather than two. I know some mentioned using cut, but would it still work the same?
all help is appreciated, thanks.
You can use a simple ifelse :
df$new_col <- ifelse(df$likes > df$dislikes, 'positive', 'negative')
This can also be written without ifelse as :
df$new_col <- c('negative', 'positive')[as.integer(df$likes > df$dislikes) + 1]
You can use Vectorize to create a vectorized version of a function. vfunc <- Vectorize(func) will allow you to call df$newcol <- vfunc(df$likes, df$dislikes) if your function takes two arguments and then return the result for each row in a vector that's assigned to a new column.

In R, dataframe[-NULL] returns an empty dataframe

I'm creating some routines in R to ease model creation and to distinguish several groups based on several parameters (ex: original watches VS fakes ones using watches common attributes).
During the proccess, I keep track of the potential excluded lines in a vector (empty at first), and I get ride of them at the end using:
model$var <- raw_data[-line_excluded,]
The problem is that if line_excluded is c() (ndlr no line exlcuded), model$var is an empty dataframe then in that case I want all the lines of the dataframe.
The only solution I have think about is the us of
if (!is.null(line_excluded)){
model$var <- raw_data[-line_excluded,]}
But that's not really pretty, and I have several tracking variables as line_excluded which need that.
Thanks for the help
You can make it in another way using setdiff(), which can deal with empty line_excluded i.e.,
model$var <- raw_data[setdiff(seq(nrow(raw_data)),line_excluded),]
You can also try:
model$var <- raw_data[!(1:nrow(raw_data) %in% line_excluded),]
This is similar to what #THomasIsCoding suggested, you look for the row numbers that are not in your line_excluded..

How do I reference previous/next rows when iterating through a data frame in R?

I have a dataset that looks like this (I'm simplifying slightly here):
Column 1 has a user id
Column 2 has a url title
Column 3 has an actual url
The data is already ordered by user and time. So its User 1 and all the URLs they visited in ascending order of time and then User 2 and the URLs they visited in ascending order of time etc etc
What I'm trying to do is loop through the dataset and look for "triplets" where the first rows url doesn't contain my keyword (something like google or facebook or nytimes or whatever), the second rows url does contain my keyword, and the third row doesn't contain my keyword. Basically checking to see which websites users visited before and after any specific website.
I've figured out I can look for the keyword using:
if(length(grep("facebook",url)) > 0)
But I haven't been able to figure out how to loop through the code and achieve what I'm trying to do.
If you could break your response into two parts, I would really appreciate it:
Part 1: Is there any way to loop through a dataframe and have access to all the columns? I was able to work on a single column with this code:
new_data <- data.frame (url)
for (url in data$url)
if(length(grep("keyword",url)) > 0) {
new_data <- rbind(new_data,data.frame(url = url))
}
This approach is limited though because I can only reference a single column in my dataframe. Whats the better solution here? I tried:
for (row in data) and then referencing columns by row[column_number] and row['column_name'] to no avail
I also tried for (i in 1:nrow(data)) and then referencing columns using data[i,column_number] and that didn't work either (That should have worked right?) I figured if this method worked I could use i-1 and i+1 to access other rows! I know this isn't the traditional way of doing things in R, but if you could still offer an explanation on how to do it this way I would really appreciate it.
Part 2: How do I accomplish my actual goal, as stated earlier? I'd like to learn to do it the "R way"; I imagine its going to involve plyr or lapply, but I haven't managed to figure out how to use those functions even after extensive reading, let alone use them and include references to previous/next rows.
Thanks in advance for your help, any guidance is appreciated!
Use [-1]:
last <- nrow(df)
penu <- nrow(df) - 1
df$ContainsKeyword <- FALSE
df$ContainsKeyword[grep("keyword", df$url)] <- TRUE
df$TripletFound <- NA
for (i in 2:penu){
df$TripletFound[i] <- {df$ContainsKeyword[i-1] & df$ContainsKeyword[i+1]} & {!df$ContainsKeyword[i]}
}

How to subset (without filtering) multiple columns from a data frame in R

I'm sorry this may have been done to death, but all the answers I've found veer all over the map into extreme exotica. I can subset using [[]] (I've learned from stackoverflow that I'm not supposed to use subset() and similar for my scripts, since they're intended for interactive use) for a single column, but I can't figure out how to make the leap to more than one column. These two work, of course:
outcomeA <- outcome[['Hospital.Name']]
outcomeB <- outcome[['TX]]
But I've tried a dozen permutations to get both of those columns, like so:
outcomeC <- outcome[[c('Hospital.Name', 'TX')]] (gives "subscript out of bound")
outcomeC <- outcome[c('Hospital.Name', 'TX')] (gives "undefined columns selected")
etc, but they all fail. Can someone please put me out of my misery and help me select more than one column?
Thanks - Ed
Did you try this with a comma and single brackets
outcomeC <- outcome[,c('Hospital.Name', 'TX')]
Also you can only get column names that exist in your data. check them against:
names(outcome)

Ifelse in R or apply

I am attempting to make elements in the first column of my df null (no entry at all) if it is equal to the element in the same row in the second column. This is a very simple thing, but I haven't been able to find the answer in the message boards.
Below are two of my attempts:
ifelse(y2014[y2014[,1]==y2014[,2]],y2014[,1]=="",y2014[,1]==y2014[,1])
y2014$new=ifelse(y2014[,1]==y2014[,2],0,y2014[,1])
Both give the following error: "level sets of factors are different" I checked the number of levels in each and they're equal, though several cells are blank in column 2. Would an apply function work better for what I'm trying to accomplish?
Really appreciate your help for a newbie.
Two things, factors generally need to be converted to character prior to comparing, and you want to assign NA rather than 0 to the value.
Something like this might be better:
y2014$new <- y2014[,1]
y2014$new[as.character(y2014$new) == as.character(y2014[,2])] <- NA

Resources