I need to create a new variable and assign values to the row based on another categorical variable. The data table looks like this
Specifically, I want to create a variable called channel_num. If the strings in channelGrouping equal to "Direct", "Display" and "Paid Search", I will assign 0 to this row; if they equal to "Organic Search" and "Social", I will assign 1.
Assuming your datatable is named df:
df$channel_num <- ifelse(df$channelGrouping %in% c("Direct","Display","Paid Search"), 0, ifelse(df$channelGrouping %in% c("Organic Search","Social"), 1, NA))
Related
If I have a list with completely unpredictable and seemingly nonsensical nesting like so:
weird_nested_structure <- (list(
Record = "First Record",
Pets = list(Rabbit = "True",
Gerbil = "True"),
Record = "Second Record",
Pets = list(Pets2 = list(Rabbit = "True")),
Record = "Third Record",
Rabbit = list(Rabbit = "True"),
Record = "Fourth Record",
Pets = list(Dog = "True")))
How can I extract only the records that have an element named "Rabbit"?
This would return records 1, 2 and 3, but not 4 because the only elements in record 4 are named "Pets" and "Dog"
To be clear, I want to filter this list down to only the records that contain a name/variable called "Rabbit", regardless of which level of nesting this variable happens to be in. So the ideal solution will return a list of records 1, 2 and 3 but not 4 from the above nested list.
Is this possible in R?
One way in base R would be to write a recursive function which checks for names at every level if any name in the list contains the name "Rabbit"
recursive_fun <- function(x) {
if (any(names(x) == "Rabbit"))
return(TRUE)
if (is.list(x))
recursive_fun(unlist(x, recursive = FALSE))
else
return(FALSE)
}
and then use sapply to pass it to each list and subset the ones which return TRUE
weird_nested_structure[sapply(weird_nested_structure, recursive_fun)]
I am trying to create a subset from data frame using csv file. The filter to be applied is a character. Here is the code written:
project_subset = subset (x = fed_stimulus, subset = 'Project Status' == "Completed 50% or more", select = 'Project Name')
The code does not return any error but also does not create a subset. Please help
The reason no subset is being created is that with the line
'Project Status' == "Completed 50% or more"
you are just comparing two strings which are not equal. This will always be FALSE, and subset looks for TRUE cases on which to filter.
What you need to do instead is unquote your column name, or pass it through as a data reference.
#unquoted variable name
project_subset = subset (x = fed_stimulus, subset = Project Status == "Completed 50% or more", select = 'Project Name')
or
# quoted variable name but used as a column reference from your original data
project_subset = subset (x = fed_stimulus, subset = fed_stimulus[ ,"Project Status"] == "Completed 50% or more", select = 'Project Name')
When the column name has a space you must surround it in back-ticks `:
project_subset = subset (x = fed_stimulus, subset = `Project Status` == "Completed 50% or more", select = 'Project Name')
However, as the documentation states, subset() is meant as a convenience function for use interactively, so if you intend to use this in a script it is better to use [ like this:
project_subset = fed_stimulus[fed_stimulus$`Project Status` == "Completed 50% or more", "Project Name"]
I have a dataframe consisting of twitter data (ID number, follower_count, clean_text). I am interested in dividing my dataframe into two subsets: one where keywords are present, and one where keywords are not present.
For example, I have the keywords stored as a value:
KeyWords <- c("abandon*", "abuse*", "agitat*" ,"attack*", "bad", "brutal*",
"care", "caring", "cheat*", "compassion*", "cruel*", "damag*",
"damn*", "destroy*", "devil*", "devot*", "disgust*", "envy*",
"evil*", "faith*","fault*", "fight*", "forbid*", "good", "goodness",
"greed*", "gross*", "hate", "heaven*", "hell", "hero*", "honest*",
"honor*", "hurt*","ideal*", "immoral*", "kill*", "liar*","loyal*",
"murder*", "offend*", "pain", "peace*","protest", "punish*","rebel*",
"respect", "revenge*", "ruin*", "safe*", "save", "secur*", "shame*",
"sin", "sinister", "sins", "slut*", "spite*", "steal*", "victim*",
"vile", "virtue*", "war", "warring", "wars", "whore*", "wicked*",
"wrong*", "benefit*", "harm*", "suffer*","value*") %>% paste0(collapse="|")
And I have made a subset (Data2) of my original dataframe (Data1) where Data2 consists of only the observations in Data1 where one or more of the keywords are present in the clean_text column. Like so:
Data2 <- Data1[with(Data1, grepl(paste0("\\b(?:",paste(KeyWords, collapse="|"),")\\b"), clean_text)),]
Now, I want to make Data3 where only the observations in Data1 where the keywords are not present in the clean_text column. Is there a way to do the inverse of my keyword subsetting above? Or, can I substract my Data2 from Data1 to get my new subset, Data3?
The "inverse" operator in R is ! - this will flip TRUE to FALSE and vice versa. So, with your example, what you're looking for is
Data3 <- Data1[!with(Data1, grepl(paste0("\\b(?:",paste(KeyWords, collapse="|"),")\\b"), clean_text)),]
I have a data frame which I manipulate and whose final form sometimes changes shape depending on whether or not certain values exist.
Once the data frame is processed and in its final form, I wish to reorder the columns into a particular order before writing to .csv.
However, because some of the columns won't always exist, I want to know if it is possible to check which columns exist and of the ones that do, I want them to follow a particular format and the ones that don't to be created and populated with zeroes.
I have one solution which I think is very clunky and could probably be improved significantly: in this example, I am checking to see if the column taken_offline exists within my dataset. If so, I want the columns to be reordered in a certain way with this column included and if not, I want taken_offline to be created and populated with zeroes whilst still being reordered in the same way.
Ideally, I want to be able to say "here is the order that the columns should be presented in. If the column doesn't exist, I want to create it and populate it with zeroes".
I appreciate that a good way might be to take a list of column names from my data frame (users) and to then check the column names against the desired column order (listed below). However, I am unsure how to implement this idea.
How can I do it?
The output columns should be in this order:
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
My code (checking for the existence of taken_offline):
if("taken_offline" %in% colnames(users_final)){
users_final <- users_final[, c(
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
)]
print("Taken offline occurrences.")
} else {
users_final$taken_offline <- 0
users_final <- users_final[, c(
"date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined"
)]
print("No taken offline occurrences.")
}
Simpler version of the other answers. Thanks to R's vector recycling, this can be done in two quick lines.
Give a name to your vector of columns, say all_cols. Then, calling your data dd
# add missing columns and set them equal to 0
dd[setdiff(all_cols, names(dd)] = 0
# put columns in desired order
dd = dd[all_cols]
Worked example:
all_cols = c("date",
"storeName",
"firstName",
"lastName",
"conversation-request",
"conversation-accepted",
"acceptance_rate",
"conversation-missed",
"taken_offline",
"conversation-already-accepted",
"total_missed",
"conversation-declined")
dd = data.frame("date" = "yesterday",
"storeName" = "Kwik-E-Mart",
"firstName" = "Apu")
dd[setdiff(all_cols, names(dd))] = 0
dd = dd[all_cols]
dd
# date storeName firstName lastName conversation-request conversation-accepted acceptance_rate
# 1 yesterday Kwik-E-Mart Apu 0 0 0 0
# conversation-missed taken_offline conversation-already-accepted total_missed conversation-declined
# 1 0 0 0 0 0
If you have a named vector, say varname for the name and order of the data s you would like then you could use:
var_not_present <- varname[which(!(varname %in% names(s)))]
h <- data.frame(matrix(0, ncol = length(var_not_present), nrow = dim(s)[1]))
colnames(h) <- var_not_present
s_updated <- cbind(s,h)
s_updated <- s_updated[varname]
I'm trying to delete a row from a data frame in which each row has a name. I cannot use indexes to delete the rows, only it's name. I have this dataframe:
DF<- data.frame('2014' = c(30,20,4, 50), '2015' = c(25,40,6, 65), row.names = c("mobile login", "computer login","errors", "total login"))
I've tried
DF["mobile login",] <- NULL
and
DF <- DF[-"mobile login",]
and more combinations with no results.
What can I do? Thanks
PS: The last row is the sum of the first two (there are other in the real DF, that's only an example), and once they are added, I don't need them, only the result, the "total login" value.
Use %in% along with an appropriate subset of your data frame. To remove the rows named errors and mobile login you can use the following code:
row.names.remove <- c("errors", "mobile login")
> DF[!(row.names(DF) %in% row.names.remove), ]
X2014 X2015
computer login 20 40
total login 50 65