New column / mutate based on existing column

New column / mutate based on existing column - r

I want to add a new column to a dataframe df based on a condition from the existing columns e.g.,
df$TScore = as.factor(0)
df$TScore =
if_else(df$test_score >= '8.0', 'high',
if_else(!is.na(df$test_score), 'low', 'NA'))
The problem I am facing is, for some cases TScore is what I would expect it to be i.e., 'high' when the score is 8 or greater but for some cases it is not correct.
Is there an error in the above code? There are lots of NAs in this data.
I am also struggling with how to write it using dplyr(). So far, I have written this:
df$TScore = df %>%
filter(test_score >= 8) %>%
mutate(TScore = 'high')
But as we would expect, the dimensions do not match. Following error is given:
Error in `$<-.data.frame`(`*tmp*`, appScore, value = list(cluster3 = c(1L, : replacement has 126 rows, data has 236
Any advice would be greatly appreciated.

We don't need to do the filter, insted can use ifelse or case_when
library(dplyr)
df <- df %>%
mutate(TScore = case_when(test_score >= 8 ~'high', TRUE ~ "low"))
if we need to avoid the assignment <, can use the compound operator (%<>% from magrittr
library(magrittr)
df %<>%
mutate(TScore = case_when(is.na(test_score) ~ NA_character_,
test_score >= 8 & !is.na(test_score) ~'high',
TRUE ~ "low"))
The error occurred because of assigning a filtered data.frame to a new column in the original dataset

Related

R how to add another column to a dataset based on 2 other columns

I have a data set of messages exchanged in an organization, I want to create another column based on case_when the sender_department == receiver_department, assign "intra" while if the sender_department != receiver_department, assign "inter".
I'm doing this to know the proportion of inter and intra departmental messages over the period.
I've use the code below
intra_inter_msg <- DF %>%
mutate(inter_intra = case_when(sender_department == receiver_department, ~"intra", ,
sender_department != receiver_department, ~"inter"))
and I got this error
Error in mutate():
! Problem while computing inter_intra = case_when(...).
Caused by error in case_when():
! Case 1 (sender_department == receiver_department) must be a two-sided formula, not a logical

I made a little example DF to test it:
require(dplyr)
DF = data.frame (sender_department = c("econ","math","history"),receiver_department = c("econ","history","math"))
DF
intra_inter_msg <- DF %>%
mutate(inter_intra = case_when(sender_department == receiver_department ~"intra",
sender_department != receiver_department ~"inter"))
intra_inter_msg

How can I add a column in R whose values reference a column in a different data frame?

So I have an R script that ranks college football teams. It outputs a rating and I want to take that rating from a different data frame and add it as a new column to a different data frame containing info from the upcoming week of games. Here's what I'm currently trying to do:
random_numbers <- rnorm(130, mean = mean_value, sd = sd_value)
sample_1 <- as.vector(sample(random_numbers, 1, replace = TRUE))
upcoming_games_df <- upcoming_games_df %>%
mutate(home_rating = case_when(home_team %in% Ratings$team ~ Ratings$Rating[Ratings$team == home_team]),
TRUE ~ sample_1)
sample_2 <- as.vector(sample(random_numbers, 1, replace = TRUE))
upcoming_games_df <- upcoming_games_df %>%
mutate(away_rating = case_when(away_team %in% PrevWeek_VoA$team ~ Ratings$Rating[Ratings$team == away_team]),
TRUE ~ sample_2)
I originally had the sample(random_numbers) inside of the mutate() function but I got error "must be a vector, not a formula object." So I moved it outside the mutate() function and added the as.vector(), but it still gave me the same error. I also got a warning about "longer object length is not a multiple of shorter object length". I don't know what to do now. The code above is the last thing I tried before coming here for help.

case_when requires all arguments to be of same length. sample_1 or sample_2 have a length of 1 and it can get recycled. (as.vector is not needed as rnorm returns a vector).
In addition, when we use ==, it is elementwise comparison and can be used only when the length of both the columns compared are same or one of them have a length of 1 (i.e. it gets recycled). Thus Ratings$team == home_team would be the cause of longer object length warning.
Instead of case_when, this maybe done with a join (assuming the 'team' column in 'Ratings' is not duplicated)
library(dplyr)
upcoming_games_df2 <- upcoming_games_df %>%
left_join(Ratings, by = c("home_team" = "team")) %>%
mutate(home_rating = coalesce(Rating, sample_1), team = NULL) %>%
left_join(PrevWeek_VoA, by = c("away_team" = "team")) %>%
mutate(away_rating = coalesce(Rating, sample_2))

Mutate to modify values and replace

Hi there I am trying to mutate values (e.g. changing kilograms to tonnes) and replace them in the original dataset but it doesn't seem to remain within the original dataset.
Here is a sample dataset for reference.
Country
Type
Quantity
A
Kilograms
23132
B
Kilograms
34235
C
Tonnes
700
library(dplyr)
df %>%
filter(Type == "Kilograms") %>%
group_by(Quantity) %>%
mutate(Quantity = Quantity /1000)
But I am not sure what to do the for next step, I tried the replace function but it didn't work.
Also, I plan to add a line at the end that changes all kilograms to tonnes, something like this:
df$Unit[df$Type == 'Kilograms'] <- 'Tonnes'

You can also use case_when() which is dplyr's equivalent to SQL's CASE WHEN. Basically it allows you to vectorize multiple if_else() statements. Below, the first condition is the if statement and then TRUE ~ is the else statement
df <- data.frame(Country = c('A', 'B', 'C'),
Type = c("Kilograms", "Kilograms", "Tonnes"),
Quantity = c(23132, 34235, 700))
df <- df %>%
mutate(Quantity = case_when(Type == 'Kilograms' ~ Quantity/1000,
TRUE ~ Quantity),
Type = case_when(Type == 'Kilograms' ~ 'Tonnes',
TRUE ~ 'Tonnes')
)

use ifelse function to change the value based on other condition. This function also works weel with tidyverse environment.
Don't forget to reassign the result to original variable since pipe operator does not change the input data
library(dplyr)
df = df %>% mutate(Quantity = ifelse(Type=="Kilograms",Quantity/1000,Quantity),
Type = ifelse(Type=='Kilograms','Tonnes',Type))

ifelse in a mutate function in r

I am trying to add a column with a condition using the mutate function in r, but keep getting an error. The code is straight from the teacher's lecture, but yet an error occurs. The LineItem column is a factor class, I am not sure if that make a difference.
Please advice on what I am missing.
Thank you,
Avi
df <- read.csv('ities_short.csv')
colSums(is.na(df))
sl <- str_length(df$LineItem)
avg <- mean(str_length(df$LineItem))
df <- df %>% mutate(LineItem_LongName = ifelse(sl > avg), 1, 0)
Error in ifelse(sl > avg) : argument "yes" is missing, with no default

You have placed ')' at wrong places. The general syntax for ifelse is:
ifelse(cond,value if true, value if false)
df <- read.csv('ities_short.csv')
colSums(is.na(df))
sl <- str_length(df$LineItem)
avg <- mean(str_length(df$LineItem))
df <- df %>% mutate(LineItem_LongName = ifelse(sl > avg, 1, 0))

#Nirbhay Singh answer is correct. However, if you compare two vectors, it's generally better to use dplyr::if_else because it is stricter regarding NA values :
df <- df %>% mutate(LineItem_LongName = if_else(sl > avg, 1, 0))
See the doc

Don't create separate objects and use it in dataframe, instead keep them in dataframe itself. You can remove the columns later which you don't need. Moreover, you can do this without ifelse.
library(dplyr)
library(stringr)
df %>%
mutate(temp = str_length(LineItem),
LineItem_LongName = as.integer(temp > mean(temp)))
Or in base R :
df$temp <- nchar(df$LineItem)
transform(df, LineItem_LongName = +(temp > mean(temp)))

Issues with replacing a subset of a data.frame using the R Package dplyr

I am trying to replace some filtered values of a data set. So far, I wrote this lines of code:
df %>%
filter(group1 == uniq[i]) %>%
mutate(values = ifelse(sum(values) < 1, 2, NA)),
where uniq is just a list containing variable names I want to focus on (and group1 and values are column names). This is actually working. However, it only outputs the altered filtered rows and does not replace anything in the data set df. Does anyone have an idea, where my mistake is? Thank you so much! The following code is to reproduce the example:
group1 <- c("A","A","A","B","B","C")
values <- c(0.6,0.3,0.1,0.2,0.8,0.9)
df = data.frame(group1, group2, values)
uniq <- unique(unlist(df$group1))
for (i in 1:length(uniq)){
df <- df %>%
filter(group1 == uniq[i]) %>%
mutate(values = ifelse(sum(values) < 1, 2, NA))
}
What I would like to get is that it leaves all values except the last one since it is one unique group (group1 == C) and 0.9 < 1. So I'd like to get the exact same data frame here except that 0.9 is replaced with NA. Moreover, would it be possible to just use if instead of ifelse?

dplyr won't create a new object unless you use an assignment operator (<-).
Compare
require(dplyr)
data(mtcars)
mtcars %>% filter(cyl == 4)
with
mtcars4 <- mtcars %>% filter(cyl == 4)
mtcars4
The data are the same, but in the second example the filtered data is stored in a new object mtcars4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

New column / mutate based on existing column - r

Related

R how to add another column to a dataset based on 2 other columns

How can I add a column in R whose values reference a column in a different data frame?

Mutate to modify values and replace

ifelse in a mutate function in r

Issues with replacing a subset of a data.frame using the R Package dplyr

Categories

Resources