Adding a column with an if else statement - r

I'm trying to cut numbers into categories to create a new column. Basically, trying to create a letter grade ("A", "B", "C", "D", "F") from scores.
I have reproduced a similar data frame to the one I'm having trouble with in the following code.
df <- tibble(score = rnorm(20:100, n = 150))
The code I wrote to add the grade column looks like this:
df_with_grade <- df %>%
mutate(Grade = if (score >= 90) {
"A"
} else if (score >= 80){
"B"
} else if (score >= 70){
"C"
} else if (score >= 60){
"D"
} else {
"F"
}
)
The code executes with a warning:
Warning messages:
1: In if (score >= 90) { :
the condition has length > 1 and only the first element will be used
2: In if (score >= 80) { :
the condition has length > 1 and only the first element will be used
3: In if (score >= 70) { :
the condition has length > 1 and only the first element will be used
4: In if (score >= 60) { :
the condition has length > 1 and only the first element will be used
The result is, all scores are assigned an "F"

How about
cut(df$score,breaks=c(0,6:10)*10,labels=rev(LETTERS[c(1:4,6)]))
? rev(LETTERS[c(1:4,6)]) might be too clever and doesn't save that many characters over c("F","D","C","B","A") ...

as suggested in the comments you can use case_when:
df_with_grade <- df %>%
mutate(Grade = case_when(score >= 90 ~ "A",
score >= 80 ~ "B",
score >= 70 ~ "C",
score >= 60 ~ "D",
TRUE ~ "F"))

You cannot use ifelse, it only works with binary conditions. Use cut like below,
df$Grade = cut(df$score,
breaks=c(0,60,70,80,90,100),
label=c("F","D","C","B","A"),
include.lowest =TRUE)

Just to show you can use ifelse.
df_with_grade <- df %>%
mutate(Grade =
ifelse(score>= 90, "A",
ifelse(score>=80, "B",
ifelse(score>=70, "C",
ifelse(score>=60, "D",
"F"))))
)

Related

For Loop with If/Else to create a new column in a df

I'm student trying to learn R... and have spent hours trying to figure this out but have so far failed. maybe I'm going about it the wrong way, or don't know something basic.
I have data with student number, and module results - the results are in numeric form, and I want to change the result to the grade - A, B, C etc. I have managed to create a loop that will print the grade but can't figure out how to put it in the dataframe.
The dataset I have is quite big, so I have created some dummy data for the example below, the code runs, and doesn't give me any errors but it doesn't replace the number with the letter grade:
`Result <- c(50,67,89,77,65,66,70,73,69,80)
for (i in Result){
if (i < 16.67) {
print ("G+")
i <- "G+"
} else if (i < 26.67) {
print ("F+")
i <- "F+"
} else if (i < 36.67) {
print ("E+")
i <- "E+"
} else if (i < 40) {
print ("D-")
i <- "D+"
}else if (i < 43.33) {
print ("D")
i <- "D"
}else if (i < 46.67) {
print ("D+")
i <- "D+"
}else if (i < 50) {
print ("C-")
i <- "C-"
}else if (i < 53.33) {
print ("D")
i <- "D"
}else if (i < 56.67) {
print ("D+")
i <- "D+"
}else if (i < 60) {
print ("B-")
i <- "B-"
}else if (i < 63.33) {
print ("B")
i <- "B"
}else if (i < 66.67) {
print ("B+")
i <- "B+"
}else if (i < 70) {
print ("A-")
i <- "A-"
}else if (i < 73.33) {
print ("A")
i <- "A"
}else if (i < 100) {
print ("A+")
i <- "A+"
}
}
# result: [1] "D"
[1] "A-"
[1] "A+"
[1] "A+"
[1] "B+"
[1] "B+"
[1] "A"
[1] "A"
[1] "A-"
[1] "A+"` `
Any advice would be greatly appreciated.
many thanks,
El.
Put your example data in a data.frame:
df <- data.frame( result = c(50,67,89,77,65,66,70,73,69,80) )
Then use cut() to get the grades in a new column of that data.frame:
df$grade <- cut(df$result,
breaks = c(0, 16.67, 26.67, 36.67, 40, 43.33, 46.67, 50, 53.33, 56.67, 60, 63.33, 66.67, 70, 73.33, 100),
labels = c("G+", "F+", "E+", "D-", "D", "D+", "C-", "C", "C+", "B-", "B", "B+", "A-", "A", "A+"))
Print the result to check:
df
result grade
1 50 C-
2 67 A-
3 89 A+
4 77 A+
5 65 B+
6 66 B+
7 70 A-
8 73 A
9 69 A-
10 80 A+
Notice that (1) it's better to save the results into a data.frame than to simply print them, and (2) many things can be done better/quicker in R if you don't loop; instead use R's vectorized functions (like cut!).

Why do my function return character(0) even though there is no error in the code?

I am seeking help on this function I have created. The purpose of this function: First, I want to extract a column from a data frame and arrange it in descending order. Then, I rank each element by "H", "M" and "L". I want to rank them such as the first 33% of the items should have the tag "H" and the last 33% of the items are tagged as "L". The rest should be tagged as "M".
This is the code:
ranking_prod <- function(data, column) {
data <- arrange(data, desc(column))
size <- length(data$column)
first_third <- data$column[round(size / 3)]
last_third <- data$column[round(size - (size / 3))]
case_when(data$column > first_third ~ "H",
data$column < last_third ~ "L",
TRUE ~ "M")
}
However, when I apply this function to the following data frame:
> one <- c("a", "b", "c", "d", "e")
> two <- c(1, 2, 2, 1, 5)
> three <- c(2, 2, 2, 4, 5)
> dataframe <- data.frame(one, two, three)
It returns:
> rank_volume(dataframe, two)
character(0)
Where is the error in the code? Why is it displaying character(0) as the results?
We can use [[ instead of $ and also as we are passing unquoted argument, convert to string. As we are converting to symbol with ensym, the input can either be unquoted or quoted
ranking_prod <- function(data, column) {
column <- rlang::ensym(column)
colstr <- rlang::as_string(column)
data <- dplyr::arrange(data, desc(!!column))
size <- length(data[[colstr]])
first_third <- data[[colstr]][round(size / 3)]
last_third <- data[[colstr]][round(size - (size / 3))]
dplyr::case_when(data[[colstr]] > first_third ~ "H",
data[[colstr]] < last_third ~ "L",
TRUE ~ "M")
}
-testing
ranking_prod(dataframe, two)
#[1] "H" "M" "M" "L" "L"
ranking_prod(dataframe, 'two')
#[1] "H" "M" "M" "L" "L"

Error creating a vector from a self-made function

Trying to make my problem reproducible, I have the following vector:
trialvector <- as.vector(c("K", "K", "m", "m", "K"))
And this function to try to convert this vector into one which transforms "K" into a numeric 3 and "m" into a numeric 6, I want to assign this vector to a variable called multiplier:
Expcalc <- function(vector) {
multiplier <<- vector(mode = "numeric", length = length(vector))
for (i in seq_along(vector)) {
if (vector[i] == "K") {
multiplier[i] <- 3
} else if (vector[i] == "M" | i == "m") {
multiplier[i] <- 6
} else {
multiplier[i] <- 0
}
}
}
Instead of getting the output I want (a Vector of 6 and/or 3 depending on which character was in trialvector, I get a vector full of zeros. and this error:
Warning messages:
1: In Expcalc(trialvector) : NAs introduced by coercion
2: In Expcalc(trialvector) : NAs introduced by coercion
What am I doing wrong?
trialvector <- as.vector(c("K", "K", "m", "m", "K"))
trailvector
[1] "K" "K" "m" "m" "K"
Expcalc <- function(vector) {
multiplier <- as.vector(x = c(), mode = "numeric")
for (i in vector) {
if (i == "K") {
multiplier <- append(multiplier, 3)
} else if (i == "M" | i == "m") {
multiplier <- append(multiplier, 6)
} else {
multiplier <- append(multiplier, 0)
}
}
return(multiplier)
}
trailvector <- Expcalc(trialvector)
trailvector
[1] 3 3 6 6 3
I switched the for loop and then just appended the new values into the new vector. Output matches what you are looking for.

How to print outputs from a loop on one line?

I have to make a grade calculator in r which can convert numerical grades into letter grades. Here is the code I came up with:
numGrades<-(c(66,02,99,59,82))
for(i in 1:length(numGrades)) {
if (numGrades[i]>=90){
print("A")
} else if (numGrades[i]>=80){
print("B")
} else if (numGrades[i]>=70){
print("C")
} else if (numGrades[i]>=60){
print("D")
} else {
print("F")}
}
I can't find a way to integrate the cat or print(c()) functions so that it prints on one line rather than getting:
[1] "D"`
[1] "F"`
[1] "A"
[1] "F"
[1] "B"
If anyone has any ideas it would be greatly appreciated!
I would simply use paste to join all elements of a 'graded' list. Hope this helps.
numGrades = graded = (c(66,02,99,59,82))
for(i in 1:length(numGrades)) {
if (numGrades[i]>=90){
graded[i] = "A"
} else if (numGrades[i]>=80){
graded[i] = "B"
} else if (numGrades[i]>=70){
graded[i] = "C"
} else if (numGrades[i]>=60){
graded[i] = "E"
} else {
graded[i] = "F"}
}
print(paste(graded))
This gives:
> print(paste(graded))
[1] "E" "F" "A" "F" "B"
why the cat is not working?
numGrades<-(c(66,02,99,59,82))
for(i in 1:length(numGrades)) {
if (numGrades[i]>=90){
cat("A ")
} else if (numGrades[i]>=80){
cat("B ")
} else if (numGrades[i]>=70){
cat("C ")
} else if (numGrades[i]>=60){
cat("D ")
} else {
cat("F ")}
}
With many tasks in R it’s better to do this using vectorised functions rather than loops. Here’s two ways of doing what you want, one using base R and the other dplyr::case_when. Note that cut returns a factor but you can always use as.character.
numGrades <- c(66,02,99,59,82)
letGrades <- cut(
numGrades,
breaks = c(-Inf, 6:9, Inf) * 10,
labels = LETTERS[c(6, 4:1)],
right = FALSE
)
letGrades
library(dplyr)
letGrades <- case_when(
numGrades >= 90 ~ "A",
numGrades >= 80 ~ "B",
numGrades >= 70 ~ "C",
numGrades >= 60 ~ "D",
TRUE ~ "F"
)
letGrades
Just for the record, there's no need to use a for loop, you can use a nested ifelse
> graded2 <- ifelse(numGrades>=90, "A",
ifelse(numGrades >= 80 & numGrades < 90, "B",
ifelse(numGrades >= 70 & numGrades < 80, "C",
ifelse(numGrades >= 60 & numGrades < 70, "E", "F"))))
> graded2
[1] "E" "F" "A" "F" "B"

Set a level of a factor to be the last

I know that the function relevel sets an specified level to be the first. I would like to know if there is a built-in function that sets an specified level to be the last. If not, what is an efficient way to write such a function?
The package forcats has a function that does this neatly.
f <- gl(2, 1, labels = c("b", "a"))
forcats::fct_relevel(f, "b", after = Inf)
#> [1] b a
#> Levels: a b
There is not a built-in function. You could do it like this:
lastlevel = function(f, last) {
if (!is.factor(f)) stop("f must be a factor")
orig_levels = levels(f)
if (! last %in% orig_levels) stop("last must be a level of f")
new_levels = c(setdiff(orig_levels, last), last)
factor(f, levels = new_levels)
}
x = factor(c("a", "b", "c"))
> lastlevel(x, "a")
[1] a b c
Levels: b c a
> lastlevel(x, "b")
[1] a b c
Levels: a c b
> lastlevel(x, "c")
[1] a b c
Levels: a b c
> lastlevel(x, "d")
Error in lastlevel(x, "d") : last must be a level of f
I feel a little silly because I just wrote that out, when I could have made a tiny modification to stats:::relevel.factor. A solution adapted from relevel would look like this:
lastlevel = function (f, last, ...) {
if (!is.factor(f)) stop("f must be a factor")
lev <- levels(f)
if (length(last) != 1L)
stop("'last' must be of length one")
if (is.character(last))
last <- match(last, lev)
if (is.na(last))
stop("'last' must be an existing level")
nlev <- length(lev)
if (last < 1 || last > nlev)
stop(gettextf("last = %d must be in 1L:%d", last, nlev),
domain = NA)
factor(f, levels = lev[c(last, seq_along(lev)[-last])])
}
It checks a few more inputs and also accepts a numeric (e.g., last = 2 would move the second level to the last).

Resources