Replacing multiple values in a column of a data frame based on condition - r

here is data set of weight
I have to replace values in weight column based on following conditon
if weight < 7 then replace values with a
if weight > = 7 and <8 replace values with b
if weight >= 8 replace with c

One option is ifelse
df1$New <- with(df1, ifelse(weight <7, "a", ifelse(weight >=7 & weight < 8, "b", "c")))
df1$New
#[1] "a" "c" "c" "a" "c" "b" "c"
or we can use cut if there are many groups
with(df1, as.character(cut(weight, breaks = c(-Inf, 7, 8, Inf),
labels = c('a', 'b', 'c'))))
#[1] "a" "c" "c" "a" "c" "b" "c"
data
set.seed(24)
df1 <- data.frame(weight = rnorm(7, 7, 3))

You could use dplyr as well to either replace the same column (give it the same name) or add (by giving it a new name). The code below replaces the existing "weight" column.
library(dplyr)
yourdata%>%mutate(
weight=ifelse(weight <7, "a", ifelse(weight >=7 & weight < 8, "b", "c")))
In future can you provide a reproducible example of your data that's not a jpeg: How to make a great R reproducible example?

Related

How to order vectors with priority layout?

Let's consider these vector of strings following:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
As you can see there are certain strings in this vector starting the same e.g. "B", "B_big".
What I want to end up with is a vector ordered in such layout that all strings with same starting should be next to each other. But order of letter should stay the same (that "B" should be first one, "C" second one and so on). Let me put an example to clarify it:
In simple words, I want to end up with vector:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
What I've done to achive this vector: I read from the left and I see "B" so I'm looking on all other vector which starts the same and put it to the right of "B". Then is "C", so I'm looking on all remaining strings and put all starting with "C" e.g. "C_small" to the right and so on.
I'm not sure how to do it. I'm almost sure that gsub function can be used to approach this result, however I'm not sure how to combine it with this searching and replacing. Could you please give me a hand doing so ?
Here's one option:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"
Use the "prefix" as factor levels and then order:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
If you are open for non-base alternatives, data.table::chgroup may be used, "groups together duplicated values but retains the group order (according the first appearance order of each group), efficiently":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
I suggest splitting the two parts of the text into separate dimensions. Then, define a clear rank order for the descriptive part of the name using a named character vector. From there you can reorder the input vector on the fly. Bundled as a function:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
Result
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"

Non duplicate remove subsetting [duplicate]

This question already has answers here:
"Set Difference" between two vectors with duplicate values
(4 answers)
Closed 2 years ago.
a <- c("A", "B", "C", "A", "A", "B")
b <- c("A", "C", "A")
I want to subset a wrt to b such that the following set is obtained:-
("B" "A" "B")
Tradition subsetting results in removal of all the "A"s and "C"s from set a.
It removes duplicates also. I don't want them to be remove. For ex:- Set b has 2 "A"s and 1 "C". So while subsetting a wrt b only two "A"s and one "C" should be removed from set a. And rest all the elements in a should remain even though they might be "A" or "C".
I just want to know if there is a way of doing this in R.
An easy option is to use vsetdiff from package vecsets, i.e.,
vecsets::vsetdiff(a,b)
such that
> vecsets::vsetdiff(a,b)
[1] "B" "A" "B"
Using tibble and dplyr, you can do:
enframe(a) %>%
transmute(name = value) %>%
group_by(name) %>%
mutate(ID = 1:n()) %>%
left_join(enframe(table(b)), by = c("name" = "name")) %>%
filter(ID > value | is.na(value)) %>%
pull(name)
[1] "B" "A" "B"
Here is a way to do this :
#Count occurrences of `a`
a_count <- table(a)
#Count occurrences of `b`
b_count <- table(b)
#Subtract the count present in b from a
a_count[names(b_count)] <- a_count[names(b_count)] - b_count
#Create a new vector of remaining values
rep(names(a_count), a_count)
#[1] "A" "B" "B"
Or:
a <- c("A", "B", "C", "A", "A", "B")
b <- c("A", "C", "A")
greedy_delete <- function(x, rmv) {
for (i in rmv) {
x <- x[-which(x == i)[1]]
}
x
}
greedy_delete(a, b)
#"B" "A" "B"

collapse / compress vector of repeated elements to max k-repeated

Is there a more efficient way than function below based on rle to compress/collapse a vector, of lets's say strings, into max k-repeated. Example input and desired outputs given below, .
Input
foov <- rep(c("a", "b", "a"), c(5, 3, 2))
For k = 2, desired output would be:
"a" "a" "b" "b" "a" "a"
And for k = 3, desired output would be:
"a" "a" "a" "b" "b" "b" "a" "a"
At the moment I am using rle as follows to achieve this:
collapseRLE <- function(v, k) {
vrle <- rle(v)
vrle$lengths[vrle$lengths > k] <- k
ret <- rep(vrle$values, vrle$lengths)
return(invisible(ret))
}
foov <- rep(c("a", "b", "a"), c(5, 3, 2))
print(collapseRLE(foov, 2))
We can use rleid from data.table. Based on the grouping by rleid on the vector, we subset from the index provided the sequence of 'k' and extract the columns as a vector ($V1)
library(data.table)
f1 <- function(k, vec) data.table(vec)[, vec[seq_len(pmin(k, .N))], rleid(vec)]$V1
f1(2, foov)
#[1] "a" "a" "b" "b" "a" "a"
f1(3, foov)
#[1] "a" "a" "a" "b" "b" "b" "a" "a"

Count number of elements meeting criteria in columns with NA values

I've got a matrix with "A", "B" and NA values, and I would like to count the number of "A" or "B" or NA values in every column.
sum(mydata[ , i] == "A")
and
sum(mydata[ , i] == "B")
worked fine for columns without NA. For columns that contain NA I can count the number of NAs with sum(is.na(mydata[ , i]). In these columns sum(mydata[ , i] == "A") returns NA as a result instead of a number.
How can i count the number of "A" values in columns which contain NA values?
Thanks for your help!
Example:
> mydata
V1 V2 V3 V4
V2 "A" "A" "A" "A"
V3 "A" "A" "A" "A"
V4 "B" "B" NA NA
V5 "A" "A" "A" "A"
V6 "B" "A" "A" "A"
V7 "B" "A" "A" "A"
V8 "A" "A" "A" "A"
sum(mydata[ , 2] == "A")
# [1] 6
sum(mydata[ , 3] == "A")
# [1] NA
sum(is.na(mydata[ , 3]))
# [1] 1
The function sum (like many other math functions in R) takes an argument na.rm. If you set na.rm=TRUE, R removes all NA values before doing the calculation.
Try:
sum(mydata[,3]=="A", na.rm=TRUE)
Not sure if this is what you are after. RnewB too so check if this working.
Difference between the number of rows and your number of rows will tell you number of NA items.
colSums(!is.na(mydata))
To expand on the answer from #Andrie,
mydata <- matrix(c(rep("A", 8), rep("B", 2), rep(NA, 2), rep("A", 4),
rep(c("B", "A", "A", "A"), 2), rep("A", 4)), ncol = 4, byrow = TRUE)
myFun <- function(x) {
data.frame(n.A = sum(x == "A", na.rm = TRUE), n.B = sum(x == "B",
na.rm = TRUE), n.NA = sum(is.na(x)))
}
apply(mydata, 2, myFun)
Another possibility is to convert the column in a factor and then to use the function summary. Example:
vec<-c("A","B","A",NA)
summary(as.factor(vec))
A quick way to do this is to do summary stats for the variable:
summary(mydata$my_variable) of table(mydata$my_variable)
This will give you the number of missing variables.
Hope this helps
You can use table to count all your values at once.

R beginner standard regarding grouping levels used in R

So one of the problems I am stuck on is that:
I have some variable X which takes values {1,2,3,4}. Thus
X:
1
2
2
4
2
3
What I want to do, is turn the 1's and 2's into A, and the 3's and 4's into B.
Is there any possible suggestions how I would go about doing this. Or hints?
I was initially thinking of using the subset command, but these seems to just extract them from the dataset.
One possible option is to use recodeVar from the doBy package
library(doBy)
x <- c(1, 2, 2, 4, 2, 3)
src = list(c(1, 2), c(3, 4))
tgt = list("A", "B")
recodeVar(x, src, tgt)
which yields
> recodeVar(x, src, tgt)
[1] "A" "A" "A" "B" "A" "B"
>
Or you can use the car package:
library(car)
recode(x, "1:2='A'; 3:4='B'")
X <- c(1,2,2,4,2,3)
Y <- ifelse(X %in% 1:2, "A", "B")
## or
Y <- cut(X,breaks=c(0,2.5,5),labels=c("A","B"))
The latter approach creates a factor rather than a character vector; you can use as.character to turn it back into a character vector if you want.
Another alternative:
LETTERS[ceiling((1:4)/2)]
[1] "A" "A" "B" "B"
LETTERS[ceiling(X/2)]
[1] "A" "A" "A" "B" "A" "B"
if it's dataframe, dplyr package:
dataframe %>%
mutate (newvar = case_when(var %in% c(1, 2) ~ "A",
case_when(var %in% c(3, 4) ~ "B")) -> dataframe

Resources