R beginner standard regarding grouping levels used in R - r

So one of the problems I am stuck on is that:
I have some variable X which takes values {1,2,3,4}. Thus
X:
1
2
2
4
2
3
What I want to do, is turn the 1's and 2's into A, and the 3's and 4's into B.
Is there any possible suggestions how I would go about doing this. Or hints?
I was initially thinking of using the subset command, but these seems to just extract them from the dataset.

One possible option is to use recodeVar from the doBy package
library(doBy)
x <- c(1, 2, 2, 4, 2, 3)
src = list(c(1, 2), c(3, 4))
tgt = list("A", "B")
recodeVar(x, src, tgt)
which yields
> recodeVar(x, src, tgt)
[1] "A" "A" "A" "B" "A" "B"
>
Or you can use the car package:
library(car)
recode(x, "1:2='A'; 3:4='B'")

X <- c(1,2,2,4,2,3)
Y <- ifelse(X %in% 1:2, "A", "B")
## or
Y <- cut(X,breaks=c(0,2.5,5),labels=c("A","B"))
The latter approach creates a factor rather than a character vector; you can use as.character to turn it back into a character vector if you want.

Another alternative:
LETTERS[ceiling((1:4)/2)]
[1] "A" "A" "B" "B"
LETTERS[ceiling(X/2)]
[1] "A" "A" "A" "B" "A" "B"

if it's dataframe, dplyr package:
dataframe %>%
mutate (newvar = case_when(var %in% c(1, 2) ~ "A",
case_when(var %in% c(3, 4) ~ "B")) -> dataframe

Related

How to order vectors with priority layout?

Let's consider these vector of strings following:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
As you can see there are certain strings in this vector starting the same e.g. "B", "B_big".
What I want to end up with is a vector ordered in such layout that all strings with same starting should be next to each other. But order of letter should stay the same (that "B" should be first one, "C" second one and so on). Let me put an example to clarify it:
In simple words, I want to end up with vector:
"B", "B_big", "B_tremendous", "C_small", "C", "A", "A_huge", "A_big", "D"
What I've done to achive this vector: I read from the left and I see "B" so I'm looking on all other vector which starts the same and put it to the right of "B". Then is "C", so I'm looking on all remaining strings and put all starting with "C" e.g. "C_small" to the right and so on.
I'm not sure how to do it. I'm almost sure that gsub function can be used to approach this result, however I'm not sure how to combine it with this searching and replacing. Could you please give me a hand doing so ?
Here's one option:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
xorder <- unique(substr(x, 1, 1))
xnew <- c()
for (letter in xorder) {
if (letter %in% substr(x, 1, 1)) {
xnew <- c(xnew, x[substr(x, 1, 1) == letter])
}
}
xnew
[1] "B" "B_big" "B_tremendous" "C_small" "C"
[6] "A" "A_huge" "A_big" "D"
Use the "prefix" as factor levels and then order:
sx = substr(x, 1, 1)
x[order(factor(sx, levels = unique(sx)))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
If you are open for non-base alternatives, data.table::chgroup may be used, "groups together duplicated values but retains the group order (according the first appearance order of each group), efficiently":
x[chgroup(substr(x, 1, 1))]
# [1] "B" "B_big" "B_tremendous" "C_small" "C" "A" "A_huge" "A_big" "D"
I suggest splitting the two parts of the text into separate dimensions. Then, define a clear rank order for the descriptive part of the name using a named character vector. From there you can reorder the input vector on the fly. Bundled as a function:
x <- c("B", "C_small", "A", "B_big", "C", "A_huge", "D", "A_big", "B_tremendous")
sorter <- function(x) {
# separate the two parts
prefix <- sub("_.*$", "", x)
suffix <- sub("^.*_", "", x)
# identify inputs with no suffix
suffix <- ifelse(suffix == "", "none", suffix)
# map each suffix to a rank ordering
suffix_order <- c(
"small" = -1,
"none" = 0,
"big" = 1,
"huge" = 2,
"tremendous" = 3
)
# return input vector,
# ordered by the prefix and the mapping of suffix to rank
x[order(prefix, suffix_order[suffix])]
}
sorter(x)
Result
[1] "A_big" "A_huge" "A" "B_big" "B_tremendous" "B" "C_small" "C"
[9] "D"

Trouble evaluating combinations from combn using purrr

I am trying to use combn to divide a group of n = 20 different units into 3 groups of unequal size -- 4, 6 and 10. Then I am trying to validate for values that must be together within a group -- if one element from the pair exists in the group then the other should also be in the group. If one is not in the group then neither should be in the group. In this fashion, I'd like to evaluate the groups in order to find all possible valid solutions where the rules are true.
x <- letters[1:20]
same_group <- list(
c("a", "c"),
c("d", "f"),
c("b", "k", "r")
)
combinations_list <- combn(x, 4, simplify = F)
validate_combinations <- function(x) all(c("a", "c") %in% x) | !any(c("a", "c") %in% x)
valid_combinations <- keep(combinations_list, validate_combinations)
In this way I'd like to combine -> reduce each group until I have a list of all valid combinations. I'm not sure how to combine combinations_list, validate_combinations, and the same_group to check all same_group "rules" against the combinations in the table. The furthest I can get is to check against one combination c("a", "c"), which when run against keep(combinations_list, validate_combinations) is indeed giving me the output I want.
I think once I can do this, I can then use the unpicked values in another combn function for the group of 6 and the group of 10.
We can change the function to accept variable group
validate_combinations <- function(x, group) all(group %in% x) | !any(group %in% x)
then for each group subset the combinations_list which satisfy validate_combinations
lapply(same_group, function(x) combinations_list[
sapply(combinations_list, function(y) validate_combinations(y, x))])
#[[1]]
#[[1]][[1]]
#[1] "a" "b" "c" "d"
#[[1]][[2]]
#[1] "a" "b" "c" "e"
#[[1]][[3]]
#[1] "a" "b" "c" "f"
#[[1]][[4]]
#[1] "a" "b" "c" "g"
#[[1]][[5]]
#[1] "a" "b" "c" "h"
#[[1]][[6]]
#[1] "a" "b" "c" "i"
#[[1]][[7]]
#[1] "a" "b" "c" "j"
#[[1]][[8]]
#[1] "a" "b" "c" "k"
#......

collapse / compress vector of repeated elements to max k-repeated

Is there a more efficient way than function below based on rle to compress/collapse a vector, of lets's say strings, into max k-repeated. Example input and desired outputs given below, .
Input
foov <- rep(c("a", "b", "a"), c(5, 3, 2))
For k = 2, desired output would be:
"a" "a" "b" "b" "a" "a"
And for k = 3, desired output would be:
"a" "a" "a" "b" "b" "b" "a" "a"
At the moment I am using rle as follows to achieve this:
collapseRLE <- function(v, k) {
vrle <- rle(v)
vrle$lengths[vrle$lengths > k] <- k
ret <- rep(vrle$values, vrle$lengths)
return(invisible(ret))
}
foov <- rep(c("a", "b", "a"), c(5, 3, 2))
print(collapseRLE(foov, 2))
We can use rleid from data.table. Based on the grouping by rleid on the vector, we subset from the index provided the sequence of 'k' and extract the columns as a vector ($V1)
library(data.table)
f1 <- function(k, vec) data.table(vec)[, vec[seq_len(pmin(k, .N))], rleid(vec)]$V1
f1(2, foov)
#[1] "a" "a" "b" "b" "a" "a"
f1(3, foov)
#[1] "a" "a" "a" "b" "b" "b" "a" "a"

Replacing multiple values in a column of a data frame based on condition

here is data set of weight
I have to replace values in weight column based on following conditon
if weight < 7 then replace values with a
if weight > = 7 and <8 replace values with b
if weight >= 8 replace with c
One option is ifelse
df1$New <- with(df1, ifelse(weight <7, "a", ifelse(weight >=7 & weight < 8, "b", "c")))
df1$New
#[1] "a" "c" "c" "a" "c" "b" "c"
or we can use cut if there are many groups
with(df1, as.character(cut(weight, breaks = c(-Inf, 7, 8, Inf),
labels = c('a', 'b', 'c'))))
#[1] "a" "c" "c" "a" "c" "b" "c"
data
set.seed(24)
df1 <- data.frame(weight = rnorm(7, 7, 3))
You could use dplyr as well to either replace the same column (give it the same name) or add (by giving it a new name). The code below replaces the existing "weight" column.
library(dplyr)
yourdata%>%mutate(
weight=ifelse(weight <7, "a", ifelse(weight >=7 & weight < 8, "b", "c")))
In future can you provide a reproducible example of your data that's not a jpeg: How to make a great R reproducible example?

Count number of elements meeting criteria in columns with NA values

I've got a matrix with "A", "B" and NA values, and I would like to count the number of "A" or "B" or NA values in every column.
sum(mydata[ , i] == "A")
and
sum(mydata[ , i] == "B")
worked fine for columns without NA. For columns that contain NA I can count the number of NAs with sum(is.na(mydata[ , i]). In these columns sum(mydata[ , i] == "A") returns NA as a result instead of a number.
How can i count the number of "A" values in columns which contain NA values?
Thanks for your help!
Example:
> mydata
V1 V2 V3 V4
V2 "A" "A" "A" "A"
V3 "A" "A" "A" "A"
V4 "B" "B" NA NA
V5 "A" "A" "A" "A"
V6 "B" "A" "A" "A"
V7 "B" "A" "A" "A"
V8 "A" "A" "A" "A"
sum(mydata[ , 2] == "A")
# [1] 6
sum(mydata[ , 3] == "A")
# [1] NA
sum(is.na(mydata[ , 3]))
# [1] 1
The function sum (like many other math functions in R) takes an argument na.rm. If you set na.rm=TRUE, R removes all NA values before doing the calculation.
Try:
sum(mydata[,3]=="A", na.rm=TRUE)
Not sure if this is what you are after. RnewB too so check if this working.
Difference between the number of rows and your number of rows will tell you number of NA items.
colSums(!is.na(mydata))
To expand on the answer from #Andrie,
mydata <- matrix(c(rep("A", 8), rep("B", 2), rep(NA, 2), rep("A", 4),
rep(c("B", "A", "A", "A"), 2), rep("A", 4)), ncol = 4, byrow = TRUE)
myFun <- function(x) {
data.frame(n.A = sum(x == "A", na.rm = TRUE), n.B = sum(x == "B",
na.rm = TRUE), n.NA = sum(is.na(x)))
}
apply(mydata, 2, myFun)
Another possibility is to convert the column in a factor and then to use the function summary. Example:
vec<-c("A","B","A",NA)
summary(as.factor(vec))
A quick way to do this is to do summary stats for the variable:
summary(mydata$my_variable) of table(mydata$my_variable)
This will give you the number of missing variables.
Hope this helps
You can use table to count all your values at once.

Resources