Count number of elements meeting criteria in columns with NA values - r

I've got a matrix with "A", "B" and NA values, and I would like to count the number of "A" or "B" or NA values in every column.
sum(mydata[ , i] == "A")
and
sum(mydata[ , i] == "B")
worked fine for columns without NA. For columns that contain NA I can count the number of NAs with sum(is.na(mydata[ , i]). In these columns sum(mydata[ , i] == "A") returns NA as a result instead of a number.
How can i count the number of "A" values in columns which contain NA values?
Thanks for your help!
Example:
> mydata
V1 V2 V3 V4
V2 "A" "A" "A" "A"
V3 "A" "A" "A" "A"
V4 "B" "B" NA NA
V5 "A" "A" "A" "A"
V6 "B" "A" "A" "A"
V7 "B" "A" "A" "A"
V8 "A" "A" "A" "A"
sum(mydata[ , 2] == "A")
# [1] 6
sum(mydata[ , 3] == "A")
# [1] NA
sum(is.na(mydata[ , 3]))
# [1] 1

The function sum (like many other math functions in R) takes an argument na.rm. If you set na.rm=TRUE, R removes all NA values before doing the calculation.
Try:
sum(mydata[,3]=="A", na.rm=TRUE)

Not sure if this is what you are after. RnewB too so check if this working.
Difference between the number of rows and your number of rows will tell you number of NA items.
colSums(!is.na(mydata))

To expand on the answer from #Andrie,
mydata <- matrix(c(rep("A", 8), rep("B", 2), rep(NA, 2), rep("A", 4),
rep(c("B", "A", "A", "A"), 2), rep("A", 4)), ncol = 4, byrow = TRUE)
myFun <- function(x) {
data.frame(n.A = sum(x == "A", na.rm = TRUE), n.B = sum(x == "B",
na.rm = TRUE), n.NA = sum(is.na(x)))
}
apply(mydata, 2, myFun)

Another possibility is to convert the column in a factor and then to use the function summary. Example:
vec<-c("A","B","A",NA)
summary(as.factor(vec))

A quick way to do this is to do summary stats for the variable:
summary(mydata$my_variable) of table(mydata$my_variable)
This will give you the number of missing variables.
Hope this helps

You can use table to count all your values at once.

Related

Get the unique values of two vectors keeping the order of both original

I am trying to get a vector of the unique elements of two vectors that respects the order of both of the original vectors.
The vectors are both sampled from a longer "hidden" vector that only contains unique entries (i.e. no repeats are allowed), which ensures both v1 and v2 have a compatible order (i.e. v1<-("Z","A",...) and v2<-("A","Z",...) can not occur).
The order is arbitrary, so I cannot use any simple order() or sort().
An example below:
v1 <- c("Z", "A", "F", "D")
v2 <- c("A", "T", "F", "Q", "D")
Result desired:
c("Z", "A", "T", "F", "Q", "D") or
Further explanation: v1 establishes the relationship
"Z" < "A" < "F" < "D"
and v2 states
"A" < "T" < "F" < "Q" < "D"
so the sequence that satisfies v1 and v2 is
"Z" < "A" < "T" < "F" < "Q" < "D"
I understand this case is fully determined (the two vectors do completely define the order of all elements), but there would be cases when this is not enough. In that case, any permutation that respects the two sets of ordering would be a satisfactory solution.
Any tips will be appreciated.
You can get unique from v1 and v2 and resort it using match on v1 and v2 and repeat this until no change happens.
x <- unique(c(v1, v2))
repeat {
y <- x
i <- match(v2, x)
x[sort(i)] <- x[i]
i <- match(v1, x)
x[sort(i)] <- x[i]
if(identical(x, y)) break;
}
x
#[1] "Z" "A" "T" "F" "Q" "D"
Alternative you can get the overlapping letters of v1 and v2 and then join to this anchor points the subsets of v1 and v2:
i <- v2[na.omit(match(v1, v2))]
j <- c(0, match(i, v2))
i <- c(0, match(i, v1))
unique(c(unlist(lapply(seq_along(i)[-1], function(k) {
c(v1[head((i[k-1]:i[k]), -1)], v2[head((j[k-1]:j[k])[-1], -1)])
})), v1, v2))
#[1] "Z" "A" "T" "F" "Q" "D"
For this example the next code works. One first has to define auxiliar vectors w1, w2 depending on which has the first common element and another vector w on which to append the lacking elements by order.
It would be clearer using a for loop, which would avoid this cumbersome code, but at first, this is faster and shorter.
w <- w1 <- unlist(ifelse(intersect(v1,v2)[1] == v1[1], list(v2), list(v1)))
w2 <- unlist(ifelse(intersect(v1,v2)[1] == v1[1], list(v1), list(v2)))
unique(lapply(setdiff(w2,w1), function(elmt) w <<- append(w, elmt, after = match(w2[match(elmt,w2)-1],w)))[[length(setdiff(w2,w1))]])
[1] "Z" "A" "T" "F" "Q" "D"

Trouble evaluating combinations from combn using purrr

I am trying to use combn to divide a group of n = 20 different units into 3 groups of unequal size -- 4, 6 and 10. Then I am trying to validate for values that must be together within a group -- if one element from the pair exists in the group then the other should also be in the group. If one is not in the group then neither should be in the group. In this fashion, I'd like to evaluate the groups in order to find all possible valid solutions where the rules are true.
x <- letters[1:20]
same_group <- list(
c("a", "c"),
c("d", "f"),
c("b", "k", "r")
)
combinations_list <- combn(x, 4, simplify = F)
validate_combinations <- function(x) all(c("a", "c") %in% x) | !any(c("a", "c") %in% x)
valid_combinations <- keep(combinations_list, validate_combinations)
In this way I'd like to combine -> reduce each group until I have a list of all valid combinations. I'm not sure how to combine combinations_list, validate_combinations, and the same_group to check all same_group "rules" against the combinations in the table. The furthest I can get is to check against one combination c("a", "c"), which when run against keep(combinations_list, validate_combinations) is indeed giving me the output I want.
I think once I can do this, I can then use the unpicked values in another combn function for the group of 6 and the group of 10.
We can change the function to accept variable group
validate_combinations <- function(x, group) all(group %in% x) | !any(group %in% x)
then for each group subset the combinations_list which satisfy validate_combinations
lapply(same_group, function(x) combinations_list[
sapply(combinations_list, function(y) validate_combinations(y, x))])
#[[1]]
#[[1]][[1]]
#[1] "a" "b" "c" "d"
#[[1]][[2]]
#[1] "a" "b" "c" "e"
#[[1]][[3]]
#[1] "a" "b" "c" "f"
#[[1]][[4]]
#[1] "a" "b" "c" "g"
#[[1]][[5]]
#[1] "a" "b" "c" "h"
#[[1]][[6]]
#[1] "a" "b" "c" "i"
#[[1]][[7]]
#[1] "a" "b" "c" "j"
#[[1]][[8]]
#[1] "a" "b" "c" "k"
#......

collapse / compress vector of repeated elements to max k-repeated

Is there a more efficient way than function below based on rle to compress/collapse a vector, of lets's say strings, into max k-repeated. Example input and desired outputs given below, .
Input
foov <- rep(c("a", "b", "a"), c(5, 3, 2))
For k = 2, desired output would be:
"a" "a" "b" "b" "a" "a"
And for k = 3, desired output would be:
"a" "a" "a" "b" "b" "b" "a" "a"
At the moment I am using rle as follows to achieve this:
collapseRLE <- function(v, k) {
vrle <- rle(v)
vrle$lengths[vrle$lengths > k] <- k
ret <- rep(vrle$values, vrle$lengths)
return(invisible(ret))
}
foov <- rep(c("a", "b", "a"), c(5, 3, 2))
print(collapseRLE(foov, 2))
We can use rleid from data.table. Based on the grouping by rleid on the vector, we subset from the index provided the sequence of 'k' and extract the columns as a vector ($V1)
library(data.table)
f1 <- function(k, vec) data.table(vec)[, vec[seq_len(pmin(k, .N))], rleid(vec)]$V1
f1(2, foov)
#[1] "a" "a" "b" "b" "a" "a"
f1(3, foov)
#[1] "a" "a" "a" "b" "b" "b" "a" "a"

Replacing multiple values in a column of a data frame based on condition

here is data set of weight
I have to replace values in weight column based on following conditon
if weight < 7 then replace values with a
if weight > = 7 and <8 replace values with b
if weight >= 8 replace with c
One option is ifelse
df1$New <- with(df1, ifelse(weight <7, "a", ifelse(weight >=7 & weight < 8, "b", "c")))
df1$New
#[1] "a" "c" "c" "a" "c" "b" "c"
or we can use cut if there are many groups
with(df1, as.character(cut(weight, breaks = c(-Inf, 7, 8, Inf),
labels = c('a', 'b', 'c'))))
#[1] "a" "c" "c" "a" "c" "b" "c"
data
set.seed(24)
df1 <- data.frame(weight = rnorm(7, 7, 3))
You could use dplyr as well to either replace the same column (give it the same name) or add (by giving it a new name). The code below replaces the existing "weight" column.
library(dplyr)
yourdata%>%mutate(
weight=ifelse(weight <7, "a", ifelse(weight >=7 & weight < 8, "b", "c")))
In future can you provide a reproducible example of your data that's not a jpeg: How to make a great R reproducible example?

R beginner standard regarding grouping levels used in R

So one of the problems I am stuck on is that:
I have some variable X which takes values {1,2,3,4}. Thus
X:
1
2
2
4
2
3
What I want to do, is turn the 1's and 2's into A, and the 3's and 4's into B.
Is there any possible suggestions how I would go about doing this. Or hints?
I was initially thinking of using the subset command, but these seems to just extract them from the dataset.
One possible option is to use recodeVar from the doBy package
library(doBy)
x <- c(1, 2, 2, 4, 2, 3)
src = list(c(1, 2), c(3, 4))
tgt = list("A", "B")
recodeVar(x, src, tgt)
which yields
> recodeVar(x, src, tgt)
[1] "A" "A" "A" "B" "A" "B"
>
Or you can use the car package:
library(car)
recode(x, "1:2='A'; 3:4='B'")
X <- c(1,2,2,4,2,3)
Y <- ifelse(X %in% 1:2, "A", "B")
## or
Y <- cut(X,breaks=c(0,2.5,5),labels=c("A","B"))
The latter approach creates a factor rather than a character vector; you can use as.character to turn it back into a character vector if you want.
Another alternative:
LETTERS[ceiling((1:4)/2)]
[1] "A" "A" "B" "B"
LETTERS[ceiling(X/2)]
[1] "A" "A" "A" "B" "A" "B"
if it's dataframe, dplyr package:
dataframe %>%
mutate (newvar = case_when(var %in% c(1, 2) ~ "A",
case_when(var %in% c(3, 4) ~ "B")) -> dataframe

Resources