Count number of matching paired values in array in R - r

Rookie question. I'm looking for a simple way in R to count the number of matching pair of values in an array such as
c("A","A","A") # 3 matched pairs
c("A","B","A") # 1 matched pair
c("A","B") # 0 matched pair
etc
Thank you

It seems like you want to find all possible pairs of identical elements, where their order does not matter. Then:
matchPairs <- function(x) sum(choose(table(x), 2))
matchPairs(c("A", "A", "A"))
# [1] 3
matchPairs(c("A", "B", "A"))
# [1] 1
matchPairs(c("A", "B"))
# [1] 0
matchPairs(c("A", "A", "A", "B"))
# [1] 3
matchPairs(c("A", "A", "A", "B", "B"))
# [1] 4
matchPairs(c("A", "A", "A", "B", "B", "A"))
# [1] 7

Related

How do I compare the order of two lists in R?

Hi I'm trying to determine the change in ordering of two lists in R.
e.g. Comparing rankings of tennis players from two different months.
Feb <- c("A", "B", "C", "D")
Mar <- c("D", "B", "C", "A")
orderChange(Feb, Mar)
I would like to get a result which shows the difference in ordering/ranking.
(-3, 0, 0, 3)
I've tried which() but that only tells me whether an element is present and doesn't compare the ordering.
which(Mar %in% Feb)
[1] 1 2 3 4
You can use seq_along and subtract match.
Feb <- c("A", "B", "C", "D")
Mar <- c("D", "B", "C", "A")
Apr <- c("C", "B", "D", "A")
seq_along(Feb) - match(Feb, Mar)
#[1] -3 0 0 3
seq_along(Feb) - match(Feb, Apr)
#[1] -3 0 2 1
and can pack this in a function if needed.
orderChange <- function(x, y) seq_along(x) - match(x, y)

Tidy way of arranging data frame rows according to target sorting orders

Back in 2015, I asked a similar question on this, but I would like to find a tidy way of doing this.
This is the best that I could come up with so far. It works, but having to change column types just for sorting seems "wrong" somehow. However, so does resorting to dplyr::*_join() and match() comes with its own catches (plus it's hard to use it in tidy contexts).
So is there a good/recommended way of doing this in the tidyverse?
Define function
library(magrittr)
arrange_by_target <- function(
x,
targets
) {
x %>%
# Transform arrange-by columns to factors so we can leverage the order of
# the levels:
dplyr::mutate_at(
names(targets),
function(.x, .targets = targets) {
.col <- deparse(substitute(.x))
factor(.x, levels = .targets[[.col]])
}
) %>%
# Actual arranging:
dplyr::arrange_at(
names(targets)
) %>%
# Clean up by recasting factor columns to their original type:
dplyr::mutate_at(
.vars = names(targets),
function(.x, .targets = targets) {
.col <- deparse(substitute(.x))
vctrs::vec_cast(.x, to = class(.targets[[.col]]))
}
)
}
Test
x <- tibble::tribble(
~group, ~name, ~value,
"A", "B", 1,
"A", "C", 2,
"A", "A", 3,
"B", "B", 4,
"B", "A", 5
)
x %>%
arrange_by_target(
targets = list(
group = c("B", "A"),
name = c("A", "B", "C")
)
)
#> # A tibble: 5 x 3
#> group name value
#> <chr> <chr> <dbl>
#> 1 B A 5
#> 2 B B 4
#> 3 A A 3
#> 4 A B 1
#> 5 A C 2
x %>%
arrange_by_target(
targets = list(
group = c("B", "A"),
name = c("A", "B", "C") %>% rev()
)
)
#> # A tibble: 5 x 3
#> group name value
#> <chr> <chr> <dbl>
#> 1 B B 4
#> 2 B A 5
#> 3 A C 2
#> 4 A B 1
#> 5 A A 3
Created on 2019-11-06 by the reprex package (v0.3.0)
The easiest way to accomplish this is to convert your character columns to factors, like so:
x %>%
mutate(
group = factor(group, c("A", "B")),
name = factor(name, c("C", "B", "A"))
) %>%
arrange(group, name)
Another option that I frequently use is to utilize joins. For example:
x <- tibble::tribble(
~group, ~name, ~value,
"A", "B", 1,
"A", "C", 2,
"A", "A", 3,
"B", "B", 4,
"B", "A", 5,
"A", "A", 6,
"B", "C", 7,
"A", "B", 8,
"B", "B", 9
)
custom_sort <- tibble::tribble(
~group, ~name,
"A", "C",
"A", "B",
"A", "A",
"B", "B",
"B", "A"
)
x %>% right_join(custom_sort)

R: select top products in groups

I need to select 3 top selling products in each category, but if category dose not have 3 products I should add more products from best available category ("a" being the best category, "c" worst).
Every day the products change so I would like to this automatically. Previously I did choose top 3 products and if there was not available I did not bothered, but unfortunately the conditions changed. For that I used code as follows:
Selected <- items %>% group_by(Cat) %>% dplyr:: filter(row_number() < 3) %>% ungroup
Sample data:
items <- data.frame(Cat = c("a", "a", "a", "b", "b", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
Desired results:
"a", "a", "a", "b", "b", "c", "c", "c", "c"
Sample data - 2:
items <- data.frame(Cat = c("a", "a", "a", "a", "b", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
Desired results - 2:
"a", "a", "a", "a", "b", "c", "c", "c", "c"
Here is a possible answer. I'm not entirely sure if I'm getting what you are after - if not let me know.
items <- data.frame(Cat = c("a", "a", "a",
"b", "b",
"c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
First we order the data according from best to worst category and add the count number within category.
Selected <- items %>% group_by(Cat) %>%
mutate(id = row_number()) %>%
ungroup() %>% arrange(Cat)
Then we can make the filter and fill up with remaining rows from best to worst
Selected %>% filter(id<=3) %>% # Select top 3 in each group
bind_rows(Selected %>% filter(id>3)) %>% # Merge with the ones that weren't selected
mutate(id=row_number()) %>%
filter(id <= 3*length(unique(Cat))) # Extract the right number
This produces
# A tibble: 9 x 3
Cat ranking id
<fctr> <int> <int>
1 a 1 1
2 a 2 2
3 a 3 3
4 b 4 4
5 b 5 5
6 c 6 6
7 c 7 7
8 c 8 8
9 c 9 9
The second data example yields
# A tibble: 9 x 3
Cat ranking id
<fctr> <int> <int>
1 a 1 1
2 a 2 2
3 a 3 3
4 b 5 4
5 c 6 5
6 c 7 6
7 c 8 7
8 a 4 8
9 c 9 9
which seems to be what you were after.

Count frequency of elements matching other elements of another column in R

Say I have
Name<- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C")
Cate<- c("a", "a", "b", "b", "c", "a", "a", "a", "c", "b", "b", "c")
I want to reproduce the following:
Nam fra frb frc
A 2 2 1
B 3 0 1
C 0 2 1
Where fra, frb and frc are the frequency values of a, b and c values respectively in Cate for each category (A,B,C) of Nam.
I am looking for a faster code than the one I am using (subsetting Nam in each category and then calculate the frequencies)
We can do a dcast from data.table which is very efficient and quick
library(data.table)
dcast(data.table(Name, Cate), Name ~paste0("fr", Cate))
# Name fra frb frc
#1: A 2 2 1
#2: B 3 0 1
#3: C 0 2 1
A simple base R option would be
table(Cate, Name)
data
Name <- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C")
Cate <- c("a", "a", "b", "b", "c", "a", "a", "a", "c", "b", "b", "c")
You can also use the xtabs() function:
xtabs(~Name + Cate)
For completeness' sake, here's a Hadleyverse solution:
library(dplyr)
library(tidyr)
data.frame(Name, Cate) %>%
count(Name, Cate) %>%
spread(key = Cate, value = n, fill = 0)

Mean by factor level for last three rows

I have a dataframe, which looks like this (but has more factor levels and values)
ID <- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C")
Value <- rep(1:5)
test <- cbind.data.frame(ID, Value)
I would like to calculate the mean of the first 3 and last 3 values (rows) of each factor level.
For the first 3 values I used ddply:
library(plyr)
mean_start <- ddply(test, .(ID), summarise, mean_start = mean(Value[1:3]))
This works great. But how can I refer to the last 3 rows, given that each factor level has a different amount of rows?
Using headand tail:
library(plyr)
(means <- ddply(test, .(ID), summarise, mean_start = mean(head(Value, 3)), mean_end = mean(tail(Value, 3))))
# ID mean_start mean_end
# 1 A 2.000000 4
# 2 B 2.000000 3
# 3 C 2.666667 4

Resources