Rookie question. I'm looking for a simple way in R to count the number of matching pair of values in an array such as
c("A","A","A") # 3 matched pairs
c("A","B","A") # 1 matched pair
c("A","B") # 0 matched pair
etc
Thank you
It seems like you want to find all possible pairs of identical elements, where their order does not matter. Then:
matchPairs <- function(x) sum(choose(table(x), 2))
matchPairs(c("A", "A", "A"))
# [1] 3
matchPairs(c("A", "B", "A"))
# [1] 1
matchPairs(c("A", "B"))
# [1] 0
matchPairs(c("A", "A", "A", "B"))
# [1] 3
matchPairs(c("A", "A", "A", "B", "B"))
# [1] 4
matchPairs(c("A", "A", "A", "B", "B", "A"))
# [1] 7
Related
Hi I'm trying to determine the change in ordering of two lists in R.
e.g. Comparing rankings of tennis players from two different months.
Feb <- c("A", "B", "C", "D")
Mar <- c("D", "B", "C", "A")
orderChange(Feb, Mar)
I would like to get a result which shows the difference in ordering/ranking.
(-3, 0, 0, 3)
I've tried which() but that only tells me whether an element is present and doesn't compare the ordering.
which(Mar %in% Feb)
[1] 1 2 3 4
You can use seq_along and subtract match.
Feb <- c("A", "B", "C", "D")
Mar <- c("D", "B", "C", "A")
Apr <- c("C", "B", "D", "A")
seq_along(Feb) - match(Feb, Mar)
#[1] -3 0 0 3
seq_along(Feb) - match(Feb, Apr)
#[1] -3 0 2 1
and can pack this in a function if needed.
orderChange <- function(x, y) seq_along(x) - match(x, y)
Back in 2015, I asked a similar question on this, but I would like to find a tidy way of doing this.
This is the best that I could come up with so far. It works, but having to change column types just for sorting seems "wrong" somehow. However, so does resorting to dplyr::*_join() and match() comes with its own catches (plus it's hard to use it in tidy contexts).
So is there a good/recommended way of doing this in the tidyverse?
Define function
library(magrittr)
arrange_by_target <- function(
x,
targets
) {
x %>%
# Transform arrange-by columns to factors so we can leverage the order of
# the levels:
dplyr::mutate_at(
names(targets),
function(.x, .targets = targets) {
.col <- deparse(substitute(.x))
factor(.x, levels = .targets[[.col]])
}
) %>%
# Actual arranging:
dplyr::arrange_at(
names(targets)
) %>%
# Clean up by recasting factor columns to their original type:
dplyr::mutate_at(
.vars = names(targets),
function(.x, .targets = targets) {
.col <- deparse(substitute(.x))
vctrs::vec_cast(.x, to = class(.targets[[.col]]))
}
)
}
Test
x <- tibble::tribble(
~group, ~name, ~value,
"A", "B", 1,
"A", "C", 2,
"A", "A", 3,
"B", "B", 4,
"B", "A", 5
)
x %>%
arrange_by_target(
targets = list(
group = c("B", "A"),
name = c("A", "B", "C")
)
)
#> # A tibble: 5 x 3
#> group name value
#> <chr> <chr> <dbl>
#> 1 B A 5
#> 2 B B 4
#> 3 A A 3
#> 4 A B 1
#> 5 A C 2
x %>%
arrange_by_target(
targets = list(
group = c("B", "A"),
name = c("A", "B", "C") %>% rev()
)
)
#> # A tibble: 5 x 3
#> group name value
#> <chr> <chr> <dbl>
#> 1 B B 4
#> 2 B A 5
#> 3 A C 2
#> 4 A B 1
#> 5 A A 3
Created on 2019-11-06 by the reprex package (v0.3.0)
The easiest way to accomplish this is to convert your character columns to factors, like so:
x %>%
mutate(
group = factor(group, c("A", "B")),
name = factor(name, c("C", "B", "A"))
) %>%
arrange(group, name)
Another option that I frequently use is to utilize joins. For example:
x <- tibble::tribble(
~group, ~name, ~value,
"A", "B", 1,
"A", "C", 2,
"A", "A", 3,
"B", "B", 4,
"B", "A", 5,
"A", "A", 6,
"B", "C", 7,
"A", "B", 8,
"B", "B", 9
)
custom_sort <- tibble::tribble(
~group, ~name,
"A", "C",
"A", "B",
"A", "A",
"B", "B",
"B", "A"
)
x %>% right_join(custom_sort)
I need to select 3 top selling products in each category, but if category dose not have 3 products I should add more products from best available category ("a" being the best category, "c" worst).
Every day the products change so I would like to this automatically. Previously I did choose top 3 products and if there was not available I did not bothered, but unfortunately the conditions changed. For that I used code as follows:
Selected <- items %>% group_by(Cat) %>% dplyr:: filter(row_number() < 3) %>% ungroup
Sample data:
items <- data.frame(Cat = c("a", "a", "a", "b", "b", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
Desired results:
"a", "a", "a", "b", "b", "c", "c", "c", "c"
Sample data - 2:
items <- data.frame(Cat = c("a", "a", "a", "a", "b", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
Desired results - 2:
"a", "a", "a", "a", "b", "c", "c", "c", "c"
Here is a possible answer. I'm not entirely sure if I'm getting what you are after - if not let me know.
items <- data.frame(Cat = c("a", "a", "a",
"b", "b",
"c", "c", "c", "c", "c", "c", "c", "c", "c", "c"),
ranking = 1:15)
First we order the data according from best to worst category and add the count number within category.
Selected <- items %>% group_by(Cat) %>%
mutate(id = row_number()) %>%
ungroup() %>% arrange(Cat)
Then we can make the filter and fill up with remaining rows from best to worst
Selected %>% filter(id<=3) %>% # Select top 3 in each group
bind_rows(Selected %>% filter(id>3)) %>% # Merge with the ones that weren't selected
mutate(id=row_number()) %>%
filter(id <= 3*length(unique(Cat))) # Extract the right number
This produces
# A tibble: 9 x 3
Cat ranking id
<fctr> <int> <int>
1 a 1 1
2 a 2 2
3 a 3 3
4 b 4 4
5 b 5 5
6 c 6 6
7 c 7 7
8 c 8 8
9 c 9 9
The second data example yields
# A tibble: 9 x 3
Cat ranking id
<fctr> <int> <int>
1 a 1 1
2 a 2 2
3 a 3 3
4 b 5 4
5 c 6 5
6 c 7 6
7 c 8 7
8 a 4 8
9 c 9 9
which seems to be what you were after.
Say I have
Name<- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C")
Cate<- c("a", "a", "b", "b", "c", "a", "a", "a", "c", "b", "b", "c")
I want to reproduce the following:
Nam fra frb frc
A 2 2 1
B 3 0 1
C 0 2 1
Where fra, frb and frc are the frequency values of a, b and c values respectively in Cate for each category (A,B,C) of Nam.
I am looking for a faster code than the one I am using (subsetting Nam in each category and then calculate the frequencies)
We can do a dcast from data.table which is very efficient and quick
library(data.table)
dcast(data.table(Name, Cate), Name ~paste0("fr", Cate))
# Name fra frb frc
#1: A 2 2 1
#2: B 3 0 1
#3: C 0 2 1
A simple base R option would be
table(Cate, Name)
data
Name <- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C")
Cate <- c("a", "a", "b", "b", "c", "a", "a", "a", "c", "b", "b", "c")
You can also use the xtabs() function:
xtabs(~Name + Cate)
For completeness' sake, here's a Hadleyverse solution:
library(dplyr)
library(tidyr)
data.frame(Name, Cate) %>%
count(Name, Cate) %>%
spread(key = Cate, value = n, fill = 0)
I have a dataframe, which looks like this (but has more factor levels and values)
ID <- c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C")
Value <- rep(1:5)
test <- cbind.data.frame(ID, Value)
I would like to calculate the mean of the first 3 and last 3 values (rows) of each factor level.
For the first 3 values I used ddply:
library(plyr)
mean_start <- ddply(test, .(ID), summarise, mean_start = mean(Value[1:3]))
This works great. But how can I refer to the last 3 rows, given that each factor level has a different amount of rows?
Using headand tail:
library(plyr)
(means <- ddply(test, .(ID), summarise, mean_start = mean(head(Value, 3)), mean_end = mean(tail(Value, 3))))
# ID mean_start mean_end
# 1 A 2.000000 4
# 2 B 2.000000 3
# 3 C 2.666667 4