How to adjust result to equal mode values in R - r

The code below is generating the mode value considering the columns Method1, Method2, Method3 and Method4. However, notice that for alternative 10 and 12 it has the same mode value, that is, it has a value of 2. However, I would like my Mode column to have different values, as if it were a rank. Therefore, the alternative that had Mode=1 is the best, but I have no way of knowing the second best alternative, because it has two numbers 2 in the Mode column. Do you have suggestions on what approach I can take?
database<-structure(list(Alternatives = c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
Method1 = c(1L, 10L, 7L, 8L, 9L, 6L, 5L, 3L, 4L, 2L), Method2 = c(1L,
8L, 6L, 7L, 10L, 9L, 4L, 2L, 3L, 5L), Method3 = c(1L,
10L, 7L, 8L, 9L, 6L, 4L, 2L, 3L, 5L), Method4 = c(1L,
9L, 6L, 7L, 10L, 8L, 5L, 3L, 4L, 2L)), class = "data.frame", row.names = c(NA,
10L))
ModeFunc <- function(Vec) {
tmp <- sort(table(Vec),decreasing = TRUE)
Nms <- names(tmp)
if(max(tmp) > 1) {
as.numeric(Nms[1])
} else NA}
output <- database |> rowwise() |>
mutate(Mode = ModeFunc(c_across(Method1:Method4))) %>%
data.frame()
> output
Alternatives Method1 Method2 Method3 Method4 Mode
1 3 1 1 1 1 1
2 4 10 8 10 9 10
3 5 7 6 7 6 6
4 6 8 7 8 7 7
5 7 9 10 9 10 9
6 8 6 9 6 8 6
7 9 5 4 4 5 4
8 10 3 2 2 3 2
9 11 4 3 3 4 3
10 12 2 5 5 2 2
CHECK
output$Rank <- (nrow(output) + 1) - rank(-output$Mode, ties.method = "last")
output|>
arrange(Mode)
Alternatives Method1 Method2 Method3 Method4 Mode Rank
1 3 1 1 1 1 1 1
2 10 3 2 2 3 2 2
3 12 2 5 5 2 2 3
4 11 4 3 3 4 3 4
5 9 5 4 4 5 4 5
6 5 7 6 7 6 6 6
7 8 6 9 6 8 6 7
8 6 8 7 8 7 7 8
9 7 9 10 9 10 9 9
10 4 10 8 10 9 10 10

OK. Based on OP's comment above, Here's a solution that picks the row with the lowest value of Alternatives in case of ties. You can generalise to any other tie break with an appropriate modification of the second mutate.
output |>
arrange(Mode) |> # Sort by mode
group_by(Mode) |> # Assign intial ranks
mutate(Rank=cur_group_id()) |>
arrange(Rank, Alternatives) |> # Sort and assign tie break
mutate(TieBreak=row_number()) |>
ungroup()
# A tibble: 10 × 8
Alternatives Method1 Method2 Method3 Method4 Mode Rank TieBreak
<dbl> <int> <int> <int> <int> <dbl> <int> <int>
1 3 1 1 1 1 1 1 1
2 10 3 2 2 3 2 2 1
3 12 2 5 5 2 2 2 2
4 11 4 3 3 4 3 3 1
5 9 5 4 4 5 4 4 1
6 5 7 6 7 6 6 5 1
7 8 6 9 6 8 6 5 2
8 6 8 7 8 7 7 6 1
9 7 9 10 9 10 9 7 1
10 4 10 8 10 9 10 8 1
Note that cur_group_id() required dplyr v1.0.0 or later and that row_number() takes account of groups when a data frame is grouped.

Related

How to get the maximum of a rolling sequence in R

I have a large data base in R, containing all the closing prices of different stocks, and i want to get the maximum of each when compared to the previous rows of the stock, kinda like this:
Data max
1 1
2 2
1 2
3 3
5 5
6 6
3 6
2 6
1 6
4 6
5 6
7 7
3 7
I have tried using rollmax, but, since it requires a width, at some point it just stops working. Thanks in advance.
We could use cummax
df1$max <- cummax(df1$Data)
-output
> df1
Data max
1 1 1
2 2 2
3 1 2
4 3 3
5 5 5
6 6 6
7 3 6
8 2 6
9 1 6
10 4 6
11 5 6
12 7 7
13 3 7
data
df1 <- structure(list(Data = c(1L, 2L, 1L, 3L, 5L, 6L, 3L, 2L, 1L, 4L,
5L, 7L, 3L)), row.names = c(NA, -13L), class = "data.frame")

Create a new column with some inherent conditions

I would like to create a column called Weight, which would be as follows:
Weight = Mode + (constant * mean).
Mode is column of my database.
Constant is alternative/total alternatives, in this case will be 0,1, because 1/10.
Mean is the average of the values ​​corresponding to a specific alternative. For example, the mean for alternative 10 is (3 + 2 + 2 + 3)/4 = 2,5.
Making an example for alternative 10 then. The weight would be 2 + (0,1.2,5) = 2,25.
database<-structure(list(Alternatives = c(3, 4, 5, 6, 7, 8, 9, 10, 11,
12), Method1 = c(1L, 10L, 7L, 8L, 9L, 6L, 5L, 3L, 4L, 2L), Method2 = c(1L,
8L, 6L, 7L, 10L, 9L, 4L, 2L, 3L, 5L), Method3 = c(1L, 10L, 7L,
8L, 9L, 6L, 4L, 2L, 3L, 5L), Method4 = c(1L, 9L, 6L, 7L, 10L,
8L, 5L, 3L, 4L, 2L), Mode = c(1, 10, 6, 7, 9, 6, 4, 2, 3, 2)), class = "data.frame", row.names = c(NA,-10L))
Alternatives Method1 Method2 Method3 Method4 Mode
1 3 1 1 1 1 1
2 4 10 8 10 9 10
3 5 7 6 7 6 6
4 6 8 7 8 7 7
5 7 9 10 9 10 9
6 8 6 9 6 8 6
7 9 5 4 4 5 4
8 10 3 2 2 3 2
9 11 4 3 3 4 3
10 12 2 5 5 2 2
If your constant equals 0.1 , then try
database |> mutate(Weight = Mode +
.1 * ((Method1 + Method2 + Method3 + Method4) / 4))
Output
Alternatives Method1 Method2 Method3 Method4 Mode Weight
1 3 1 1 1 1 1 1.100
2 4 10 8 10 9 10 10.925
3 5 7 6 7 6 6 6.650
4 6 8 7 8 7 7 7.750
5 7 9 10 9 10 9 9.950
6 8 6 9 6 8 6 6.725
7 9 5 4 4 5 4 4.450
8 10 3 2 2 3 2 2.250
9 11 4 3 3 4 3 3.350
10 12 2 5 5 2 2 2.350
if it is the probability then change it to
database |> mutate(Weight = Mode +
1/ nrow(database) * ((Method1 + Method2 + Method3 + Method4) / 4)) |>
mutate(rank = rank(Weight))
This can also be accomplished using the base package.
database$Weight = .1 * (database$Method1 + database$Method2 + database$Method3 + database$Method4)/4 + database$Mode
database
# Alternatives Method1 Method2 Method3 Method4 Mode Weight
#1 3 1 1 1 1 1 1.100
#2 4 10 8 10 9 10 10.925
#3 5 7 6 7 6 6 6.650
#4 6 8 7 8 7 7 7.750
#5 7 9 10 9 10 9 9.950
#6 8 6 9 6 8 6 6.725
#7 9 5 4 4 5 4 4.450
#8 10 3 2 2 3 2 2.250
#9 11 4 3 3 4 3 3.350
#10 12 2 5 5 2 2 2.350
One can also use the rank function on the resulting "weight" column.
database$weight = rank(database$weight)
Additionally, the with function can be used for convenience:
database$weight = with(database, .1*(Method1 + Method2 + Method3 + Method4)/4 + Mode)

Insert conditions when I have the same values for the mode or when I don't have the mode value

The code below generates the mode value from the values obtained by Methods 1, 2, 3 and 4. But notice that in some cases I have correct mode values, for example, alternatives 3 and 4, but incorrect ones, such as in alternative 5, as it has two values of 7 and two values of 6, but the mode value is showing 6. Furthermore, in alternatives 11 and 12, it has no a mode value, because it has different values for both methods. So for these incorrect cases, that is, when I have 2 equal values for the same alternative and when I have no mode value, I would like to consider the value obtained by Method 1 to be the mode value. I inserted below the correct output.
Executable code below:
database<-structure(list(Alternatives = c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
Method1 = c(1L, 10L, 7L, 8L, 9L, 6L, 5L, 3L, 4L, 2L), Method2 = c(1L,
8L, 6L, 7L, 10L, 9L, 4L, 2L, 5L, 3L), Method3 = c(1L,
10L, 7L, 8L, 9L, 6L, 4L, 2L, 3L, 5L), Method4 = c(1L,
9L, 6L, 7L, 10L, 8L, 5L, 3L, 2L, 4L)), class = "data.frame", row.names = c(NA,
10L))
ModeFunc <- function(Vec) {
tmp <- sort(table(Vec),decreasing = TRUE)
Nms <- names(tmp)
if(max(tmp) > 1) {
as.numeric(Nms[1])
} else NA}
output <- database |> rowwise() |>
mutate(Mode = ModeFunc(c_across(Method1:Method4))) %>%
data.frame()
> output
Alternatives Method1 Method2 Method3 Method4 Mode
1 3 1 1 1 1 1
2 4 10 8 10 9 10
3 5 7 6 7 6 6
4 6 8 7 8 7 7
5 7 9 10 9 10 9
6 8 6 9 6 8 6
7 9 5 4 4 5 4
8 10 3 2 2 3 2
9 11 4 5 3 2 NA
10 12 2 3 5 4 NA
The correct output would then be:
Alternatives Method1 Method2 Method3 Method4 Mode
3 1 1 1 1 1
4 10 8 10 9 10
5 7 6 7 6 7
6 8 7 8 7 8
7 9 10 9 10 9
8 6 9 6 8 6
9 5 4 4 5 5
10 3 2 2 3 3
11 4 5 3 2 4
12 2 3 5 4 2
You could use some conventional mode() function,
mode <- function(x) {
ux <- unique(x)
tb <- tabulate(match(x, ux))
ux[tb == max(tb)]
}
and update values using ifelse in mapply.
mds <- apply(database[-1], 1, mode) |> setNames(database$Alternatives)
mapply(\(x, y) ifelse(length(x) > 1, y, x), mds, database$Method1)
# 3 4 5 6 7 8 9 10 11 12
# 1 10 7 8 9 6 5 3 4 2
So, altogether it could look like this:
database |>
cbind(Mode=mapply(\(x, y) ifelse(length(x) > 1, y, x),
apply(database[-1], 1, mode),
database$Method1))
# Alternatives Method1 Method2 Method3 Method4 Mode
# 1 3 1 1 1 1 1
# 2 4 10 8 10 9 10
# 3 5 7 6 7 6 7
# 4 6 8 7 8 7 8
# 5 7 9 10 9 10 9
# 6 8 6 9 6 8 6
# 7 9 5 4 4 5 5
# 8 10 3 2 2 3 3
# 9 11 4 5 3 2 4
# 10 12 2 3 5 4 2

select first and last occurences of a value in a given month

I have a dataset that records the changes in a group from a certain ID, in a given month.
In the example, in july, the ID 5 changed from group 2 to group 1, then from group 1 to 2, and so on.
I need to get only the first and the last changes made in this ID/month.
ID groupTO groupFROM MONTH
5 2 1 6
5 1 2 7
5 2 1 7
5 3 2 7
5 1 3 7
5 2 1 8
5 1 2 8
5 2 1 8
6 1 2 6
6 3 1 6
6 2 1 7
6 3 2 8
6 1 3 8
In this case, i need the results to be:
ID groupTO groupFROM MONTH
5 2 1 6
5 1 2 7
5 1 3 7
5 2 1 8
5 2 1 8
6 1 2 6
6 3 1 6
6 2 1 7
6 3 2 8
6 1 3 8
If i remove the duplicates (ID/MONTH), i can get the first occurence, but how do i get the last one?
Here's an easy way you can do with dplyr;
library(dplyr)
# Create data
dt <-
data.frame(Id = c(rep(5, 8), rep(6, 5)),
groupTO = c(2, 1, 2, 3, 1, 2, 1, 2, 1, 3, 2, 3, 1),
groupFROM = c(1, 2, 1, 2, 3, 1, 2, 1, 2, 1, 1, 2, 3),
MONTH = c(6, 7, 7, 7, 7, 8, 8, 8, 6, 6, 7, 8, 8))
dt %>%
# Group by ID and month
group_by(Id, MONTH) %>%
# Get first and last row
slice(c(1, n())) %>%
# To remove cases where first is same as last
distinct()
# # A tibble: 9 x 4
# # Groups: Id, MONTH [6]
# Id groupTO groupFROM MONTH
# <dbl> <dbl> <dbl> <dbl>
# 5 2 1 6
# 5 1 2 7
# 5 1 3 7
# 5 2 1 8
# 6 1 2 6
# 6 3 1 6
# 6 2 1 7
# 6 3 2 8
# 6 1 3 8
Using data.table
library(data.table)
unique(setDT(df1)[, .SD[c(1, .N)], .(ID, MONTH)])
# ID MONTH groupTO groupFROM
#1: 5 6 2 1
#2: 5 7 1 2
#3: 5 7 1 3
#4: 5 8 2 1
#5: 6 6 1 2
#6: 6 6 3 1
#7: 6 7 2 1
#8: 6 8 3 2
#9: 6 8 1 3
data
df1 <- structure(list(ID = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L, 6L), groupTO = c(2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L,
3L, 2L, 3L, 1L), groupFROM = c(1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 3L), MONTH = c(6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
6L, 6L, 7L, 8L, 8L)), class = "data.frame", row.names = c(NA,
-13L))
Here is a base R solution using split
dfout <- do.call(rbind,c(make.row.names = F,
lapply(split(df,df[c("Id","MONTH")],lex.order = T),
function(v) if (nrow(v)==1) v[1,] else v[c(1,nrow(v)),])))
such that
> dfout
Id groupTO groupFROM MONTH
1 5 2 1 6
2 5 1 2 7
3 5 1 3 7
4 5 2 1 8
5 5 2 1 8
6 6 1 2 6
7 6 3 1 6
8 6 2 1 7
9 6 3 2 8
10 6 1 3 8```
A base R way using ave where we select 1st and last row for each ID and MONTH and select the unique rows in the dataframe.
unique(subset(df, ave(groupTO == 1, ID, MONTH, FUN = function(x)
seq_along(x) %in% c(1, length(x)))))
# ID groupTO groupFROM MONTH
#1 5 2 1 6
#2 5 1 2 7
#5 5 1 3 7
#6 5 2 1 8
#9 6 1 2 6
#10 6 3 1 6
#11 6 2 1 7
#12 6 3 2 8
#13 6 1 3 8

in create a new variable with the max or min of another variable -- by group [duplicate]

This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 6 years ago.
R Community: I am trying to to create a new variable based on the value of existing variable, not on a row-wise basis but rather on a group-wise basis. I'm trying to create max.var and min.var below based on old.var without collapsing or aggregating the rows, that is, preserving all the id rows:
id old.var min.var max.var
1 1 1 3
1 2 1 3
1 3 1 3
2 5 5 11
2 7 5 11
2 9 5 11
2 11 5 11
3 3 3 4
3 4 3 4
structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), old.var =
c(1L,
2L, 3L, 5L, 7L, 9L, 11L, 3L, 4L), min.var = c(1L, 1L, 1L, 5L,
5L, 5L, 5L, 3L, 3L), max.var = c(3L, 3L, 3L, 11L, 11L, 11L, 11L,
4L, 4L)), .Names = c("id", "old.var", "min.var", "max.var"), class = "data.frame", row.names = c(NA,
-9L))
I've tried using the aggregate and by functions, but they of course summarize the data. I haven't had much luck trying an Excel-like MATCH/INDEX approach either. Thanks in advance for your assistance!
You can use dplyr,
df %>%
group_by(id) %>%
mutate(min.var = min(old.var), max.var = max(old.var))
#Source: local data frame [9 x 4]
#Groups: id [3]
# id old.var min.var max.var
# (int) (int) (int) (int)
#1 1 1 1 3
#2 1 2 1 3
#3 1 3 1 3
#4 2 5 5 11
#5 2 7 5 11
#6 2 9 5 11
#7 2 11 5 11
#8 3 3 3 4
#9 3 4 3 4
Using ave as docendo discimus pointed out in the question's comments:
df$min.var <- ave(df$old.var, df$id, FUN = min)
df$max.var <- ave(df$old.var, df$id, FUN = max)
Output:
id old.var min.var max.var
1 1 1 1 3
2 1 2 1 3
3 1 3 1 3
4 2 5 5 11
5 2 7 5 11
6 2 9 5 11
7 2 11 5 11
8 3 3 3 4
9 3 4 3 4
We can use data.table
library(data.table)
setDT(df1)[, c('min.var', 'max.var') := list(min(old.var), max(old.var)) , by = id]
df1
# id old.var min.var max.var
#1: 1 1 1 3
#2: 1 2 1 3
#3: 1 3 1 3
#4: 2 5 5 11
#5: 2 7 5 11
#6: 2 9 5 11
#7: 2 11 5 11
#8: 3 3 3 4
#9: 3 4 3 4

Resources