Create a new column with some inherent conditions - r

I would like to create a column called Weight, which would be as follows:
Weight = Mode + (constant * mean).
Mode is column of my database.
Constant is alternative/total alternatives, in this case will be 0,1, because 1/10.
Mean is the average of the values ​​corresponding to a specific alternative. For example, the mean for alternative 10 is (3 + 2 + 2 + 3)/4 = 2,5.
Making an example for alternative 10 then. The weight would be 2 + (0,1.2,5) = 2,25.
database<-structure(list(Alternatives = c(3, 4, 5, 6, 7, 8, 9, 10, 11,
12), Method1 = c(1L, 10L, 7L, 8L, 9L, 6L, 5L, 3L, 4L, 2L), Method2 = c(1L,
8L, 6L, 7L, 10L, 9L, 4L, 2L, 3L, 5L), Method3 = c(1L, 10L, 7L,
8L, 9L, 6L, 4L, 2L, 3L, 5L), Method4 = c(1L, 9L, 6L, 7L, 10L,
8L, 5L, 3L, 4L, 2L), Mode = c(1, 10, 6, 7, 9, 6, 4, 2, 3, 2)), class = "data.frame", row.names = c(NA,-10L))
Alternatives Method1 Method2 Method3 Method4 Mode
1 3 1 1 1 1 1
2 4 10 8 10 9 10
3 5 7 6 7 6 6
4 6 8 7 8 7 7
5 7 9 10 9 10 9
6 8 6 9 6 8 6
7 9 5 4 4 5 4
8 10 3 2 2 3 2
9 11 4 3 3 4 3
10 12 2 5 5 2 2

If your constant equals 0.1 , then try
database |> mutate(Weight = Mode +
.1 * ((Method1 + Method2 + Method3 + Method4) / 4))
Output
Alternatives Method1 Method2 Method3 Method4 Mode Weight
1 3 1 1 1 1 1 1.100
2 4 10 8 10 9 10 10.925
3 5 7 6 7 6 6 6.650
4 6 8 7 8 7 7 7.750
5 7 9 10 9 10 9 9.950
6 8 6 9 6 8 6 6.725
7 9 5 4 4 5 4 4.450
8 10 3 2 2 3 2 2.250
9 11 4 3 3 4 3 3.350
10 12 2 5 5 2 2 2.350
if it is the probability then change it to
database |> mutate(Weight = Mode +
1/ nrow(database) * ((Method1 + Method2 + Method3 + Method4) / 4)) |>
mutate(rank = rank(Weight))

This can also be accomplished using the base package.
database$Weight = .1 * (database$Method1 + database$Method2 + database$Method3 + database$Method4)/4 + database$Mode
database
# Alternatives Method1 Method2 Method3 Method4 Mode Weight
#1 3 1 1 1 1 1 1.100
#2 4 10 8 10 9 10 10.925
#3 5 7 6 7 6 6 6.650
#4 6 8 7 8 7 7 7.750
#5 7 9 10 9 10 9 9.950
#6 8 6 9 6 8 6 6.725
#7 9 5 4 4 5 4 4.450
#8 10 3 2 2 3 2 2.250
#9 11 4 3 3 4 3 3.350
#10 12 2 5 5 2 2 2.350
One can also use the rank function on the resulting "weight" column.
database$weight = rank(database$weight)
Additionally, the with function can be used for convenience:
database$weight = with(database, .1*(Method1 + Method2 + Method3 + Method4)/4 + Mode)

Related

How to adjust result to equal mode values in R

The code below is generating the mode value considering the columns Method1, Method2, Method3 and Method4. However, notice that for alternative 10 and 12 it has the same mode value, that is, it has a value of 2. However, I would like my Mode column to have different values, as if it were a rank. Therefore, the alternative that had Mode=1 is the best, but I have no way of knowing the second best alternative, because it has two numbers 2 in the Mode column. Do you have suggestions on what approach I can take?
database<-structure(list(Alternatives = c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
Method1 = c(1L, 10L, 7L, 8L, 9L, 6L, 5L, 3L, 4L, 2L), Method2 = c(1L,
8L, 6L, 7L, 10L, 9L, 4L, 2L, 3L, 5L), Method3 = c(1L,
10L, 7L, 8L, 9L, 6L, 4L, 2L, 3L, 5L), Method4 = c(1L,
9L, 6L, 7L, 10L, 8L, 5L, 3L, 4L, 2L)), class = "data.frame", row.names = c(NA,
10L))
ModeFunc <- function(Vec) {
tmp <- sort(table(Vec),decreasing = TRUE)
Nms <- names(tmp)
if(max(tmp) > 1) {
as.numeric(Nms[1])
} else NA}
output <- database |> rowwise() |>
mutate(Mode = ModeFunc(c_across(Method1:Method4))) %>%
data.frame()
> output
Alternatives Method1 Method2 Method3 Method4 Mode
1 3 1 1 1 1 1
2 4 10 8 10 9 10
3 5 7 6 7 6 6
4 6 8 7 8 7 7
5 7 9 10 9 10 9
6 8 6 9 6 8 6
7 9 5 4 4 5 4
8 10 3 2 2 3 2
9 11 4 3 3 4 3
10 12 2 5 5 2 2
CHECK
output$Rank <- (nrow(output) + 1) - rank(-output$Mode, ties.method = "last")
output|>
arrange(Mode)
Alternatives Method1 Method2 Method3 Method4 Mode Rank
1 3 1 1 1 1 1 1
2 10 3 2 2 3 2 2
3 12 2 5 5 2 2 3
4 11 4 3 3 4 3 4
5 9 5 4 4 5 4 5
6 5 7 6 7 6 6 6
7 8 6 9 6 8 6 7
8 6 8 7 8 7 7 8
9 7 9 10 9 10 9 9
10 4 10 8 10 9 10 10
OK. Based on OP's comment above, Here's a solution that picks the row with the lowest value of Alternatives in case of ties. You can generalise to any other tie break with an appropriate modification of the second mutate.
output |>
arrange(Mode) |> # Sort by mode
group_by(Mode) |> # Assign intial ranks
mutate(Rank=cur_group_id()) |>
arrange(Rank, Alternatives) |> # Sort and assign tie break
mutate(TieBreak=row_number()) |>
ungroup()
# A tibble: 10 × 8
Alternatives Method1 Method2 Method3 Method4 Mode Rank TieBreak
<dbl> <int> <int> <int> <int> <dbl> <int> <int>
1 3 1 1 1 1 1 1 1
2 10 3 2 2 3 2 2 1
3 12 2 5 5 2 2 2 2
4 11 4 3 3 4 3 3 1
5 9 5 4 4 5 4 4 1
6 5 7 6 7 6 6 5 1
7 8 6 9 6 8 6 5 2
8 6 8 7 8 7 7 6 1
9 7 9 10 9 10 9 7 1
10 4 10 8 10 9 10 8 1
Note that cur_group_id() required dplyr v1.0.0 or later and that row_number() takes account of groups when a data frame is grouped.

Insert conditions when I have the same values for the mode or when I don't have the mode value

The code below generates the mode value from the values obtained by Methods 1, 2, 3 and 4. But notice that in some cases I have correct mode values, for example, alternatives 3 and 4, but incorrect ones, such as in alternative 5, as it has two values of 7 and two values of 6, but the mode value is showing 6. Furthermore, in alternatives 11 and 12, it has no a mode value, because it has different values for both methods. So for these incorrect cases, that is, when I have 2 equal values for the same alternative and when I have no mode value, I would like to consider the value obtained by Method 1 to be the mode value. I inserted below the correct output.
Executable code below:
database<-structure(list(Alternatives = c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
Method1 = c(1L, 10L, 7L, 8L, 9L, 6L, 5L, 3L, 4L, 2L), Method2 = c(1L,
8L, 6L, 7L, 10L, 9L, 4L, 2L, 5L, 3L), Method3 = c(1L,
10L, 7L, 8L, 9L, 6L, 4L, 2L, 3L, 5L), Method4 = c(1L,
9L, 6L, 7L, 10L, 8L, 5L, 3L, 2L, 4L)), class = "data.frame", row.names = c(NA,
10L))
ModeFunc <- function(Vec) {
tmp <- sort(table(Vec),decreasing = TRUE)
Nms <- names(tmp)
if(max(tmp) > 1) {
as.numeric(Nms[1])
} else NA}
output <- database |> rowwise() |>
mutate(Mode = ModeFunc(c_across(Method1:Method4))) %>%
data.frame()
> output
Alternatives Method1 Method2 Method3 Method4 Mode
1 3 1 1 1 1 1
2 4 10 8 10 9 10
3 5 7 6 7 6 6
4 6 8 7 8 7 7
5 7 9 10 9 10 9
6 8 6 9 6 8 6
7 9 5 4 4 5 4
8 10 3 2 2 3 2
9 11 4 5 3 2 NA
10 12 2 3 5 4 NA
The correct output would then be:
Alternatives Method1 Method2 Method3 Method4 Mode
3 1 1 1 1 1
4 10 8 10 9 10
5 7 6 7 6 7
6 8 7 8 7 8
7 9 10 9 10 9
8 6 9 6 8 6
9 5 4 4 5 5
10 3 2 2 3 3
11 4 5 3 2 4
12 2 3 5 4 2
You could use some conventional mode() function,
mode <- function(x) {
ux <- unique(x)
tb <- tabulate(match(x, ux))
ux[tb == max(tb)]
}
and update values using ifelse in mapply.
mds <- apply(database[-1], 1, mode) |> setNames(database$Alternatives)
mapply(\(x, y) ifelse(length(x) > 1, y, x), mds, database$Method1)
# 3 4 5 6 7 8 9 10 11 12
# 1 10 7 8 9 6 5 3 4 2
So, altogether it could look like this:
database |>
cbind(Mode=mapply(\(x, y) ifelse(length(x) > 1, y, x),
apply(database[-1], 1, mode),
database$Method1))
# Alternatives Method1 Method2 Method3 Method4 Mode
# 1 3 1 1 1 1 1
# 2 4 10 8 10 9 10
# 3 5 7 6 7 6 7
# 4 6 8 7 8 7 8
# 5 7 9 10 9 10 9
# 6 8 6 9 6 8 6
# 7 9 5 4 4 5 5
# 8 10 3 2 2 3 3
# 9 11 4 5 3 2 4
# 10 12 2 3 5 4 2

select first and last occurences of a value in a given month

I have a dataset that records the changes in a group from a certain ID, in a given month.
In the example, in july, the ID 5 changed from group 2 to group 1, then from group 1 to 2, and so on.
I need to get only the first and the last changes made in this ID/month.
ID groupTO groupFROM MONTH
5 2 1 6
5 1 2 7
5 2 1 7
5 3 2 7
5 1 3 7
5 2 1 8
5 1 2 8
5 2 1 8
6 1 2 6
6 3 1 6
6 2 1 7
6 3 2 8
6 1 3 8
In this case, i need the results to be:
ID groupTO groupFROM MONTH
5 2 1 6
5 1 2 7
5 1 3 7
5 2 1 8
5 2 1 8
6 1 2 6
6 3 1 6
6 2 1 7
6 3 2 8
6 1 3 8
If i remove the duplicates (ID/MONTH), i can get the first occurence, but how do i get the last one?
Here's an easy way you can do with dplyr;
library(dplyr)
# Create data
dt <-
data.frame(Id = c(rep(5, 8), rep(6, 5)),
groupTO = c(2, 1, 2, 3, 1, 2, 1, 2, 1, 3, 2, 3, 1),
groupFROM = c(1, 2, 1, 2, 3, 1, 2, 1, 2, 1, 1, 2, 3),
MONTH = c(6, 7, 7, 7, 7, 8, 8, 8, 6, 6, 7, 8, 8))
dt %>%
# Group by ID and month
group_by(Id, MONTH) %>%
# Get first and last row
slice(c(1, n())) %>%
# To remove cases where first is same as last
distinct()
# # A tibble: 9 x 4
# # Groups: Id, MONTH [6]
# Id groupTO groupFROM MONTH
# <dbl> <dbl> <dbl> <dbl>
# 5 2 1 6
# 5 1 2 7
# 5 1 3 7
# 5 2 1 8
# 6 1 2 6
# 6 3 1 6
# 6 2 1 7
# 6 3 2 8
# 6 1 3 8
Using data.table
library(data.table)
unique(setDT(df1)[, .SD[c(1, .N)], .(ID, MONTH)])
# ID MONTH groupTO groupFROM
#1: 5 6 2 1
#2: 5 7 1 2
#3: 5 7 1 3
#4: 5 8 2 1
#5: 6 6 1 2
#6: 6 6 3 1
#7: 6 7 2 1
#8: 6 8 3 2
#9: 6 8 1 3
data
df1 <- structure(list(ID = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L, 6L), groupTO = c(2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L,
3L, 2L, 3L, 1L), groupFROM = c(1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 3L), MONTH = c(6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
6L, 6L, 7L, 8L, 8L)), class = "data.frame", row.names = c(NA,
-13L))
Here is a base R solution using split
dfout <- do.call(rbind,c(make.row.names = F,
lapply(split(df,df[c("Id","MONTH")],lex.order = T),
function(v) if (nrow(v)==1) v[1,] else v[c(1,nrow(v)),])))
such that
> dfout
Id groupTO groupFROM MONTH
1 5 2 1 6
2 5 1 2 7
3 5 1 3 7
4 5 2 1 8
5 5 2 1 8
6 6 1 2 6
7 6 3 1 6
8 6 2 1 7
9 6 3 2 8
10 6 1 3 8```
A base R way using ave where we select 1st and last row for each ID and MONTH and select the unique rows in the dataframe.
unique(subset(df, ave(groupTO == 1, ID, MONTH, FUN = function(x)
seq_along(x) %in% c(1, length(x)))))
# ID groupTO groupFROM MONTH
#1 5 2 1 6
#2 5 1 2 7
#5 5 1 3 7
#6 5 2 1 8
#9 6 1 2 6
#10 6 3 1 6
#11 6 2 1 7
#12 6 3 2 8
#13 6 1 3 8

Select i-th element if a condition occurs with for loop

I have a dataframe (df) like this:
Rif dd A A A A A B B B B B C C C C C
a1 10 5 8 10 2 6 9 6 5 7 9 1 5 6 4 5
b1 20 12 7 1 5 9 10 5 3 8 7 3 6 1 9 8
c1 100 11 6 8 1 14 1 11 9 3 6 10 8 13 8 4
d1 70 4 3 7 8 11 19 2 6 7 1 20 18 7 10 7
I have a vector
rif <- c(0, 15, 50, 90, 110)
I would like to add to the df a column such that if dd(i) >= rif(i-1) & dd(i)
Rif dd A A A A A B B B B B C C C C C V1
a1 10 5 8 10 2 6 9 6 5 7 9 1 5 6 4 5 8
b1 20 12 7 1 5 9 10 5 3 8 7 3 6 1 9 8 1
c1 100 1 6 8 1 14 1 11 9 3 6 10 8 13 8 4 14
d1 70 4 3 7 8 11 19 2 6 7 1 20 18 7 10 7 8
The same should be done for V2 and V3 with respect to Bs and Cs columns.
ref <- c(0, 15, 50, 90, 110)
for (i in 2:length(ref)) {
for (j in 1:nrow(df)) {
if (df$dd >= ref[i-1] && df$dd< ref[i]) {
df[,"V1"] <- df[j,i]
}
}
}
I get the following error:
Error in if (..) :
missing value where TRUE/FALSE needed
Probably the if command is not the correct one.
could you help me?
I think you just need to better specify the rows and columns:
df <- data.frame(
c("a1","b1","c1","d1")
, c(10,20,100,70), c(5,12,11,4), c(8,7,6,3), c(10,1,8,7), c(2,5,1,8), c(6,9,14,11)
, c(9,10,1,19), c(6,5,11,2), c(5,3,9,6), c(7,8,3,7), c(9,7,6,1)
, c(1,3,10,20), c(5,6,8,18), c(6,1,13,7), c(4,9,8,10), c(5,8,4,7)
)
colnames(df) <- c("Rif", "dd", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C")
ref <- c(0, 15, 50, 90, 110)
for (i in 2:length(ref)) {
for (j in 1:nrow(df)) {
if (df$dd[j] >= ref[i-1] && df$dd[j] < ref[i]) {
df$V1[j] <- df[j,i+2]
df$V2[j] <- df[j,i+2+5]
df$V3[j] <- df[j,i+2+10]
}
}
}
which gives:
Rif dd A A A A A B B B B B C C C C C V1 V2 V3
1 a1 10 5 8 10 2 6 9 6 5 7 9 1 5 6 4 5 8 6 5
2 b1 20 12 7 1 5 9 10 5 3 8 7 3 6 1 9 8 1 3 1
3 c1 100 11 6 8 1 14 1 11 9 3 6 10 8 13 8 4 14 6 4
4 d1 70 4 3 7 8 11 19 2 6 7 1 20 18 7 10 7 8 7 10
Another option in base R:
lters <- c(A="A", B="B", C="C")
firstcol <- lapply(lters, function(x) match(x, colnames(DF)))
idx <- findInterval(DF$dd, rif)
for (l in lters)
DF[, paste0("V_", l)] <- as.integer(DF[cbind(seq_len(nrow(DF)), idx + firstcol[[l]])])
DF
output:
Rif dd A A.1 A.2 A.3 A.4 B B.1 B.2 B.3 B.4 C C.1 C.2 C.3 C.4 V_A V_B V_C
1 a1 10 5 8 10 2 6 9 6 5 7 9 1 5 6 4 5 8 6 5
2 b1 20 12 7 1 5 9 10 5 3 8 7 3 6 1 9 8 1 3 1
3 c1 100 11 6 8 1 14 1 11 9 3 6 10 8 13 8 4 14 6 4
4 d1 70 4 3 7 8 11 19 2 6 7 1 20 18 7 10 7 8 7 10
data:
DF <- structure(list(Rif = c("a1", "b1", "c1", "d1"), dd = c(10L, 20L,
100L, 70L), A = c(5L, 12L, 11L, 4L), A = c(8L, 7L, 6L, 3L), A = c(10L,
1L, 8L, 7L), A = c(2L, 5L, 1L, 8L), A = c(6L, 9L, 14L, 11L),
B = c(9L, 10L, 1L, 19L), B = c(6L, 5L, 11L, 2L), B = c(5L,
3L, 9L, 6L), B = c(7L, 8L, 3L, 7L), B = c(9L, 7L, 6L, 1L),
C = c(1L, 3L, 10L, 20L), C = c(5L, 6L, 8L, 18L), C = c(6L,
1L, 13L, 7L), C = c(4L, 9L, 8L, 10L), C = c(5L, 8L, 4L, 7L
)), class = "data.frame", row.names = c(NA, -4L))
rif <- c(0, 15, 50, 90, 110)
Another way is reorganize the data by separating the lookup values into another table and perform an update join using data.table:
library(data.table)
setDT(DF)
out <- DF[, .(rn=.I, Rif, dd)]
#reorganizing data
lc <- grepl("A|B|C", names(DF))
lutbl <- data.table(COL=names(DF)[lc], transpose(DF[, ..lc]))
lutbl <- melt(lutbl, measure.vars=patterns("V"), variable.name="rn")[,
c("rn", "rif") := .(as.integer(gsub("V", "", rn)), rep(rif, sum(lc)*nrow(DF)/length(rif)))]
#lookup and update
for (l in lters)
out[, paste0("NEW", l) := lutbl[COL==l][out, on=c("rn", "rif"="dd"), roll=-Inf, value]]
out:
rn Rif dd NEWA NEWB NEWC
1: 1 a1 10 8 6 5
2: 2 b1 20 1 3 1
3: 3 c1 100 14 6 4
4: 4 d1 70 8 7 10

How to keep sequence of numbers until certain row number reached

I have been trying to assign numbers with sequence. I would like to further add to repeat the sequence until the certain row numbers reached. For example repeat the sequence for every 44th row.
Here is what I mean
test_table <- data.frame(col=rep(0:10,each=11), row=c(rev(0:10)))
and assigning cumulative numbers in this way
> library(dplyr)
test_table%>%
mutate(No=(row_number() - 1) %/% 11)
test_table
col row No
1 0 10 0
2 0 9 0
3 0 8 0
4 0 7 0
5 0 6 0
6 0 5 0
7 0 4 0
8 0 3 0
9 0 2 0
10 0 1 0
11 0 0 0
12 1 10 1
13 1 9 1
14 1 8 1
15 1 7 1
16 1 6 1
17 1 5 1
18 1 4 1
19 1 3 1
20 1 2 1
21 1 1 1
22 1 0 1
23 2 10 2
24 2 9 2
25 2 8 2
26 2 7 2
27 2 6 2
28 2 5 2
29 2 4 2
30 2 3 2
31 2 2 2
32 2 1 2
33 2 0 2
34 3 10 3
35 3 9 3
36 3 8 3
37 3 7 3
38 3 6 3
39 3 5 3
40 3 4 3
41 3 3 3
42 3 2 3
43 3 1 3
44 3 0 3
45 4 10 4
46 4 9 4
47 4 8 4
48 4 7 4
49 4 6 4
50 4 5 4
51 4 4 4
52 4 3 4
53 4 2 4
54 4 1 4
55 4 0 4
56 5 10 5
57 5 9 5
58 5 8 5
59 5 7 5
60 5 6 5
61 5 5 5
62 5 4 5
63 5 3 5
64 5 2 5
65 5 1 5
66 5 0 5
67 6 10 6
68 6 9 6
69 6 8 6
70 6 7 6
71 6 6 6
72 6 5 6
73 6 4 6
74 6 3 6
75 6 2 6
76 6 1 6
77 6 0 6
78 7 10 7
79 7 9 7
80 7 8 7
81 7 7 7
82 7 6 7
83 7 5 7
84 7 4 7
85 7 3 7
86 7 2 7
87 7 1 7
88 7 0 7
89 8 10 8
90 8 9 8
91 8 8 8
92 8 7 8
93 8 6 8
94 8 5 8
95 8 4 8
96 8 3 8
97 8 2 8
98 8 1 8
99 8 0 8
100 9 10 9
101 9 9 9
102 9 8 9
103 9 7 9
104 9 6 9
105 9 5 9
106 9 4 9
107 9 3 9
108 9 2 9
109 9 1 9
110 9 0 9
111 10 10 10
112 10 9 10
113 10 8 10
114 10 7 10
115 10 6 10
116 10 5 10
117 10 4 10
118 10 3 10
119 10 2 10
120 10 1 10
121 10 0 10
Ok. Good! But I would like to keep the sequence for example 0 and 1 until the 44th row reached. After that, start to the new sequence from 2 and go 88th row like this.
So the expected output will be
test_table
col row No
1 0 10 0
2 0 9 0
3 0 8 0
4 0 7 0
5 0 6 0
6 0 5 0
7 0 4 0
8 0 3 0
9 0 2 0
10 0 1 0
11 0 0 0
12 1 10 1
13 1 9 1
14 1 8 1
15 1 7 1
16 1 6 1
17 1 5 1
18 1 4 1
19 1 3 1
20 1 2 1
21 1 1 1
22 1 0 1
23 2 10 0
24 2 9 0
25 2 8 0
26 2 7 0
27 2 6 0
28 2 5 0
29 2 4 0
30 2 3 0
31 2 2 0
32 2 1 0
33 2 0 0
34 3 10 1
35 3 9 1
36 3 8 1
37 3 7 1
38 3 6 1
39 3 5 1
40 3 4 1
41 3 3 1
42 3 2 1
43 3 1 1
44 3 0 1
45 4 10 2
46 4 9 2
47 4 8 2
48 4 7 2
49 4 6 2
50 4 5 2
51 4 4 2
52 4 3 2
53 4 2 2
54 4 1 2
55 4 0 2
56 5 10 3
57 5 9 3
58 5 8 3
59 5 7 3
60 5 6 3
61 5 5 3
62 5 4 3
63 5 3 3
64 5 2 3
65 5 1 3
66 5 0 3
67 6 10 2
68 6 9 2
69 6 8 2
70 6 7 2
71 6 6 2
72 6 5 2
73 6 4 2
74 6 3 2
75 6 2 2
76 6 1 2
77 6 0 2
78 7 10 3
79 7 9 3
80 7 8 3
81 7 7 3
82 7 6 3
83 7 5 3
84 7 4 3
85 7 3 3
86 7 2 3
87 7 1 3
88 7 0 3
89 8 10 4
90 8 9 4
91 8 8 4
92 8 7 4
93 8 6 4
94 8 5 4
95 8 4 4
96 8 3 4
97 8 2 4
98 8 1 4
99 8 0 4
100 9 10 5
101 9 9 5
102 9 8 5
103 9 7 5
104 9 6 5
105 9 5 5
106 9 4 5
107 9 3 5
108 9 2 5
109 9 1 5
110 9 0 5
111 10 10 4
112 10 9 4
113 10 8 4
114 10 7 4
115 10 6 4
116 10 5 4
117 10 4 4
118 10 3 4
119 10 2 4
120 10 1 4
121 10 0 4
How can we do that ?
Thanks in advance!
This would do it in more general way
num.seq = 11L # total number of sequences in the first column
num.rows = N * num.seq # total number of rows
seq.length.3 = 44 # length of the pattern in the 3rd column
# number of paterns in the 3rd column
num.seq.3 = ( num.rows - 1 ) %/% seq.length.3 +1
# starting number in the sequence of the 3rd column
nseq=0
# vector for the 3rd column (could be done right in data frame def.)
No = (rep(rep( nseq:(nseq+1), each = N, times= 2), times=num.seq.3) +
rep(0:(num.seq.3 -1)*2, each= seq.length.3)) [1:num.rows]
test_table <- data.frame(col=rep(0:10,each=11),
row=c(rev(0:10)),
No=No)
An alternative way:
library(dplyr)
dt2 <- test_table%>%
mutate(No = (row_number() - 1) %/% 11)
dt2$No <- dt2$No %% 2 + (rep(0:num.seq.3, each =44, times=num.seq.3 )*2)
[1:num.rows]
The arithmetic, which is totally dependent on row numbers, seems right this way.
test_table%>%
mutate(No=((row_number() - 1) %/% 11) %% 2) %>% # alternating 11 rows of 0's and 1's
mutate(No = No + ((row_number() - 1) %/% 44) * 2) # add 2 every after 44 rows
Here is the result, as intended.
structure(list(col = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L), row = c(10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L,
10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L,
6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L,
2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L,
9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L,
5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L,
1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L,
8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L, 10L, 9L, 8L, 7L, 6L, 5L,
4L, 3L, 2L, 1L, 0L), No = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4)), class = "data.frame", .Names = c("col", "row",
"No"), row.names = c(NA, -121L))
This would deliver what I understand to be the requested vector (except I think your sequencing"skipped a beat"):
c( rep( c(1,2,1,2), each=11) , rep(c(3,4,3,4), each=11), rep(c(5,6,5,6), each=11) )
[1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3
[48] 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5
[95] 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6
A more general way:
c( sapply( seq(1, 6, by=2), function(start) {
rep( rep(start:(start+1) , 2), each=11) }))
The outer c() will remove the matrix character that sapply defaults to.

Resources