Related
I have a matrix like below that is a hyper graph matrix, I transformed it to the object network , but I dunno how can I transform this matrix in a hypergraph of an object of class network, can you help me? any idea?
mat<-as.matrix(data)
g<- as.network.matrix(mat)
g
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14
EVELYN 1 1 1 1 1 1 0 1 1 0 0 0 0 0
LAURA 1 1 1 0 1 1 1 1 0 0 0 0 0 0
THERESA 0 1 1 1 1 1 1 1 1 0 0 0 0 0
BRENDA 1 0 1 1 1 1 1 1 0 0 0 0 0 0
CHARLOTTE 0 0 1 1 1 0 1 0 0 0 0 0 0 0
FRANCES 0 0 1 0 1 1 0 1 0 0 0 0 0 0
ELEANOR 0 0 0 0 1 1 1 1 0 0 0 0 0 0
PEARL 0 0 0 0 0 1 0 1 1 0 0 0 0 0
RUTH 0 0 0 0 1 0 1 1 1 0 0 0 0 0
VERNE 0 0 0 0 0 0 1 1 1 0 0 1 0 0
MYRA 0 0 0 0 0 0 0 1 1 1 0 1 0 0
KATHERINE 0 0 0 0 0 0 0 1 1 1 0 1 1 1
SYLVIA 0 0 0 0 0 0 1 1 1 1 0 1 1 1
NORA 0 0 0 0 0 1 1 0 1 1 1 1 1 1
HELEN 0 0 0 0 0 0 1 1 0 1 1 1 0 0
DOROTHY 0 0 0 0 0 0 0 1 1 0 0 0 0 0
OLIVIA 0 0 0 0 0 0 0 0 1 0 1 0 0 0
FLORA 0 0 0 0 0 0 0 0 1 0 1 0 0 0
I guess mat is a incidence matrix, and I am not sure if you are looking for something like below if you are using network package
as.matrix(network(t(mat)),matrix.type = "incidence")
Besides, the incidence matrix visualization via igraph can be achieved from the following:
g <- igraph::graph_from_incidence_matrix(mat)
then
plot(g)
gives
DATA
mat <- structure(c(1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L), .Dim = c(18L,
14L), .Dimnames = list(c("EVELYN", "LAURA", "THERESA", "BRENDA",
"CHARLOTTE", "FRANCES", "ELEANOR", "PEARL", "RUTH", "VERNE",
"MYRA", "KATHERINE", "SYLVIA", "NORA", "HELEN", "DOROTHY", "OLIVIA",
"FLORA"), c("E1", "E2", "E3", "E4", "E5", "E6", "E7", "E8", "E9",
"E10", "E11", "E12", "E13", "E14")))
I have a set of variables in the dataset -- I want to simply calculate the running total (and the running mean) for all these variables, based on all prior years.
To illustrate. This is how my data looks like, including the total run variable that I want to generate.
country year X1 X2 X3 X4 X5 running_total
Bahamas 1990 0 0 0 0 1 NA
Bahamas 1991 0 0 1 1 0 1
Bahamas 1992 1 1 0 0 1 3
Bahamas 1993 0 0 0 0 0 6
Bahamas 1994 1 1 0 1 1 6
Bahamas 1995 0 0 1 0 0 10
Bahamas 1996 0 1 0 1 0 11
Bahamas 1997 1 0 1 0 1 13
Bahamas 1998 0 1 0 1 0 16
Bahamas 1999 1 0 1 0 1 18
Bahamas 2000 0 1 0 1 0 21
Bahamas 2001 1 0 1 0 1 23
Bahamas 2002 0 1 0 1 0 26
Bahamas 2003 1 0 0 0 1 28
Bahamas 2004 0 0 0 1 0 30
Bahamas 2005 1 1 0 0 0 31
Bahamas 2006 0 0 1 1 1 33
Bahamas 2007 1 0 0 0 0 36
Bahamas 2008 0 0 1 1 1 37
Bahamas 2009 1 1 0 0 0 40
Bahamas 2010 0 0 1 1 1 42
Bahamas 2011 1 1 0 0 0 45
Bolivia 1990 0 0 0 0 0 NA
Bolivia 1991 0 0 1 1 0 0
Bolivia 1992 0 0 0 0 0 2
Bolivia 1993 0 0 1 0 0 2
Bolivia 1994 0 0 0 0 0 3
Bolivia 1995 0 0 0 0 0 3
Bolivia 1996 0 0 0 0 0 3
Bolivia 1997 0 0 0 0 0 3
Bolivia 1998 0 0 0 0 0 3
Bolivia 1999 0 0 0 0 0 3
Bolivia 2000 0 1 0 1 0 3
Bolivia 2001 0 0 0 0 0 5
Bolivia 2002 0 0 0 0 0 5
Bolivia 2003 0 0 0 0 0 5
Bolivia 2004 0 0 0 0 0 5
Bolivia 2005 0 0 0 0 0 5
Bolivia 2006 0 0 0 0 0 5
Bolivia 2007 0 0 0 0 0 5
Bolivia 2008 0 0 0 0 1 5
Bolivia 2009 0 0 0 0 0 6
Bolivia 2010 0 0 0 0 1 6
Bolivia 2011 0 0 0 0 0 7
Starting year 1990 ==NA. For example, running total for 1991 is based on 1990. Running total for 1992 is based on 1990-1991. running total for 1993 is based on 1990-1992- running total for 1994 is based on 1990-1993. And so on...until 2011. Then it starts the same procedur for new country B.
I tried the following code below but it doesn't work the way I want. Surely, I need to specify it better, but how?
DF$csum <- ave(DF$X1, DF$X2,DF$X3,DF$X4,DF$X5,FUN=cumsum)
In addition, I would like to generate running mean based on the same logic.
Any help here would be much appreciated!
structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Bahamas", "Bolivia"), class = "factor"),
year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L,
1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L,
1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L,
2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L), X1 = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), X2 = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), X3 = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), X4 = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), X5 = c(1L, 0L, 1L, 0L, 1L, 0L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 1L, 0L), running_total = c(NA, 1L, 3L,
6L, 6L, 10L, 11L, 13L, 16L, 18L, 21L, 23L, 26L, 28L, 30L,
31L, 33L, 36L, 37L, 40L, 42L, 45L, NA, 0L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L,
7L)), .Names = c("country", "year", "X1", "X2", "X3", "X4", "X5", "running_total"), class = "data.frame", row.names = c(NA,
-44L))
library(data.table)
setDT(df)
df[, xt := X1+X2+X3+X4+X5]
df[, rt2 := shift(cumsum(xt)), by = country]
Actually it can be solved with an one-liner:
df[, rt3 := {xt=X1+X2+X3+X4+X5; shift(cumsum(xt))}, by = country]
# Or as Ryan points out:
df[, rt2 := shift(cumsum(Reduce(`+`, .SD))) , by = country , .SDcols = grep('^X.*', names(df), value = T)]
All resulting in:
country year X1 X2 X3 X4 X5 running_total xt rt2
1: Bahamas 1990 0 0 0 0 1 NA 1 NA
2: Bahamas 1991 0 0 1 1 0 1 2 1
3: Bahamas 1992 1 1 0 0 1 3 3 3
4: Bahamas 1993 0 0 0 0 0 6 0 6
5: Bahamas 1994 1 1 0 1 1 6 4 6
6: Bahamas 1995 0 0 1 0 0 10 1 10
7: Bahamas 1996 0 1 0 1 0 11 2 11
8: Bahamas 1997 1 0 1 0 1 13 3 13
9: Bahamas 1998 0 1 0 1 0 16 2 16
10: Bahamas 1999 1 0 1 0 1 18 3 18
11: Bahamas 2000 0 1 0 1 0 21 2 21
12: Bahamas 2001 1 0 1 0 1 23 3 23
13: Bahamas 2002 0 1 0 1 0 26 2 26
14: Bahamas 2003 1 0 0 0 1 28 2 28
15: Bahamas 2004 0 0 0 1 0 30 1 30
16: Bahamas 2005 1 1 0 0 0 31 2 31
17: Bahamas 2006 0 0 1 1 1 33 3 33
18: Bahamas 2007 1 0 0 0 0 36 1 36
19: Bahamas 2008 0 0 1 1 1 37 3 37
20: Bahamas 2009 1 1 0 0 0 40 2 40
21: Bahamas 2010 0 0 1 1 1 42 3 42
22: Bahamas 2011 1 1 0 0 0 45 2 45
23: Bolivia 1990 0 0 0 0 0 NA 0 NA
24: Bolivia 1991 0 0 1 1 0 0 2 0
25: Bolivia 1992 0 0 0 0 0 2 0 2
26: Bolivia 1993 0 0 1 0 0 2 1 2
27: Bolivia 1994 0 0 0 0 0 3 0 3
28: Bolivia 1995 0 0 0 0 0 3 0 3
29: Bolivia 1996 0 0 0 0 0 3 0 3
30: Bolivia 1997 0 0 0 0 0 3 0 3
31: Bolivia 1998 0 0 0 0 0 3 0 3
32: Bolivia 1999 0 0 0 0 0 3 0 3
33: Bolivia 2000 0 1 0 1 0 3 2 3
34: Bolivia 2001 0 0 0 0 0 5 0 5
35: Bolivia 2002 0 0 0 0 0 5 0 5
36: Bolivia 2003 0 0 0 0 0 5 0 5
37: Bolivia 2004 0 0 0 0 0 5 0 5
38: Bolivia 2005 0 0 0 0 0 5 0 5
39: Bolivia 2006 0 0 0 0 0 5 0 5
40: Bolivia 2007 0 0 0 0 0 5 0 5
41: Bolivia 2008 0 0 0 0 1 5 1 5
42: Bolivia 2009 0 0 0 0 0 6 0 6
43: Bolivia 2010 0 0 0 0 1 6 1 6
44: Bolivia 2011 0 0 0 0 0 7 0 7
country year X1 X2 X3 X4 X5 running_total xt rt2
df = structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bahamas", "Bolivia"), class = "factor"), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L), X1 = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2 = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X4 = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X5 = c(1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), running_total = c(NA, 1L, 3L, 6L, 6L, 10L, 11L, 13L, 16L, 18L, 21L, 23L, 26L, 28L, 30L, 31L, 33L, 36L, 37L, 40L, 42L, 45L, NA, 0L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 7L)), .Names = c("country", "year", "X1", "X2", "X3", "X4", "X5", "running_total"), class = "data.frame", row.names = c(NA, -44L))
df <- df %>% mutate(sums = X1 + X2 + X3 +X4 + X5) %>%
group_by(country) %>% mutate(sum_shift = shift(sums),
sum_shift = ifelse(is.na(sum_shift), 0, sum_shift),
running_total = cumsum(sum_shift))
head(df)
country year X1 X2 X3 X4 X5 running_total sums sum_shift
1: Bahamas 1990 0 0 0 0 1 0 1 0
2: Bahamas 1991 0 0 1 1 0 1 2 1
3: Bahamas 1992 1 1 0 0 1 3 3 2
4: Bahamas 1993 0 0 0 0 0 6 0 3
5: Bahamas 1994 1 1 0 1 1 6 4 0
6: Bahamas 1995 0 0 1 0 0 10 1 4
This is the dplyr solution but it is basically the same as the data table solution. We create a column where we sum across the rows. Then we group by the country and and sum across and create a cumulative sum. We have to set the nas to 0 for the cumulative sums to work.
A solution using dplyr and purrr. We can split the data frame by country, create the running_total column, and then combine the data frames. Notice that this solution does not need to specify individual column names, such as X1 and X2. dat2 is the final output.
library(dplyr)
library(purrr)
dat2 <- dat %>%
split(.$country) %>%
map_dfr(~mutate(.x,
running_total =
as.integer(lag(cumsum(rowSums(select(.x, starts_with("X"))))))))
To calculate the running mean, we can follow the same logic by adding the command to the mutate function. Notice that the cummean function is from the dplyr package.
dat2 <- dat %>%
split(.$country) %>%
map_dfr(~mutate(.x,
running_total =
as.integer(lag(cumsum(rowSums(select(.x, starts_with("X")))))),
running_mean =
lag(cummean(rowSums(select(.x, starts_with("X")))))))
DATA
dat <- structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bahamas", "Bolivia"), class = "factor"), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L), X1 = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X2 = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X4 = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X5 = c(1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), running_total = c(NA, 1L, 3L, 6L, 6L, 10L, 11L, 13L, 16L, 18L, 21L, 23L, 26L, 28L, 30L, 31L, 33L, 36L, 37L, 40L, 42L, 45L, NA, 0L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 7L)), .Names = c("country", "year", "X1", "X2", "X3", "X4", "X5", "running_total"), class = "data.frame", row.names = c(NA, -44L))
dat$running_total <- NULL
I have a dataset with over a million rows and 67 columns. I create a new column that records scores according to my code below.
I am stuck at a condition I need to take care of in:
DF$change[DF[,63] == "M"] <- DF$change[DF[,63] == "M"] - pos2
When DF$change[DF[,63] == "M"] - pos2 = 0 I need to see if this happend on the bidside or the askside. The way I can determine that is by seeing if the position of DF[,64] is even(askside) or odd(bidside). There's a caveat, I can't use the value from pos2 , that I have already calculated, to do this because of my function called mywhich(I can provide code if needed) used in the code.
So to determine even/odd I have to recalculate the position of DF[,64]. Once I know even or odd, DF$change should be either -1 or 1 depending on whether DF[,66] > DF[,64] or <.
Now, I've tried subsetting but I don't see how that can work because I have recalculate the positions. I tried not using mywhich for this part but I cant seem to get my head around it to make it work.
Any pointers/suggestions? What else I should I try? Should I write a separate function that handles this? Write another version of a which function? I am a little lost
This is what I have so far:
> DF$change <- apply(DF[, 1:62] == DF[,64], 1, mywhich)
> DF$change[DF[,63] == "C"] <- apply(DF[which(DF[,63] == "C") - 1, 1:62] == DF[DF[,63] == "C",64], 1, mywhich)*(-1)
> pos2 <- apply(DF[which(DF[,63] == "M") - 1, 1:62] == DF[DF[,63] == "M",66], 1, mywhich)
> DF$change[DF[,63] == "M"] <- DF$change[DF[,63] == "M"] - pos2
This is the output:
> head(DF, 20)
DateTime Seq BP1 BQ1 BO1 AP1 AQ1 AO1 BP2 BQ2 BO2 AP2 AQ2 AO2 BP3 BQ3 BO3 AP3 AQ3 AO3 BP4 BQ4 BO4 AP4 AQ4 AO4 BP5 BQ5 BO5 AP5 AQ5 AO5 BP6 BQ6 BO6 AP6 AQ6 AO6 BP7 BQ7 BO7 AP7 AQ7 AO7 BP8 BQ8 BO8 AP8 AQ8 AO8 BP9 BQ9 BO9 AP9
1 2015-11-30 09:15:00.368 92 80830 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2015-11-30 09:15:00.368 108 80830 1 1 83435 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 2015-11-30 09:15:00.375 406 81100 1 1 83435 1 1 80830 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 2015-11-30 09:15:00.375 479 81100 1 1 82165 1 1 80830 1 1 83435 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 2015-11-30 09:15:00.377 643 81100 1 1 82165 1 1 80830 1 1 83200 1 1 0 0 0 83435 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 2015-11-30 09:15:00.378 722 81100 1 1 82165 1 1 80830 1 1 82650 1 1 0 0 0 83200 1 1 0 0 0 83435 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 2015-11-30 09:15:00.380 811 81100 1 1 82165 1 1 80830 1 1 82650 1 1 0 0 0 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 2015-11-30 09:15:00.380 822 81100 1 1 82165 1 1 80835 1 1 82650 1 1 80830 1 1 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 2015-11-30 09:15:00.380 828 81100 1 1 82345 1 1 80835 1 1 82650 1 1 80830 1 1 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 2015-11-30 09:15:00.383 1046 81100 1 1 82345 1 1 80835 1 1 82650 1 1 80830 1 1 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 2015-11-30 09:15:00.384 1103 81100 1 1 82165 1 1 80835 1 1 82650 1 1 80830 1 1 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 2015-11-30 09:15:00.384 1171 81100 1 1 82345 1 1 80835 1 1 82650 1 1 80830 1 1 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 2015-11-30 09:15:00.384 1186 81100 1 1 82345 1 1 80835 1 1 82650 1 1 80830 1 1 82900 1 1 0 0 0 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
14 2015-11-30 09:15:00.384 1196 81100 1 1 82165 1 1 80835 1 1 82650 1 1 80830 1 1 82900 1 1 0 0 0 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
15 2015-11-30 09:15:00.385 1238 81100 1 1 82340 1 1 80835 1 1 82650 1 1 80830 1 1 82900 1 1 0 0 0 83200 1 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
16 2015-11-30 09:15:00.385 1249 81100 1 1 82340 1 1 80835 1 1 82650 1 1 80830 1 1 82900 1 1 0 0 0 83200 2 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
17 2015-11-30 09:15:00.385 1254 81200 1 1 82340 1 1 81100 1 1 82650 1 1 80835 1 1 82900 1 1 80830 1 1 83200 2 1 0 0 0 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
18 2015-11-30 09:15:00.387 1273 81200 1 1 82340 1 1 81100 1 1 82650 1 1 80835 1 1 82900 1 1 80830 1 1 83200 2 1 80035 1 1 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
19 2015-11-30 09:15:00.388 1333 81200 1 1 82165 1 1 81100 1 1 82650 1 1 80835 1 1 82900 1 1 80830 1 1 83200 2 1 80035 1 1 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
20 2015-11-30 09:15:00.388 1343 81200 1 1 82340 1 1 81100 1 1 82650 1 1 80835 1 1 82900 1 1 80830 1 1 83200 2 1 80035 1 1 83430 1 1 0 0 0 83435 1 1 0 0 0 83500 1 1 0 0 0 0 0 0 0 0 0 0
AQ9 AO9 BP10 BQ10 BO10 AP10 AQ10 AO10 C Price Qty OldPrice OldQty change
1 0 0 0 0 0 0 0 0 N 80830 1 NA NA 5
2 0 0 0 0 0 0 0 0 N 83435 1 NA NA 5
3 0 0 0 0 0 0 0 0 N 81100 1 NA NA 5
4 0 0 0 0 0 0 0 0 N 82165 1 NA NA 5
5 0 0 0 0 0 0 0 0 N 83200 1 NA NA 4
6 0 0 0 0 0 0 0 0 N 82650 1 NA NA 4
7 0 0 0 0 0 0 0 0 N 83430 1 NA NA 2
8 0 0 0 0 0 0 0 0 N 80835 1 NA NA 4
9 0 0 0 0 0 0 0 0 M 82345 1 82165 1 0
10 0 0 0 0 0 0 0 0 N 83500 1 NA NA 0
11 0 0 0 0 0 0 0 0 M 82165 1 82345 1 0
12 0 0 0 0 0 0 0 0 M 82345 1 82165 1 0
13 0 0 0 0 0 0 0 0 N 82900 1 NA NA 3
14 0 0 0 0 0 0 0 0 M 82165 1 82345 1 0
15 0 0 0 0 0 0 0 0 M 82340 1 82165 1 0
16 0 0 0 0 0 0 0 0 N 83200 1 NA NA 2
17 0 0 0 0 0 0 0 0 N 81200 1 NA NA 5
18 0 0 0 0 0 0 0 0 N 80035 1 NA NA 1
19 0 0 0 0 0 0 0 0 M 82165 1 82340 1 0
20 0 0 0 0 0 0 0 0 M 82340 1 82165 1 0
> dput(DF[1:20,])
structure(list(DateTime = structure(c(1448855100.369, 1448855100.369,
1448855100.375, 1448855100.376, 1448855100.378, 1448855100.379,
1448855100.38, 1448855100.38, 1448855100.38, 1448855100.383,
1448855100.384, 1448855100.385, 1448855100.385, 1448855100.385,
1448855100.386, 1448855100.386, 1448855100.386, 1448855100.387,
1448855100.389, 1448855100.389), class = c("POSIXct", "POSIXt"
), tzone = ""), Seq = c(92L, 108L, 406L, 479L, 643L, 722L, 811L,
822L, 828L, 1046L, 1103L, 1171L, 1186L, 1196L, 1238L, 1249L,
1254L, 1273L, 1333L, 1343L), BP1 = c(80830L, 80830L, 81100L,
81100L, 81100L, 81100L, 81100L, 81100L, 81100L, 81100L, 81100L,
81100L, 81100L, 81100L, 81100L, 81100L, 81200L, 81200L, 81200L,
81200L), BQ1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), BO1 = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), AP1 = c(0L, 83435L, 83435L, 82165L, 82165L, 82165L, 82165L,
82165L, 82345L, 82345L, 82165L, 82345L, 82345L, 82165L, 82340L,
82340L, 82340L, 82340L, 82165L, 82340L), AQ1 = c(0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), AO1 = c(0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), BP2 = c(0L, 0L, 80830L, 80830L,
80830L, 80830L, 80830L, 80835L, 80835L, 80835L, 80835L, 80835L,
80835L, 80835L, 80835L, 80835L, 81100L, 81100L, 81100L, 81100L
), BQ2 = c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), BO2 = c(0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), AP2 = c(0L,
0L, 0L, 83435L, 83200L, 82650L, 82650L, 82650L, 82650L, 82650L,
82650L, 82650L, 82650L, 82650L, 82650L, 82650L, 82650L, 82650L,
82650L, 82650L), AQ2 = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), AO2 = c(0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), BP3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 80830L, 80830L,
80830L, 80830L, 80830L, 80830L, 80830L, 80830L, 80830L, 80835L,
80835L, 80835L, 80835L), BQ3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), BO3 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), AP3 = c(0L, 0L, 0L, 0L, 83435L, 83200L, 83200L,
83200L, 83200L, 83200L, 83200L, 83200L, 82900L, 82900L, 82900L,
82900L, 82900L, 82900L, 82900L, 82900L), AQ3 = c(0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), AO3 = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), BP4 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 80830L, 80830L, 80830L,
80830L), BQ4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L), BO4 = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L
), AP4 = c(0L, 0L, 0L, 0L, 0L, 83435L, 83430L, 83430L, 83430L,
83430L, 83430L, 83430L, 83200L, 83200L, 83200L, 83200L, 83200L,
83200L, 83200L, 83200L), AQ4 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), AO4 = c(0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), BP5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 80035L, 80035L, 80035L), BQ5 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L), BO5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), AP5 = c(0L, 0L, 0L,
0L, 0L, 0L, 83435L, 83435L, 83435L, 83435L, 83435L, 83435L, 83430L,
83430L, 83430L, 83430L, 83430L, 83430L, 83430L, 83430L), AQ5 = c(0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), AO5 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), BP6 = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), BQ6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BO6 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
AP6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 83500L, 83500L,
83500L, 83435L, 83435L, 83435L, 83435L, 83435L, 83435L, 83435L,
83435L), AQ6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), AO6 = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), BP7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BQ7 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BO7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AP7 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 83500L, 83500L,
83500L, 83500L, 83500L, 83500L, 83500L, 83500L), AQ7 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), AO7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), BP8 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BQ8 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BO8 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), AP8 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AQ8 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), AO8 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BP9 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BQ9 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BO9 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), AP9 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AQ9 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), AO9 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BP10 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BQ10 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BO10 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), AP10 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AQ10 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), AO10 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), C = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 4L, 3L, 3L, 4L,
4L, 4L, 3L, 3L), .Label = c("", "C", "M", "N"), class = "factor"),
Price = c(80830L, 83435L, 81100L, 82165L, 83200L, 82650L,
83430L, 80835L, 82345L, 83500L, 82165L, 82345L, 82900L, 82165L,
82340L, 83200L, 81200L, 80035L, 82165L, 82340L), Qty = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), OldPrice = c(NA, NA, NA, NA, NA, NA, NA,
NA, 82165L, NA, 82345L, 82165L, NA, 82345L, 82165L, NA, NA,
NA, 82340L, 82165L), OldQty = c(NA, NA, NA, NA, NA, NA, NA,
NA, 1L, NA, 1L, 1L, NA, 1L, 1L, NA, NA, NA, 1L, 1L), change = c(5,
5, 5, 5, 4, 4, 2, 4, 0, 0, 0, 0, 3, 0, 0, 2, 5, 1, 0, 0)), .Names = c("DateTime",
"Seq", "BP1", "BQ1", "BO1", "AP1", "AQ1", "AO1", "BP2", "BQ2",
"BO2", "AP2", "AQ2", "AO2", "BP3", "BQ3", "BO3", "AP3", "AQ3",
"AO3", "BP4", "BQ4", "BO4", "AP4", "AQ4", "AO4", "BP5", "BQ5",
"BO5", "AP5", "AQ5", "AO5", "BP6", "BQ6", "BO6", "AP6", "AQ6",
"AO6", "BP7", "BQ7", "BO7", "AP7", "AQ7", "AO7", "BP8", "BQ8",
"BO8", "AP8", "AQ8", "AO8", "BP9", "BQ9", "BO9", "AP9", "AQ9",
"AO9", "BP10", "BQ10", "BO10", "AP10", "AQ10", "AO10", "C", "Price",
"Qty", "OldPrice", "OldQty", "change"), row.names = c(NA, 20L
), class = "data.frame")
Scroll down to find dput(DF)
I finally came to the bottom of this, I figured I could create as many new column I wanted in order to calculate what I had to so that's what I did. I think this is the best way to do it but am open to suggestions if there are other ways or even faster.
Here is my code:
DF<- read.csv(file = file,header = FALSE,sep = "", col.names = c("DateTime","Seq","BP1","BQ1","BO1","AP1","AQ1","AO1","BP2","BQ2","BO2","AP2","AQ2","AO2","BP3","BQ3","BO3","AP3","AQ3","AO3","BP4","BQ4","BO4","AP4","AQ4","AO4","BP5","BQ5","BO5","AP5","AQ5","AO5","BP6","BQ6","BO6","AP6","AQ6","AO6","BP7","BQ7","BO7","AP7","AQ7","AO7","BP8","BQ8","BO8","AP8","AQ8","AO8","BP9","BQ9","BO9","AP9","AQ9","AO9","BP10","BQ10","BO10","AP10","AQ10","AO10","C","Price","Qty","OldPrice","OldQty"))
DF<- DF[which(DF$DateTime != 0),]
options(digits.secs = 3)
DF$DateTime= as.POSIXct(DF$DateTime/(10^9), origin="1970-01-01", tz = "GMT") #timestamp conversion
source('~/R/mywhich.R')
source('~/nwhich.R')
#matching with same line for all
DF$change <- apply(DF[, 1:62] == DF[,64], 1, mywhich)
#matching "C" with previous line
DF$change[DF[,63] == "C"] <- apply(DF[which(DF[,63] == "C") - 1, 1:62] == DF[DF[,63] == "C",64], 1, mywhich)*(-1)
#matching old price with previous line in "M"
pos2 <- apply(DF[which(DF[,63] == "M") - 1, 1:62] == DF[DF[,63] == "M",66], 1, mywhich)
#subracting the two position in "M"
DF$change[DF[,63] == "M"] <- DF$change[DF[,63] == "M"] - pos2
# arbitrary number to create side
DF$side <- 1000
DF$side[DF[,63] == "M" & DF[,68] == 0] <- apply(DF[which(DF[,63] == "M"), 1:62] == DF[DF[,63] == "M",64], 1, nwhich)%%2 #check this -- erroneous for modifications that have happend outside level 5 -- might have to add another column for this
#DF$side[DF[,63] == "M" & DF[,68] != 0] <- DF$change[DF[,63] == "M" & DF[,68] != 0]
#DF$side[DF[,63] == "N"] <- DF$change[DF[,63] == "N"]
#DF$side[DF[,63] == "C"] <- DF$change[DF[,63] == "C"]
DF$diff <- 0
#price difference
DF$diff[DF[,63] == "M" & DF[,68] == 0] <- DF$OldPrice[DF[,63] == "M" & DF[,68] == 0] - DF$Price[DF[,63] == "M" & DF[,68] == 0]
#askside -- price increase
DF$modify[DF[,69] == 0 & DF[,70] > 0] <- -1
#askside -- price decrease
DF$modify[DF[,69] == 0 & DF[,70] < 0] <- 1
#bidside -- price decrease
DF$modify[DF[,69] == 1 & DF[,70] < 0] <- -1
#bidside -- price increase
DF$modify[DF[,69] == 1 & DF[,70] > 0] <- 1
#copying change to modify
DF$modify[DF[,63] == "N"] <- DF$change[DF[,63] == "N"]
DF$modify[DF[,63] == "C"] <- DF$change[DF[,63] == "C"]
DF$modify[DF[,63] == "M" & DF[,68] != 0] <- DF$change[DF[,63] == "M" & DF[,68] != 0]
df = data.frame(Time=DF$DateTime, Modify = DF$modify)
finalxts <- as.xts(x = df$Modify, order.by = df$Time)
#finalxts <- aggregatets(finalxts, FUN = "sum", on = "minutes", k = 1, dropna = TRUE)
finalxts
Thank you for the help everyone.
This is a question about the fourthcorner algorithm in R. It's designed to measure the relationship between three different tables: an n x m table (table R) of m environmental variables (columns) at n sites (rows), an n x p table (table L) of p abundances (columns) at n sites (rows), and a p x s table (table Q) of s traits (columns) for p species (rows).
The fourthcorner function is in the package ade4.
All three of my dataframes are binary (0s and 1s denoting the presence or absence of a variable, a species at a site, or a trait, respectively). I've tried using "yes" and "no" instead of 0s and 1s without success.
Here are some example matrices in the format I'm using:
tabQ
Trait1 Trait2 Trait3 Trait4
Sp1 0 1 0 0
Sp2 0 1 0 0
Sp3 1 0 1 0
Sp4 1 0 1 0
Sp5 0 1 0 0
Sp6 0 1 0 0
Sp7 0 0 0 1
Sp8 0 0 0 1
tabR
EnV1 EnV2 EnV3 EnV4
Site1 1 1 1 1
Site2 1 1 0 1
Site3 0 1 0 1
Site4 1 1 1 1
Site5 1 1 0 1
Site6 0 1 0 0
Site7 0 1 0 1
Site8 0 1 0 1
Site9 1 1 1 1
Site10 1 1 0 1
Site11 1 1 1 1
Site12 0 1 0 0
Site13 1 1 0 1
Site14 1 1 0 1
Site15 0 1 0 1
Site16 1 1 0 1
Site17 0 1 0 1
Site18 1 1 1 1
Site19 1 1 0 1
Site20 1 1 0 1
tabL
Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7 Sp8
Site1 1 1 0 0 0 0 0 0
Site2 1 1 0 0 0 0 0 0
Site3 1 1 0 0 0 0 0 0
Site4 1 0 0 0 0 0 0 1
Site5 1 1 0 0 0 0 0 0
Site6 1 0 0 0 1 0 0 0
Site7 1 0 0 0 0 0 0 0
Site8 0 0 0 0 1 0 0 0
Site9 1 0 0 0 0 0 0 0
Site10 1 1 0 0 0 0 0 0
Site11 0 0 1 1 0 0 0 0
Site12 0 0 0 0 0 1 0 0
Site13 1 0 0 0 0 0 0 0
Site14 0 0 0 0 1 0 0 0
Site15 1 1 0 0 0 0 0 0
Site16 1 1 0 0 0 0 0 0
Site17 1 0 0 0 0 0 0 0
Site18 0 0 1 0 0 0 0 0
Site19 1 0 0 0 0 0 0 0
Site20 1 1 0 0 0 0 1 0
I read these dataframes into R from text files, and I specify that the first column is row names.
This is the error I get when I try to use the fourthcorner function on my matrices:
fourth1=fourthcorner(tabR,tabL,tabQ,nrepet=1)
Error in apply(sim, 2, function(x) length(na.omit(x))) :
dim(X) must have a positive length
I don't understand where the problem lies, is it a formatting issue? If so, should I reformat one of the matrices? Which one is causing the trouble? Or can I not use binary traits and environmental variables for this function? In other words, can I solve this problem by changing a piece of code, or is it impossible to use this function for this question?
As an additional tidbit of information, I did email the author of the function, but unfortunately I did not understand his response fully, possibly because my R skills still leave much to be desired. Here is his response if it is helpful:
Q could contain quantitative or qualitative traits. In R, qualitative traits should be coded as factors to obtain adapted statistics (i.e. chi2 or eta2). If you code qualitative variables as dummy variables, they would be considered as quantitative.
Thank you very much to any and all insight.
I noted that your example fails only nrepet is equal to one, so if you can use any other positive number you should be fine.
However, if you do need nrepet=1, you should contact with the author of ade4 and ask to him/her to fix the fourthcorner function code. I traced back the error and found that fourthcorner calls as.krandtest with sim = res$tabD[-1,] where res$tabD is a matrix with nrepet+1 rows. When nrepet=1 and you remove one row from a two-row matrix, R automatically converts the resulting one-row matrix into a vector, but as.krandtest function expects sim to be a matrix and thus raises the error.
Here is your input data just in case somebody else would like to answer your question:
tabR
structure(list(EnV1 = c(1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L), EnV2 = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), EnV3 = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), EnV4 = c(1L, 1L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("EnV1",
"EnV2", "EnV3", "EnV4"), row.names = c("Site1", "Site2", "Site3",
"Site4", "Site5", "Site6", "Site7", "Site8", "Site9", "Site10",
"Site11", "Site12", "Site13", "Site14", "Site15", "Site16", "Site17",
"Site18", "Site19", "Site20"), class = "data.frame")
tabL
structure(list(Sp1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L), Sp2 = c(1L, 1L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
1L), Sp3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), Sp4 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
Sp5 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L), Sp6 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), Sp7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), Sp8 = c(0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L)), .Names = c("Sp1", "Sp2", "Sp3", "Sp4", "Sp5", "Sp6",
"Sp7", "Sp8"), row.names = c("Site1", "Site2", "Site3", "Site4",
"Site5", "Site6", "Site7", "Site8", "Site9", "Site10", "Site11",
"Site12", "Site13", "Site14", "Site15", "Site16", "Site17", "Site18",
"Site19", "Site20"), class = "data.frame")
tabQ
structure(list(Trait1 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), Trait2 = c(1L,
1L, 0L, 0L, 1L, 1L, 0L, 0L), Trait3 = c(0L, 0L, 1L, 1L, 0L, 0L,
0L, 0L), Trait4 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L)), .Names = c("Trait1",
"Trait2", "Trait3", "Trait4"), row.names = c("Sp1", "Sp2", "Sp3",
"Sp4", "Sp5", "Sp6", "Sp7", "Sp8"), class = "data.frame")
I have a table that looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
586 0 0 0 1 0 0 0 1 3 1 0 1 0 0 0 0 0 1 0 2 0 3 0 0 0 4 0 1 2 0
637 0 0 0 0 0 0 2 3 2 2 0 4 0 0 0 0 1 0 0 2 0 1 1 1 0 0 0 0 0 1
989 0 0 1 0 0 0 2 1 0 0 0 2 1 0 0 1 2 1 0 3 0 2 0 1 1 0 1 0 1 0
1081 0 0 0 1 0 0 1 0 1 1 0 0 2 0 0 0 0 0 0 3 0 5 0 0 2 1 0 1 1 1
2922 0 1 1 1 0 0 0 2 1 0 0 0 2 0 0 0 1 1 0 1 0 3 1 1 2 0 0 1 0 1
3032 0 1 0 0 0 0 0 3 0 0 1 0 2 1 0 1 0 1 1 0 0 3 1 1 1 1 0 0 1 1
Numbers 1 to 30 in the first row are my labels, and the columns are my items. I would like to find, for each item, the label with the most counts. E.g. 586 has 4 counts of 26, which is the highest number in that row, so for 586, I would like to assign 26.
I am able to get the maximum value for each row with max(table1[1,])), which gets me the maximum value for first row, but doesn't get me the label it corresponds to, but I don't know how to proceed. All help is appreciated!
dput:
structure(c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 1L, 0L, 0L, 1L, 3L, 1L,
0L, 2L, 3L, 3L, 2L, 0L, 1L, 1L, 0L, 1L, 2L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 4L, 2L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L,
2L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 1L, 2L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 1L, 2L, 2L, 3L, 3L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 3L, 1L, 2L, 5L, 3L, 3L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L,
0L, 1L, 1L, 0L, 0L, 1L, 2L, 2L, 1L, 4L, 0L, 0L, 1L, 0L, 1L, 0L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 2L, 0L, 1L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 1L, 1L), .Dim = c(6L, 30L), .Dimnames = structure(list(
c("586", "637", "989", "1081", "2922", "3032"), c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "29", "30")), .Names = c("",
"")))
max.col will give you vector of column numbers which correspond to maximum value for each row.
> max.col(df, tie='first')
[1] 26 12 20 22 22 8
You can use that vector to get column names for each row.
> colnames(df)[max.col(df, tie='first')]
[1] "26" "12" "20" "22" "22" "8"
Perhaps you are looking for which.max. Assuming your matrix is called "temp":
> apply(temp, 1, which.max)
586 637 989 1081 2922 3032
26 12 20 22 22 8
apply with MARGIN = 1 (the second argument) will apply a function by row.