I have a pass traffic data which shows the pass traffic between Members, here's the sample dataset
It shows the Interactions between Members in consecutive rows. I want to count that interactions, and obtain a new dataset which shows how many interactions occured between Members for Each Member, the direction doesn't matters
For example:
between 26 and 11 = X
between 26 and 27 = Y
I just can't figure it out which function I can use and how can I write a code for this calculation. Thanks
You could use the rollaply function from the zoo package to find all interactions. The frequency of these interactions could be calculated using table. (I assume your object is called dat.)
library(zoo)
table(as.data.frame(rollapply(dat[[1]], 2, sort)))
The result:
V2
V1 4 8 10 11 13 17 19 25 26 27 53
4 2 13 17 1 2 5 6 3 1 9 4
8 0 2 14 11 10 4 5 0 13 13 11
10 0 0 3 9 7 2 4 2 8 11 8
11 0 0 0 1 6 5 4 4 5 4 25
13 0 0 0 0 0 1 3 5 7 9 8
17 0 0 0 0 0 0 1 1 1 5 5
19 0 0 0 0 0 0 1 1 1 5 4
25 0 0 0 0 0 0 0 0 5 8 5
26 0 0 0 0 0 0 0 0 1 5 3
27 0 0 0 0 0 0 0 0 0 0 1
53 0 0 0 0 0 0 0 0 0 0 1
The lower triangular part of the matrix contains zeros only since the direction does not matter.
If you are not interested in interactions between the same values, use the following command:
table(as.data.frame(rollapply(rle(dat[[1]])$values, 2, sort)))
V2
V1 8 10 11 13 17 19 25 26 27 53
4 13 17 1 2 5 6 3 1 9 4
8 0 14 11 10 4 5 0 13 13 11
10 0 0 9 7 2 4 2 8 11 8
11 0 0 0 6 5 4 4 5 4 25
13 0 0 0 0 1 3 5 7 9 8
17 0 0 0 0 0 1 1 1 5 5
19 0 0 0 0 0 0 1 1 5 4
25 0 0 0 0 0 0 0 5 8 5
26 0 0 0 0 0 0 0 0 5 3
27 0 0 0 0 0 0 0 0 0 1
Related
I have a matrix that looks like the following. For rows 1:23, I would like to calculate the weighted mean, where the data in rows 1:23 are the weights and row 24 is the data.
1 107 33 41 22 12 4 122 44 297 123 51 16 7 9 1 1 0
10 5 2 2 1 0 3 4 6 12 3 3 0 1 1 0 0 0
11 1 3 1 0 0 0 4 2 8 3 4 0 0 0 0 0 0
12 2 1 1 0 0 0 2 1 5 6 3 1 0 0 0 0 0
13 1 0 1 0 0 0 3 1 3 5 2 2 0 1 0 0 0
14 3 0 0 0 0 0 3 1 2 3 0 1 0 0 0 0 0
15 0 0 0 0 0 0 2 0 0 1 0 1 0 0 0 0 0
16 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0
17 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
18 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
2 80 27 37 5 6 4 97 48 242 125 44 27 7 8 8 0 2
20 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
22 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3 47 12 33 12 6 1 63 42 200 96 45 19 6 6 9 2 0
4 45 14 21 9 4 2 54 26 130 71 36 17 8 5 1 0 2
5 42 10 14 6 3 2 45 19 89 45 26 7 4 8 2 1 0
6 17 3 12 5 2 0 18 21 51 41 19 15 5 1 1 0 0
7 16 2 6 0 0 1 14 9 37 23 17 7 3 0 3 0 0
8 9 4 4 2 1 0 7 9 30 15 8 3 3 1 1 0 1
9 12 2 3 1 1 1 6 5 14 12 5 1 2 0 0 1 0
24 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
As an example using the top two rows, there would have an additional column at the end indicated the weighted mean.
1 107 33 41 22 12 4 122 44 297 123 51 16 7 9 1 1 0 6.391011
10 5 2 2 1 0 3 4 6 12 3 3 0 1 1 0 0 0 6.232558
I'm a little new to coding so I wasn't too sure how to do it - any advice would be appreciated!
You can do:
apply(df[-nrow(df), ], 1, function(row) weighted.mean(df[nrow(df), ], row))
I'm assuming your first columns is some kind of index and not used for the weighted mean (and the data is stored in matr_dat):
apply(matr_dat[-nrow(matr_dat), -1], 1,
function(row) weighted.mean(matr_dat[nrow(matr_dat), -1], row))
Using apply and setting the margin to 1, the function defined in the third argument of apply to each row of the data; to calculate the weighted mean, you can use weighted.mean and set the weights to the values of the row.
I am trying to use a package where the table they've used is in a certain format, I am very new to R and don't know how to get my data in this same format to be able to use the package.
Their table looks like this:
Recipient
Actor 1 10 11 12 2 3 4 5 6 7 8 9
1 0 0 0 1 3 1 1 2 3 0 2 6
10 1 0 0 1 0 0 0 0 0 0 0 0
11 13 5 0 5 3 8 0 1 3 2 2 9
12 0 0 2 0 1 1 1 3 1 1 3 0
2 0 0 2 0 0 1 0 0 0 2 2 1
3 9 9 0 5 16 0 2 8 21 45 13 6
4 21 28 64 22 40 79 0 16 53 76 43 38
5 2 0 0 0 0 0 1 0 3 0 0 1
6 11 22 4 21 13 9 2 3 0 4 39 8
7 5 32 11 9 16 1 0 4 33 0 17 22
8 4 0 2 0 1 11 0 0 0 1 0 1
9 0 0 3 1 0 0 1 0 0 0 0 0
Where mine at the moment is:
X0 X1 X2 X3 X4 X5
0 0 2 3 3 0 0
1 1 0 4 2 0 0
2 0 0 0 0 0 0
3 0 2 2 0 1 0
4 0 0 3 2 0 2
5 0 0 3 3 1 0
I would like to add the recipient and actor to mine, as well as change to row and column names to 1, ..., 6.
Also my data is listed under Data in my Workspace and it says:
'num' [1:6,1:6] 0 1 ...
Whereas the example data in the workspace is shown in Values as:
'table' num [1:12,1:12] 0 1 13 ...
Please let me know if you have suggestion to get my data in the same type and style as theirs, all help is greatly appreciated!
OK, so you have a matrix like so:
m <- matrix(c(1:9), 3)
rownames(m) <- 0:2
colnames(m) <- paste0("X", 0:2)
# X0 X1 X2
#0 1 4 7
#1 2 5 8
#2 3 6 9
First you need to remove the Xs and turn it into a table:
colnames(m) <- sub("X", "", colnames(m))
m <- as.table(m)
# 0 1 2
#0 1 4 7
#1 2 5 8
#2 3 6 9
Then you can set the dimension names:
names(dimnames(m)) <- c("Actor", "Recipient")
# Recipient
#Actor 0 1 2
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
However, usually you would create the contingency table from raw data using the table function, which would automatically return a table object. So, maybe you should fix the step creating your matrix?
I'm not sure how to phrase this question. I have a some data which im trying to get into a different format (maybe even an array) so that i can vectorize it. This isn't very concrete, so here's a simplified example:
I have a file like dt, say:
set.seed(1)
time = 1:10
size <- round(runif(10), digits = 1)
count <- round(runif(10)*20)
dt <- data.frame(time,size, count)
dt
time size count
1 1 0.3 4
2 2 0.4 4
3 3 0.6 14
4 4 0.9 8
5 5 0.2 15
6 6 0.9 10
7 7 0.9 14
8 8 0.7 20
9 9 0.6 8
10 10 0.1 16
and i want to end up with...
time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
1 1 0 0 4 0 0 0 0
2 2 0 0 0 4 0 0 0
3 3 0 0 0 0 14 0 0
4 4 0 0 0 0 0 0 8
5 5 0 15 0 0 0 0 0
6 6 0 0 0 0 0 0 10
7 7 0 0 0 0 0 0 14
8 8 0 0 0 0 0 20 0
9 9 0 0 0 0 8 0 0
10 10 16 0 0 0 0 0 0
which has introduced all the possible results for the size variable as new variables.
Then do a cumulative sum on to get something like this, but really that previous step is the trickiest:
time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
1 1 0 0 4 0 0 0 0
2 2 0 0 4 4 0 0 0
3 3 0 0 4 4 14 0 0
4 4 0 0 4 4 14 0 8
5 5 0 15 4 4 14 0 8
6 6 0 15 4 4 14 0 18
7 7 0 15 4 4 14 0 32
8 8 0 15 4 4 14 20 32
9 9 0 15 4 4 22 20 32
10 10 16 15 4 4 22 20 32
We can use dcast to create the 'size' columns, and then loop over the 'size' columns (lapply(...) and do the cumsum.
library(reshape2)
dt1 <- dcast(dt, time~paste0('size_', size), value.var='count', fill=0)
dt1[-1] <- lapply(dt1[-1], cumsum)
dt1
# time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
#1 1 0 0 4 0 0 0 0
#2 2 0 0 4 4 0 0 0
#3 3 0 0 4 4 14 0 0
#4 4 0 0 4 4 14 0 8
#5 5 0 15 4 4 14 0 8
#6 6 0 15 4 4 14 0 18
#7 7 0 15 4 4 14 0 32
#8 8 0 15 4 4 14 20 32
#9 9 0 15 4 4 22 20 32
#10 10 16 15 4 4 22 20 32
I've got the following code to create a classification table in R:
> table(class = class1, truth = valid[,1])
1 2 3 4 5 6 7 8 9 10 11 12
1 357 73 0 0 47 0 5 32 20 0 4 7
2 25 71 0 0 23 4 1 0 2 1 8 3
3 1 2 120 1 5 0 1 0 0 0 0 0
4 0 0 0 77 0 0 0 0 1 0 0 0
5 15 27 0 0 67 6 7 0 4 1 5 7
6 1 2 0 0 2 44 0 0 0 7 7 0
7 1 1 0 0 10 0 66 0 1 0 1 7
9 1 0 0 0 3 0 0 2 8 0 0 2
10 1 1 0 0 1 6 0 0 0 17 0 0
11 0 7 0 0 3 1 0 0 0 4 10 2
12 0 1 0 0 1 0 0 0 0 0 0 1
However, I need this table to be a square (line 8 is missing in this example), i.e. the number of rows should equal the number of columns, and I need the rownames and colnames to be preserved. The missing line should be filled with zeros. Any way of doing this?
The problem most probably comes from a difference in levels.
Try copying the levels from valid to class1:
class1 <- factor(class1, levels=levels(valid[,1])
table(class = class1, truth = valid[,1])
I have a data.frame with a factor identifying events
year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
And I would need a counter-type identifying a given window around the events. The result should look like this (for a window that is, for example, 3 periods around the event):
year event window
1 0
2 0
3 0
4 0
5 0
6 0 -3
7 0 -2
8 0 -1
9 1 0
10 0 1
11 0 2
12 0 3
13 0
14 0 -3
15 0 -2
16 0 -1
17 1 0
18 0 1
19 0 2
20 0 3
Any guidance on how to implement this within a function would be appreciated. You can copy the data. frame, pasting the block above in "..." here:
dt <- read.table( text="...", , header = TRUE )
Assuming there is no overlapping, you can use on of my favourite base functions, filter:
DF <- read.table(text="year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0", header=TRUE)
DF$window <- head(filter(c(rep(0, 3), DF$event, rep(0, 3)),
filter=-3:3)[-(1:3)], -3)
DF$window[DF$window == 0 & DF$event==0] <- NA
# year event window
# 1 1 0 NA
# 2 2 0 NA
# 3 3 0 NA
# 4 4 0 NA
# 5 5 0 NA
# 6 6 0 -3
# 7 7 0 -2
# 8 8 0 -1
# 9 9 1 0
# 10 10 0 1
# 11 11 0 2
# 12 12 0 3
# 13 13 0 NA
# 14 14 0 -3
# 15 15 0 -2
# 16 16 0 -1
# 17 17 1 0
# 18 18 0 1
# 19 19 0 2
# 20 20 0 3