How to get a square table? - r

I've got the following code to create a classification table in R:
> table(class = class1, truth = valid[,1])
1 2 3 4 5 6 7 8 9 10 11 12
1 357 73 0 0 47 0 5 32 20 0 4 7
2 25 71 0 0 23 4 1 0 2 1 8 3
3 1 2 120 1 5 0 1 0 0 0 0 0
4 0 0 0 77 0 0 0 0 1 0 0 0
5 15 27 0 0 67 6 7 0 4 1 5 7
6 1 2 0 0 2 44 0 0 0 7 7 0
7 1 1 0 0 10 0 66 0 1 0 1 7
9 1 0 0 0 3 0 0 2 8 0 0 2
10 1 1 0 0 1 6 0 0 0 17 0 0
11 0 7 0 0 3 1 0 0 0 4 10 2
12 0 1 0 0 1 0 0 0 0 0 0 1
However, I need this table to be a square (line 8 is missing in this example), i.e. the number of rows should equal the number of columns, and I need the rownames and colnames to be preserved. The missing line should be filled with zeros. Any way of doing this?

The problem most probably comes from a difference in levels.
Try copying the levels from valid to class1:
class1 <- factor(class1, levels=levels(valid[,1])
table(class = class1, truth = valid[,1])

Related

Calculate weighted mean from matrix in R

I have a matrix that looks like the following. For rows 1:23, I would like to calculate the weighted mean, where the data in rows 1:23 are the weights and row 24 is the data.
1 107 33 41 22 12 4 122 44 297 123 51 16 7 9 1 1 0
10 5 2 2 1 0 3 4 6 12 3 3 0 1 1 0 0 0
11 1 3 1 0 0 0 4 2 8 3 4 0 0 0 0 0 0
12 2 1 1 0 0 0 2 1 5 6 3 1 0 0 0 0 0
13 1 0 1 0 0 0 3 1 3 5 2 2 0 1 0 0 0
14 3 0 0 0 0 0 3 1 2 3 0 1 0 0 0 0 0
15 0 0 0 0 0 0 2 0 0 1 0 1 0 0 0 0 0
16 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0
17 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
18 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
2 80 27 37 5 6 4 97 48 242 125 44 27 7 8 8 0 2
20 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
22 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3 47 12 33 12 6 1 63 42 200 96 45 19 6 6 9 2 0
4 45 14 21 9 4 2 54 26 130 71 36 17 8 5 1 0 2
5 42 10 14 6 3 2 45 19 89 45 26 7 4 8 2 1 0
6 17 3 12 5 2 0 18 21 51 41 19 15 5 1 1 0 0
7 16 2 6 0 0 1 14 9 37 23 17 7 3 0 3 0 0
8 9 4 4 2 1 0 7 9 30 15 8 3 3 1 1 0 1
9 12 2 3 1 1 1 6 5 14 12 5 1 2 0 0 1 0
24 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
As an example using the top two rows, there would have an additional column at the end indicated the weighted mean.
1 107 33 41 22 12 4 122 44 297 123 51 16 7 9 1 1 0 6.391011
10 5 2 2 1 0 3 4 6 12 3 3 0 1 1 0 0 0 6.232558
I'm a little new to coding so I wasn't too sure how to do it - any advice would be appreciated!
You can do:
apply(df[-nrow(df), ], 1, function(row) weighted.mean(df[nrow(df), ], row))
I'm assuming your first columns is some kind of index and not used for the weighted mean (and the data is stored in matr_dat):
apply(matr_dat[-nrow(matr_dat), -1], 1,
function(row) weighted.mean(matr_dat[nrow(matr_dat), -1], row))
Using apply and setting the margin to 1, the function defined in the third argument of apply to each row of the data; to calculate the weighted mean, you can use weighted.mean and set the weights to the values of the row.

How to add multiple columns in R with different condition for each column?

Here is my data set. I would like to add 5 new columns to mydata with 5 different conditions.
mydata=data.frame(sub=rep(c(1:4),c(3,4,5,5)),t=c(1:3,1:4,1:5,1:5),
y.val=c(10,20,13,
5,7,8,0,
45,17,25,12,10,
40,0,0,5,8))
mydata
sub t y.val
1 1 1 10
2 1 2 20
3 1 3 13
4 2 1 5
5 2 2 7
6 2 3 8
7 2 4 0
8 3 1 45
9 3 2 17
10 3 3 25
11 3 4 12
12 3 5 10
13 4 1 40
14 4 2 0
15 4 3 0
16 4 4 5
17 4 5 8
I would like to add the following 5 (max of 't' column) columns as
mydata$It1=ifelse(mydata$t==1 & mydata$y.val>0,1,0)
mydata$It2=ifelse(mydata$t==2 & mydata$y.val>0,1,0)
mydata$It3=ifelse(mydata$t==3 & mydata$y.val>0,1,0)
mydata$It4=ifelse(mydata$t==4 & mydata$y.val>0,1,0)
mydata$It5=ifelse(mydata$t==5 & mydata$y.val>0,1,0)
Here is the expected outcome.
> mydata
sub t y.val It1 It2 It3 It4 It5
1 1 1 10 1 0 0 0 0
2 1 2 20 0 1 0 0 0
3 1 3 13 0 0 1 0 0
4 2 1 5 1 0 0 0 0
5 2 2 7 0 1 0 0 0
6 2 3 8 0 0 1 0 0
7 2 4 0 0 0 0 0 0
8 3 1 45 1 0 0 0 0
9 3 2 17 0 1 0 0 0
10 3 3 25 0 0 1 0 0
11 3 4 12 0 0 0 1 0
12 3 5 10 0 0 0 0 1
13 4 1 40 1 0 0 0 0
14 4 2 0 0 0 0 0 0
15 4 3 0 0 0 0 0 0
16 4 4 5 0 0 0 1 0
17 4 5 8 0 0 0 0 1
I appreciate your help if it can be written as a function using for loop or any other technique.
You could use sapply/lapply
n <- seq_len(5)
mydata[paste0("It", n)] <- +(sapply(n, function(x) mydata$t==x & mydata$y.val>0))
mydata
# sub t y.val It1 It2 It3 It4 It5
#1 1 1 10 1 0 0 0 0
#2 1 2 20 0 1 0 0 0
#3 1 3 13 0 0 1 0 0
#4 2 1 5 1 0 0 0 0
#5 2 2 7 0 1 0 0 0
#6 2 3 8 0 0 1 0 0
#7 2 4 0 0 0 0 0 0
#8 3 1 45 1 0 0 0 0
#9 3 2 17 0 1 0 0 0
#10 3 3 25 0 0 1 0 0
#11 3 4 12 0 0 0 1 0
#12 3 5 10 0 0 0 0 1
#13 4 1 40 1 0 0 0 0
#14 4 2 0 0 0 0 0 0
#15 4 3 0 0 0 0 0 0
#16 4 4 5 0 0 0 1 0
#17 4 5 8 0 0 0 0 1
mydata$t==x & mydata$y.val>0 returns a logical value of TRUE/FALSE based on condition. The + changes those logical values to 1/0 respectively. (Try +c(FALSE, TRUE)). It avoids using ifelse i.e ifelse(condition, 1, 0).
Here's another approach based on multiplying a model matrix by the logical y.val > 0.
df <- cbind(mydata[1:3], model.matrix(~ factor(t) + 0, mydata)*(mydata$y.val>0))
Which gives:
sub t y.val factor.t.1 factor.t.2 factor.t.3 factor.t.4 factor.t.5
1 1 1 10 1 0 0 0 0
2 1 2 20 0 1 0 0 0
3 1 3 13 0 0 1 0 0
4 2 1 5 1 0 0 0 0
5 2 2 7 0 1 0 0 0
6 2 3 8 0 0 1 0 0
7 2 4 0 0 0 0 0 0
8 3 1 45 1 0 0 0 0
9 3 2 17 0 1 0 0 0
10 3 3 25 0 0 1 0 0
11 3 4 12 0 0 0 1 0
12 3 5 10 0 0 0 0 1
13 4 1 40 1 0 0 0 0
14 4 2 0 0 0 0 0 0
15 4 3 0 0 0 0 0 0
16 4 4 5 0 0 0 1 0
17 4 5 8 0 0 0 0 1
To clean up the names you can do:
names(df) <- sub("factor.t.", "It", names(df), fixed = TRUE)
You can use sapply to compare each t for equality against 1:5 and combine this with an & of y.val>0.
within(mydata, It <- +(sapply(1:5, `==`, t) & y.val>0))
# sub t y.val It.1 It.2 It.3 It.4 It.5
#1 1 1 10 1 0 0 0 0
#2 1 2 20 0 1 0 0 0
#3 1 3 13 0 0 1 0 0
#4 2 1 5 1 0 0 0 0
#5 2 2 7 0 1 0 0 0
#6 2 3 8 0 0 1 0 0
#7 2 4 0 0 0 0 0 0
#8 3 1 45 1 0 0 0 0
#9 3 2 17 0 1 0 0 0
#10 3 3 25 0 0 1 0 0
#11 3 4 12 0 0 0 1 0
#12 3 5 10 0 0 0 0 1
#13 4 1 40 1 0 0 0 0
#14 4 2 0 0 0 0 0 0
#15 4 3 0 0 0 0 0 0
#16 4 4 5 0 0 0 1 0
#17 4 5 8 0 0 0 0 1
Here's a tidyverse solution, using pivot_wider:
library(tidyverse)
mydata %>%
mutate(new_col = paste0("It", t),
y_test = as.integer(y.val > 0)) %>%
pivot_wider(id_cols = c(sub, t, y.val),
names_from = new_col,
values_from = y_test,
values_fill = list(y_test = 0))
sub t y.val It1 It2 It3 It4 It5
<int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 10 1 0 0 0 0
2 1 2 20 0 1 0 0 0
3 1 3 13 0 0 1 0 0
4 2 1 5 1 0 0 0 0
5 2 2 7 0 1 0 0 0
6 2 3 8 0 0 1 0 0
7 2 4 0 0 0 0 0 0
8 3 1 45 1 0 0 0 0
9 3 2 17 0 1 0 0 0
10 3 3 25 0 0 1 0 0
11 3 4 12 0 0 0 1 0
12 3 5 10 0 0 0 0 1
13 4 1 40 1 0 0 0 0
14 4 2 0 0 0 0 0 0
15 4 3 0 0 0 0 0 0
16 4 4 5 0 0 0 1 0
17 4 5 8 0 0 0 0 1
Explanation:
Make two columns, new_col (new column names with "It") and y_test (y.val > 0).
Pivot new_col values into column names.
Fill in the NA values with zeros.
One purrr and dplyr option could be:
map_dfc(.x = 1:5,
~ mydata %>%
mutate(!!paste0("It", .x) := as.integer(t == .x & y.val > 0)) %>%
select(starts_with("It"))) %>%
bind_cols(mydata)
It1 It2 It3 It4 It5 sub t y.val
1 1 0 0 0 0 1 1 10
2 0 1 0 0 0 1 2 20
3 0 0 1 0 0 1 3 13
4 1 0 0 0 0 2 1 5
5 0 1 0 0 0 2 2 7
6 0 0 1 0 0 2 3 8
7 0 0 0 0 0 2 4 0
8 1 0 0 0 0 3 1 45
9 0 1 0 0 0 3 2 17
10 0 0 1 0 0 3 3 25
11 0 0 0 1 0 3 4 12
12 0 0 0 0 1 3 5 10
13 1 0 0 0 0 4 1 40
14 0 0 0 0 0 4 2 0
15 0 0 0 0 0 4 3 0
16 0 0 0 1 0 4 4 5
17 0 0 0 0 1 4 5 8
Or if you want to perform it dynamically according the range in t column:
map_dfc(.x = reduce(as.list(range(mydata$t)), `:`),
~ mydata %>%
mutate(!!paste0("It", .x) := as.integer(t == .x & y.val > 0)) %>%
select(starts_with("It"))) %>%
bind_cols(mydata)

Recoding range of numerics into single numeric in R

I am trying to recode a data frame with four columns. Across all of the columns, I want to recode all the numeric values into these ordinal numeric values:
0 stays as is
1:3 <- 1
4:10 <- 2
11:22 <- 3
22:max <-4
This is the data frame:
> df
T4.1 T4.2 T4.3 T4.4
1 0 54 0 5
2 0 5 0 0
3 0 3 0 0
4 0 2 0 0
5 0 3 0 0
6 0 2 0 0
7 0 4 0 0
8 1 20 0 0
9 1 7 0 2
10 0 14 0 0
11 0 3 0 0
12 0 202 0 41
13 2 12 0 0
14 3 6 0 0
15 3 21 0 3
16 0 143 0 0
17 0 0 0 0
18 4 9 0 0
19 3 15 0 0
20 0 58 0 6
21 2 0 0 0
22 0 52 0 0
23 0 3 0 0
24 0 1 0 0
25 4 6 0 1
26 1 4 0 0
27 0 38 0 1
28 0 6 0 0
29 0 8 0 0
30 0 29 0 4
31 1 14 0 0
32 0 12 0 10
33 4 1 0 3
I'm trying to use the recode function, but I can't seem to figure out how to input a range of numeric values into it. I get the following errors with my attempts:
> recode(df, 11:22=3)
Error: unexpected '=' in "recode(df, 11:22="
> recode(df, c(11:22)=3)
Error: unexpected '=' in "recode(df, c(11:22)="
I would greatly appreciate any advice. Thanks for your time!
Edit: Thanks all for the help!!
You can use cut with range of values as:
df_res <- as.data.frame(sapply(df, function(x)cut(x,
breaks = c(-0.5, 0.5, 3.5, 10.5, 22.5, Inf),
labels = c(0, 1, 2, 3, 4)))
)
str(df_res)
#'data.frame': 33 obs. of 4 variables:
# $ T4.1: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 2 2 1 ...
# $ T4.2: Factor w/ 5 levels "0","1","2","3",..: 5 3 2 2 2 2 3 4 3 4 ...
# $ T4.3: Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ...
# $ T4.4: Factor w/ 4 levels "0","1","2","4": 3 1 1 1 1 1 1 1 2 1 ...
df_res
# T4.1 T4.2 T4.3 T4.4
# 1 0 4 0 2
# 2 0 2 0 0
# 3 0 1 0 0
# 4 0 1 0 0
# 5 0 1 0 0
# 6 0 1 0 0
# 7 0 2 0 0
# 8 1 3 0 0
# 9 1 2 0 1
# 10 0 3 0 0
# 11 0 1 0 0
# 12 0 4 0 4
# 13 1 3 0 0
# 14 1 2 0 0
# 15 1 3 0 1
# 16 0 4 0 0
# 17 0 0 0 0
# 18 2 2 0 0
# 19 1 3 0 0
# 20 0 4 0 2
# 21 1 0 0 0
# 22 0 4 0 0
# 23 0 1 0 0
# 24 0 1 0 0
# 25 2 2 0 1
# 26 1 2 0 0
# 27 0 4 0 1
# 28 0 2 0 0
# 29 0 2 0 0
# 30 0 4 0 2
# 31 1 3 0 0
# 32 0 3 0 2
# 33 2 1 0 1
I find named vectors are a nice pattern for re-coding variables, especially for irregular patterns. You could use one like this here:
decoder <- c(0, rep(1,3), rep(2,7), rep(3, 12))
names(decoder) <- 0:22
sapply(df, function(x) ifelse(x <= 22, decoder[as.character(x)], 4))
If the re-coding was more of a pattern, cut is a useful function.

Finding the count of Interactions between Members located in the Dataset

I have a pass traffic data which shows the pass traffic between Members, here's the sample dataset
It shows the Interactions between Members in consecutive rows. I want to count that interactions, and obtain a new dataset which shows how many interactions occured between Members for Each Member, the direction doesn't matters
For example:
between 26 and 11 = X
between 26 and 27 = Y
I just can't figure it out which function I can use and how can I write a code for this calculation. Thanks
You could use the rollaply function from the zoo package to find all interactions. The frequency of these interactions could be calculated using table. (I assume your object is called dat.)
library(zoo)
table(as.data.frame(rollapply(dat[[1]], 2, sort)))
The result:
V2
V1 4 8 10 11 13 17 19 25 26 27 53
4 2 13 17 1 2 5 6 3 1 9 4
8 0 2 14 11 10 4 5 0 13 13 11
10 0 0 3 9 7 2 4 2 8 11 8
11 0 0 0 1 6 5 4 4 5 4 25
13 0 0 0 0 0 1 3 5 7 9 8
17 0 0 0 0 0 0 1 1 1 5 5
19 0 0 0 0 0 0 1 1 1 5 4
25 0 0 0 0 0 0 0 0 5 8 5
26 0 0 0 0 0 0 0 0 1 5 3
27 0 0 0 0 0 0 0 0 0 0 1
53 0 0 0 0 0 0 0 0 0 0 1
The lower triangular part of the matrix contains zeros only since the direction does not matter.
If you are not interested in interactions between the same values, use the following command:
table(as.data.frame(rollapply(rle(dat[[1]])$values, 2, sort)))
V2
V1 8 10 11 13 17 19 25 26 27 53
4 13 17 1 2 5 6 3 1 9 4
8 0 14 11 10 4 5 0 13 13 11
10 0 0 9 7 2 4 2 8 11 8
11 0 0 0 6 5 4 4 5 4 25
13 0 0 0 0 1 3 5 7 9 8
17 0 0 0 0 0 1 1 1 5 5
19 0 0 0 0 0 0 1 1 5 4
25 0 0 0 0 0 0 0 5 8 5
26 0 0 0 0 0 0 0 0 5 3
27 0 0 0 0 0 0 0 0 0 1

How to sum leading diagonal of table in R

I have a table created using the table() command in R:
y
x 0 1 2 3 4 5 6 7 8 9
0 23 0 0 0 0 1 0 0 0 0
1 0 23 1 0 1 0 1 2 0 2
2 1 1 28 0 0 0 1 0 2 2
3 0 1 0 24 0 1 0 0 0 1
4 1 1 0 0 34 0 3 0 0 0
5 0 0 0 0 0 33 0 0 0 0
6 0 0 0 0 0 2 32 0 0 0
7 0 1 0 1 0 0 0 36 0 1
8 1 1 1 1 0 0 0 1 20 1
9 1 3 0 1 0 1 0 1 0 24
This table shows the results of a classification, and I want to sum the leading diagonal of it (the diagonal with the large numbers - like 23, 23, 28 etc). Is there a sensible/easy way to do this in R?
How about sum(diag(tbl)), where tbl is your table?

Resources