Plotting count data in r - plot

I have counted crashes at intersections and am wondering how to plot this data in time series. The data was counted through the years of 2008 to 2018. the data is found at this link. Please, i am interested in the code and proper technique for plotting the data.
In order to get the data into table format the melt command from shape2 is required:
using melt from reshape2:
> attidtudeM=melt(df)
> head(attidtudeM)
variable value
1 F2008 0
2 F2008 1
3 F2008 1
4 F2008 2
5 F2008 0
6 F2008 1
> table(attidtudeM)
variable 0 1 2 3 4 5 6 7
F2008 235 38 11 3 0 0 0 0
F2009 244 27 8 6 2 0 0 0
F2010 237 9 31 3 2 2 3 0
F2011 241 33 11 0 1 0 1 0
F2012 246 31 8 1 1 0 0 0
F2013 251 28 7 1 0 0 0 0
F2014 265 16 5 0 0 1 0 0
F2015 261 6 17 0 2 0 1 0
F2016 263 17 5 0 1 0 0 1
F2017 275 7 4 0 0 0 0 1
F2008 F2009 F2010 F2011 F2012 F2013 F2014 F2015 F2016 F2017
1 1 1
1 2 1 1 2 1
1 1 2
2 1 2
1 1
3 1
1 1 2 3 2 2 1
3 1
2
1
1 1 4 1 1 2 2 2
2 1
2 1 1 1 1 2
1 3 2 2 1 5 4 1 7
1
2 2
1 6 2 1 2 1 1 2
1 2 1
5 2 1 2
2 1 1
1 2 2 1
2 2
1
1
1
1 0
1
4

Related

How can I calculate the percentage score from test results using tidyverse?

Rather than calculate each individuals score, I want to calculate the percentage of individuals who answered the question correctly. Below is the tibble containing the data, the columns are the candidates, a-r, and the rows are the questions. The data points are the answers given, and the column on the right, named 'correct', shows the correct answer.
A tibble: 20 x 19
question a b c d e g h i j k l m n o p q r correct
<chr> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
1 001 3 3 3 0 4 0 1 4 4 0 2 3 2 0 3 0 3 1
2 002 2 4 2 3 4 NA 4 2 2 2 4 2 4 3 2 2 3 2
3 003 2 2 2 3 4 2 2 4 4 1 4 3 3 2 4 1 3 2
4 005 2 3 1 3 4 NA 2 4 4 2 4 1 4 2 4 2 2 2
5 006 3 1 2 3 3 NA 2 3 4 2 3 3 3 3 3 NA 3 3
6 008 3 3 3 3 3 1 1 3 3 1 3 3 3 3 3 1 3 3
7 010 4 5 4 3 4 4 4 4 4 3 4 4 5 4 4 3 4 4
8 011 3 3 5 3 3 3 3 3 5 4 5 4 4 3 3 2 5 5
9 013 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0
10 014 0 0 0 2 0 1 0 0 0 0 2 0 2 0 0 0 0 0
11 016 3 3 0 0 4 1 1 4 4 2 3 3 3 3 1 0 3 0
12 017 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
13 019 0 1 0 2 1 1 0 1 0 1 2 2 2 1 0 1 1 0
14 020 0 0 0 0 0 0 0 0 0 0 1 3 0 0 0 0 0 0
15 039 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0
16 041 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
17 045 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 047 0 0 0 0 0 NA 0 0 0 0 1 0 0 0 0 0 0 0
19 049 3 3 3 3 4 NA 2 4 x 2 4 3 5 3 1 1 3 3
20 050 0 3 3 0 1 NA 0 3 3 0 x 0 0 0 0 0 3 1
I would like to generate a column 'percentage' that gives the proportion of correct answers for each question. I suspect I have to do loops or row-wise operations, but I'm so far out of my depth with that, I just can't figure out how to compare factors. I've tried mutate(), if_else(), group_by() and much more but have not managed to get close to an answer.
Any help would be greatly appreciated.
If your data.frame is called data you may try
library(dplyr)
data %>% rowwise() %>%
mutate(percentage = sum(c_across(a:r) == correct) / length(c_across(a:r)))
You can try this solution using a loop:
#Code
#First select the range of individuals a to r
index <- 2:18
#Create empty var to save results
df$Count <- NA
df$Prop <- NA
#Apply function
for(i in 1:dim(df)[1])
{
x <- df[i,index]
count <- length(which(x==df$correct[i]))
percentage <- count/dim(x)[2]
#Assign
df$Count[i] <- count
df$Prop[i] <- percentage
}
Output:
question a b c d e g h i j k l m n o p q r correct Count Prop
1 1 3 3 3 0 4 0 1 4 4 0 2 3 2 0 3 0 3 1 1 0.05882353
2 2 2 4 2 3 4 NA 4 2 2 2 4 2 4 3 2 2 3 2 8 0.47058824
3 3 2 2 2 3 4 2 2 4 4 1 4 3 3 2 4 1 3 2 6 0.35294118
4 5 2 3 1 3 4 NA 2 4 4 2 4 1 4 2 4 2 2 2 6 0.35294118
5 6 3 1 2 3 3 NA 2 3 4 2 3 3 3 3 3 NA 3 3 10 0.58823529
6 8 3 3 3 3 3 1 1 3 3 1 3 3 3 3 3 1 3 3 13 0.76470588
7 10 4 5 4 3 4 4 4 4 4 3 4 4 5 4 4 3 4 4 12 0.70588235
8 11 3 3 5 3 3 3 3 3 5 4 5 4 4 3 3 2 5 5 4 0.23529412
9 13 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 14 0.82352941
10 14 0 0 0 2 0 1 0 0 0 0 2 0 2 0 0 0 0 0 13 0.76470588
11 16 3 3 0 0 4 1 1 4 4 2 3 3 3 3 1 0 3 0 3 0.17647059
12 17 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 15 0.88235294
13 19 0 1 0 2 1 1 0 1 0 1 2 2 2 1 0 1 1 0 5 0.29411765
14 20 0 0 0 0 0 0 0 0 0 0 1 3 0 0 0 0 0 0 15 0.88235294
15 39 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 14 0.82352941
16 41 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 14 0.82352941
17 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 1.00000000
18 47 0 0 0 0 0 NA 0 0 0 0 1 0 0 0 0 0 0 0 15 0.88235294
19 49 3 3 3 3 4 NA 2 4 NA 2 4 3 5 3 1 1 3 3 7 0.41176471
20 50 0 3 3 0 1 NA 0 3 3 0 NA 0 0 0 0 0 3 1 1 0.05882353
You had some x in answers so I have replaced by NA in order to make the loop works.

data cleaning for plotting data frames

I am currently working with survey data in R studio. I originally had two csv files but I merged them into one. Both CSV files contained sample IDs. The first file also contains bivariate info, while the second contains rating as a continuous variable.
Here is a sample of the data
ID O1 O2 O3 O4 O5 O6 O7 O8 S1 S2 S3 S4 S5 S6 S7 S8
22 0 1 0 1 0 1 0 1 4 6 2 6 4 3 6 2
23 0 1 0 0 1 1 0 1 5 6 10 4 5 7 7 6
24 0 1 1 0 1 0 0 1 7 4 7 8 7 6 3 9
25 0 0 1 1 0 0 1 1 3 5 5 7 4 6.9 6 5
26 0 1 0 0 1 1 0 1 2 2.5 7 5 4 5 4 3
27 0 1 1 1 0 1 0 0 6 3 4 6 5 6 5 6
28 0 1 1 1 0 0 0 1 7 4 2 8 2 1 4 5
29 0 0 1 0 1 1 1 0 2 5 1 2 4 3 2 2
30 0 1 0 1 1 1 0 0 8 2 6 7 1 7 5 4
31 0 0 0 1 0 1 1 1 7 4 3 2 4 5 7 2
32 0 0 1 0 0 1 1 1 4 7 5 3 1 6 2 3
33 0 1 1 0 1 1 0 0 7 4 5 8 8 5 6 7
For example the 0 in O1 corresponds to the 4 in S1.
I want to make a loop that will sum all of the values corresponding to variable 0 and 1.
if value in O1 is 0, add value in S1 to "sum of 0"
if value in O1 is 1, add value in S1 to "sum of 1"
repeat for all columns to get a total value for 0 and 1.
Any strategies or tips would be helpful going forward!

How to create new columns in R every time a given value appears?

I have a question regarding creating new columns if a certain value appears in an existing row.
N=5
T=5
time<-rep(1:T, times=N)
id<- rep(1:N,each=T)
dummy<- c(0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0)
df <- data.frame(id, time, dummy)
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 1 4 1
5 1 5 0
6 2 1 0
7 2 2 0
8 2 3 1
9 2 4 0
10 2 5 0
11 3 1 0
12 3 2 1
13 3 3 0
14 3 4 1
15 3 5 0
16 4 1 0
17 4 2 0
18 4 3 0
19 4 4 0
20 4 5 0
21 5 1 1
22 5 2 0
23 5 3 0
24 5 4 1
25 5 5 0
In this case we have some cross-sections in which more than one 1 appears. Now I try to create a new dummy variable/column for each additional 1. After that, for each dummy, the rows for each cross-section should also be filled with a 1 after the first 1 appears. I can fill the rows by using group_by(id) and the cummax function on each column. But how do I get new variables without going through every cross-section manually? So I want to achieve the following:
id time dummy dummy2
1 1 1 0 0
2 1 2 0 0
3 1 3 1 0
4 1 4 1 1
5 1 5 1 1
6 2 1 0 0
7 2 2 0 0
8 2 3 1 0
9 2 4 1 0
10 2 5 1 0
11 3 1 0 0
12 3 2 1 0
13 3 3 1 0
14 3 4 1 1
15 3 5 1 1
16 4 1 0 0
17 4 2 0 0
18 4 3 0 0
19 4 4 0 0
20 4 5 0 0
21 5 1 1 0
22 5 2 1 0
23 5 3 1 0
24 5 4 1 1
25 5 5 1 1
Thanks! :)
You can use cummax and you would need cumsum to create dummy2
df %>%
group_by(id) %>%
mutate(dummy1 = cummax(dummy), # don't alter 'dummy' here we need it in the next line
dummy2 = cummax(cumsum(dummy) == 2)) %>%
as.data.frame() # needed only to display the entire result
# id time dummy dummy1 dummy2
#1 1 1 0 0 0
#2 1 2 0 0 0
#3 1 3 1 1 0
#4 1 4 1 1 1
#5 1 5 0 1 1
#6 2 1 0 0 0
#7 2 2 0 0 0
#8 2 3 1 1 0
#9 2 4 0 1 0
#10 2 5 0 1 0
#11 3 1 0 0 0
#12 3 2 1 1 0
#13 3 3 0 1 0
#14 3 4 1 1 1
#15 3 5 0 1 1
#16 4 1 0 0 0
#17 4 2 0 0 0
#18 4 3 0 0 0
#19 4 4 0 0 0
#20 4 5 0 0 0
#21 5 1 1 1 0
#22 5 2 0 1 0
#23 5 3 0 1 0
#24 5 4 1 1 1
#25 5 5 0 1 1

Quick way to count number of position match of a given character between all rows pairwise

I have a matrix and I want to identify the number of times that each character appears in the same position between all pairwise.
A example of the way I'm doing is below, but my matrix has 10,000 rows and it's taking too long.
# This code will generate a dataframe with one row for each pair and columns that
# count the number of position match each letter have
my_letters <- c("A", "B", "C", "D")
size_vector <- 175
n_vectors <- 10
indexes_vectors <- seq_len(n_vectors)
mtx <- sapply(indexes_vectors,
function(i) sample(my_letters, n_vectors, replace = TRUE))
rownames(mtx) <- indexes_vectors
df <- as.data.frame(t(combn(indexes_vectors, m = 2)))
colnames(df) <- c("index_1", "index_2")
for(l in my_letters){
cat(l, "\n")
df[,l] <- apply(df[,1:2], 1,
function(ids) {
sum(mtx[ids[1],] == mtx[ids[2],] &
mtx[ids[1],] == l, na.rm = TRUE)
})
}
m1 <- t(sapply(1:nrow(df), function(i)
table(factor(mtx[df[i,1],][mtx[df[i,1],] == mtx[df[i,2],]],
levels = my_letters))))
cbind(df, m1)
> V1 V2 A B C D
1 1 2 0 0 1 1
2 1 3 1 0 1 1
3 1 4 1 0 2 1
4 1 5 0 0 1 0
5 1 6 2 0 2 0
6 1 7 0 0 1 0
7 1 8 1 0 1 1
8 1 9 0 0 1 0
9 1 10 1 0 1 1
10 2 3 0 0 1 1
11 2 4 1 1 1 2
12 2 5 0 0 0 1
13 2 6 1 0 2 1
14 2 7 1 0 0 1
15 2 8 1 0 0 0
16 2 9 2 0 0 0
17 2 10 1 0 1 0
18 3 4 0 0 0 0
19 3 5 0 2 1 0
20 3 6 1 1 2 1
21 3 7 0 1 0 0
22 3 8 1 1 0 0
23 3 9 0 1 2 0
24 3 10 0 0 1 0
25 4 5 1 1 0 1
26 4 6 2 1 1 0
27 4 7 1 0 1 1
28 4 8 0 1 0 0
29 4 9 1 0 0 0
30 4 10 2 0 0 0
31 5 6 0 2 0 0
32 5 7 0 1 3 1
33 5 8 0 1 2 0
34 5 9 1 0 2 0
35 5 10 0 0 2 0
36 6 7 0 0 0 0
37 6 8 1 1 0 0
38 6 9 0 0 1 0
39 6 10 3 0 1 0
40 7 8 0 1 1 0
41 7 9 1 0 1 0
42 7 10 0 0 1 0
43 8 9 1 1 1 1
44 8 10 0 0 1 0
45 9 10 0 0 0 0
I don't know if this will perform well, but it's one option:
library(data.table)
matchDT = setDT(melt(mtx))[,
CJ(row1 = Var1, row2 = Var1)[row1 < row2], by=.(value, col = Var2)]
]
dcast(matchDT, row1 + row2 ~ value)
This excludes row combos with no matches. To get them back, maybe...
levs = seq_len(nrow(mtx))
dcast(matchDT, factor(row1, levels=levs) + factor(row2, levels = levs) ~ value, drop = FALSE)[as.integer(row1) < as.integer(row2)]
Aggregate function missing, defaulting to 'length'
row1 row2 A B C D
1: 1 2 1 0 2 0
2: 1 3 1 0 1 1
3: 1 4 1 1 0 1
4: 1 5 0 1 1 0
5: 1 6 1 0 1 1
6: 1 7 0 0 1 0
7: 1 8 0 2 1 0
8: 1 9 1 2 2 1
9: 1 10 0 1 1 0
10: 2 3 2 0 0 0
11: 2 4 1 0 1 0
12: 2 5 0 1 1 0
13: 2 6 1 0 1 1
14: 2 7 0 0 1 0
15: 2 8 2 0 1 0
16: 2 9 1 0 1 0
17: 2 10 1 0 1 0
18: 3 4 0 0 0 2
19: 3 5 0 0 0 0
20: 3 6 1 0 0 2
21: 3 7 1 1 1 0
22: 3 8 1 0 0 1
23: 3 9 1 1 0 0
24: 3 10 1 0 1 0
25: 4 5 0 2 1 0
26: 4 6 0 1 0 2
27: 4 7 0 0 0 0
28: 4 8 1 1 0 2
29: 4 9 0 2 0 0
30: 4 10 0 2 1 0
31: 5 6 0 1 1 0
32: 5 7 0 2 1 0
33: 5 8 0 1 0 1
34: 5 9 0 1 1 0
35: 5 10 0 2 1 1
36: 6 7 0 1 2 1
37: 6 8 0 0 0 1
38: 6 9 1 1 1 0
39: 6 10 0 1 0 0
40: 7 8 0 0 1 0
41: 7 9 0 0 1 0
42: 7 10 0 1 2 0
43: 8 9 1 2 1 0
44: 8 10 1 1 1 1
45: 9 10 0 2 1 0
row1 row2 A B C D
A possible solution with base R:
l1 <- lapply(split(df, 1:nrow(df)), as.integer)
l2 <- lapply(l1, function(x) {
m <- mtx[x[1],] == mtx[x[2],]
l <- lapply(my_letters, '==', mtx[x[1],])
sapply(l, function(i) sum(i & m))
})
cbind(df, setNames(do.call(rbind.data.frame, l2), my_letters))
which gives:
index_1 index_2 A B C D
1 1 2 0 0 0 0
2 1 3 0 0 2 1
3 1 4 0 0 0 1
4 1 5 0 1 2 0
5 1 6 0 0 3 1
6 1 7 0 1 1 3
7 1 8 0 1 2 2
8 1 9 0 0 2 1
9 1 10 0 0 2 0
10 2 3 0 1 0 1
11 2 4 0 1 0 2
12 2 5 0 1 0 0
13 2 6 0 0 0 2
14 2 7 0 1 0 1
15 2 8 1 0 0 0
16 2 9 0 1 0 2
17 2 10 2 1 0 3
18 3 4 0 0 1 0
19 3 5 0 0 1 1
20 3 6 0 0 1 1
21 3 7 0 1 1 2
22 3 8 0 0 0 1
23 3 9 1 0 0 0
24 3 10 0 0 0 1
25 4 5 0 2 1 0
26 4 6 0 0 1 1
27 4 7 1 1 0 1
28 4 8 1 1 1 1
29 4 9 0 1 1 2
30 4 10 0 1 0 2
31 5 6 0 1 2 0
32 5 7 0 1 1 0
33 5 8 0 2 1 0
34 5 9 0 1 2 0
35 5 10 0 2 1 0
36 6 7 1 0 1 1
37 6 8 0 0 3 1
38 6 9 0 1 2 0
39 6 10 0 0 1 1
40 7 8 0 1 0 2
41 7 9 0 1 0 1
42 7 10 0 0 0 1
43 8 9 0 0 2 1
44 8 10 1 1 1 0
45 9 10 0 0 2 1

How to count number of particular values

My data looks like this:
ID CO MV
1 0 1
1 5 0
1 0 1
1 9 0
1 8 0
1 0 1
2 69 0
2 0 1
2 8 0
2 0 1
2 78 0
2 53 0
2 0 1
2 3 0
3 54 0
3 0 1
3 8 0
3 90 0
3 0 1
3 56 0
4 0 1
4 56 0
4 0 1
4 45 0
4 0 1
4 34 0
4 31 0
4 0 1
4 45 0
5 0 1
5 0 1
5 67 0
I want it to look like this:
ID CO MV CONUM
1 0 1 3
1 5 0 3
1 0 1 3
1 9 0 3
1 8 0 3
1 0 1 3
2 69 0 5
2 0 1 5
2 8 0 5
2 0 1 5
2 78 0 5
2 53 0 5
2 0 1 5
2 3 0 5
3 54 0 4
3 0 1 4
3 8 0 4
3 90 0 4
3 0 1 4
3 56 0 4
4 0 1 5
4 56 0 5
4 0 1 5
4 45 0 5
4 0 1 5
4 34 0 5
4 31 0 5
4 0 1 5
4 45 0 5
5 0 1 1
5 0 1 1
5 67 0 1
I want to create a column CONUM which is the total number of values other than zero in the CO column for each value in the ID column. So for example the CO column for ID 1 has 3 values other than zero, therefore the corresponding values in CONUM column is 3. The MV column is 0 if CO column has a value and 1 if CO column is 0. So another way to accomplish creating the CONUM column would be to count the number of zeros per ID . It would be great if you could help me with the r code to accomplish this. Thanks.
Here is an option with data.table
library(data.table)
setDT(df)[,CONUM:=sum(CO!=0) ,ID][]
You can use ave in base R:
dat <- transform(dat, CONUM = ave(as.logical(CO), ID, FUN = sum))
and an option with dplyr
# install.packages("dplyr")
library(dplyr)
dat <- dat %>%
group_by(ID) %>%
mutate(CONUM = sum(CO != 0))

Resources