How to merge three tables by inserting to each other in R? - r

I have a data frame as following. I want to know the evolution from RIK_T1 to RIK_T2 by seeing their frequency, row% and Column%. How to show them at once?
ID<-c('1','2','3','4','5','6','7','8','9','10')
RIK_T1<-c('20','15','20','20','97','20','20','20','15','15')
RIK_T2<-c('20','15','15','20','97','97','20','20','20','20')
df<-data.frame(ID,RIK_T1,RIK_T2)
df
TAB=table(df$RIK_T1,df$RIK_T2)
t1<-addmargins(TAB) #TABLE-01
TAB_row=prop.table(TAB,1)#row
t2<-round(addmargins(TAB_row),digits=2)#TABLE-01-1
TAB_col=prop.table(TAB,2)#column
t3<-round(addmargins(TAB_col),digits=2)#TABLE-01-2
I get three tables as following:table, row% and col%
15 20 97 Sum
15 1 2 0 3
20 1 4 1 6
97 0 0 1 1
Sum 2 6 2 10
15 20 97 Sum
15 0.33 0.67 0.00 1.00
20 0.17 0.67 0.17 1.00
97 0.00 0.00 1.00 1.00
Sum 0.50 1.33 1.17 3.00
15 20 97 Sum
15 0.50 0.33 0.00 0.83
20 0.50 0.67 0.50 1.67
97 0.00 0.00 0.50 0.50
Sum 1.00 1.00 1.00 3.00
Is it possible to merge them into one table as following?
15 20 97 Sum
R%/C% R%/C% R%/C% R%/C%
15 1 2 0 3
0.33/0.50 0.67/0.33 0.00/0.00 1.00/0.83
20 1 4 1 6
0.17/0.50 0.67/0.67 0.17/0.50 1.00/1.67
97 0 0 1 1
0.00/0.00 0.00/0.00 1.00/0.50 1.00/0.50
Sum 2 6 2 10
0.50/1.00 1.33/1.00 1.17/1.00 3.00/3.00
Thanks in advance.

Related

Scrape Data into R

I'm currently trying to scrape the Player Standard Stats table into R but am having trouble getting the right table.
html_link <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard::1"
"https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard::1"
df <- html_link %>%
xml2::read_html() %>%
rvest::html_nodes("table") %>%
rvest::html_table(fill = T)
The link provides a copy link to clipboard, so I was trying to use that link and scrape the data in, but it looks like I'm not getting the right results. Does anyone know how to do this automatically in R without having to download the CSV file?
Thanks.
You can use the "embed link" on the table...
url <- "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fstats%2FPremier-League-Stats&div=div_stats_standard"
f <- url %>%
xml2::read_html() %>%
rvest::html_nodes('table') %>%
html_table() %>%
.[[1]]
> head(f)
1 Rk Player Nation Pos Squad Age Born
2 1 Patrick van Aanholt nl NED DF Crystal Palace 30-170 1990
3 2 Tammy Abraham eng ENG FW Chelsea 23-136 1997
4 3 Che Adams eng ENG FW Southampton 24-217 1996
5 4 Tosin Adarabioyo eng ENG DF Fulham 23-144 1997
6 5 Adrián es ESP GK Liverpool 34-043 1987
Playing Time Playing Time Playing Time Playing Time Performance
1 MP Starts Min 90s Gls
2 14 13 1,144 12.7 0
3 18 10 957 10.6 6
4 22 20 1,735 19.3 4
5 19 19 1,710 19.0 0
6 2 2 180 2.0 0
Performance Performance Performance Performance Performance
1 Ast G-PK PK PKatt CrdY
2 1 0 0 0 1
3 1 6 0 0 0
4 4 4 0 0 1
5 0 0 0 0 1
6 0 0 0 0 0
Performance Per 90 Minutes Per 90 Minutes Per 90 Minutes
1 CrdR Gls Ast G+A
2 0 0.00 0.08 0.08
3 0 0.56 0.09 0.66
4 0 0.21 0.21 0.41
5 0 0.00 0.00 0.00
6 0 0.00 0.00 0.00
Per 90 Minutes Per 90 Minutes Expected Expected Expected Expected
1 G-PK G+A-PK xG npxG xA npxG+xA
2 0.00 0.08 0.8 0.8 0.8 1.6
3 0.56 0.66 5.5 5.5 0.9 6.3
4 0.21 0.41 5.1 5.1 4.3 9.4
5 0.00 0.00 0.8 0.8 0.1 0.9
6 0.00 0.00 0.0 0.0 0.0 0.0
Per 90 Minutes Per 90 Minutes Per 90 Minutes Per 90 Minutes
1 xG xA xG+xA npxG
2 0.06 0.06 0.12 0.06
3 0.51 0.08 0.60 0.51
4 0.26 0.22 0.49 0.26
5 0.04 0.01 0.05 0.04
6 0.00 0.00 0.00 0.00
Per 90 Minutes
1 npxG+xA Matches
2 0.12 Matches
3 0.60 Matches
4 0.49 Matches
5 0.05 Matches
6 0.00 Matches

Cumulative sum based on factor on R

I have the following dataset, and I need to acumulate the value and
sum, if the factor is 0, and then put the cummulated sum when I found
the factor != 0.
I've tried the loop bellow, but it didn't worked at all.
for(i in dataset$Variable.1) {
ifelse(dataset$Factor == 0,
dataset$teste <- dataset$Variable.1 + i,
dataset$teste <- dataset$Variable.1)
i<- dataset$Variable.1
print(i)
}
Any ideas?
Bellow an example of the dataset. I wish to get the "Result" Column.
On the real one, I also have a negative factor (-1).
Date Factor Variable.1 Result
1 03/02/2018 0 0.75 0.75
2 04/02/2018 0 0.75 1.50
3 05/02/2018 1 0.96 2.46
4 06/02/2018 1 0.76 0.76
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.70 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0.00 0.00
9 11/02/2018 0 0.00 0.00
10 12/02/2018 0 0.20 0.20
11 13/02/2018 0 0.13 0.33
12 14/02/2018 0 1.64 1.97
13 15/02/2018 0 0.03 2.00
14 16/02/2018 1 0.51 2.51
15 17/02/2018 1 0.00 0.00
16 18/02/2018 0 0.00 0.00
17 19/02/2018 0 0.83 0.83
18 20/02/2018 1 0.42 1.25
19 21/02/2018 1 0.17 0.17
20 22/02/2018 1 0.97 0.97
21 23/02/2018 0 0.92 0.92
22 24/02/2018 0 0.00 0.92
23 25/02/2018 0 0.00 0.92
24 26/02/2018 1 0.19 1.11
25 27/02/2018 1 0.87 0.87
26 28/02/2018 1 0.85 0.85
27 01/03/2018 1 1.95 1.95
28 02/03/2018 1 0.54 0.54
29 03/03/2018 1 0.00 0.00
30 04/03/2018 0 0.00 0.00
31 05/03/2018 0 1.17 1.17
32 06/03/2018 1 0.25 1.42
33 07/03/2018 1 1.45 1.45
Thanks In advance.
If you want to stick with the for-loop, you can try this code :
DF$Result <- NA
prev <- 0
for(i in seq_len(nrow(DF))){
DF$Result[i] <- DF$Variable.1[i] + prev
if(DF$Factor[i] == 1)
prev <- 0
else
prev <- DF$Result[i]
}
Iteratively, try something like:
a=as.data.frame(cbind(Factor=c(0,0,1,1,0,1,1,
rep(0,3),1),Variable.1=c(0.75,0.75,0.96,0.71,1.35,0.7,
0.75,0.96,0.71,1.35,0.7)))
Result=0
aux=NULL
for (i in 1:nrow(a)){
if (a$Factor[i]==0){
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
} else{
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
Result=0
}
}
a$Results=aux
a
Factor Variable.1 Results
1 0 0.75 0.75
2 0 0.75 1.50
3 1 0.96 2.46
4 1 0.71 0.71
5 0 1.35 1.35
6 1 0.70 2.05
7 1 0.75 0.75
8 0 0.96 0.96
9 0 0.71 1.67
10 0 1.35 3.02
11 1 0.70 3.72
A possibility using tidyverse and data.table:
df %>%
mutate(temp = ifelse(Factor == 1 & lag(Factor) == 1, NA, 1), #Marking the rows after the first 1 in "Factor" as NA
temp = ifelse(!is.na(temp), rleid(temp), NA)) %>% #Run length along non-NA values
group_by(temp) %>% #Grouping by run length
mutate(Result = ifelse(!is.na(temp), cumsum(Variable.1), Variable.1)) %>% #Cumulative sum of desired rows
ungroup() %>%
select(-temp) #Removing the redundant variable
Date Factor Variable.1 Result
<chr> <int> <dbl> <dbl>
1 03/02/2018 0 0.750 0.750
2 04/02/2018 0 0.750 1.50
3 05/02/2018 1 0.960 2.46
4 06/02/2018 1 0.760 0.760
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.700 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0. 0.
9 11/02/2018 0 0. 0.
10 12/02/2018 0 0.200 0.200

Object not found in for loop

I'm trying to figure out why this doesn't work:
data=read.csv("data_risk.csv")
pa1 = c(data$pa1)
pa2 = c(data$pa2)
pb1 = c(data$pb1)
pb2 = c(data$pb2)
a1 = c(data$a1)
a2= c(data$a2)
b1 = c(data$b1)
b2 = c(data$b2)
yy=c(data$choice)
crra=function(x,r){
u=x^(1-r)/(1-r)
return(u)
}
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
LL_all = c()
R<-seq(0,1,0.01)
for (r in R){
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
probA = eua/(eua+eub)
total = ifelse(yy==1, probA, 1-probA)
LL=log(prod(total))
LL_all=c(LL_all,LL)
}
Right now every time I try and run it it says object r not found or error object R not found it works without the for loop just fine but when I add the for loop it all breaks down.
I'm trying to find the value of r that maximises someones utility given two choices. A decision maker chooses option A over B with probability a EUA/(EUA+EUB). In this example r is the risk aversion coefficient and x is the outcome of the lottery.
pa1 = probability of event a1 happening
pa2 = probability of event a2 happening
pb1 = probability of event b1 happening
pb2 = probability of event b2 happening
a1,a2,b1,b2 = outcomes of events
yy= indicator function that takes the value of 1 if lottery a is chosen and 0 otherwise
dataset:
: task pa1 a1 pa2 a2 pb1 b1 pb2 b2 choice
1 0.34 24 0.66 59 0.42 47 0.58 64 0
2 0.88 79 0.12 82 0.20 57 0.80 94 0
3 0.74 62 0.26 0 0.44 23 0.56 31 1
4 0.05 56 0.95 72 0.95 68 0.05 95 1
5 0.25 84 0.75 43 0.43 7 0.57 97 0
6 0.28 7 0.72 74 0.71 55 0.29 63 0
7 0.09 56 0.91 19 0.76 13 0.24 90 0
8 0.63 41 0.37 18 0.98 56 0.02 8 0
9 0.88 72 0.12 29 0.39 67 0.61 63 1
10 0.61 37 0.39 50 0.60 6 0.40 45 1
11 0.08 54 0.92 31 0.15 44 0.85 29 1
12 0.92 63 0.08 5 0.63 43 0.37 53 1
13 0.78 32 0.22 99 0.32 39 0.68 56 0
14 0.16 66 0.84 23 0.79 15 0.21 29 1
15 0.12 52 0.88 73 0.98 92 0.02 19 0
16 0.29 88 0.71 78 0.29 53 0.71 91 1
17 0.31 39 0.69 51 0.84 16 0.16 91 1
18 0.17 70 0.83 65 0.35 100 0.65 50 0
19 0.91 80 0.09 19 0.64 37 0.36 65 1
20 0.09 83 0.91 67 0.48 77 0.52 6 1
21 0.44 14 0.56 72 0.21 9 0.79 31 1
22 0.68 41 0.32 65 0.85 100 0.15 2 0
23 0.38 40 0.62 55 0.14 26 0.86 96 0
24 0.62 1 0.38 83 0.41 37 0.59 24 1
25 0.49 15 0.51 50 0.94 64 0.06 14 0
26 0.10 40 0.90 32 0.10 77 0.90 2 1
27 0.20 40 0.80 32 0.20 77 0.80 2 1
28 0.30 40 0.70 32 0.30 77 0.70 2 1
29 0.40 40 0.60 32 0.40 77 0.60 2 1
30 0.50 40 0.50 32 0.50 77 0.50 2 0
31 0.60 40 0.40 32 0.60 77 0.40 2 0
32 0.70 40 0.30 32 0.70 77 0.30 2 0
33 0.80 40 0.20 32 0.80 77 0.20 2 0
34 0.90 40 0.10 32 0.90 77 0.10 2 0
35 1.00 40 0.00 32 1.00 77 0.00 2 0
The problem in the peace of code below after your definition of crra function:
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
Basically you are trying to use r variable before it's defined moreover it is a duplicate of the code inside the for-loop. If you comment out these two lines everything goes OK. Please see the code below:
data=read.table(text = " task pa1 a1 pa2 a2 pb1 b1 pb2 b2 choice
1 0.34 24 0.66 59 0.42 47 0.58 64 0
2 0.88 79 0.12 82 0.20 57 0.80 94 0
3 0.74 62 0.26 0 0.44 23 0.56 31 1
4 0.05 56 0.95 72 0.95 68 0.05 95 1
5 0.25 84 0.75 43 0.43 7 0.57 97 0
6 0.28 7 0.72 74 0.71 55 0.29 63 0
7 0.09 56 0.91 19 0.76 13 0.24 90 0
8 0.63 41 0.37 18 0.98 56 0.02 8 0
9 0.88 72 0.12 29 0.39 67 0.61 63 1
10 0.61 37 0.39 50 0.60 6 0.40 45 1
11 0.08 54 0.92 31 0.15 44 0.85 29 1
12 0.92 63 0.08 5 0.63 43 0.37 53 1
13 0.78 32 0.22 99 0.32 39 0.68 56 0
14 0.16 66 0.84 23 0.79 15 0.21 29 1
15 0.12 52 0.88 73 0.98 92 0.02 19 0
16 0.29 88 0.71 78 0.29 53 0.71 91 1
17 0.31 39 0.69 51 0.84 16 0.16 91 1
18 0.17 70 0.83 65 0.35 100 0.65 50 0
19 0.91 80 0.09 19 0.64 37 0.36 65 1
20 0.09 83 0.91 67 0.48 77 0.52 6 1
21 0.44 14 0.56 72 0.21 9 0.79 31 1
22 0.68 41 0.32 65 0.85 100 0.15 2 0
23 0.38 40 0.62 55 0.14 26 0.86 96 0
24 0.62 1 0.38 83 0.41 37 0.59 24 1
25 0.49 15 0.51 50 0.94 64 0.06 14 0
26 0.10 40 0.90 32 0.10 77 0.90 2 1
27 0.20 40 0.80 32 0.20 77 0.80 2 1
28 0.30 40 0.70 32 0.30 77 0.70 2 1
29 0.40 40 0.60 32 0.40 77 0.60 2 1
30 0.50 40 0.50 32 0.50 77 0.50 2 0
31 0.60 40 0.40 32 0.60 77 0.40 2 0
32 0.70 40 0.30 32 0.70 77 0.30 2 0
33 0.80 40 0.20 32 0.80 77 0.20 2 0
34 0.90 40 0.10 32 0.90 77 0.10 2 0
35 1.00 40 0.00 32 1.00 77 0.00 2 0", header = TRUE)
pa1 = c(data$pa1)
pa2 = c(data$pa2)
pb1 = c(data$pb1)
pb2 = c(data$pb2)
a1 = c(data$a1)
a2= c(data$a2)
b1 = c(data$b1)
b2 = c(data$b2)
yy=c(data$choice)
crra=function(x,r){
u=x^(1-r)/(1-r)
return(u)
}
# eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
# eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
LL_all = c()
R<-seq(0,1,0.01)
for (r in R){
eua = c(pa1*crra(a1,r)+pa2*crra(a2,r))
eub = c(pb1*crra(b1,r)+pb2*crra(b2,r))
probA = eua/(eua+eub)
total = ifelse(yy==1, probA, 1-probA)
LL=log(prod(total))
LL_all=c(LL_all,LL)
}
head(LL_all)
Output:
[1] -18.93759 -18.97863 -19.02000 -19.06170 -19.10374 -19.14611

Finding the mean of a subset

I have made a subset from the dataframe 'Indometh' called 'indo':
indo
Subject time conc
1 1 0.25 1.50
13 2 0.50 1.63
24 3 0.50 1.49
25 3 0.75 1.16
34 4 0.25 1.85
35 4 0.50 1.39
36 4 0.75 1.02
46 5 0.50 1.04
57 6 0.50 1.44
58 6 0.75 1.03
I want to find what the average concentration for the subset is. I have used code but to no avail:
mean(subset(indo, conc >1 & conc <2))
I know summary(indo) will show the mean of the concentration but wanted to know if there was another way I could do this just for conc.
You can try subsetting via bracket notation:
mean(indo$conc[indo$conc > 1 & indo$conc < 2])

Unify boxplot factor group colours

I'm somewhat of an R and ggplot novice so I'm struggling to plot this data as a box plot with Flux on the y and Week on the X, with the boxplots grouped by species (and within each species group treatment).
Treatment Species Week Flux
1 L- Heisteria 1 0.19
2 L- Heisteria 1 0.03
3 L- Heisteria 1 NA
4 L- Heisteria 1 0.12
5 L- Simarouba 1 0.22
6 L- Simarouba 1 0.19
7 L- Simarouba 1 NA
8 L- Simarouba 1 -0.65
9 C Heisteria 1 -0.99
10 C Heisteria 1 0.10
11 C Heisteria 1 0.26
12 C Heisteria 1 NA
13 C Simarouba 1 -1.41
14 C Simarouba 1 0.17
15 C Simarouba 1 NA
16 C Simarouba 1 0.35
17 L+ Heisteria 1 0.71
18 L+ Heisteria 1 0.25
19 L+ Heisteria 1 0.08
20 L+ Heisteria 1 4.14
21 L+ Simarouba 1 -1.36
22 L+ Simarouba 1 0.06
23 L+ Simarouba 1 -0.65
24 L+ Simarouba 1 -0.25
25 L- Heisteria 2 0.31
26 L- Heisteria 2 0.15
27 L- Heisteria 2 -0.09
28 L- Heisteria 2 -0.08
29 L- Simarouba 2 0.04
30 L- Simarouba 2 0.06
31 L- Simarouba 2 0.05
32 L- Simarouba 2 -0.07
33 C Heisteria 2 0.20
34 C Heisteria 2 0.15
35 C Heisteria 2 -0.03
36 C Heisteria 2 0.18
37 C Simarouba 2 0.10
38 C Simarouba 2 0.08
39 C Simarouba 2 0.09
40 C Simarouba 2 0.05
41 L+ Heisteria 2 0.24
42 L+ Heisteria 2 0.09
43 L+ Heisteria 2 0.16
44 L+ Heisteria 2 0.11
45 L+ Simarouba 2 NA
46 L+ Simarouba 2 0.21
47 L+ Simarouba 2 -0.07
48 L+ Simarouba 2 1.51
49 L- Heisteria 3 0.15
50 L- Heisteria 3 0.07
51 L- Heisteria 3 NA
52 L- Heisteria 3 -1.02
53 L- Simarouba 3 -0.02
54 L- Simarouba 3 0.08
55 L- Simarouba 3 -0.06
56 L- Simarouba 3 -0.08
57 C Heisteria 3 0.23
58 C Heisteria 3 0.19
59 C Heisteria 3 0.09
60 C Heisteria 3 -0.10
61 C Simarouba 3 0.77
62 C Simarouba 3 0.07
63 C Simarouba 3 0.20
64 C Simarouba 3 0.62
65 L+ Heisteria 3 0.19
66 L+ Heisteria 3 -0.09
67 L+ Heisteria 3 NA
68 L+ Heisteria 3 0.06
69 L+ Simarouba 3 NA
70 L+ Simarouba 3 -0.17
71 L+ Simarouba 3 0.13
72 L+ Simarouba 3 0.64
73 L- Heisteria 4 0.13
74 L- Heisteria 4 0.54
75 L- Heisteria 4 0.18
76 L- Heisteria 4 3.59
77 L- Simarouba 4 0.00
78 L- Simarouba 4 0.10
79 L- Simarouba 4 0.20
80 L- Simarouba 4 NA
81 C Heisteria 4 -0.14
82 C Heisteria 4 -0.32
83 C Heisteria 4 0.21
84 C Heisteria 4 0.12
85 C Simarouba 4 0.10
86 C Simarouba 4 NA
87 C Simarouba 4 0.11
88 C Simarouba 4 0.42
89 L+ Heisteria 4 0.14
90 L+ Heisteria 4 0.05
91 L+ Heisteria 4 0.25
92 L+ Heisteria 4 0.74
93 L+ Simarouba 4 NA
94 L+ Simarouba 4 0.05
95 L+ Simarouba 4 -0.06
96 L+ Simarouba 4 -0.13
I can plot the data using this code
ggplot(treeflux, aes(Week, Flux, fill=interaction(Week, Species, Treatment), dodge=Species, Treatment)) +
stat_boxplot(geom ='errorbar') +
geom_boxplot()
It gives me a plot in the order I want but with way too many colours and items in the legend section. I want the treatments for each species to be variants of a single colour and the legend to read like this "L- Heisteria".
How about this for a start? (The legend for alpha needs a little tweaking ...) This is much easier than setting up an entire custom palette of fill colours and getting the legend right ...
theme_set(theme_bw()) ## my aesthetic preference, also easier for
## distinguishing light vs dark colours
ggplot(treeflux, aes(factor(Week), Flux, fill=Species, alpha=Treatment),
dodge=Species, Treatment) +
stat_boxplot(geom ='errorbar') +
geom_boxplot()

Resources