I'm currently trying to scrape the Player Standard Stats table into R but am having trouble getting the right table.
html_link <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard::1"
"https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard::1"
df <- html_link %>%
xml2::read_html() %>%
rvest::html_nodes("table") %>%
rvest::html_table(fill = T)
The link provides a copy link to clipboard, so I was trying to use that link and scrape the data in, but it looks like I'm not getting the right results. Does anyone know how to do this automatically in R without having to download the CSV file?
Thanks.
You can use the "embed link" on the table...
url <- "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fstats%2FPremier-League-Stats&div=div_stats_standard"
f <- url %>%
xml2::read_html() %>%
rvest::html_nodes('table') %>%
html_table() %>%
.[[1]]
> head(f)
1 Rk Player Nation Pos Squad Age Born
2 1 Patrick van Aanholt nl NED DF Crystal Palace 30-170 1990
3 2 Tammy Abraham eng ENG FW Chelsea 23-136 1997
4 3 Che Adams eng ENG FW Southampton 24-217 1996
5 4 Tosin Adarabioyo eng ENG DF Fulham 23-144 1997
6 5 Adrián es ESP GK Liverpool 34-043 1987
Playing Time Playing Time Playing Time Playing Time Performance
1 MP Starts Min 90s Gls
2 14 13 1,144 12.7 0
3 18 10 957 10.6 6
4 22 20 1,735 19.3 4
5 19 19 1,710 19.0 0
6 2 2 180 2.0 0
Performance Performance Performance Performance Performance
1 Ast G-PK PK PKatt CrdY
2 1 0 0 0 1
3 1 6 0 0 0
4 4 4 0 0 1
5 0 0 0 0 1
6 0 0 0 0 0
Performance Per 90 Minutes Per 90 Minutes Per 90 Minutes
1 CrdR Gls Ast G+A
2 0 0.00 0.08 0.08
3 0 0.56 0.09 0.66
4 0 0.21 0.21 0.41
5 0 0.00 0.00 0.00
6 0 0.00 0.00 0.00
Per 90 Minutes Per 90 Minutes Expected Expected Expected Expected
1 G-PK G+A-PK xG npxG xA npxG+xA
2 0.00 0.08 0.8 0.8 0.8 1.6
3 0.56 0.66 5.5 5.5 0.9 6.3
4 0.21 0.41 5.1 5.1 4.3 9.4
5 0.00 0.00 0.8 0.8 0.1 0.9
6 0.00 0.00 0.0 0.0 0.0 0.0
Per 90 Minutes Per 90 Minutes Per 90 Minutes Per 90 Minutes
1 xG xA xG+xA npxG
2 0.06 0.06 0.12 0.06
3 0.51 0.08 0.60 0.51
4 0.26 0.22 0.49 0.26
5 0.04 0.01 0.05 0.04
6 0.00 0.00 0.00 0.00
Per 90 Minutes
1 npxG+xA Matches
2 0.12 Matches
3 0.60 Matches
4 0.49 Matches
5 0.05 Matches
6 0.00 Matches
Related
I have the following dataset, and I need to acumulate the value and
sum, if the factor is 0, and then put the cummulated sum when I found
the factor != 0.
I've tried the loop bellow, but it didn't worked at all.
for(i in dataset$Variable.1) {
ifelse(dataset$Factor == 0,
dataset$teste <- dataset$Variable.1 + i,
dataset$teste <- dataset$Variable.1)
i<- dataset$Variable.1
print(i)
}
Any ideas?
Bellow an example of the dataset. I wish to get the "Result" Column.
On the real one, I also have a negative factor (-1).
Date Factor Variable.1 Result
1 03/02/2018 0 0.75 0.75
2 04/02/2018 0 0.75 1.50
3 05/02/2018 1 0.96 2.46
4 06/02/2018 1 0.76 0.76
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.70 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0.00 0.00
9 11/02/2018 0 0.00 0.00
10 12/02/2018 0 0.20 0.20
11 13/02/2018 0 0.13 0.33
12 14/02/2018 0 1.64 1.97
13 15/02/2018 0 0.03 2.00
14 16/02/2018 1 0.51 2.51
15 17/02/2018 1 0.00 0.00
16 18/02/2018 0 0.00 0.00
17 19/02/2018 0 0.83 0.83
18 20/02/2018 1 0.42 1.25
19 21/02/2018 1 0.17 0.17
20 22/02/2018 1 0.97 0.97
21 23/02/2018 0 0.92 0.92
22 24/02/2018 0 0.00 0.92
23 25/02/2018 0 0.00 0.92
24 26/02/2018 1 0.19 1.11
25 27/02/2018 1 0.87 0.87
26 28/02/2018 1 0.85 0.85
27 01/03/2018 1 1.95 1.95
28 02/03/2018 1 0.54 0.54
29 03/03/2018 1 0.00 0.00
30 04/03/2018 0 0.00 0.00
31 05/03/2018 0 1.17 1.17
32 06/03/2018 1 0.25 1.42
33 07/03/2018 1 1.45 1.45
Thanks In advance.
If you want to stick with the for-loop, you can try this code :
DF$Result <- NA
prev <- 0
for(i in seq_len(nrow(DF))){
DF$Result[i] <- DF$Variable.1[i] + prev
if(DF$Factor[i] == 1)
prev <- 0
else
prev <- DF$Result[i]
}
Iteratively, try something like:
a=as.data.frame(cbind(Factor=c(0,0,1,1,0,1,1,
rep(0,3),1),Variable.1=c(0.75,0.75,0.96,0.71,1.35,0.7,
0.75,0.96,0.71,1.35,0.7)))
Result=0
aux=NULL
for (i in 1:nrow(a)){
if (a$Factor[i]==0){
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
} else{
Result=Result+a$Variable.1[i]
aux=c(aux,Result)
Result=0
}
}
a$Results=aux
a
Factor Variable.1 Results
1 0 0.75 0.75
2 0 0.75 1.50
3 1 0.96 2.46
4 1 0.71 0.71
5 0 1.35 1.35
6 1 0.70 2.05
7 1 0.75 0.75
8 0 0.96 0.96
9 0 0.71 1.67
10 0 1.35 3.02
11 1 0.70 3.72
A possibility using tidyverse and data.table:
df %>%
mutate(temp = ifelse(Factor == 1 & lag(Factor) == 1, NA, 1), #Marking the rows after the first 1 in "Factor" as NA
temp = ifelse(!is.na(temp), rleid(temp), NA)) %>% #Run length along non-NA values
group_by(temp) %>% #Grouping by run length
mutate(Result = ifelse(!is.na(temp), cumsum(Variable.1), Variable.1)) %>% #Cumulative sum of desired rows
ungroup() %>%
select(-temp) #Removing the redundant variable
Date Factor Variable.1 Result
<chr> <int> <dbl> <dbl>
1 03/02/2018 0 0.750 0.750
2 04/02/2018 0 0.750 1.50
3 05/02/2018 1 0.960 2.46
4 06/02/2018 1 0.760 0.760
5 07/02/2018 0 1.35 1.35
6 08/02/2018 1 0.700 2.05
7 09/02/2018 1 2.02 2.02
8 10/02/2018 0 0. 0.
9 11/02/2018 0 0. 0.
10 12/02/2018 0 0.200 0.200
I have made a subset from the dataframe 'Indometh' called 'indo':
indo
Subject time conc
1 1 0.25 1.50
13 2 0.50 1.63
24 3 0.50 1.49
25 3 0.75 1.16
34 4 0.25 1.85
35 4 0.50 1.39
36 4 0.75 1.02
46 5 0.50 1.04
57 6 0.50 1.44
58 6 0.75 1.03
I want to find what the average concentration for the subset is. I have used code but to no avail:
mean(subset(indo, conc >1 & conc <2))
I know summary(indo) will show the mean of the concentration but wanted to know if there was another way I could do this just for conc.
You can try subsetting via bracket notation:
mean(indo$conc[indo$conc > 1 & indo$conc < 2])
I have a data frame as following. I want to know the evolution from RIK_T1 to RIK_T2 by seeing their frequency, row% and Column%. How to show them at once?
ID<-c('1','2','3','4','5','6','7','8','9','10')
RIK_T1<-c('20','15','20','20','97','20','20','20','15','15')
RIK_T2<-c('20','15','15','20','97','97','20','20','20','20')
df<-data.frame(ID,RIK_T1,RIK_T2)
df
TAB=table(df$RIK_T1,df$RIK_T2)
t1<-addmargins(TAB) #TABLE-01
TAB_row=prop.table(TAB,1)#row
t2<-round(addmargins(TAB_row),digits=2)#TABLE-01-1
TAB_col=prop.table(TAB,2)#column
t3<-round(addmargins(TAB_col),digits=2)#TABLE-01-2
I get three tables as following:table, row% and col%
15 20 97 Sum
15 1 2 0 3
20 1 4 1 6
97 0 0 1 1
Sum 2 6 2 10
15 20 97 Sum
15 0.33 0.67 0.00 1.00
20 0.17 0.67 0.17 1.00
97 0.00 0.00 1.00 1.00
Sum 0.50 1.33 1.17 3.00
15 20 97 Sum
15 0.50 0.33 0.00 0.83
20 0.50 0.67 0.50 1.67
97 0.00 0.00 0.50 0.50
Sum 1.00 1.00 1.00 3.00
Is it possible to merge them into one table as following?
15 20 97 Sum
R%/C% R%/C% R%/C% R%/C%
15 1 2 0 3
0.33/0.50 0.67/0.33 0.00/0.00 1.00/0.83
20 1 4 1 6
0.17/0.50 0.67/0.67 0.17/0.50 1.00/1.67
97 0 0 1 1
0.00/0.00 0.00/0.00 1.00/0.50 1.00/0.50
Sum 2 6 2 10
0.50/1.00 1.33/1.00 1.17/1.00 3.00/3.00
Thanks in advance.
Given a table of values, where A = state of system, B = length of state, and C = cumulative length of states:
A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46
How would I use R to calculate the number of transitions (where A gives the state) per constant interval length - where the intervals are consecutive and could be any arbitrary number (I chose a value of 2 in my image example)? For example, using the table values or the image included we count 2 transitions from 0-2, 3 transitions from greater than 2-4, 3 transitions from >4-6, etc.
This is straightforward in R. All you need is column C and ?cut. Consider:
d <- read.table(text="A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46", header=TRUE)
fi <- cut(d$C, breaks=seq(from=0, to=14, by=2))
table(fi)
# fi
# (0,2] (2,4] (4,6] (6,8] (8,10] (10,12] (12,14]
# 2 3 3 5 3 3 1
I have the following the data set:
TRAIN dataset
Sr A B C XX
1 0.09 0.52 11.1 high
2 0.13 0.25 11.1 low
3 0.20 0.28 11.1 high
4 0.29 0.50 11.1 low
5 0.31 0.58 11.1 high
6 0.32 0.37 11.1 high
7 0.37 0.58 11.1 low
8 0.38 0.40 11.1 low
9 0.42 0.65 11.1 high
10 0.42 0.79 11.1 low
11 0.44 0.34 11.1 high
12 0.45 0.89 11.1 low
13 0.57 0.72 11.1 low
TEST dataset
Sr A B C XX
1 0.54 1.36 9.80 low
2 0.72 0.82 9.80 low
3 0.19 0.38 9.90 high
4 0.25 0.44 9.90 high
5 0.29 0.54 9.90 high
6 0.30 0.54 9.90 high
7 0.42 0.86 9.90 low
8 0.44 0.86 9.90 low
9 0.49 0.66 9.90 low
10 0.54 0.76 9.90 low
11 0.54 0.76 9.90 low
12 0.68 1.08 9.90 low
13 0.88 0.51 9.90 high
Sr : Serial Number
A-C : Parameters
XX : Output Binary Parameter
I am trying to use the KNN classifier to develop a predictor model with 5 nearest neighbors. Following is the code that I have written:
train_input <- as.matrix(train[,-ncol(train)])
train_output <- as.factor(train[,ncol(train)])
test_input <- as.matrix(test[,-ncol(test)])
prediction <- knn(train_input, test_input, train_output, k=5, prob=TRUE)
resultdf <- as.data.frame(cbind(test[,ncol(test)], prediction))
colnames(resultdf) <- c("Actual","Predicted")
RESULT dataset
A P
1 2 2
2 2 2
3 1 2
4 1 1
5 1 1
6 1 2
7 2 2
8 2 2
9 2 2
10 2 2
11 2 2
12 2 1
13 1 2
I have the following concerns:
What should I do to obtain probability values? Is this a probability of getting high or low i.e. P(high) or P(low)?
The levels are set to 1 (high) and 2 (low), which is based on the order of first appearance. If low appeared before high in the train dataset, it would have a value 1. I feel this is not good practice. Is there anyway I can avoid this?
If there were more classes (more than 2) in the classifier, how would I handle this in the classifier?
I am using the class and e1071 library.
Thanks.
Utility function built before the "text" argument to scan was introduced:
rd.txt <- function (txt, header = TRUE, ...)
{ tconn <- textConnection(txt)
rd <- read.table(tconn, header = header, ...)
close(tconn)
rd}
RESULT <- rd.txt(" A P
1 2 2
2 2 2
3 1 2
4 1 1
5 1 1
6 1 2
7 2 2
8 2 2
9 2 2
10 2 2
11 2 2
12 2 1
13 1 2
")
> prop.table(table(RESULT))
P
A 1 2
1 0.15385 0.23077
2 0.07692 0.53846
You can also set up prop.table to deliver row or column proportions (AKA probabilities).