How can I pull player stats from a tabbed ESPN table? - r

I've been reading through a couple of the other useful guides on pulling player and match data from ESPN using R, however I have come across a problem with tabbed tables. As shown here on the player stats for a recent rugby game, the player statistics table is tabbed into 'Scoring', 'Attacking', 'Defending' and 'Discipline'.
Using the following code (with the help of two lovely packages (RCurl and htmltab), I can pull out the first tab ('Scoring') from that page ...
# install & attach RCurl
if (!base::require(package="RCurl")) utils::install.packages("RCurl")
library(RCurl)
# install & attach htmltab
if (!base::require(package="htmltab")) utils::install.packages("htmltab")
library(htmltab)
# assign URL
theurl <- RCurl::getURL("https://www.espn.co.uk/rugby/playerstats?gameId=294854&league=270557",.opts = list(ssl.verifypeer = FALSE))
# pull tables from url
team1 <- htmltab::htmltab(theurl,which=1)
team2 <- htmltab::htmltab(theurl,which=2)
league <- htmltab::htmltab(theurl,which=3)
... in the following format, which is exactly what I wanted ...
team1
rowID LEINS Tx TA CG PG PTS
2 J LarmourFB 0 0 0 0 0 0
3 H KeenanW 0 0 0 0 0 0
4 G RingroseC 0 0 0 0 0 0
5 R HenshawC 1 0 0 0 0 5
6 J LoweW 1 0 0 0 0 5
7 R ByrneFH 0 0 2 2 0 10
8 J Gibson-ParkSH 0 1 0 0 0 0
9 C HealyP 0 0 0 0 0 0
10 R KelleherH 0 0 0 0 0 0
11 A PorterP 0 0 0 0 0 0
... however I seem unable to pull out any tab other than 'Scoring'. I'm sure I'm missing something really obvious, so would appreciate someone pointing out where I'm going wrong!
Thanks in advance!

if you check the source html-page you will see that the data is not there at the start. You can find a data-reactid-tag that indicates that the data is only loaded once you click on the new tab. So you will need to find a way to make that click on the second tab.
One option for you might be to use Selenium: https://www.rdocumentation.org/packages/RSelenium/versions/1.7.7
This would enable you to make the necessary button click.
A sample can be found here: https://www.r-bloggers.com/2014/12/scraping-with-selenium/

Related

Defining a workflow for importing the RNA-seq count data

i am getting starting with R and i read some basics and syntax to get me started with it,
now i using miodin to define a project and a case-control study design.
library(miodin)
mp <- MiodinProject(name = "MyProject", author = "Myself", path ="." )
mshow(mp)
I have a file named "randseq"in my computer hard-disk which look like this.
ID LineA_1 LineA_2 LineA_3 LineA_4 LineA_5 LineB_1 LineB_2 LineB_3 LineB_4 LineB_5 LineB_6 LineB_7 LineB_8 LineB_9
ENSG00000000003 23 1 0 0 0 1 0 0 0 0 0 3 3 0
ENSG00000000005 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENSG00000000419 0 0 0 0 0 0 0 0 4 0 0 0 0 0
Now i want to define a workflow for importing the RNA-seq count data of that file which is in a folder named analysis_with_r, using the study design. Execute the workflow and export the dataset to the project folder. Below is my code for it
mw <- MiodinWorkflow(name = 'MyProject')
mw <- mw + downloadRepositoryData(
name = 'RNA downloader',
accession = 'randseq',
repository = '/Users/aarf/Desktop/analysis_with_r/randseq.txt',
path = 'data',
type = 'processed'
)
mw <- insert(mw,mp)
mshow(mw)
mw <- execute(mw)
saveDataFile(mp)
export(mp, 'dataset', 'randseq')
After running this code i get this error
[INFO] Module terminated with the following error [ERROR] Unknown
repository/Users/aarf/Desktop/analysis_with_r/randseq.txt
[INFO] 1 modules were not executed [STATUS] Execution finished
Can anybody tell me what am i doing wrong here?

How to fix rows order with pheatmap?

I have generate a heatmap with pheatmap and for some reasons, I want that the rows appear in a predefined order.
I see in previous posts that the solution is to set the paramater cluster_row to FALSE, and to order the matrix in the order we want, like this in my case:
Otu0085 Otu0086 Otu0087 Otu0088 Otu0091
AB200 0 0 0 0 0
2 91 0 2 1 0
20CF360 0 1 0 1 0
19CF359 0 0 0 2 0
11VP12 0 0 0 0 155
11VP04 4 1 0 0 345
However, when I do:
pheatmap(shared,cluster_rows = F)
My rows are sorted alphabetically, like this:
10CF278a
11
11AA07
11CF278b
11VP03
11VP04
11VP05
11VP06
11VP08
11VP09
ANy suggestions would be welcome
Thank's by advance

how to read a specific .Matrix file in R

I have a .Matrix file, I have been told it is similar to .csv file, and I take a look by web browser, it looks like this:
%TransMat_H0004.E1.L1.S1.B1.T1
CLUSTER,,3,3,2,2,1,1,3,1,1,1,1,3,2,3,1,2,2,1,1,3,3,1,2,1,3,1,1,2,1,3,3,2,3,3,1,1,1,1,1,3,3,1,2,3,2,1,1,1,1,2,1,2,2,3,1,3,2,2,2,1,3,3,2,3,3,1,2,3,3,2,2,2,3,2,2,2,1,1,2,1,1,2,1,1,1,1,2,3,1,3,2,3,3,3,3,2,1,1,3,3,3,1,1,1,2,1,3,1,2,1,1,1,1,1,1,1,3,1,3,2,3,1,1,3,2,2,3,3,1,3,1,1,2,1,2,2,1,1,3,3,1,2,1,2,2,2,2,2,1,3,1,2,3,2,2,2,2,3,2,1,1,2,3,3,2,1,3,1,1,1,1,3,3,3,1,3,3,1,2,2,3,2,3,2,2,3,1,2,2,1,3,1,2,2,3,1,2,3,2,3,3,1,3,2,3,1,1,2,3,1,1,3,2,1,2,1,1,3,1,1,3,1,1,2,1,2,2,2,3,1,3,3,3,1,3,1,1,3,2,3,1,3,2,1,3,1,1,1,2,3,3,3,1,3,3,3,1,1,2,2,3,2,3,3,3,1,3,3,1,1,2,3,2,1,1,3,1,1,1,1,1,3,3,2,2,1,1,1,1,1,3,1,1,2,3,3,1,1,3,2,2,1,1,2,1,1,3,2,1,2,1,2,3,2,1,1,3,2,1,3,2,1,2,2,1,3,3,1,3,3,2,3,2,3,1,3,3,3,3,2,1,3,2,3,3,3,2,1,2,1,2,3,1,1,3,3,3,3,3,2,3,3,1,3,1,1,2,3,3,3,3,3,3,2,2,2,3,1,2,3,3,3,3,2,1,2,2,3,2,3,2,3,2,3,3,2,1,2,3,3,2,1,2,3,3,3,1,3,2,3,3,1,2,2,3,1,1,2,2,3,2,1,1,2,2,1,3,1,2,3,1,3,1,1,2,3,3,1,2,3,2,2,1,1,2,3,2,2,2,1,2,1,2,2,3,2,1,2,1,3,1,2,3,1,2,3,1,2,1,1,2,1,3,3,3,1,3,3,2,2,2,1,2,3,1,3,1,2,1,3,1,2,2,1,2,3,1,1,3,3,2,2,3,1,1,2,1,1,1,2,1,2,3,3,2,2,1,2,3,2,3,1,2,2,2,1,3,3,3,3,3,3,2,3,2,1,2,1,3,3,1,3,3,1,3,2,3,3,1,2,3,3,3,3,3,1,2,1,2,1,1,1,1,2,2,3,1,1,2,3,2,3,2,2,3,3,1,2,1,3,2,3,2,2,3,2,3,1,1,1,3,1,2,3,1,3,2,3,2,2,1,2,3,1,3,2,1,2,3,1,3,1,2,2,1,3,3,2,1,3,3,1,2,3,1,2,1,1,3,1,3,2,3,3,3,3,2,2,1,1,3,3,2,1,3,1,1,3,3,3,1,3,3,1,1,3,3,3,1,1,3,3,2,1,3,2,3,1,3,2,2,2,2,2,3,3,1,2,2,3,2,3,3,1,3,1,3,3,1,3,2,1,2,3,1,3,1,3,2,2,1,1,1,1,3,2,3,3,2,2,3,2,3,1,3,2,1,2,3,1,2,2,1,1,1,3,3,2,3,3,3,3,2,3,1,1,3,3,3,1,1,3,2,1,2,3,2,3,1,3,3,2,1,1,1,1,3,3,2,3,1,2,1,3,3,3,2,2,2,2,3,3,1,1,2,3,2,2,3,3,2,2,3,3,3,2,2,1,2,2,3,3,3,3,1,2,2,3,2,2,2,2,3,2,2,2,1,1,2,2,2,1,2,3,2,2,3,3,2,1,3,1,2,2,1,3,2,3,1,1,3,1,2,2,2,3,3,1,3,3,1,2,1,2,3,1,3,2,3,1,1,3,3,3,1,2,3,3,3,1,3,3,1,3,2,2,2,3,2,1,2,3,3,2,1,2,1,2,1,1,3,3,1,1,3,2,1,3,2,1,3,3,3,2,2,2,1,3,2,3,2,3,1,2,3,1,3,3,1,1,3,2,1,2,3,2,1,1,2,3,1,3,2,1,2,2,3,2,2,1,3,2,1,1,3,3,2,1,3,1,2,2,1,2,2,3,2,2,2,3,1,1,3,3,3,3,1,2,2,3,3,3,2,1,3,2,1,2,3,3,1,3,2,1,2,1,1,2,2,3,2,2,3,1,2,3,2,3,1,2,3,3,2,3,3,1,1,2,1,1,1,3,1,3,1,3,3,2,3,1,2,2,1,2,3,3,2,3,2,3,2,1,1,3,2,3,2,3,1,1,3,1,3,2,1,3,2,2,2,3,1,1,2,3,1,1,1,2,3,3,3,1,2,3,3,3,3,2,3,1,3,1,3,2,3,2,3,3,1,1,2,3,1,1,3,3,2,3,3,1,2,3,1,2,3,3,2,3,3,2,1,2,3,3,2,3,1,2,2,3,1,2,1,3,2,3,1,2,2,3,3,2,2,3,1,3,3,3,3,2,3,2,2,1,3,1,2,1,1,1,3,2,3,1,1,1,1,3,3,2,3,1,1,2,1,3,1,2,3,3,2,2,1,1,3,2,2,3,1,2,3,3,3,2,1,2,2,3,1,3,3,2,1,2,2,3,3,2,2,3,2,1,1,3,1,3,3,1,3,2,3,3,3,1,1,1,3,1,2,2,3,2,3,2,3,1,1,2,1,2,1,3,3,1,3,3,2,2,1,3,1,2,2,3,2,2,2,3,3,2,1,1,1,1,3,1,1,2,1,2,2,3,3,2,3,3,3,2,1,1,3,2,2,2,3,1,3,3,3,2,2,3,1,3,3,3,1,3,3,3,2,3,1,2,1,1,3,1,2,3,2,1,3,3,2,1,3,2,3,2,3,1,2,2,3,3,2,3,3,3,1,2,3,3,3,3,3,1,1,2,3,1,2,1,1,1,1,2,1,1,2,3,1,3,3,2,2,3,2,2,1,3,2,2,3,1,1,1,1,1,3,1,3,1,1,3,2,2,3,3,3,1,2,2,3,3,2,3,2,3,3,2,1,2,3,3,1,3,1,2,1,1,2,2,2,2,2,2,1,3,1,3,2,3,2,2,2,2,2,3,2,2,1,3,1,1,1,2,1,2,1,2,1,3,1,3,3,1,3,1,3,3,1,3,2,3,3,3,3,1,3,3,2,3,2,3,3,3,1,1,2,2,3,3,3,2,2,3,3,1,3,1,2,1,2,2,1,1,3,3,1,1,3,1,1,1,2,2,3,2,2,2,3,3,1,2,1,2,2,2,3,2,2,1,2,1,1,1,3,3,3,2,1,3,3,3,2,2,3,1,2,1,3,1,3,3,1,3,2,3,2,2,1,1,1,3,3,2,3,1,3,2,2,2,2,2,3,1,3,2,3,1,3,1,3,1,2,3,2,2,3,3,3,3,3,1,1,2,3,3,2,3,1,3,3,1,3,3,2,2,1,3,3,3,3,2,1,3,2,2,2,3,3,1,1,3,3,3,1,3,1,1,2,3,1,3,3,3,2,1,3,1,2,1,3,2,2,3,1,3,1,2,3,3,3,2,2,3,1,2,1,1,1,2,3,1,2,3,2,3,3,2,1,1,2,3,3,1,2,3,1,1,1,3,1,2,3,1,2,3,2,2,3,2,3,2,3,1,2,3,3,1,3,3,2,2,1,1,2,3,2,2,3,3,2,1,1,1,3,3,3,2,2,1,3,2,2,1,3,2,3,3,1,1,3,2,3,3,2,3,1,3,3,1,3,3,2,3,3,2,3,1,3,3,3,3,3,1,1,3,2,2,3,3,3,3,1,1,1,1,3,2,3,3,1,3,2,2,1,1,1,1,3,2,2,3,2,2,3,3,2,3,1,1,1,3,3,3,3,2,3,1,3,3,1,1,3,3,1,3,3,3,1,3,2,1,1,3,3,2,3,3,3,2,2,1,3,3,3,1,2,2,2,2,1,2,2,1,2,3,2,1,2,2,3,3,3,3,3,2,2,3,2,2,3,2,1,3,1,1,2,2,3,1,2,3,2,1,3,1,1,2,1,2,2,3,1,2,2,3,3,1,3,2,1,3,3,2,1,3,3,3,1,3,2,3,3,2,3,2,2,3,2,1,3,3,3,3,2,1,3,3,3,1,3,3,1,3,1,3,3,3
tSNE-1,,8.13846968090103,12.8635212043927,10.3864480425066,7.17083119797853,-72.7452686458686,-49.7960088439495,45.63460621346,-50.3693843293848,-53.2415432674881,-54.6891175204711,-46.4635164735514,4.49644447816871,3.98243750756555,-9.99729157677144,-98.1041739031645,14.4129117311442,21.8090838800674,-46.5547640077783,-65.8379505581324,39.8907136841164,45.2453417297103,-43.4054353275594,5.58370171555427,-82.6419520577671,42.7647608862027,-91.125151907502,-37.9838559192307,62.9924569510685,-69.108888726706,62.7774653919852,60.3873481045592,62.825
I tried to read it by read.csv:
test=read.csv('TransMat_H0004.E1.L1.S1.B1.T1.Matrix',sep='' )
str(test)
'data.frame': 33141 obs. of 1 variable:
$ X.TransMat_H0004.E1.L1.S1.B1.T1: Factor w/ 33141 levels "A1BG,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"| truncated,..: 13453 31099 31100 1 2 3 4 5 6 7 ...
how should I read it in a right format, say, first character of 'sequence'(list?I guess?) as rowname.
Thanks in advance!
sorry, I cannot provide the data link because it is unpublished; but I can tell you what the data look like:
%TransMat_H0004.E1.L1.S1.B1.T1
cluster,1,2,3,2,3….
tsne-
1,-41,-80…..
tsne-
2,-41,-80…..
tsne-
3,-41,-80…..
(and the rest are all started with gene name and number, such as)
genea, 0,2,1,0…
….
genez,0,2,1,0
my desired output is to remove the first 4 factors(cluster, tsne-1, tsne-2,tsne-3), and extract the gene transcripts matrix,such as:
V1 V2 V3 V4 V5
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
I figure this out by this:
read.csv("E2.Matrix", skip=1)
since the first row is annotation according to the bioinfor technician who arranged the .Matrix file
Thanks! # Stephan

Multiple responses in SPSS

I have multiple response questions which have 5 categories (values). I want to get respondents who answered only one category.
For example,
Respondents who answered category not 2,3,4,5.
I want only A mentions like, who are all checked A category alone. I need count of this.
Help, Please.
The following solution is assuming the data has 5 dichotomous variables - one for each of the multiple response categories.
* creating some sample data to demonstrate on.
data list list/cat1 to cat5.
begin data
1 0 0 0 1
0 1 1 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 0 1
1 0 0 0 0
1 1 1 0 0
end data.
* now checking in which cases only category 1 was chosen.
compute NumCats=sum(cat1 to cat5).
if cat1=1 and NumCats=1 onlyCat1=1.
execute.
* if instead you wish to do the same check for each of the 5 categories,
use `do repeat` this way.
do repeat cat=cat1 to cat5/only=only1 to only5.
compute only=(cat=1 and NumCats=1).
end repeat.
execute.
But ditch the EXECUTE commands. They just cause a useless data pass in this case except for immediately updating the Data Editor (instead of updating on the next data pass).

using graph.adjacency() in R

I have a sample code in R as follows:
library(igraph)
rm(list=ls())
dat=read.csv(file.choose(),header=TRUE,row.names=1,check.names=T) # read .csv file
m=as.matrix(dat)
net=graph.adjacency(adjmatrix=m,mode="undirected",weighted=TRUE,diag=FALSE)
where I used csv file as input which contain following data:
23732 23778 23824 23871 58009 58098 58256
23732 0 8 0 1 0 10 0
23778 8 0 1 15 0 1 0
23824 0 1 0 0 0 0 0
23871 1 15 0 0 1 5 0
58009 0 0 0 1 0 7 0
58098 10 1 0 5 7 0 1
58256 0 0 0 0 0 1 0
After this I used following command to check weight values:
E(net)$weight
Expected output is somewhat like this:
> E(net)$weight
[1] 8 1 10 1 15 1 1 5 7 1
But I'm getting weird values (and every time different):
> E(net)$weight
[1] 2.121996e-314 2.121996e-313 1.697597e-313 1.291034e-57 1.273197e-312 5.092790e-313 2.121996e-314 2.121996e-314 6.320627e-316 2.121996e-314 1.273197e-312 2.121996e-313
[13] 8.026755e-316 9.734900e-72 1.273197e-312 8.027076e-316 6.320491e-316 8.190221e-316 5.092790e-313 1.968065e-62 6.358638e-316
I'm unable to find where and what I am doing wrong?
Please help me to get the correct expected result and also please tell me why is this weird output and that too every time different when I run it.??
Thanks,
Nitin
Just a small working example below, much clearer than CSV input.
library('igraph');
adjm1<-matrix(sample(0:1,100,replace=TRUE,prob=c(0.9,01)),nc=10);
g1<-graph.adjacency(adjm1);
plot(g1)
P.s. ?graph.adjacency has a lot of good examples (remember to run library('igraph')).
Related threads
Creating co-occurrence matrix
Co-occurrence matrix using SAC?
The problem seems to be due to the data-type of the matrix elements. graph.adjacency expects elements of type numeric. Not sure if its a bug.
After you do,
m <- as.matrix(dat)
set its mode to numeric by:
mode(m) <- "numeric"
And then do:
net <- graph.adjacency(m, mode = "undirected", weighted = TRUE, diag = FALSE)
> E(net)$weight
[1] 8 1 10 1 15 1 1 5 7 1

Resources