Defining a workflow for importing the RNA-seq count data - r

i am getting starting with R and i read some basics and syntax to get me started with it,
now i using miodin to define a project and a case-control study design.
library(miodin)
mp <- MiodinProject(name = "MyProject", author = "Myself", path ="." )
mshow(mp)
I have a file named "randseq"in my computer hard-disk which look like this.
ID LineA_1 LineA_2 LineA_3 LineA_4 LineA_5 LineB_1 LineB_2 LineB_3 LineB_4 LineB_5 LineB_6 LineB_7 LineB_8 LineB_9
ENSG00000000003 23 1 0 0 0 1 0 0 0 0 0 3 3 0
ENSG00000000005 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENSG00000000419 0 0 0 0 0 0 0 0 4 0 0 0 0 0
Now i want to define a workflow for importing the RNA-seq count data of that file which is in a folder named analysis_with_r, using the study design. Execute the workflow and export the dataset to the project folder. Below is my code for it
mw <- MiodinWorkflow(name = 'MyProject')
mw <- mw + downloadRepositoryData(
name = 'RNA downloader',
accession = 'randseq',
repository = '/Users/aarf/Desktop/analysis_with_r/randseq.txt',
path = 'data',
type = 'processed'
)
mw <- insert(mw,mp)
mshow(mw)
mw <- execute(mw)
saveDataFile(mp)
export(mp, 'dataset', 'randseq')
After running this code i get this error
[INFO] Module terminated with the following error [ERROR] Unknown
repository/Users/aarf/Desktop/analysis_with_r/randseq.txt
[INFO] 1 modules were not executed [STATUS] Execution finished
Can anybody tell me what am i doing wrong here?

Related

How to import and transform adjacency matrix to R edge list?

A sample of my data can be seen below. The data contains information about ties between organizations (over 2000 organizations, the csv file has 0s and 1s, and empty cells)
A2654 B0004 B0188 B1278 B1372 B1722 B2503
A2654 0 1 0 0 0 1 0
B0004 1 0 0 0 0 1 0
B0188 0 0 0 0 0 0 0
B1278 0 0 0 0 0 0 0
B1372 0 0 0 0 0 0 0
B1722 1 1 0 0 0 0 0
(1) The first problem is that I can't import this data (.csv) into R
I runt the following code dt <- read_csv2("Org_ties.csv") The problem here is that while in the csv file the first column is left empty (it should be) -- when reading it into R, read_csv() generates a label for this column "X1". I do this in order to run the next code: g=graph_from_adjacency_matrix(dtmtrx, mode="directed", weighted = T) to produce a graph. However, I get the error message below. I think it has to do with the fact that I can't read it properly.
graph.adjacency.dense(adjmatrix, mode = mode, weighted = weighted, :
not a square matrix
In addition: Warning message:
In mde(x) : NAs introduced by coercion
(2) Another puzzling thing is that I cannot seem to transform the current data structure into an edge list. How can I do that? The edge list looks something like this
V1 V2 weight
A2654 B0004 1
A2654 B0188 0
A2654 B1278 0
A2654 B1372 0
A2654 B1722 1

How can I pull player stats from a tabbed ESPN table?

I've been reading through a couple of the other useful guides on pulling player and match data from ESPN using R, however I have come across a problem with tabbed tables. As shown here on the player stats for a recent rugby game, the player statistics table is tabbed into 'Scoring', 'Attacking', 'Defending' and 'Discipline'.
Using the following code (with the help of two lovely packages (RCurl and htmltab), I can pull out the first tab ('Scoring') from that page ...
# install & attach RCurl
if (!base::require(package="RCurl")) utils::install.packages("RCurl")
library(RCurl)
# install & attach htmltab
if (!base::require(package="htmltab")) utils::install.packages("htmltab")
library(htmltab)
# assign URL
theurl <- RCurl::getURL("https://www.espn.co.uk/rugby/playerstats?gameId=294854&league=270557",.opts = list(ssl.verifypeer = FALSE))
# pull tables from url
team1 <- htmltab::htmltab(theurl,which=1)
team2 <- htmltab::htmltab(theurl,which=2)
league <- htmltab::htmltab(theurl,which=3)
... in the following format, which is exactly what I wanted ...
team1
rowID LEINS Tx TA CG PG PTS
2 J LarmourFB 0 0 0 0 0 0
3 H KeenanW 0 0 0 0 0 0
4 G RingroseC 0 0 0 0 0 0
5 R HenshawC 1 0 0 0 0 5
6 J LoweW 1 0 0 0 0 5
7 R ByrneFH 0 0 2 2 0 10
8 J Gibson-ParkSH 0 1 0 0 0 0
9 C HealyP 0 0 0 0 0 0
10 R KelleherH 0 0 0 0 0 0
11 A PorterP 0 0 0 0 0 0
... however I seem unable to pull out any tab other than 'Scoring'. I'm sure I'm missing something really obvious, so would appreciate someone pointing out where I'm going wrong!
Thanks in advance!
if you check the source html-page you will see that the data is not there at the start. You can find a data-reactid-tag that indicates that the data is only loaded once you click on the new tab. So you will need to find a way to make that click on the second tab.
One option for you might be to use Selenium: https://www.rdocumentation.org/packages/RSelenium/versions/1.7.7
This would enable you to make the necessary button click.
A sample can be found here: https://www.r-bloggers.com/2014/12/scraping-with-selenium/

How to fix rows order with pheatmap?

I have generate a heatmap with pheatmap and for some reasons, I want that the rows appear in a predefined order.
I see in previous posts that the solution is to set the paramater cluster_row to FALSE, and to order the matrix in the order we want, like this in my case:
Otu0085 Otu0086 Otu0087 Otu0088 Otu0091
AB200 0 0 0 0 0
2 91 0 2 1 0
20CF360 0 1 0 1 0
19CF359 0 0 0 2 0
11VP12 0 0 0 0 155
11VP04 4 1 0 0 345
However, when I do:
pheatmap(shared,cluster_rows = F)
My rows are sorted alphabetically, like this:
10CF278a
11
11AA07
11CF278b
11VP03
11VP04
11VP05
11VP06
11VP08
11VP09
ANy suggestions would be welcome
Thank's by advance

Error return by R predict function or underlying Rcpp

I apparently have successively used a newer R package called milr, multiple instance logistic regression. Admittedly, I do not make any claims regarding the goodness of the model. However, when I try to use the model to predict I get the error
Error in logit(cbind(1, newdata), .) : not compatible with requested type
when I call predict as follows:
miltp <- predict(milt, SQFM.te, SQFM.teb, type="bag") and
miltp <- predict(milt, SQFM.te, SQFM.teb)
However I get a NULL return when I call it as:
miltp <- predict(milt, SQFM.te, SQFM.teb, type="response") and
miltp <- predict(milt, SQFM.te, SQFM.teb, type="class")
I have tried using factors, integers and numerics, I am perplexed. My online search only yielded
Rcpp: Error: not compatible with requested type
which is not helpful for me as R and C++ is over my head. All comments are appreciated, some input info is given below I have tried some conversions
str(SQFM.te)
'data.frame': 100369 obs. of 5 variables:
$ arstmade: int 0 0 0 0 0 0 0 0 0 0 ...
$ perstop : int 0 0 0 0 0 0 0 0 0 0 ...
$ trhsloc : int 0 0 0 0 0 0 0 0 0 0 ...
$ acrept : int 0 0 0 0 0 0 0 0 0 0 ...
$ radio : int 1 1 1 1 1 1 1 1 1 1 ...
str(SQFM.teb)
int [1:100369] 3 3 3 3 3 3 3 3 3 3 ...
print(milt)
Coefficients:
intercept arstmade perstop trhsloc acrept radio
-1.69306 -0.09544 -7.95369 -0.53375 0.16506 -0.61778
Residual Deviance: Inf
BIC: Inf

UFF58 File reader using R Program

I have a input uff file with 'n' no.of channels. I want to read the UFF file and also split the values based on each individual channel. Then store the result for each channel in separate file. Each channel always start with '-1' '58' etc., and end with '-1'.
Example channel_01 from the input UFF file:
-1
58
filename
22-Mar-2016 10:16:53
164
MnBrgFr-AC225R/N;50.9683995923 mV/m/s2
0 0 0 0 channel_01 0 0 NONE 0 0
2 1048576 1 0.00000E+00 8.19669930804e-06 0.00000E+00
17 0 0 0 Time s
1 0 0 0 MnBrgFr-AC225R/N m/s2
0 0 0 0 NONE NONE
0 0 0 0 NONE NONE
392.665124452 392.659048025 392.658404832 392.661676933 392.665882251 392.671989083
392.67634175 392.673743248 392.672398388 392.669360175 392.665533757 392.66088639
392.660390546 392.660975268 392.663400693 392.662668621 392.661209156 392.65498538
392.649463269 392.649580214 392.649259786 392.658580248 392.664715147 392.667051694
-1

Resources