I cannot load data from R package from R-Forge - r

I am trying to figure out the 'factorAnalytics' package from R-Forge. Everything seems to load fine, but when i try to walk through an example, I cannot load the data.
install.packages("factorAnalytics", repos="http://R-Forge.R-project.org")
This works fine and everything is installed. I now try to walk through the example in the vignette:
# Load fundamental and return data
data(Stock.df)
# fit a fundamental factor model
exposure.vars <- c("BOOK2MARKET", "LOG.MARKETCAP")
fit <- fitFfm(data=stock, asset.var="TICKER", ret.var="RETURN",
date.var="DATE", exposure.vars=exposure.vars)
names(fit)
Which provides this:
Warning message:
In data(Stock.df) : data set ‘Stock.df’ not found
But if I run this:
data(package = "factorAnalytics")
I see that the dataset should be there...
Data sets in package ‘factorAnalytics’:
factors.M (CommonFactors) Factor set of several commonly used factors
factors.Q (CommonFactors) Factor set of several commonly used factors
managers Hypothetical Alternative Asset Manager and Benchmark Data
r.M (StockReturns) Stock Return Data
r.W (StockReturns) Stock Return Data
stock (Stock.df) Fundamental and return data for 447 NYSE stocks
tr.yields (TreasuryYields) Treasury yields at different maturities
So what am I missing??

I am an idiot. I did not include the library after installing the package. Sorry all.
I just ran this:
library("factorAnalytics")

You can try like this:
load("E:\\R-3.3.1\\library\\factorAnalytics\\data\\Stock.df.RData")
You should find the path where "Stock.df.RData" is.

Related

ViSEAGO tutorial: visualising topGO object

Earlier, I had posted a question and was able to load in my data successfully and create a topGO object after some help. I'm trying to visualise GO terms that are significantly associated with the list of differentially expressed genes that I have from mouse RNA-seq data.
Now, I'd want to raise a concern about ViSEAGO's tutorial. The tutorial initially specifies loading two files: 'selection.txt' and 'background.txt'. The origin of these files is not clearly stated. However, after a lot of digging into topGO's documentation, I was able to find the datatypes for each of the files. But, even after following these, I have a problem running the following code. Does anyone have any insights to share?
WORKING CODE:
mysampleGOdata <- new("topGOdata",
description = "my Simple session",
ontology = "BP",
allGenes = geneList_new,
nodeSize = 1,
annot = annFUN.org,
mapping="org.Mm.eg.db",
ID = "SYMBOL")
resultFisher <- runTest(mysampleGOdata, algorithm = "classic", statistic = "fisher")
head(GenTable(mysampleGOdata,fisher=resultFisher),20)
myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
PROBLEMS:
> head(myNewBP,2)
GO.ID Term Annotated Significant Expected fisher
1 GO:0006006 glucose metabolic process 194 12 0.19 1.0e-19
2 GO:0019318 hexose metabolic process 223 12 0.22 5.7e-19
> ###################
> # merge results
> myBP_sResults<-ViSEAGO::merge_enrich_terms(
+ Input=list(
+ condition=c("mysampleGOdata","resultFisher")
+ )
+ )
Error in setnames(x, value) :
Can't assign 3 names to a 2 column data.table
> myNewBP<-GenTable(mysampleGOdata,fisher=resultFisher)
> ###################
> # display the merged table
> ViSEAGO::show_table(myNewBP)
Error in ViSEAGO::show_table(myNewBP) :
object must be enrich_GO_terms, GO_SS, or GO_clusters class objects
According to the tutorial, the printed table contains for each enriched GO terms, additional columns including the list of significant genes and frequency (ratio of the number of significant genes to the number of background genes) evaluated by comparison. I think I have that, but it's definitely not working.
Can someone see why? I'm not very clear on this.
Thanks!
I think you try to circumvent an error you made at the beginning. You receive the error due to the fact that you did not use the wrapper function from the ViSEAGO package. As you stated in your last question, you had initial problems formatting your data.
Here are some tips:
The "selection" file is a character vector with your DEGs names or IDs. I recommend using EntrezID's.
The "Background" file is a character vector with known genes. I recommend using EntrezID's as well. You can easily generate this character vector with:
background=keys(org.Hs.eg.db, keytype ='ENTREZID').
With these two files, you can easily proceed to the next steps of the package as described in the vignette.
# connect to EntrezGene
EntrezGene<-ViSEAGO::EntrezGene2GO()
# load GO annotations from EntrezGene
# with the add of GO annotations from orthologs genes (see above)
#id = "9606" = homo sapiens
myGENE2GO<-ViSEAGO::annotate(id="9606", EntrezGene)
BP<-ViSEAGO::create_topGOdata(
geneSel = selection, #your DEG vector
allGenes = background, #your created background vector
gene2GO=myGENE2GO,
ont="BP",
nodeSize=5
)
classic<-topGO::runTest(
BP,
algorithm ="classic",
statistic = "fisher"
)
# merge results
BP_sResults<-ViSEAGO::merge_enrich_terms(
Input=list(
condition=c("BP","classic")
)
)
You should get a merged list of your enriched GO terms with the corresponding statistical tests you prefer.
I have faced this problem recently, it was very frustrating. In my case the whole issue seemed to be related to the package version I was using.
I used conda to install ViSEAGO. Nevertheless, R's version in my conda environment was a bit old (i.e. 3.6.1 to be specific). Therefore, when installing ViSEAGO with conda, the version 1.0.0 of the package was installed. Please note that the most recent version of ViSEAGO is 1.4.0.
Therefore, I created a conda environment with R version 4.0.3, and repeated the procedure to install ViSEAGO by using conda. When doing this, ViSEAGO's 1.4.0 version was installed, and everything went fine.
I've tried to backtrack the error, and only find one thing: in the older ViSEAGO version, the function Custom2GO loaded tables with 4 columns; in the most recent version it admits 5 columns (the new one being 'gene_symbol'). I think this disagreement might be part of the issue, as the source code of the function merge_enrich_terms seems to deal with the columns 'gene_id' and 'gene_symbol' at some point, but I'm not sure.
Hope you find my comment helpful!
Cheers,
Mauricio

ClusterLongData kml package export to scv

I am clustering time series in R using package KmL. I have read both manual and paper how to use this package, but Im not very clear how to export the results (data frame, where each trajectories are assign to some clusters e.g.
trajectory (i), time1, time2, time3, clustername)
I have read several answers here Output from 'choice' in R's kml
but if I do the same (run choice(myCld, typeGraph= "bmp")) my R says:
~ Choice : menu ~ 'Arrow' : change partition 'Space' : select/unselect a partition ... etc. e : change the display (both)
~ 'Return' when its done ~
And only thing that is saved to my library is myCld.Rdata and it runs for very long time without any more results. (my dataset: N trajectories, with time= 1:53) I want to have csv. files as manual suggest (objectName-Cx-y-Clusters.csv)
I am also not very clear WHERE should I PRESS on "Return" or "Arrow"? There is no option to press on anything in my Rstudio workplace.
I am really a beginner with R so any help would be appreciated. Thanks!
I am not sure that kml is compatible with Rstudio. The older version of Rstudio did not handle instruction like getGraphicsEvent that was used by choice.

Correct way to handle dataset dependencies in package development?

I'm attempting to build a package that depends on some data from another package. Writing R Extensions says to avoid the use of require in package functions. I may not use all the tables in the Lahman package, and am currently importing them this way...
team.batting <- function(year, league, playoffs = FALSE)
{
...
Batting <- Lahman::Batting
Teams <- Lahman::Teams
## calculations, subsets, etc.
...
}
Is this correct? If not, what is the correct way to call an exported data set in a package function? And is the end user required to have the package installed for this to work?
Also, I'm not really clear on what a development version is, as compared to an installed version. If anyone could shed some light, I'd appreciate it.
After some research, I've determined the the correct way to do this is to include the directive
import(Lahman)
in the NAMESPACE file of my package (or possibly importFrom(Lahman, table name) depending on how many tables are used). After doing this, the :: calls can be removed.
team.batting <- function(year, league, playoffs = FALSE)
{
...
bat <- Batting
tms <- Teams
## calculations, subsets, etc.
...
}

How can I bin data in bigvis package R for a non-numeric data set?

I am trying to use bin() on my big data set. I am using the Lahman data set as an example: http://www.seanlahman.com/baseball-archive/statistics/
I am using the comma-delimited version and looking at the 'Batting' csv file. My data set will be much, much bigger, but if my program cannot handle this, it can't handle my bigger data set.
This is what I am trying to do currently:
> require(devtools)
> require(bigvis)
> bigData <- read.csv("GCdataViz/lahman2012-csv/Batting.csv")
> bigDataNum <- bigData[,sapply(bigData,is.numeric)]
> bin(bigDataNum)
Error: is.numeric(x) is not TRUE
I first got an error when I tried to use bin() because my data set wasn't all numeric. So I used sapply() with the is.numeric parameter, but still got the error that my data set wasn't numeric.
The bigvis library doesn't have much documentation. Should I smooth() after the condense method or go ahead and autoplot(). Is there anyway I can specify plots like bar graphs, line graphs, box, etc?
EDIT: my error:

Are there known compatibility issues with R package mgcv? Are there general rules for compatibility?

I use R version 2.15.1 (2012-06-22) and mgcv version 1.7-22
I load the following set of packages in R:
library(sqldf)
library(timeDate)
library(forecast)
library(xts)
library(tseries)
library(MASS)
library(mgcv)
It happens that I can not run a simple model (I omit the code). Even the sample code taken from the help pages:
dat = gamSim(1,n=400,dist="normal",scale=2)
b = gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
gives an error:
Error in qr.qty(qrc, sm$S[[l]]) :
NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning message:
In smoothCon(split$smooth.spec[[i]], data, knots, absorb.cons, scale.penalty = scale.penalty, :
number of items to replace is not a multiple of replacement length
Note that everything works fine, if I just load the package mgcv and then use the sample code right away. It also works if I just load all the packages and run the sample code. It just does not work if I
load all packages
do some file reading, sqldf statements, ts operations and some models from package forecast.
if I then apply GAM, it does not work anymore.
Apparently the variable definitions in the general environment mess up the functioning of the package.
Are there any known issues? Are there general rules that I have to obey if I load various packages? Can I write code that "disturbed" the package mgcv?
# Richard there are 2 GAM related packages: gam and mgcv. Loading both libraries at the same time usually causes a conflict.
Loading mgcv as the first package solved my problem ... strange but true.

Resources