getting Error while installing DMwR package - r

hi i am getting this error message while installing DMwR package from RGUI-3.3.1.
Error in read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package", "Type")) :
cannot open the connection
In addition: Warning messages:
1: In unzip(zipname, exdir = dest) : error 1 in extracting from zip file
2: In read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package", "Type")) :
cannot open compressed file 'bitops/DESCRIPTION', probable reason 'No such file or directory'

Approach 1:
The error being reported is inability to open a connection. In Windows that is often a firewall problem and is in the Windows R FAQ. The usual first attempt should be to run internet2.dll. From a console session you can use:
setInternet2(TRUE)
NEWS for R version 3.3.1 Patched (2016-09-13 r71247)
(Windows only) Function
setInternet2()
has no effect and will be removed in due
course. The choice between methods
"internal"
and
"wininet"
is now made by the
method
arguments of
url()
and
download.file()
and their defaults can be set
via
options. The out-of-the-box default remains
"wininet"
(as it has been since
R
3.2.2)
You are using version 3.3.1, this is why it is not working anymore.
Approach 2
The error is suggesting that the package requires another package bitops that is not available. That package is not in any of the dependencies but perhaps one of the dependencies requires it in turn(In this case, it is: ROCR).
Try installing:
install.packages("bitops",repos="https://cran.r-project.org/bin/windows/contrib/3.3/bitops_1.0-6.zip",dependencies=TRUE,type="source")
The package DMwR contains packages abind, zoo, xts, quantmod and ROCR as imports. So, additionally to installing 5 packages you must install DMwR package, Install these packages manually.
Install packages in following sequence:
install.packages('abind')
install.packages('zoo')
install.packages('xts')
install.packages('quantmod')
install.packages('ROCR')
install.packages("DMwR")
library("DMwR")
Approach 3:
chooseCRANmirror()
Select CRAN mirror from popup list. Then install packages:
install.packages("bitops")
install.packages("DMwR")

Package ‘DMwR’ was removed from the CRAN repository.
Formerly available versions can be obtained from the archive.
https://CRAN.R-project.org/package=DMwR
You can use the function as written in CRAN package. Copy the following code in a new RScript, run it and save it for future use if you want. Once you run this function, you should be able to use the way you have been trying to use it.
# ===================================================
# Creating a SMOTE training sample for classification problems
#
# If called with learner=NULL (the default) is does not
# learn any model, simply returning the SMOTEd data set
#
# NOTE: It does not handle NAs!
#
# Examples:
# ms <- SMOTE(Species ~ .,iris,'setosa',perc.under=400,perc.over=300,
# learner='svm',gamma=0.001,cost=100)
# newds <- SMOTE(Species ~ .,iris,'setosa',perc.under=300,k=3,perc.over=400)
#
# L. Torgo, Feb 2010
# ---------------------------------------------------
SMOTE <- function(form,data,
perc.over=200,k=5,
perc.under=200,
learner=NULL,...
)
# INPUTS:
# form a model formula
# data the original training set (with the unbalanced distribution)
# minCl the minority class label
# per.over/100 is the number of new cases (smoted cases) generated
# for each rare case. If perc.over < 100 a single case
# is generated uniquely for a randomly selected perc.over
# of the rare cases
# k is the number of neighbours to consider as the pool from where
# the new examples are generated
# perc.under/100 is the number of "normal" cases that are randomly
# selected for each smoted case
# learner the learning system to use.
# ... any learning parameters to pass to learner
{
# the column where the target variable is
tgt <- which(names(data) == as.character(form[[2]]))
minCl <- levels(data[,tgt])[which.min(table(data[,tgt]))]
# get the cases of the minority class
minExs <- which(data[,tgt] == minCl)
# generate synthetic cases from these minExs
if (tgt < ncol(data)) {
cols <- 1:ncol(data)
cols[c(tgt,ncol(data))] <- cols[c(ncol(data),tgt)]
data <- data[,cols]
}
newExs <- smote.exs(data[minExs,],ncol(data),perc.over,k)
if (tgt < ncol(data)) {
newExs <- newExs[,cols]
data <- data[,cols]
}
# get the undersample of the "majority class" examples
selMaj <- sample((1:NROW(data))[-minExs],
as.integer((perc.under/100)*nrow(newExs)),
replace=T)
# the final data set (the undersample+the rare cases+the smoted exs)
newdataset <- rbind(data[selMaj,],data[minExs,],newExs)
# learn a model if required
if (is.null(learner)) return(newdataset)
else do.call(learner,list(form,newdataset,...))
}
# ===================================================
# Obtain a set of smoted examples for a set of rare cases.
# L. Torgo, Feb 2010
# ---------------------------------------------------
smote.exs <- function(data,tgt,N,k)
# INPUTS:
# data are the rare cases (the minority "class" cases)
# tgt is the name of the target variable
# N is the percentage of over-sampling to carry out;
# and k is the number of nearest neighours to use for the generation
# OUTPUTS:
# The result of the function is a (N/100)*T set of generated
# examples with rare values on the target
{
nomatr <- c()
T <- matrix(nrow=dim(data)[1],ncol=dim(data)[2]-1)
for(col in seq.int(dim(T)[2]))
if (class(data[,col]) %in% c('factor','character')) {
T[,col] <- as.integer(data[,col])
nomatr <- c(nomatr,col)
} else T[,col] <- data[,col]
if (N < 100) { # only a percentage of the T cases will be SMOTEd
nT <- NROW(T)
idx <- sample(1:nT,as.integer((N/100)*nT))
T <- T[idx,]
N <- 100
}
p <- dim(T)[2]
nT <- dim(T)[1]
ranges <- apply(T,2,max)-apply(T,2,min)
nexs <- as.integer(N/100) # this is the number of artificial exs generated
# for each member of T
new <- matrix(nrow=nexs*nT,ncol=p) # the new cases
for(i in 1:nT) {
# the k NNs of case T[i,]
xd <- scale(T,T[i,],ranges)
for(a in nomatr) xd[,a] <- xd[,a]==0
dd <- drop(xd^2 %*% rep(1, ncol(xd)))
kNNs <- order(dd)[2:(k+1)]
for(n in 1:nexs) {
# select randomly one of the k NNs
neig <- sample(1:k,1)
ex <- vector(length=ncol(T))
# the attribute values of the generated case
difs <- T[kNNs[neig],]-T[i,]
new[(i-1)*nexs+n,] <- T[i,]+runif(1)*difs
for(a in nomatr)
new[(i-1)*nexs+n,a] <- c(T[kNNs[neig],a],T[i,a])[1+round(runif(1),0)]
}
}
newCases <- data.frame(new)
for(a in nomatr)
newCases[,a] <- factor(newCases[,a],levels=1:nlevels(data[,a]),labels=levels(data[,a]))
newCases[,tgt] <- factor(rep(data[1,tgt],nrow(newCases)),levels=levels(data[,tgt]))
colnames(newCases) <- colnames(data)
newCases
}

It has been removed from the CRAN library. There are instructions on how to retrieve it from the archive.
Either follow the link - https://packagemanager.rstudio.com/client/#/repos/2/packages/DMwR
OR copy-paste the three lines of code mentioned below:
install.packages("devtools")
devtools::install_version('DMwR', '0.4.1')
library("DMwR")
EDIT: this is the error I got while downloading the DMwR package in 2022, but looks like when the question was posted, the error happened because of another reason.

The reason is that the package 'DMwR' was built under R version 3.4.3 So the solution is actually explained in the marked answer in details. Hence, to be short:Just run the script below to get the problem solved! 
install.packages('abind')
install.packages('zoo')
install.packages('xts')
install.packages('quantmod')
install.packages('ROCR')
install.packages("DMwR")
library("DMwR")

Related

R code error "Assigned data `TL$x[TL$products == products$product]` must be compatible with existing data."

Am trying to calculate an indicator called "pesticide load index" using the package of the same name available on R. Everything goes well until I get to the
products$TL <- TL$x[TL$products==products$product]
products$FL <- FL$x[FL$products==products$product]
part. There i get the error msg :
Error:
! Assigned data TL$x[TL$products == products$product] must be compatible with existing data.
x Existing data has 6 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
Run rlang::last_error() to see where the error occurred.
Here is the code used for that:
# Read-in data on Products (Fill out Table_R_substances.xlsx) ####
products <- read_excel("./Table_R_products.xlsx")
####### Health Load Computation #####
products$HL <- products$formula*(products$sum.risk.score/products$reference.sum.risk.scores)
###### Output: Load Indicator Values ######
substances$TL.products <- substances$concentration*substances$Environmental.Toxicity.Substance
substances$FL.products <- substances$concentration*substances$Fate.Load.substances
TL <- aggregate(substances$TL.products, by=list(products=substances$product), FUN=sum)
FL <- aggregate(substances$FL.products, by=list(products=substances$product),FUN=sum)
products$TL <- TL$x[TL$products==products$product]
products$FL <- FL$x[FL$products==products$product]
products$L <- products$HL + products$TL + products$FL
###### Output: Load Index Values #######
# caculate STI-Quotient
products$STI <- products$amount.applied/products$standard.doses
Any chance on how to fix this? Thanks in advance

How to apply rma() normalization to a unique CEL file?

I have implemented a R script that performs batch correction on a gene expression dataset. To do the batch correction, I first need to normalize the data in each CEL file through the Affy rma() function of Bioconductor.
If I run it on the GSE59867 dataset obtained from GEO, everything works.
I define a batch as the data collection date: I put all the CEL files having the same date into a specific folder, and then consider that date/folder as a specific batch.
On the GSE59867 dataset, a batch/folder contains only 1 CEL file. Nonetheless, the rma() function works on it perfectly.
But, instead, if I try to run my script on another dataset (GSE36809), I have some troubles: if I try to apply the rma() function to a batch/folder containing only 1 file, I get the following error:
Error in `colnames<-`(`*tmp*`, value = "GSM901376_c23583161.CEL.gz") :
attempt to set 'colnames' on an object with less than two dimensions
Here's my specific R code, to let you understand.
You first have to download the file GSM901376_c23583161.CEL.gz:
setwd(".")
options(stringsAsFactors = FALSE)
fileURL <- "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM901nnn/GSM901376/suppl/GSM901376%5Fc23583161%2ECEL%2Egz"
fileDownloadCommand <- paste("wget ", fileURL, " ", sep="")
system(fileDownloadCommand)
Library installation:
source("https://bioconductor.org/biocLite.R")
list.of.packages <- c("easypackages")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
listOfBiocPackages <- c("oligo", "affyio","BiocParallel")
bioCpackagesNotInstalled <- which( !listOfBiocPackages %in% rownames(installed.packages()) )
cat("package missing listOfBiocPackages[", bioCpackagesNotInstalled, "]: ", listOfBiocPackages[bioCpackagesNotInstalled], "\n", sep="")
if( length(bioCpackagesNotInstalled) ) {
biocLite(listOfBiocPackages[bioCpackagesNotInstalled])
}
library("easypackages")
libraries(list.of.packages)
libraries(listOfBiocPackages)
Application of rma()
thisFileDate <- "GSM901376_c23583161.CEL.gz"
thisDateRawData <- read.celfiles(thisDateCelFiles)
thisDateNormData <- rma(thisDateRawData)
After the call to rma(), I get the error.
How can I solve this problem?
I also tried to skip this normalization, by saving the thisDateRawData object directly. But then I have the problem that I cannot combine together this thisDateRawData (that is a ExpressionFeatureSet) with the outputs of rma() (that are ExpressionSet objects).
(EDIT: I extensively edited the question, and added a piece of R code you should be able to run on your pc.)
Hmm. This is a puzzling problem. the oligo::rma() function might be buggy for class GeneFeatureSet with single samples. I got it to work with a single sample by using lower-level functions, but it means I also had to create the expression set from scratch by specifying the slots:
# source("https://bioconductor.org/biocLite.R")
# biocLite("GEOquery")
# biocLite("pd.hg.u133.plus.2")
# biocLite("pd.hugene.1.0.st.v1")
library(GEOquery)
library(oligo)
# # Instead of using .gz files, I extracted the actual CELs.
# # This is just to illustrate how I read in the files; your usage will differ.
# projectDir <- "" # Path to .tar files here
# setwd(projectDir)
# untar("GSE36809_RAW.tar", exdir = "GSE36809")
# untar("GSE59867_RAW.tar", exdir = "GSE59867")
# setwd("GSE36809"); gse3_cels <- dir()
# sapply(paste(gse3_cels, sep = "/"), gunzip); setwd(projectDir)
# setwd("GSE59867"); gse5_cels <- dir()
# sapply(paste(gse5_cels, sep = "/"), gunzip); setwd(projectDir)
#
# Read in CEL
#
# setwd("GSE36809"); gse3_cels <- dir()
# gse3_efs <- read.celfiles(gse3_cels[1])
# # Assuming you've read in the CEL files as a GeneFeatureSet or
# # ExpressionFeatureSet object (i.e. gse3_efs in this example),
# # you can now fit the RMA and create an ExpressionSet object with it:
exprsData <- basicRMA(exprs(gse3_efs), pnVec = featureNames(gse3_efs))
gse3_expset <- new("ExpressionSet")
slot(gse3_expset, "assayData") <- assayDataNew(exprs = exprsData)
slot(gse3_expset, "phenoData") <- phenoData(gse3_efs)
slot(gse3_expset, "featureData") <- annotatedDataFrameFrom(attr(gse3_expset,
'assayData'), byrow = TRUE)
slot(gse3_expset, "protocolData") <- protocolData(gse3_efs)
slot(gse3_expset, "annotation") <- slot(gse3_efs, "annotation")
Hopefully the above approach will work in your code.

Valuation of a Vanilla Interest Rate Swap using the "RQuantLib" Package

I am modelling a Vanilla Interest Rate Swap using the "RQuantLib" Package. I am following the example given in the Cran Paper "RQuantLib". For the Fixed Leg of the Interest Rate Swap, the given R code in the example is;
bond <- list(faceAmount=100,
issueDate=as.Date("2004-11-30"),
maturityDate=as.Date("2008-11-30"),
redemption=100,
effectiveDate=as.Date("2004-11-30"))
dateparams <- list(settlementDays=1,
calendar="us", dayCounter = 'Thirty360', period=2,
businessDayConvention = 4, terminationDateConvention=4,
dateGeneration=1, endOfMonth=1)
coupon.rate <- c(0.02875)
params <- list(tradeDate=as.Date('2002-2-15'),
settleDate=as.Date('2002-2-19'),
dt=.25,
interpWhat="discount",
interpHow="loglinear")
setEvaluationDate(as.Date("2004-11-22"))
discountCurve.flat <- DiscountCurve(params, list(flat=0.05))
FixedRateBond(bond, coupon.rate, discountCurve.flat, dateparams)
#Same bond with a discount curve constructed from market quotes
tsQuotes <- list(d1w =0.0382,
d1m =0.0372,
fut1=96.2875,
fut2=96.7875,
fut3=96.9875,
fut4=96.6875,
fut5=96.4875,
fut6=96.3875,
fut7=96.2875,
fut8=96.0875,
s3y =0.0398,
s5y =0.0443,
s10y =0.05165,
s15y =0.055175)
discountCurve <- DiscountCurve(params, tsQuotes)
FixedRateBond(bond, coupon.rate, discountCurve, dateparams)
#example with default dateparams
FixedRateBond(bond, coupon.rate, discountCurve)
##exampe with defaul bond parameter and dateparams
bond <- list(issueDate=as.Date("2004-11-30"),
maturityDate=as.Date("2008-11-30"))
dateparams <- list(calendar="us",
dayCounter = "ActualActual",
period="Annual")
FixedRateBond(bond, coupon.rate, discountCurve, dateparams)
However, in this R Code the following transcripts are showing errors;
setEvaluationDate(as.Date("2004-11-22"))
discountCurve.flat <- DiscountCurve(params, list(flat=0.05))
and the errors are as below;
> setEvaluationDate(as.Date("2004-11-22"))
Error: could not find function "setEvaluationDate"
> discountCurve.flat <- DiscountCurve(params, list(flat=0.05))
Error: could not find function "DiscountCurve"
I have tried to investigate why the R Code is failing to compile but I failed. Can anyone assist please?
Maybe you didn't load the package :
R> library(RQuantLib)
R> setEvaluationDate(Sys.Date())
[1] TRUE
R>

R2WinBUGS error in R

I'm trying to duplicate some code and am running into troubles with WinBUGS. The code was written in 2010 and I think that back then, the package was installed with additional files which R is now looking for and can't find (hence the error), but I'm not sure.
R stops trying to run #bugs.directory (see code) and the error is:
Error in file(con, "rb") : cannot open the connection
In addition: Warning message:
In file(con, "rb") :
cannot open file 'C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS/System/Rsrc/Registry.odc': No such file or directory
Error in bugs.run(n.burnin, bugs.directory, WINE = WINE, useWINE = useWINE, :
WinBUGS executable does not exist in C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS
I have the results of the analysis so if there is another way of conducting a Bayesian analysis for the "rawdata" file (in the 14 day model with [-3,0] event window) or if someone would PLEASE shed some light on what's wrong with the code, I would be forever grateful.
The code is:
rm(list=ls(all=TRUE))
setwd("C:/Users/Hiwi/Dropbox/Oracle/Oracle CD files/analysis/chapter6_a")
library(foreign)
rawdata <- read.dta("nyt.dta",convert.factors = F)
library(MASS)
summary(glm.nb(rawdata$num_events_14 ~ rawdata$nyt_num))
# WinBUGS code
library("R2WinBUGS")
nb.model <- function(){
for (i in 1:n){ # loop for all observations
# stochastic component
dv[i]~dnegbin( p[i], r)
# link and linear predictor
p[i] <- r/(r+lambda[i])
log(lambda[i] ) <- b[1] + b[2] * iv[i]
}
#
# prior distributions
r <- exp(logr)
logr ~ dnorm(0.0, 0.01)
b[1]~dnorm(0,0.001) # prior (please note: second element is 1/variance)
b[2]~dnorm(0,0.001) # prior
}
write.model(nb.model, "negativebinomial.bug")
n <- dim(rawdata)[1] # number of observations
winbug.data <- list(dv = rawdata$num_events_14,
iv = rawdata$nyt_num,
n=n)
winbug.inits <- function(){list(logr = 0 ,b=c(2.46,-.37)
)} # Ausgangswerte aus der Uniformverteilung zwischen -1 und 1
bug.erg <- bugs(data=winbug.data,
inits=winbug.inits,
#inits=NULL,
parameters.to.save = c("b","r"),
model.file="negativebinomial.bug",
n.chains=3, n.iter=10000, n.burnin=5000,
n.thin=1,
codaPkg=T,
debug=F,
#bugs.directory="C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS/"
bugs.directory="C:/Users/Hiwi/Documents/R/Win-library/3.0/R2winBUGS"
)
tempdir()
setwd(tempdir())
file.rename("codaIndex.txt","simIndex.txt")
file.rename("coda1.txt","sim1.txt")
file.rename("coda2.txt","sim2.txt")
file.rename("coda3.txt","sim3.txt")
posterior <- rbind(read.coda("sim1.txt","simIndex.txt"),read.coda("sim2.txt","simIndex.txt"),read.coda("sim3.txt","simIndex.txt"))
post.df <- as.data.frame(posterior)
summary(post.df)
quantile(post.df[,2],probs=c(.025,.975))
quantile(post.df[,2],probs=c(.05,.95))
quantile(post.df[,2],probs=c(.10,.90))
tempdir()
Difficult to say for sure without sitting at your PC... Maybe it is something to do with R2WinBUGS looking in the wrong directory for WinBUGS.exe? You can point R2WinBUGS to the right place using the bugs.directory argument in the bugs function.
If not, try and install OpenBUGS and give R2OpenBUGS a go.

Get the most expressed genes from one .CEL file in R

In R the Limma package can give you a list of differentially expressed genes.
How can I simply get all the probesets with highest signal intensity in the respect of a threshold?
Can I get only the most expressed genes in an healty experiment, for example from one .CEL file? Or the most expressed genes from a set of .CEL files of the same group (all of the control group, or all of the sample group).
If you run the following script, it's all ok. You have many .CEL files and all work.
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("GEOquery","affy","limma","gcrma"))
gse_number <- "GSE13887"
getGEOSuppFiles( gse_number )
COMPRESSED_CELS_DIRECTORY <- gse_number
untar( paste( gse_number , paste( gse_number , "RAW.tar" , sep="_") , sep="/" ), exdir=COMPRESSED_CELS_DIRECTORY)
cels <- list.files( COMPRESSED_CELS_DIRECTORY , pattern = "[gz]")
sapply( paste( COMPRESSED_CELS_DIRECTORY , cels, sep="/") , gunzip )
celData <- ReadAffy( celfile.path = gse_number )
gcrma.ExpressionSet <- gcrma(celData)
But if you delete all .CEL files manually but you leave only one, execute the script from scratch, in order to have 1 sample in the celData object:
> celData
AffyBatch object
size of arrays=1164x1164 features (17 kb)
cdf=HG-U133_Plus_2 (54675 affyids)
number of samples=1
number of genes=54675
annotation=hgu133plus2
notes=
Then you'll get the error:
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) :
variable lengths differ (found for 'x')
How can I get the most expressed genes from 1 .CEL sample file?
I've found a library that could be useful for my purpose: the panp package.
But, if you run the following script:
if(!require(panp)) { biocLite("panp") }
library(panp)
myGDS <- getGEO("GDS2697")
eset <- GDS2eSet(myGDS,do.log2=TRUE)
my_pa <- pa.calls(eset)
you'll get an error:
> my_pa <- pa.calls(eset)
Error in if (chip == "hgu133b") { : the argument has length zero
even if the platform of the GDS is that expected by the library.
If you run with the pa.call() with gcrma.ExpressionSet as parameter then all work:
my_pa <- pa.calls(gcrma.ExpressionSet)
Processing 28 chips: ############################
Processing complete.
In summary, If you run the script you'll get an error while executing:
my_pa <- pa.calls(eset)
and not while executing
my_pa <- pa.calls(gcrma.ExpressionSet)
Why if they are both ExpressionSet?
> is(gcrma.ExpressionSet)
[1] "ExpressionSet" "eSet" "VersionedBiobase" "Versioned"
> is(eset)
[1] "ExpressionSet" "eSet" "VersionedBiobase" "Versioned"
Your gcrma.ExpressionSet is an object of class "ExpressionSet"; working with ExpressionSet objects is described in the Biobase vignette
vignette("ExpressionSetIntroduction")
also available on the Biobase landing page. In particular the matrix of summarized expression values can be extracted with exprs(gcrma.ExpressionSet). So
> eset = gcrma.ExpressionSet ## easier to display
> which(exprs(eset) == max(exprs(eset)), arr.ind=TRUE)
row col
213477_x_at 22779 24
> sampleNames(eset)[24]
[1] "GSM349767.CEL"
Use justGCRMA() rather than ReadAffy as a faster and more memory efficient way to get to an ExpressionSet.
Consider asking questions about Biocondcutor packages on the Bioconductor support site where you'll get fast responses from knowledgeable members.

Resources