I am writing an R package where only a small subset of the functions use functions from data.table. Following Wickham's advice, I added data.table in the Suggests: field of the DESCRIPTION file. I also added if(! requireNamespace("data.table", quietly=TRUE)) at the beginning of each of my functions using a function from data.table. Moeover, each time, I use a data.table-specific function, I precede it with data.table::.
However, I am still encountering problems. As the FAQ of data.table only deals with the Depends: and Imports: fields of the DESCRIPTION file, does it mean that Suggests isn't an option?
Here is a function causing problems:
depths.per.sample <- function(dat, min.reg.len=30, max.reg.len=500,
min.reg.dep=10, max.reg.dep=100,
min.reg.frac=0.25){
if(! requireNamespace("data.table", quietly=TRUE))
stop("Pkg needed for this function to work. Please install it.",
call.=FALSE)
stopifnot(data.table::is.data.table(dat))
for(col in c("ind", "flowcell", "lane", "start", "end", "depth", "fraction"))
stopifnot(col %in% colnames(dat))
## http://stackoverflow.com/a/8096882/597069
depth=fraction=chrom=ind=flowcell=lane=NULL
data.table::setkey(dat, NULL)
data.table::setkeyv(x=dat, cols=c("ind", "flowcell", "lane"))
depths.sample <- dat[end - start >= min.reg.len &
end - start <= max.reg.len,
list(depth.len=data.table::.N,
depth.min=min(data.table::.SD[,depth]),
depth.med=as.double(median(data.table::.SD[,depth])),
depth.mean=mean(data.table::.SD[,depth]),
depth.max=max(data.table::.SD[,depth]),
depth.q65=quantile(data.table::.SD[,depth], 0.65),
depth.q70=quantile(data.table::.SD[,depth], 0.70),
depth.q75=quantile(data.table::.SD[,depth], 0.75),
depth.q80=quantile(data.table::.SD[,depth], 0.80),
reg.ok=nrow(unique(data.table::.SD[depth >= min.reg.dep &
depth <= max.reg.dep &
fraction >= min.reg.frac,
list(chrom,start,end)]))),
by=list(ind,flowcell,lane)]
return(depths.sample)
}
And here are the errors:
Error in x[j] : invalid subscript type 'list'
In addition: Warning messages:
1: In min(data.table::.SD[, depth]) :
no non-missing arguments to min; returning Inf
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In mean.default(data.table::.SD[, depth]) :
argument is not numeric or logical: returning NA
4: In max(data.table::.SD[, depth]) :
no non-missing arguments to max; returning -Inf
Related
I am building an R library for Bioconductor, and one of the lines in the code seems to be problematic only when calling it from another function in RStudio (i.e. when doing the lines one by one it seems to work, and when using R in terminal it also seems to work).
This is the problematic function:
get_cres_tiled_genome <- function(cres, assembly="hg19", chr="chr22", binsize=5e3){
bins <- tileGenome(seqinfo(load_bs_genome(assembly))[chr], tilewidth=binsize, cut.last.tile.in.chrom=TRUE);bins <- bins[width(bins) == binsize];
bins <- keepStandardChromosomes(bins, pruning.mode="coarse");seqlevelsStyle(bins) <- "UCSC"
keep.bins.idx <- 1:length(bins)
cres$bin <- Rle(floor((end(cres)-1)/binsize)+1)
cres <- cres[cres$bin %in% keep.bins.idx]
binned.cres <- bins[unique(cres$bin)]
return(binned.cres)
}
Anyway this is the error:
Error in h(simpleError(msg, call)) : error in evaluating the
argument 'i' in selecting a method for function '[': 'match' requires
vector arguments
5. h(simpleError(msg, call))
4. .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "'match' requires vector
arguments",
base::quote(match(x, table, nomatch = 0L)))
3. cres$bin %in% keep.bins.idx
2. cres[cres$bin %in% keep.bins.idx] at generate_candidates.R#116
get_cres_tiled_genome(cres = cres, assembly = assembly, chr = chr,
binsize = binsize)
This leads me to suppose that the error is coming from cres[cres$bin %in% keep.bins.idx]. The question is, could there be another way to write this so that this error doesn't pop up? I don't seem to quite understand where the problem comes from, given the situation described at the start of the question.
t.test(antibioticdata$Bacteria,
antibioticdata$Inhibition,
alternative = c("two.sided"),
paired = FALSE,
var.equal = FALSE)
Here is my R code to make a t-test for a set of data on antibiotic resistance of bacteria. This gives me the error code:
Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In var(x) :
Calling var(x) on a factor x is deprecated and will become an error.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
not sure what I am doing wrong
I just met the same error. It's probably due to all the values in each group are the same.
So just write two more "if else". For me, I did
library("greenbrown")
apply(data.table, 1, function(x){
if(AllEqual(x[1:9])){return(1)}
else if(AllEqual(x[1:4]) & AllEqual(x[5:9])){return(0)} else {
t.test(as.numeric(x[1:4]), as.numeric(x[5:9]))->t.results
return(t.results$p.value)
}
})->P.for.data.table
Here is my code:
library(tidyr)
messy <- data.frame(
name = c("Wilbur", "Petunia", "Gregory"),
a = c(67, 80, 64),
b = c(56, 90, 50)
)
And I would like to use gather function with variable/function result. Borrowing from I tried:
not_messy <-messy %>%
gather_('drug', 'heartrate', paste0('a',''):paste0('b',''))
But it generated error:
Error in paste0("a", ""):paste0("b", "") : NA/NaN argument
In addition: Warning messages:
1: In lapply(.x, .f, ...) : NAs introduced by coercion
2: In lapply(.x, .f, ...) : NAs introduced by coercion
What am I missing?
With the latest version of the tidyverse functions, you are discouraged from using the underscore versions of the function for standard evaluation and instead use the rlang function syntax. In this case you can use
gather(messy, "drug", "heartrate", (!!as.name("a")):(!!as.name("b")))
(This is a beginner question, but I didn't find an answer elsewhere. Relevant posts include this one, this one, and this one, but not sure how to apply these to my case.)
When I use read.dta to import STATA format data to R, there is a warning:
> lca <- read.dta("trial.dta")
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else
paste0(labels, :
duplicated levels in factors are deprecated
Does it simply mean that the variables ("factors" in R) contain duplicate values? If so, why is this even a warning -- isn't this expected of most variables?
Try this :
don <- read.dta("trial.dta",convert.dates = T,convert.factors = F)
for(i in 1:ncol(don)){
valuelabel<-attributes(don)[[6]][i]
if(valuelabel!=""){
label<-paste("names(attributes(don)[[11]]$",valuelabel,")",sep="")
level<-paste("attributes(don)[[11]]$",valuelabel,sep="")
labels=(eval(parse(text=label)))
levels=(eval(parse(text=level)))
if(sum(duplicated(labels)) > 0){
doublon<-which(duplicated(labels))
remplace<-levels[doublon]
valueremplace<-levels[unique(labels)==names(remplace)]
don[don[,i]%in%remplace,i]<-valueremplace
labels<-unique(labels)
levels<-levels[labels]
}
don[,i]<-factor(don[,i],levels=levels,labels=labels)
}
}
I have a problem with the R package scholar
What works:
get_citation_history(SSalzberg)
What doesn't:
get_article_cite_history(SSalzberg, "any article")
Code:
article <- "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome"
SSalzberg <- "sUVeH-4AAAAJ" (Google Scholar ID)
get_article_cite_history(SSalzberg, article)
Error Message:
Error in min(years):max(years) : result would be too long a vector
In addition: Warning messages:
1: In min(years) : no non-missing arguments to min; returning Inf
2: In max(years) : no non-missing arguments to max; returning -Inf
I do not understand the error message in the context of that function and I tried another paper with an another author without success. I don't know what I am missing here.... Thanks
You have to use an article ID, not the title of the article. Probably the easiest way to get this is to retrieve the full list of pubs, which has a pubid column ...
library(scholar)
SSalzberg <- "sUVeH-4AAAAJ"
all_pubs <- get_publications(SSalzberg)
## next step is cosmetic -- the equivalent of stringsAsFactors=FALSE
all_pubs <- as.data.frame(lapply(all_pubs,
function(x) if (is.factor(x)) as.character(x) else x))
w <-grep("Ultrafast",all_pubs$title) ## publication number 3
all_pubs$title[w]
## [1] Ultrafast and memory-efficient alignment of ...
all_pubs$pubid[w] ## "Tyk-4Ss8FVUC"
ch <- get_article_cite_history(SSalzberg,all_pubs$pubid[w])
plot(cites~year,ch,type="b")