R package "scholar" / getting the citation history of an article - r

I have a problem with the R package scholar
What works:
get_citation_history(SSalzberg)
What doesn't:
get_article_cite_history(SSalzberg, "any article")
Code:
article <- "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome"
SSalzberg <- "sUVeH-4AAAAJ" (Google Scholar ID)
get_article_cite_history(SSalzberg, article)
Error Message:
Error in min(years):max(years) : result would be too long a vector
In addition: Warning messages:
1: In min(years) : no non-missing arguments to min; returning Inf
2: In max(years) : no non-missing arguments to max; returning -Inf
I do not understand the error message in the context of that function and I tried another paper with an another author without success. I don't know what I am missing here.... Thanks

You have to use an article ID, not the title of the article. Probably the easiest way to get this is to retrieve the full list of pubs, which has a pubid column ...
library(scholar)
SSalzberg <- "sUVeH-4AAAAJ"
all_pubs <- get_publications(SSalzberg)
## next step is cosmetic -- the equivalent of stringsAsFactors=FALSE
all_pubs <- as.data.frame(lapply(all_pubs,
function(x) if (is.factor(x)) as.character(x) else x))
w <-grep("Ultrafast",all_pubs$title) ## publication number 3
all_pubs$title[w]
## [1] Ultrafast and memory-efficient alignment of ...
all_pubs$pubid[w] ## "Tyk-4Ss8FVUC"
ch <- get_article_cite_history(SSalzberg,all_pubs$pubid[w])
plot(cites~year,ch,type="b")

Related

Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

I am trying to do some Bioconductor exercises on R studio cloud. Running the first two codes (#1,#2) have been fine, but the last code(#3) gives the error message. Please can anyone help?
#1 Transcribe dna_seq into an RNAString object and print it
dna_seq <- subseq(unlist(zikaVirus), end = 21)
dna_seq
21-letter "DNAString" instance
seq: AGTTGTTGATCTGTGTGAGTC
#2 Transcribe dna_seq into an RNAString object and print it
rna_seq <- RNAString(dna_seq)
rna_seq
21-letter "RNAString" instance
seq: AGUUGUUGAUCUGUGUGAGUC
#3 Translate rna_seq into an AAString object and print it
aa_seq <- translate(rna_seq)
aa_seq
aa_seq <- translate(rna_seq)
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
aa_seq
Error: object 'aa_seq' not found
Thank you. I managed to solve the problem: I think there was a clash with the translate() function because it is used by both the seqinr and Biostring packages(I loaded both). I had to unload seqinr, because the exercises I was doing were based on the Biostring package.

How to capture particular warning message and execute call

Lately when I run my code that uses coxph in the survival package
coxph(frml,data = data), I am now getting warning messages of the following type
1: In model.matrix.default(Terms, mf, contrasts = contrast.arg) :
partial argument match of 'contrasts' to 'contrasts.arg'
2: In seq.default(along = temp) :
partial argument match of 'along' to 'along.with'"
I'm not exactly sure why all of a sudden these partial argument match warnings started popping up, but I don't think they effect me.
However, when I get the following warning message, I want coxph(frml,data = data) = NA
3: In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 2 ; beta may be infinite.
6: In coxph(frml, data = data) :
X matrix deemed to be singular; variable 1 3 4
I used tryCatch when I wasn't getting the partial argument match warning using this code where if the nested tryCatch got either a warning or error message it would return NA
coxphfit = tryCatch(tryCatch(coxph(frml,data = data), error=function(w) return(NA)), warning=function(w) return(NA))
However, now that I am getting the partial argument match warnings, I need to only return an NA if there is an error or if I get the above warning messages 3 and 4 . Any idea about how to capture these particular warning messages and return an NA in those instances?
It's actually interesting question, if you are looking for quick and dirty way of capturing warnings you could simply do:
withCallingHandlers({
warning("hello")
1 + 2
}, warning = function(w) {
w ->> w
}) -> res
In this example the object w created in parent environment would be:
>> w
<simpleWarning in withCallingHandlers({ warning("hello") 1 + 2}, warning = function(w) { w <<- w}): hello>
You could then interrogate it:
grepl(x = w$message, pattern = "hello")
# [1] TRUE
as
>> w$message
# [1] "hello"
Object res would contain your desired results:
>> res
[1] 3
It's not the super tidy way but I reckon you could always reference object w and check if the warning message has the phrase you are interested in.

Error while creating a Timeseries plot in R: Error in plot.window(xlim, ylim, log, ...) : need finite 'ylim' values

Here's a sample of my single column data set:
Lines
141,523
146,785
143,667
65,560
88,524
148,422
I read this file as a .csv file, convert it into a ts object and then plot it:
##Read the actual number of lines CSV file
Aclines <- read.csv(file.choose(), header=T, stringsAsFactors = F)
Aclinests <- ts(Aclines[,1], start = c(2013), end = c(2015), frequency = 52)
plot(Aclinests, ylab = "Actual_Lines", xlab = "Time", col = "red")
I get the following error message:
Error in plot.window(xlim, ylim, log, ...) : need finite 'ylim' values
In addition: Warning messages:
1: In xy.coords(x, NULL, log = log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
I thought this might be because of the "," in the columns and tried to use sapply to take care of that as advised here:
need finite 'ylim' values-error
plot(sapply(Aclinests, function(x)gsub(",",".",x)))
But I got the following error:
Error in plot(sapply(Aclinests, function(x) gsub(",", ".", x))) :
error in evaluating the argument 'x' in selecting a method for function 'plot': Error in sapply(Aclinests, function(x) gsub(",", ".", x)) :
'names' attribute [105] must be the same length as the vector [1]
Here is the head of my original and ts data set if it might help:
> head(Aclines)
Lines
1 141,523
2 146,785
3 143,667
4 65,560
5 88,524
6 148,422
> head(Aclinests)
[1] "141,523" "146,785" "143,667" "65,560" "88,524" "148,422"
Also, if I read the .csv file as:
Aclines <- read.csv(file.choose(), header=T, **stringsAsFactors = T**)
Then, I am able to plot the ts object, but head(Aclinests)gives the below output which is not consistent with my original data:
> head(Aclinests)
[1] 14 27 17 84 88 36
Please advice on how I can plot this ts object.
The simplest way to avoid this, in my case, is to remove the commas in the excel file containing the data. This can be done using simple excel commands and it worked for me.

Mention "data.table" in "Suggests" rather than "Imports" of custom package

I am writing an R package where only a small subset of the functions use functions from data.table. Following Wickham's advice, I added data.table in the Suggests: field of the DESCRIPTION file. I also added if(! requireNamespace("data.table", quietly=TRUE)) at the beginning of each of my functions using a function from data.table. Moeover, each time, I use a data.table-specific function, I precede it with data.table::.
However, I am still encountering problems. As the FAQ of data.table only deals with the Depends: and Imports: fields of the DESCRIPTION file, does it mean that Suggests isn't an option?
Here is a function causing problems:
depths.per.sample <- function(dat, min.reg.len=30, max.reg.len=500,
min.reg.dep=10, max.reg.dep=100,
min.reg.frac=0.25){
if(! requireNamespace("data.table", quietly=TRUE))
stop("Pkg needed for this function to work. Please install it.",
call.=FALSE)
stopifnot(data.table::is.data.table(dat))
for(col in c("ind", "flowcell", "lane", "start", "end", "depth", "fraction"))
stopifnot(col %in% colnames(dat))
## http://stackoverflow.com/a/8096882/597069
depth=fraction=chrom=ind=flowcell=lane=NULL
data.table::setkey(dat, NULL)
data.table::setkeyv(x=dat, cols=c("ind", "flowcell", "lane"))
depths.sample <- dat[end - start >= min.reg.len &
end - start <= max.reg.len,
list(depth.len=data.table::.N,
depth.min=min(data.table::.SD[,depth]),
depth.med=as.double(median(data.table::.SD[,depth])),
depth.mean=mean(data.table::.SD[,depth]),
depth.max=max(data.table::.SD[,depth]),
depth.q65=quantile(data.table::.SD[,depth], 0.65),
depth.q70=quantile(data.table::.SD[,depth], 0.70),
depth.q75=quantile(data.table::.SD[,depth], 0.75),
depth.q80=quantile(data.table::.SD[,depth], 0.80),
reg.ok=nrow(unique(data.table::.SD[depth >= min.reg.dep &
depth <= max.reg.dep &
fraction >= min.reg.frac,
list(chrom,start,end)]))),
by=list(ind,flowcell,lane)]
return(depths.sample)
}
And here are the errors:
Error in x[j] : invalid subscript type 'list'
In addition: Warning messages:
1: In min(data.table::.SD[, depth]) :
no non-missing arguments to min; returning Inf
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In mean.default(data.table::.SD[, depth]) :
argument is not numeric or logical: returning NA
4: In max(data.table::.SD[, depth]) :
no non-missing arguments to max; returning -Inf

Converting a Document Term Matrix into a Matrix with lots of data causes overflow

Let's do some Text Mining
Here I stand with a document term matrix (from the tm Package)
dtm <- TermDocumentMatrix(
myCorpus,
control = list(
weight = weightTfIdf,
tolower=TRUE,
removeNumbers = TRUE,
minWordLength = 2,
removePunctuation = TRUE,
stopwords=stopwords("german")
))
When I do a
typeof(dtm)
I see that it is a "list" and the structure looks like
Docs
Terms 1 2 ...
lorem 0 0 ...
ipsum 0 0 ...
... .......
So I try a
wordMatrix = as.data.frame( t(as.matrix( dtm )) )
That works for 1000 Documents.
But when I try to use 40000 it doesn't anymore.
I get this error:
Fehler in vector(typeof(x$v), nr * nc) : Vektorgröße kann nicht NA sein
Zusätzlich: Warnmeldung:
In nr * nc : NAs durch Ganzzahlüberlauf erzeugt
Error in vector ... : Vector can't be NA
Additional:
In nr * nc NAs created by integer overflow
So I looked at as.matrix and it turns out that somehow the function converts it to a vector with as.vector and than to a matrix.
The convertion to a vector works but not the one from the vector to the matrix dosen't.
Do you have any suggestions what could be the problem?
Thanks, The Captain
Integer overflow tells you exactly what the problem is : with 40000 documents, you have too much data. It is in the conversion to a matrix that the problem begins btw, which can be seen if you look at the code of the underlying function :
class(dtm)
[1] "TermDocumentMatrix" "simple_triplet_matrix"
getAnywhere(as.matrix.simple_triplet_matrix)
A single object matching ‘as.matrix.simple_triplet_matrix’ was found
...
function (x, ...)
{
nr <- x$nrow
nc <- x$ncol
y <- matrix(vector(typeof(x$v), nr * nc), nr, nc)
...
}
This is the line referenced by the error message. What's going on, can be easily simulated by :
as.integer(40000 * 60000) # 40000 documents is 40000 rows in the resulting frame
[1] NA
Warning message:
NAs introduced by coercion
The function vector() takes an argument with the length, in this case nr*nc If this is larger than appx. 2e9 ( .Machine$integer.max ), it will be replaced by NA. This NA is not valid as an argument for vector().
Bottomline : You're running into the limits of R. As for now, working in 64bit won't help you. You'll have to resort to different methods. One possibility would be to continue working with the list you have (dtm is a list), selecting the data you need using list manipulation and go from there.
PS : I made a dtm object by
require(tm)
data("crude")
dtm <- TermDocumentMatrix(crude,
control = list(weighting = weightTfIdf,
stopwords = TRUE))
Here is a very very simple solution I discovered recently
DTM=t(TDM)#taking the transpose of Term-Document Matrix though not necessary but I prefer DTM over TDM
M=as.big.matrix(x=as.matrix(DTM))#convert the DTM into a bigmemory object using the bigmemory package
M=as.matrix(M)#convert the bigmemory object again to a regular matrix
M=t(M)#take the transpose again to get TDM
Please note that taking transpose of TDM to get DTM is absolutely optional, it's my personal preference to play with matrices this way
P.S.Could not answer the question 4 years back as I was just a fresh entry in my college
Based on Joris Meys answer, I've found the solution. "vector()" documentation regarding "length" argument
...
For a long vector, i.e., length > .Machine$integer.max, it has to be of type "double"...
So we can make a tiny fix of the as.matrix():
as.big.matrix <- function(x) {
nr <- x$nrow
nc <- x$ncol
# nr and nc are integers. 1 is double. Double * integer -> double
y <- matrix(vector(typeof(x$v), 1 * nr * nc), nr, nc)
y[cbind(x$i, x$j)] <- x$v
dimnames(y) <- x$dimnames
y
}

Resources