I'm needing to perform deep pagination using R and the solr package. SOLR 7.2.1 server, R 3.4.3
I can't figure out how to get the nextCursorMark from the resultant dataframe. I usually do this in Python but this is stumping me.
res <- solr_all(base = myBase, rows = 100, verbose=TRUE,
sort = "unique_id asc",
fq="*:*",
cursorMark="*"
)
I cannot get the nextCursorMark from the result. Any help would be appreciated.
I have noticed that if I add the nextCursorMark to pageDoc it will return the value if parsetype is set to json, but not dataframe. So I guess another part is - where is that value if you return a dataframe?
So I finally got a way to make this work. This is not optimal, the final solution is in the github issue referenced in the comment. But this works:
dat <-"http://yadda.com"
cM = "*"
done = FALSE
rowCount = 0
a <- data.frame()
while (!done)
{
Data <- solr_search(base = dat, rows = 100, verbose=FALSE,
sort = "unique_id asc",
fq="*:*",
parsetype="json",
cursorMark=cM,
pageDoc = "nextCursorMark"
)
if (cM == Data$nextCursorMark) {
done = TRUE
} else {
cM = Data$nextCursorMark
}
a <- append(x = a, Data$response$docs)
rowCount = rowCount + length(Data$response$docs)
print(rowCount)
}
Related
I'm trying to use batchtools/BatchJobs for parallel computing on two unix-based R servers. I'm completely new to this and hence followed a few articles and package details to do this. I have added some links below:
batchtools,
BatchJobs
So far I have not really understood how to use batchtools for multi-machines. On the other hand, with BatchJobs I have better progress.
I made an ssh connection from the terminal first and execute the following lines:
reg = makeRegistry("TestExp")
reg$cluster.functions = makeClusterFunctionsSSH(worker = makeSSHWorker(nodename="sla19438")) #By BatchJobs
#Test Function
piApprox = function(n) {
nums = matrix(runif(2 * n), ncol = 2)
d = sqrt(nums[, 1]^2 + nums[, 2]^2)
4 * mean(d <= 1)
}
set.seed(42)
piApprox(1000)
BatchJobs::batchMap(reg = reg, fun = piApprox, n = rep(1e7, 10))
getJobTable()
BatchJobs::submitJobs(reg = reg, resources = list(walltime = 3600, memory = 1024))
getStatus(reg = reg)
loadResult(reg = reg, id = 5)
mean(sapply(1:10, loadResult, reg = reg))
It works and gives me the results but I can't see any indication of the jobs being run on the other machine (sla19438) when I run "top" in the terminal.
Please help me understand what I'm doing wrong. Maybe there is some configuration needed but I don't see any material online which dumbs down the steps for a newbie like me.
Thanks
I'm looking for a straight forward way to change the formatting of numbers into K,M in shiny dataTables. Preferably with something like formatCurrency. I don't want to write k, m functions to convert number into string in order to do the formatting as it makes it difficult to sort rows by value.
There's no built-in way to do this, but it's not too bad to write your own format function in JavaScript that doesn't break row sorting.
See Column Rendering in the DT docs for how to do this: https://rstudio.github.io/DT/options.html
And this will also help:
https://datatables.net/reference/option/columns.render
Here's an example of a custom thousands formatter that rounds to 1 decimal place:
library(DT)
formatThousands <- JS(
"function(data) {",
"return (data / 1000).toFixed(1) + 'K'",
"}")
datatable(datasets::rock, rownames = FALSE, options = list(
columnDefs = list(list(
targets = 0:1, render = formatThousands
))
))
Alternatively, if you want a non-JavaScript method, you could use the colFormat function used with the reactable package. Unfortunately, there is no automatic millions option but it's pretty easy to replicate if you divide the original data and add the labels on with colFormat.
Product <- c('Apples','Oranges','Pears')
Revenue <- c(212384903, 23438872, 26443879)
df <- data.frame(Product,Revenue)
df$Revenue_millions <- dfeg$Revenue/1000000
reactable(df,
showSortable = TRUE,
columns = list(
Revenue_millions = colDef(format = colFormat(prefix = "£", separators = TRUE,digits=1,suffix = "m"))))
The data should now sort correctly
If you are using DataTables, to get the data as Unit format i.e
10000 -> 10K
we can use render function
"render": function ( data ) {
if(data > 999 && data < 1000000) {
return data/1000+' K'
}
else if(data > 1000000){
return data/1000000+' M'
}
else{
return data
}
}
}
I am working on the NSGA2 package on R (library mco).
My NSGA2 code takes forever to run, so I am wondering:
1) Is there a way to limit the precision of the solution values (say, maybe up to 3 decimal places) instead of infinite?
2) How do I set an equality constraint (the ones online all seemed to be about >= or <= than =)? Not sure if I'm doing it right.
My entire relevant code for reference, for easy tracing: https://docs.google.com/document/d/1xj7OPng11EzLTTtWLdRWMm8zJ9f7q1wsx2nIHdh3RM4/edit?usp=sharing
Relevant sample part of code reproduced here:
VTR = get.hist.quote(instrument = 'VTR',
start="2010-01-01", end = "2015-12-31",
quote = c("AdjClose"),provider = "yahoo",
compress = "d")
ObjFun1 <- function (xh){
f1 <- sum(HSVaR_P(merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP), xh, 0.05, 2))
tempt = merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP)
tempt2 = tempt[(nrow(tempt)-(2*N)):nrow(tempt),]
for (i in 1:nrow(tempt2))
{
for (j in 1:ncol(tempt2))
{
if (is.na(tempt2[i,j]))
{
tempt2[i,j] = 0
}
}
}
f2 <- ((-1)*abs(sum((xh*t(tempt2)))))
c(f1=f1,f2=f2)
}
Constr <- function(xh){
totwt <- (1-sum(-xh))
totwt2 <- (sum(xh)-1)
c(totwt,totwt2)
}
Solution1 <- nsga2(ObjFun1, n.projects, 2,
lower.bounds=rep(0,n.projects), upper.bounds=rep(1,n.projects),
popsize=n.solutions, constraints = Constr, cdim=1,
generations=generations)
The function HSVaR_P returns matrix(x,2*500,1).
Even when I set generations = 1, the code does not seem to run. Clearly there should be some error in the code, somewhere, but I am not entirely sure about the mechanics of the NSGA2 algorithm.
Thanks.
I have a bunch of values I want to read in to an if statement in R. Namely:
year_1 , year_2
And so on.
I would like to use a for loop or a vectorisation method to test each one but I am not familiar with this in R as opposed to C++.
So I'd like to achieve something like:
for(i in 1:15) {
if(year_[i] !=NULL) {
count = count + 1
}
}
Not sure whether I am not searching for the right thing or whether R just doesn't do this sort of thing easily. I have used paste and a for loop successfully in the past to automatically name new variables but this I haven't got a hold of.
Update
Ok your answer seems to be on the right track. I should have been a bit more specific and say that I am reading the data from an excel file of parameters and using the data to produce plots. Different data sets will have different years active. The core of this problem is telling R how many years are active, starting from param$year_1 to param$year_15. So I am trying to actually read param$year_1 and so on for example, and basically check whether it is empty or not which will allow me to know how many years this particular data set is working with. When I tried mget(paste0("param$year_", 1:5))
it said the value was not found.
Update 2
I am sure the difficulty with this comes down to my description. But here is exactly what I want to produce but automated to a few lines as I know I will want to do similar operations like this is in the future. What the actual data is is irrelevant. This non automated version produces exactly what I want.
Non Automated Code
if(is.na(param$year_1[1]) == TRUE || param$year_1[1] == '') {
print("empty")
}
if(is.na(param$year_2[1]) == TRUE || param$year_2[1] == '') {
print("empty")
}
if(is.na(param$year_3[1]) == TRUE || param$year_3[1] == '') {
print("empty")
}
if(is.na(param$year_4[1]) == TRUE || param$year_4[1] == '') {
print("empty")
}
so on and so on until the final
if(is.na(param$year_15[1]) == TRUE || param$year_15[1] == '') {
print("empty")
}
It is such a simple thing to do in C++ but I have to learn it in R for the future.
It sounds like you have a list-like structure that contains names year_1, year_2, ..., year_15 and you want to check how many of these are null or have a missing first element. You could use standard indexing to limit to the elements for those years, sapply to check which are null, and sum to add up those values:
which(sapply(paste0("year_", 1:15), function(x) {
is.null(param[[x]]) || param[[x]][1] == ''
}))
# year_2 year_5 year_9 year_10 year_11 year_12 year_14 year_15
# 2 5 9 10 11 12 14 15
Data:
param <- list(ID = 1:10, year_1 = 1:5, year_2 = NULL, year_3 = 1:7, year_4 = 1:2, year_5 = NULL, year_6 = 14, year_7 = 1:3, year_8 = 1:9, year_9 = NULL, year_10 = NULL, year_11 = NULL, year_12 = NULL, year_13 = 1:7, year_14 = NULL, year_15 = NULL)
Assume that I have a (at least subjectively) complex function like this:
library(rgithub)
pull <- function(i){
commits <- get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100)
links <- digest_header_links(commits)
number_of_pages <- links[2,]$page
if (number_of_pages != 0)
try_default(for (n in 1:number_of_pages){
if (as.integer(commits$headers$`x-ratelimit-remaining`) < 5)
Sys.sleep(as.integer(commits$headers$`x-ratelimit-reset`)- as.POSIXct(Sys.time()) %>% as.integer())
else
get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100, page = n)
}, default = NULL)
else
return(commits)
}
list <- c(500, 501, 502)
pull_lists <- lapply(list, pull)
Let's say that I want to attain a deeper understanding of what actually happens inside this function. How can I add some type of logging that will help me trace what goes on inside of the function as it is being run?
You can use futile.logger
Than you can setup log threshold level using:
flog.threshold(INFO)
Functions, like flog.debug or flog.info are used to produce logging information
For further details see:
http://www.r-bloggers.com/better-logging-in-r-aka-futile-logger-1-3-0-released/
http://cran.r-project.org/web/packages/futile.logger/index.html