R Version: R version 3.5.1 (2018-07-02)
H2O cluster version: 3.20.0.2
The dataset used here is available on Kaggle (Home credit risk). Prior to using h2o automl, the necessary treatment of missing values and selection of relevant categorical variables has already been carried out. Can you assist me in figuring out what is the underlying cause for this error?
Thanks
Code:
h2o.init()
h2o.no_progress()
# y_train_processed_tbl is the target variable
# x_train_processed_tbl is the remaining data post dealing with Missing
# values
data_h2o <- as.h2o(bind_cols(y_train_processed_tbl, x_train_processed_tbl))
splits_h2o <- h2o.splitFrame(data_h2o, ratios = c(0.7, 0.15), seed = 1234)
train_h2o <- splits_h2o[[1]]
valid_h2o <- splits_h2o[[2]]
test_h2o <- splits_h2o[[3]]
y <- "TARGET"
x <- setdiff(names(train_h2o), y)
automl_models_h2o <- h2o.automl(x = x,y = y,
training_frame = train_h2o, validation_frame = valid_h2o,
leaderboard_frame = test_h2o,
max_runtime_secs = 90
)
automl_leader <- automl_models_h2o#leader
# Error in performance_h2o
performance_h2o <- h2o.performance(automl_leader, newdata = test_h2o)
ERROR: Unexpected HTTP Status code: 404 Not Found
water.exceptions.H2OKeyNotFoundArgumentException
[1] "water.exceptions.H2OKeyNotFoundArgumentException: Object 'dummy' not
found in function: predict for argument: model"
[2] " water.api.ModelMetricsHandler.score(ModelMetricsHandler.java:235)"
[3] " sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"
[4] " sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)"
[5] " sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)"
[6] " java.lang.reflect.Method.invoke(Unknown Source)"
[7] " water.api.Handler.handle(Handler.java:63)"
[8] " water.api.RequestServer.serve(RequestServer.java:451)"
[9] " water.api.RequestServer.doGeneric(RequestServer.java:296)"
[10] " water.api.RequestServer.doPost(RequestServer.java:222)"
[11] " javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"
[12] " javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"
[13] " org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"
[14] " org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)"
[15] " org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"
[16] " org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)"
[17] " org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"
[18] " org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"
[19] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[20] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[21] " water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:197)"
[22] " org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[23] " org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[24] " org.eclipse.jetty.server.Server.handle(Server.java:370)"
[25] " org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"
[26] " org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"
[27] " org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)"
[28] " org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)"
[29] " org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)"
[30] " org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)"
[31] " org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"
[32] " org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"
[33] " org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"
[34] " org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"
[35] " java.lang.Thread.run(Unknown Source)"
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix =
page, :
ERROR MESSAGE:
Object 'dummy' not found in function: predict for argument: model
The issue here is that you only gave AutoML 90 seconds to run, so it did not have time to train even one model. In the next stable release of H2O, the error message will be gone and instead you will simply get a Leaderboard with no rows (we are fixing this so that it's handled more gracefully).
Rather than using max_runtime_secs = 90, you could increase that to something much larger (the default is 3600 secs, or 1 hour). Alternatively you can specify the number of models you want instead by setting max_models = 20, for example.
If you do use max_models, I'd recommend setting max_runtime_secs to something large (e.g. 999999999) so that you don't run out of time. The AutoML process will stop when it reaches the first of max_models or max_runtime_secs.
I posted a similar answer here.
My code was working fine, then I tweaked it and got the same error.
To fix it, instead of using automl_models_h2o#leader to save the leader for predictions/performance, save the leader using h2o.getModel().
Change your automl_leader initialization:
...
# get model name from list
automl_models_h2o#leaderboard
# change MODEL_NAME_HERE to a model name from your leaderboard list.
automl_leader <- h2o.getModel("MODEL_NAME_HERE")
performance_h2o <- h2o.performance(automl_leader, newdata = test_h2o)
...
Related
I have a matrix, saved as a file (no extension) looking like this:
Peter Westons NH 54 RTcoef level B matrix from L70 Covstats.
2.61949322E+00 2.27966995E+00 1.68120147E+00 9.88238464E-01 8.38279026E-01
7.41276375E-01
2.27966995E+00 2.31885465E+00 1.53558372E+00 4.87789344E-01 2.90254400E-01
2.56963125E-01
1.68120147E+00 1.53558372E+00 1.26129096E+00 8.18048022E-01 5.66120186E-01
3.23866166E-01
9.88238464E-01 4.87789344E-01 8.18048022E-01 1.38558423E+00 1.21272607E+00
7.20283781E-01
8.38279026E-01 2.90254400E-01 5.66120186E-01 1.21272607E+00 1.65314082E+00
1.35926028E+00
7.41276375E-01 2.56963125E-01 3.23866166E-01 7.20283781E-01 1.35926028E+00
1.74777330E+00
How do I go about reading this in as a fixed 6*6 matrix, skipping the first header? I don't see any options for the amount of columns in read.matrix, I tried with the scan() -> matrix() option but I can't read in the file as the skip parameter in scan() doesn't seem to work. I feel there must be a simple option to do this.
My original file is larger, and has 17 full rows of 5 elements and 1 row of 1 element in this structure, example of what needs to be in one row:
[1] " 2.61949322E+00 2.27966995E+00 1.68120147E+00 9.88238464E-01 8.38279026E-01"
[2] " 7.41276375E-01 5.23588785E-01 1.09559244E-01 -9.58430529E-02 -3.24544839E-02"
[3] " 1.96694874E-02 3.39249911E-02 1.54438478E-02 2.38380549E-03 9.59475077E-03"
[4] " 8.02748175E-03 1.63922615E-02 4.51778592E-04 -1.32080759E-02 -2.06313988E-02"
[5] " -1.56037533E-02 -3.35496588E-03 -4.22450803E-03 -3.17468525E-03 3.23012615E-03"
[6] " -8.68914773E-03 -5.94151619E-03 2.34059840E-04 -2.76737270E-03 -4.90334584E-03"
[7] " 1.53812087E-04 5.69891977E-03 5.33816835E-03 3.32982333E-03 -2.62856968E-03"
[8] " -5.15188677E-03 -4.47782553E-03 -5.49510247E-03 -3.71780229E-03 9.80192203E-04"
[9] " 4.18101180E-03 5.47513662E-03 4.14679058E-03 -2.81461574E-03 -4.67580613E-03"
[10] " 3.41841523E-04 4.07771227E-03 7.06154094E-03 6.61650765E-03 5.97925136E-03"
[11] " 3.92987162E-03 1.72895946E-03 -3.47249017E-03 9.90977857E-03 -2.36066909E-31"
[12] " -8.62803933E-32 -1.32472387E-31 -1.02360189E-32 -5.11800943E-33 -4.16409844E-33"
[13] " -5.11800943E-33 -2.52126889E-32 -2.52126889E-32 -4.16409844E-33 -4.16409844E-33"
[14] " -5.11800943E-33 -5.11800943E-33 -4.16409844E-33 -2.52126889E-32 -2.52126889E-32"
[15] " -2.52126889E-32 -1.58614773E-33 -1.58614773E-33 -2.55900472E-33 -1.26063444E-32"
[16] " -7.93073863E-34 -1.04102461E-33 -3.19875590E-34 -3.19875590E-34 -3.19875590E-34"
[17] " -2.60256152E-34 -1.30128076E-34 0.00000000E+00 1.78501287E-02 -1.14423068E-11"
[18] " 3.00625863E-02"
So the full matrix should be 86*86.
Thanks a bunch
Try this option :
Read the file with readLines removing the first line. ([-1]).
Split values on whitespace and create 1 X 6 matrix from every combination of two rows.
Combine them together in one matrix with do.call(rbind, ..).
rows <- readLines('filename')[-1]
result <- do.call(rbind,
tapply(rows, ceiling(seq_along(rows)/2), function(x)
strsplit(paste0(trimws(x), collapse = ' '), '\\s+')[[1]]))
I'm attempting to use the system function in R to run a program, which I expect to yield an error message in some cases. For this I want to write a tryCatch function.
system(command, intern = TRUE) only returns the actual values which were echo'd by the program I'm running, it does not return my error.
In R, how can I get the error message which was yielded by my system?
My code:
test <- tryCatch({
cmd <- paste0("../Scripts/Plink2/plink --file ../InputData/",prefix," --bmerge ",
"../InputData/fs --missing --out ../InputData/",prefix)
print(cmd)
system(cmd)
} , error = function(e) {
# error handler picks up where error was generated
print("EZEL")
print(paste("MY_ERROR: ",e))
}, finally = {
print("something")
})
[1] "../Scripts/Plink2/plink --file ../InputData/GS80Kdata --bmerge ../InputData/fs --missing --out ../InputData/GS80Kdata"
PLINK v1.90b3.37 64-bit (16 May 2016) https://www.cog-genomics.org/plink2
#....
#skipping some lines here to reduce size
#....
Of these, 1414410 are new, while 2462 are present in the base dataset.
Error: 1 variant with 3+ alleles present.
* If you believe this is due to strand inconsistency, try --flip with
# Skipping some more lines here
[1] "something"
However when using intern=TRUE and assigning the system function to a variable won't catch the error in the vector and still prints it in the R console.
Edit: Here the output of the vector (using gsub to reduce the ridiculous size)
> gsub(pattern="\b\\d.*", replacement = "", x = tst)
[1] "PLINK v1.90b3.37 64-bit (16 May 2016) https://www.cog-genomics.org/plink2"
[2] "(C) 2005-2016 Shaun Purcell, Christopher Chang GNU General Public License v3"
[3] "Logging to ../InputData/GS80Kdata.log."
[4] "Options in effect:"
[5] " --bmerge ../InputData/fs"
[6] " --file ../InputData/GS80Kdata"
[7] " --missing"
[8] " --out ../InputData/GS80Kdata"
[9] ""
[10] "64381 MB RAM detected; reserving 32190 MB for main workspace."
[11] "Scanning .ped file... 0%\b"
[12] "2%\b\b"
[13] "%\b\b"
[14] "\b\b"
[15] "\b"
[16] ""
[17] "58%\b\b"
[18] "7%\b\b"
[19] "%\b\b"
[20] "\b\b"
[21] "\b"
[22] "Performing single-pass .bed write (42884 variants, 14978 people)."
[23] "0%\b"
[24] "../InputData/GS80Kdata-temporary.bim + ../InputData/GS80Kdata-temporary.fam"
[25] "written."
[26] "14978 people loaded from ../InputData/GS80Kdata-temporary.fam."
[27] "144 people to be merged from ../InputData/fs.fam."
[28] "Of these, 140 are new, while 4 are present in the base dataset."
[29] "42884 markers loaded from ../InputData/GS80Kdata-temporary.bim."
[30] "1416872 markers to be merged from ../InputData/fs.bim."
[31] "Of these, 1414410 are new, while 2462 are present in the base dataset."
attr(,"status")
[1] 3
>
I'm trying parsing financial tables from web page. I proceeded. But I am not able to arrange list, or data.frame
library(rvest)
link <- "http://www.marketwatch.com/investing/stock/garan/financials/balance-sheet/quarter"
read <- read_html(link)
prs <- html_nodes(read, ".financials")
irre <- html_text(prs)
re <- strsplit(irre, split = "\r\n")
re is something like this:
[27] "Assets"
[28] ""
[29] " "
[30] " "
[31] " All values TRY millions."
[32] " 31-Dec-201431-Mar-201530-Jun-201530-Sep-201531-Dec-2015"
[33] " 5-qtr trend"
[34] " "
[35] " "
[36] " "
[37] " "
[38] " Total Cash & Due from Banks"
[39] " 27.26B26.27B26.7B34.51B27.9B"
[40] " "
[41] " "
bla bla...
How Can I edit this list through data.frame that properly like this page
Try
library(XML)
theurl <- "http://www.marketwatch.com/investing/stock/garan/financials/balance-sheet/quarter"
re <- readHTMLTable(theurl)
The result is a list with two dataframes.
Using my trusty firebug and firepath plug-ins I'm trying to scrape some data.
require(XML)
url <- "http://www.hkjc.com/english/racing/display_sectionaltime.asp?racedate=25/05/2008&Raceno=2&All=0"
tree <- htmlTreeParse(url, useInternalNodes = T)
t <- xpathSApply(tree, "//html/body/table/tbody/tr/td[2]/table[2]/tbody/tr/td/font/table/tbody/tr/td[1]/font", xmlValue) # works
This works! t now contains "Meeting Date: 25/05/2008, Sha Tin\r\n\t\t\t\t\t\t"
If I try to capture the first sectional time of 29.4 thusly:
t <- xpathSApply(tree, "//html/body/table/tbody/tr/td[2]/table[2]/tbody/tr/td/font/a/table/tbody/tr[3]/td/table/tbody/tr/td/table[2]/tbody/tr[5]/td[1]", xmlValue) # doesn't work
t contains NULL.
Any ideas what I've done wrong? Many thanks.
First off, I can't find that first sectional time of 29.4. The one I see on the page you linked is 24.5 or I'm misunderstanding what you are looking for.
Here's a way of grabbing that one using rvest and SelectorGadget for Chrome:
library(rvest)
html <- read_html(url)
t <- html %>%
html_nodes(".bigborder table tr+ tr td:nth-child(2) font") %>%
html_text(trim = T)
> t
[1] "24.5"
This differs a bit from your approach but I hope it helps. Not sure how to properly scrape the meeting time that way, but this at least works:
mt <- html %>%
html_nodes("font > table font") %>%
html_text(trim = T)
> mt
[1] "Meeting Date: 25/05/2008, Sha Tin" "4 - 1200M - (060-040) - Turf - A Course - Good To Firm"
[3] "MONEY TALKS HANDICAP" "Race\tTime :"
[5] "(24.5)" "(48.1)"
[7] "(1.10.3)" "Sectional Time :"
[9] "24.5" "23.6"
[11] "22.2"
> mt[1]
[1] "Meeting Date: 25/05/2008, Sha Tin"
Looks like the comments just after the <a> may be throwing you off.
<a name="Race1">
<!-- test0 table start -->
<table class="bigborder" border="0" cellpadding="0" cellspacing="0" width="760">...
<!--0 table End -->
<!-- test1 table start -->
<br>
<br>
</a>
This seems to work:
t <- xpathSApply(tree, '//tr/td/font[text()="Sectional Time : "]/../following-sibling::td[1]/font', xmlValue)
You might want to try something a little less fragile then that long direct path.
Update
If you are after all of the times in the "1st Sec." column: 29.4, 28.7, etc...
t <- xpathSApply(
tree,
"//tr/td[starts-with(.,'1st Sec.')]/../following-sibling::*[position() mod 2 = 0]/td[1]",
xmlValue
)
Looks for the "1st Sec." column, then jump up to its row, grab every other row's 1st td value.
[1] "29.4 "
[2] "28.7 "
[3] "29.2 "
[4] "29.0 "
[5] "29.3 "
[6] "28.2 "
[7] "29.5 "
[8] "29.5 "
[9] "30.1 "
[10] "29.8 "
[11] "29.6 "
[12] "29.9 "
[13] "29.1 "
[14] "29.8 "
I've removed all the extra whitespace (\r\n\t\t...) for display purposes here.
If you wanted to make it a little more dynamic, you could grab the column value under "1st Sec." or any other column. Replace
/td[1]
with
td[count(//tr/td[starts-with(.,'1st Sec.')]/preceding-sibling::*)+1]
Using that, you could update the name of the column, and grab the corresponding values. For all "3rd Sec." times:
"//tr/td[starts-with(.,'3rd Sec.')]/../following-sibling::*[position() mod 2 = 0]/td[count(//tr/td[starts-with(.,'3rd Sec.')]/preceding-sibling::*)+1]"
[1] "23.3 "
[2] "23.7 "
[3] "23.3 "
[4] "23.8 "
[5] "23.7 "
[6] "24.5 "
[7] "24.1 "
[8] "24.0 "
[9] "24.1 "
[10] "24.1 "
[11] "23.9 "
[12] "23.9 "
[13] "24.3 "
[14] "24.0 "
I have an odd situation, and please pardon me for not providing a reproducible example for this question. I have more than 1000 lines of syntax written for Stata to carry out multiple analyses (I wrote it before I started using R). This syntax is used to perform analysis in a quarterly dataset every 3 months to create a report. Results of the analyses are saved in csv files, and read via R, and put into a Word document using ReporterS package.
Is there any way to invoke Stata via R, and specify/pipe the syntax to run it? (I understand the reverse situation can be done using rsource (user-written command) in Stata). I can still manually fire up Stata and run the syntax there. But is it possible to do it via R? So, a shiny app/web interface can be created to do this part, and the user doesn't need to do it manually?
As #thelatemail suggests, the easiest thing to do here is simply run Stata in batch mode from a system call.
Here's an example do file (called "example.do"):
log using out.log, replace
sysuse auto
regress mpg weight foreign
And here's the R code to run it and retrieve the output (assuming Stata is on your path and you replace Stata-64 with the appropriate binary file on your machine):
> system("Stata-64 /e do example.do"); readLines("out.log")
[1] "-----------------------------------------------------------------------------------------------------------------------"
[2] " name: <unnamed>"
[3] " log: FilePathHere"
[4] " log type: text"
[5] " opened on: 9 Jan 2015, 13:34:18"
[6] ""
[7] ". sysuse auto"
[8] "(1978 Automobile Data)"
[9] ""
[10] ". regress mpg weight foreign"
[11] ""
[12] " Source | SS df MS Number of obs = 74"
[13] "-------------+------------------------------ F( 2, 71) = 69.75"
[14] " Model | 1619.2877 2 809.643849 Prob > F = 0.0000"
[15] " Residual | 824.171761 71 11.608053 R-squared = 0.6627"
[16] "-------------+------------------------------ Adj R-squared = 0.6532"
[17] " Total | 2443.45946 73 33.4720474 Root MSE = 3.4071"
[18] ""
[19] "------------------------------------------------------------------------------"
[20] " mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]"
[21] "-------------+----------------------------------------------------------------"
[22] " weight | -.0065879 .0006371 -10.34 0.000 -.0078583 -.0053175"
[23] " foreign | -1.650029 1.075994 -1.53 0.130 -3.7955 .4954422"
[24] " _cons | 41.6797 2.165547 19.25 0.000 37.36172 45.99768"
[25] "------------------------------------------------------------------------------"
[26] ""
[27] ". "
[28] "end of do-file"
[29] ""
[30] ". exit, clear"
It may be easier to parse the output if you log using Stata Markup Control Language (SMCL), by replacing the first line of the do file with log using out.log, replace smcl. Then the output will be:
[1] "{smcl}"
[2] "{com}{sf}{ul off}{txt}{.-}"
[3] " name: {res}<unnamed>"
[4] " {txt}log: {res}FilePathHere"
[5] " {txt}log type: {res}smcl"
[6] " {txt}opened on: {res} 9 Jan 2015, 13:41:53"
[7] "{txt}"
[8] "{com}. sysuse auto"
[9] "{txt}(1978 Automobile Data)"
[10] ""
[11] "{com}. regress mpg weight foreign"
[12] ""
[13] " {txt}Source {c |} SS df MS Number of obs ={res} 74"
[14] "{txt}{hline 13}{char +}{hline 30} F( 2, 71) ={res} 69.75"
[15] " {txt} Model {char |} {res} 1619.2877 2 809.643849 {txt}Prob > F = {res} 0.0000"
[16] " {txt}Residual {char |} {res} 824.171761 71 11.608053 {txt}R-squared = {res} 0.6627"
[17] "{txt}{hline 13}{char +}{hline 30} Adj R-squared = {res} 0.6532"
[18] " {txt} Total {char |} {res} 2443.45946 73 33.4720474 {txt}Root MSE = {res} 3.4071"
[19] ""
[20] "{txt}{hline 13}{c TT}{hline 11}{hline 11}{hline 9}{hline 8}{hline 13}{hline 12}"
[21] "{col 1} mpg{col 14}{c |} Coef.{col 26} Std. Err.{col 38} t{col 46} P>|t|{col 54} [95% Con{col 67}f. Interval]"
[22] "{hline 13}{c +}{hline 11}{hline 11}{hline 9}{hline 8}{hline 13}{hline 12}"
[23] "{space 6}weight {c |}{col 14}{res}{space 2}-.0065879{col 26}{space 2} .0006371{col 37}{space 1} -10.34{col 46}{space 3}0.000{col 54}{space 4}-.0078583{col 67}{space 3}-.0053175"
[24] "{txt}{space 5}foreign {c |}{col 14}{res}{space 2}-1.650029{col 26}{space 2} 1.075994{col 37}{space 1} -1.53{col 46}{space 3}0.130{col 54}{space 4} -3.7955{col 67}{space 3} .4954422"
[25] "{txt}{space 7}_cons {c |}{col 14}{res}{space 2} 41.6797{col 26}{space 2} 2.165547{col 37}{space 1} 19.25{col 46}{space 3}0.000{col 54}{space 4} 37.36172{col 67}{space 3} 45.99768"
[26] "{txt}{hline 13}{c BT}{hline 11}{hline 11}{hline 9}{hline 8}{hline 13}{hline 12}"
[27] "{res}{txt}"
[28] "{com}. "
[29] "{txt}end of do-file"