kubeflow pipeline SDK use dsl.ParallelFor to build a loop - pipeline

#dsl.pipeline( name='classfier') def classifiertest(): make_classification_com_res = make_classification_com() rng_res = np_random_random_state() uniform_res = rng_uniform(make_classification_com_res.output,rng_res.output) all_datas_res = get_all_datas(x_input=uniform_res.output,y_input=make_classification_com_res.output) forlist= list([0,1,2,3,4,5,6,7,8,9]) with dsl.ParallelFor(forlist) as item_index: for_outter_func(item_index,ds_input=all_datas_res.output)
When I run this pipeline, the following error occurs after clicking the start button of run:
{"error":"Failed to create a new run.: InternalServerError: Failed to store run classfier-9xbrk to table: Error 1366: Incorrect string value: '\xE8\xBF\x99\xE4\xB8\x80...' for column 'WorkflowRuntimeManifest' at row 1","code":13,"message":"Failed to create a new run.: InternalServerError: Failed to store run classfier-9xbrk to table: Error 1366: Incorrect string value: '\xE8\xBF\x99\xE4\xB8\x80...' for column 'WorkflowRuntimeManifest' at row 1","details":[{"#type":"type.googleapis.com/api.Error","error_message":"Internal Server Error","error_details":"Failed to create a new run.: InternalServerError: Failed to store run classfier-9xbrk to table: Error 1366: Incorrect string value: '\xE8\xBF\x99\xE4\xB8\x80...' for column 'WorkflowRuntimeManifest' at row 1"}]}
When I delete these two lines of code, pipeline can successfully commit and run.
with dsl.ParallelFor(forlist) as item_index: for_outter_func(item_index,ds_input=all_datas_res.output)

Delete inline comments.
The following writing style is the cause of problems in kfp.
from kfp.components import create_component_from_func
# This comment is OK.
def inputdata_outputtable(input_arr, output_path):
import numpy as np
# This comment is also OK.
_input_arr = np.array(input_arr) # This comment causes an error.
np.save(output_path, _input_arr)

Related

SQLAlchemy and SQLite3: Error if database file does not exist

I would like SQLAlchemy to return an error if the underlying SQLite3 database file does not exist.
I've looked around, and tried:
#!/usr/bin/env python3
from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class SomeTable(Base):
id = Column(Integer)
DB_SPECIFIER = 'sqlite+pysqlite:////tmp/non-exist.db?mode=rw'
engine = create_engine(DB_SPECIFIER, echo=False, future=True, connect_args={'uri': True})
session = Session(engine)
x = session.query(SomeTable)
I'd like the create_engine call to fail if /tmp/non-exist.db does not exist. I thought using this answer would work, but it did not.
Looks like it's in the docs, though fairly hidden:
https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#uri-connections
So you'd do:
DB_SPECIFIER = 'sqlite:///file:/tmp/non-exist.db?mode=ro&uri=true'
engine = create_engine(DB_SPECIFIER, echo=False, future=True)
It picks this apart and sends arguments to the connection, the rest through to the URI. (You can also add some lock disabling and such there, if that helps, since it's read only).

I am trying to extract values and place them in a .db to use later, however my code has bugged and I can no longer even load SQLite

Using :
-IronPython
-AutoDesk
-Revit (PyRevit)
-Revit API
-SQLite3
My code is as follows:
try:
conn = sqlite3.connect('SQLite_Python.db')
c = conn.cursor()
print("connected")
Insert_Volume = """INSERT INTO Column_Coordinates
(x, y, z)
VALUES
(1, 2, 3)"""
count = c.execute(Insert_Volume)
conn.commit()
print("Volume values inserted", c.rowcount)
c.close()
except sqlite3.Error as error:
print("Failed to insert data into sqlite table", error)
finally:
if (conn):
conn.close()
print("The SQLite connection is closed")'''
This code used to work within PyRevit, but now does not, with the following error:
Exception : System.IO.IOException: Could not add reference to assembly IronPython.SQLite
Please advise, this is one of the early steps of a large project and therefore is delaying my work quite a bit.
I look forward to your reply.

Python 3.7- PhantomJS - Driver.get(url) with 'Window handle/name is invalid or closed?'

Using two functions to scrape a website results in a driver.get error.
I've tried different variations of while and for loops to get this to work. Now I get a driver.get error. The initial function works on its own, but when running both functions one after another I get this error.
import requests, sys, webbrowser, bs4, time
import urllib.request
import pandas as pd
from selenium import webdriver
driver = webdriver.PhantomJS(executable_path = 'C:\\PhantomJS\\bin\\phantomjs.exe')
jobtit = 'some+job'
location = 'some+city'
urlpag = ('https://www.indeed.com/jobs?q=' + jobtit + '&l=' + location + '%2C+CA')
def initial_scrape():
data = []
try:
driver.get(urlpag)
results = driver.find_elements_by_tag_name('h2')
print('Finding the results for the first page of the search.')
for result in results: # loop 2
job_name = result.text
link = result.find_element_by_tag_name('a')
job_link = link.get_attribute('href')
data.append({'Job' : job_name, 'link' : job_link})
print('Appending the first page results to the data table.')
if result == len(results):
return
except Exception:
print('An error has occurred when trying to run this script. Please see the attached error message and screenshot.')
driver.save_screenshot('screenshot.png')
driver.close()
return data
def second_scrape():
data = []
try:
#driver.get(urlpag)
pages = driver.find_element_by_class_name('pagination')
print('Variable nxt_pg is ' + str(nxt_pg))
for page in pages:
page_ = page.find_element_by_tag_name('a')
page_link = page_.get_attribute('href')
print('Taking a look at the different page links..')
for page_link in range(1,pg_amount,1):
driver.click(page_link)
items = driver.find_elements_by_tag_name('h2')
print('Going through each new page and getting the jobs for ya...')
for item in items:
job_name = item.text
link = item.find_element_by_tag_name('a')
job_link = link.get_attribute('href')
data.append({'Job' : job_name, 'link' : job_link})
print('Appending the jobs to the data table....')
if page_link == pg_amount:
print('Oh boy! pg_link == pg_amount...time to exit the loops')
return
except Exception:
print('An error has occurred when trying to run this script. Please see the attached error message and screenshot.')
driver.save_screenshot('screenshot.png')
driver.close()
return data
Expected:
Initial Function
Get website from urlpag
Find element by tag name and loop through elements while appending to a list.
When done will all elements exit and return the list.
Second Function
While still on urlpag, find element by class name and get the links for the next pages to scrape.
As we have each page to scrape, go through each page scraping and appending the elements to a different table.
Once we reach our pg_amount limit - exit and return the finalized list.
Actual:
Initial Function
Get website from urlpag
Find element by tag name and loop through elements while appending to a list.
When done will all elements exit and return the list.
Second Function
Finds class pagination, prints nxt_variable and then throws the error below.
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\Scripts\Indeedscraper\indeedscrape.py", line 23, in initial_scrape
driver.get(urlpag)
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: {"errorMessage":"Currently Window handle/name is invalid (closed?)"
For individuals having this error, I ended up switching to chromedriver and using that instead for webscraping. It appears that using the PhantomJS driver will sometimes return this error.
I was having the same issue, until I placed my driver.close() after I was done interacting with selenium objects. I ended up closing the driver at the end of my script just to be on the safe side.

Retrieving values from SQLite3 Database for Scotty

I'm trying to get information from a SQLite DB (HDBC.sqlite3) to feed to a web view using the Scotty framework. I'm currently trying to complete a "grab all" or rather select all the information from the table and then return it to display on my web page that is running via Scotty. I've encountered an error and I'm having some trouble figuring out how to fix it.
Here is my error:
Controllers/Home.hs:42:44:
Couldn't match expected type `Data.Text.Lazy.Internal.Text'
with actual type `IO [[(String, SqlValue)]]'
In the expression: getUsersDB
In the first argument of `mconcat', namely
`["<p>/users/all</p><p>", getUsersDB, "</p>"]'
In the second argument of `($)', namely
`mconcat ["<p>/users/all</p><p>", getUsersDB, "</p>"]'
Here is my code:
import Control.Monad
import Web.Scotty (ScottyM, ActionM, get, html, param, text)
import Data.Monoid (mconcat)
import Controllers.CreateDb (createUserDB)
import Database.HDBC
import Database.HDBC.Sqlite3
import Control.Monad.Trans ( MonadIO(liftIO) )
import Data.Convertible
getAllUsers :: ScottyM()
getAllUsers = get "/users/all" $ do
html $ mconcat ["<p>/users/all</p><p>", getUsersDB , "</p>"]
getUsersDB = do
conn <- connectSqlite3 databaseFilePath
stmt <- prepare conn "SELECT name FROM users VALUES"
results <- fetchAllRowsAL stmt
disconnect conn
return (results)
run returns the number of rows modified :
https://hackage.haskell.org/package/HDBC-2.4.0.0/docs/Database-HDBC.html#v:run

incremental select using id from SQLite in Twisted

I am trying to select data from a table in SQLite one row ONLY at a time for each call to the function, and I want the row to increment on each call (self.count is initialized elsewhere and 'line' is irrelevant here) I am using an adbapi connection pool in Twisted to connect to the DB. Here is the code I have tried:
def queryBTData4(self,line):
self.count=self.count+1
uuId=self.count
query="SELECT co2_data, patient_Id FROM btdata4 WHERE uid=:uid",{"uid": uuId}
d = self.dbpool.runQuery(query)
return d
This code works if I just set uid=1 or any other number in the DB (I used autoincrement for uid when I created the DB) but when I try to assign a value to uid (i.e. self.count via uuId) it reports that the operator has to be string or unicode.(I have tried both but it does not seem to help) However, I know that the query statement above works just fine in a previous program when I use a cursor and the execute command but I cannot see why it does not work here. I have tried all sorts of combinations and searched for a solution but have not found anything yet that works.(I have also tried the statement with brackets and other forms)
Thanks for any help or advice.
Here is the entire code:
from twisted.internet import protocol, reactor
from twisted.protocols import basic
from twisted.enterprise import adbapi
import sqlite3, time
class ServerProtocol(basic.LineReceiver):
def __init__(self):
self.conn = sqlite3.connect('biomed2.db',check_same_thread=False)
self.dbpool = adbapi.ConnectionPool("sqlite3" , 'biomed2.db', check_same_thread=False)
def connectionMade(self):
self.sendLine("conn made")
factory = protocol.ClientFactory()
factory.protocol = ClientProtocol
factory.originator = self
reactor.connectTCP('localhost', 1234, factory)
def lineReceived(self, line):
self._received = line
self.insertBTData4(self._received)
self.sendLine("line recvd")
def forwardLine(self, recipient):
recipient.sendLine(self._received)
def insertBTData4(self,data):
print "data in insert is",data
chx=data
PID=2
device_ID=5
query="INSERT INTO btdata4(co2_data,patient_Id, sensor_Id) VALUES ('%s','%s','%s')" % (chx, PID, device_ID)
dF = self.dbpool.runQuery(query)
return dF
class ClientProtocol(basic.LineReceiver):
def __init__(self):
self.conn = sqlite3.connect('biomed2.db',check_same_thread=False)
self.dbpool = adbapi.ConnectionPool("sqlite3" , 'biomed2.db', check_same_thread=False)
self.count=0
def connectionMade(self):
print "server-client made connection with client"
self.factory.originator.forwardLine(self)
#self.transport.loseConnection()
def lineReceived(self, line):
d=self.queryBTData4(self)
d.addCallbacks(self.sendData,self.printError )
def queryBTData4(self,line):
self.count=self.count+1
query=("SELECT co2_data, patient_Id FROM btdata4 WHERE uid=:uid",{"uid": uuId})
dF = self.dbpool.runQuery(query)
return dF
def sendData(self,line):
data=str(line)
self.sendLine(data)
def printError(self,error):
print "Got Error: %r" % error
error.printTraceback()
def main():
factory = protocol.ServerFactory()
factory.protocol = ServerProtocol
reactor.listenTCP(4321, factory)
reactor.run()
if __name__ == '__main__':
main()
The DB is created in another program, thus:
import sqlite3, time, string
conn = sqlite3.connect('biomed2.db')
c = conn.cursor()
c.execute('''CREATE TABLE btdata4
(uid INTEGER PRIMARY KEY, co2_data integer, patient_Id integer, sensor_Id integer)''')
The main program takes data into the server socket and inserts into DB. On the client socket side, data is removed from the DB one line at a time and sent to an external server. The program also has the ability to send data from the server side to the client side if required but I am not doing so here at the moment.
In queryBTData(), every time the function is called the count increments and I assign that value to uuId, which I then pass to the query. I have had this query statement working in a program where I do not use the adbapi but it does not seem to work here. I hope this is clear enough but if not please let me know and I will try again.
EDIT:
I have modified the program to take one row from the DB at a time (see queryBTData() below) but have come across another problem.
def queryBTData4(self,line):
self.count=self.count+1
xuId= self.count
#xuId=10
return self.dbpool.runQuery("SELECT co2_data FROM btdata4 WHERE uid = ?",xuId)
#return self.dbpool.runQuery("SELECT co2_data FROM btdata4 WHERE uid = 10")
When the count gets to 10 I get an error (which I will post below) which states that: "Incorrect number of bindings supplied. The current statement uses 1, and there are 2 supplied"
I have tried setting xuId to 10 (see commented out line xuId=10) but I still get the same error. However, if I switch the return statements (to commented out return) I do indeed get correct row with no error. I have tried converting xuId to unicode but that makes no difference, I still get the same error. Basically, if I I set uid in the return statement to 10 or more (commented out return) it works, but if I set uid to xuId (i.e. uid=?,xuId) in the first return, it only works when xuId is below 10. The API documentation, as far as I can tell, gives no clue as to why this occurs.(I have also disabled the insert into DB to eliminate this and checked the SQLite3_ limit, which is 999)
Here are the errors I am getting when using the first return statement.
Got Error: <twisted.python.failure.Failure <class 'sqlite3.ProgrammingError'>>
Traceback (most recent call last):
File "c:\python26\lib\threading.py", line 504, in __bootstrap
self.__bootstrap_inner()
File "c:\python26\lib\threading.py", line 532, in __bootstrap_inner
self.run()
File "c:\python26\lib\threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
File "c:\python26\lib\site-packages\twisted\python\threadpool.py", line 207, i
n _worker
result = context.call(ctx, function, *args, **kwargs)
File "c:\python26\lib\site-packages\twisted\python\context.py", line 118, in c
allWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "c:\python26\lib\site-packages\twisted\python\context.py", line 81, in ca
llWithContext
return func(*args,**kw)
File "c:\python26\lib\site-packages\twisted\enterprise\adbapi.py", line 448, i
n _runInteraction
result = interaction(trans, *args, **kw)
File "c:\python26\lib\site-packages\twisted\enterprise\adbapi.py", line 462, i
n _runQuery
trans.execute(*args, **kw)
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current sta
tement uses 1, and there are 2 supplied.
Thanks.
Consider the API documentation for runQuery. Next, consider the difference between these three function calls:
c = a, b
f(a, b)
f((a, b))
f(c)
Finally, don't paraphrase error messages. Always quote them verbatim. Copy/paste whenever possible; make a note when you've manually transcribed them.

Resources