pyspark: namespace in jar file not found - jar

I'm trying to import classes in external jar with PySpark, I'm running the spark-shell with --jars and the path to the jar that contains the classes I want to use.
However, when I import a class inside my code, the namespace is not found:
from io.warp10.spark import WarpScriptFilterFunction
The error:
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "warp10-test.py", line 1, in <module>
from io.warp10.spark import WarpScriptFilterFunction
ImportError: No module named warp10.spark

You have to use a WarpScriptâ„¢ UDF if you want to run warpscript on Spark.
Here is an example:
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql.types import StringType
from pyspark.sql.types import ArrayType
spark = SparkSession.builder.appName("WarpScript Spark Test").getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
sqlContext.registerJavaFunction("foo", "io.warp10.spark.WarpScriptUDF3", ArrayType(StringType()))
print sqlContext.sql("SELECT foo('SNAPSHOT \"Easy!\"', 3.14, 'pi')").collect()
For more information, see: https://www.warp10.io/content/05_Ecosystem/04_Data_Science/06_Spark/02_WarpScript_PySpark

Related

How to execute parallel test cases in Unittest framework in python?

I've tried the following method in test suite file:
import unittest
import xmlrunner
from ui_tests.test_prelogin import PreLoginTests
login_tests = unittest.TestLoader().loadTestsFromTestCase(PreLoginTests)
test_suite = unittest.TestSuite([login_tests])
unittest.main(testRunner=xmlrunner.XMLTestRunner(output='test-reports')).run(test_suite)

Module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'

I'm trying to convert R-dataframe to Python Pandas DataFrame.
I use the following code:
from rpy2.robjects import pandas2ri
pandas2ri.activate()
r_dataframe = r_function(my_dataframe['Numbers'])
print(r_dataframe)
python_dataframe = pandas2ri.ri2py(r_dataframe)
The above code works well in Jupyter Notebook (Anaconda). But if I run this code through a my_program.py file through the terminal, I get an error:
:~$ python3 my_program.py
Traceback (most recent call last):
File "my_program.py", line 223, in <module>
python_dataframe = pandas2ri.ri2py(r_dataframe)
AttributeError: module 'rpy2.robjects.pandas2ri' has no attribute 'ri2py'
Line of code: print(r_dataframe) shows right result in the terminal.
If I try to use code print(dir(pandas2ri)) in Jupyter Notebook I get ('ri2py'):
['DataFrame', 'FactorVector', 'FloatSexpVector', 'INTSXP', 'ISOdatetime', 'IntSexpVector', 'IntVector', 'ListSexpVector', 'ListVector', 'OrderedDict', 'POSIXct', 'PandasDataFrame', 'PandasIndex', 'PandasSeries', 'SexpVector', 'StrSexpVector', 'StrVector', 'Vector', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'activate', 'as_vector', 'conversion', 'converter', 'datetime', 'deactivate', 'dt_O_type', 'dt_datetime64ns_type', 'get_timezone', 'numpy', 'numpy2ri', 'original_converter', 'os', 'pandas', 'py2ri', 'py2ri_categoryseries', 'py2ri_pandasdataframe', 'py2ri_pandasindex', 'py2ri_pandasseries', 'py2ro', 'pytz', 'recarray', 'ri2py', 'ri2py_dataframe', 'ri2py_floatvector', 'ri2py_intvector', 'ri2py_listvector', 'ri2py_vector', 'ri2ro', 'rinterface', 'ro', 'warnings']
And if I try to use the same code print(dir(pandas2ri)) in Terminal I get ('rpy2py'):
['DataFrame', 'FactorVector', 'FloatSexpVector', 'ISOdatetime', 'IntSexpVector', 'IntVector', 'ListSexpVector', 'OrderedDict', 'POSIXct', 'PandasDataFrame', 'PandasIndex', 'PandasSeries', 'Sexp', 'SexpVector', 'StrSexpVector', 'StrVector', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'activate', 'as_vector', 'conversion', 'converter', 'datetime', 'deactivate', 'default_timezone', 'dt_O_type', 'get_timezone', 'is_datetime64_any_dtype', 'numpy', 'numpy2ri', 'original_converter', 'pandas', 'py2rpy', 'py2rpy_categoryseries', 'py2rpy_pandasdataframe', 'py2rpy_pandasindex', 'py2rpy_pandasseries', 'pytz', 'ri2py_vector', 'rinterface', 'rpy2py', 'rpy2py_dataframe', 'rpy2py_floatvector', 'rpy2py_intvector', 'rpy2py_listvector', 'tzlocal', 'warnings']
It turns out the developers have changed the name of the functions.
Since no one bothered to write down the way to do it with newer versions of rpy2:
Conversion is done using a localconverter block which automatically converts from pandas dataframe to r dataframe and back.
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
pd_df = pd.DataFrame({'int_values': [1,2,3],
'str_values': ['abc', 'def', 'ghi']})
base = importr('base')
with localconverter(ro.default_converter + pandas2ri.converter):
df_summary = base.summary(pd_df)
You are likely using documentation/code written for a different version of rpy2 than what you have installed.
If using the latest release, consider checking the documentation for it:
https://rpy2.github.io/doc/v3.0.x/html/generated_rst/pandas.html
For anyone having issues with localconverter, here is an alternative way to convert a df of type rpy2.robjects.vectors.ListVector to a pandas dataframe. This solution drops columns names
pd.Dataframe(np.array(df).reshape((nrows, ncols)))

Running all python unittests from different modules as a suite having parameterized class constructor

I am working on a design pattern to make my python unittest as a POM, so far I have written my page classes in modules HomePageObject.py,FilterPageObject.py, my base class (for common stuff)TestBase in BaseTest.py, my testcase modules are TestCase1.py and TestCase2.py and one runner module runner.py.
In runner class i am using loader.getTestCaseNames to get all the tests from a testcase class of a module. In both the testcase modules the name of the test class is same 'Test' and also the method name is same 'testName'
Since the names are confilicting while importing it in runner, only one test is getting executed. I want python to scan all the modules that i specify for tests in them and run those even the name of classes are same.
I got to know that nose might be helpful in this, but not sure how i can implement it here. Any advice ?
BaseTest.py
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ChromeOptions
import unittest
class TestBase(unittest.TestCase):
driver = None
def __init__(self,testName,browser):
self.browser = browser
super(TestBase,self).__init__(testName)
def setUp(self):
if self.browser == "firefox":
TestBase.driver = webdriver.Firefox()
elif self.browser == "chrome":
options = ChromeOptions()
options.add_argument("--start-maximized")
TestBase.driver = webdriver.Chrome(chrome_options=options)
self.url = "https://www.airbnb.co.in/"
self.driver = TestBase.getdriver()
TestBase.driver.implicitly_wait(10)
def tearDown(self):
self.driver.quit()
#staticmethod
def getdriver():
return TestBase.driver
#staticmethod
def waitForElementVisibility(locator, expression, message):
try:
WebDriverWait(TestBase.driver, 20).\
until(EC.presence_of_element_located((locator, expression)),
message)
return True
except:
return False
TestCase1.py and TestCase2.py (same)
from airbnb.HomePageObject import HomePage
from airbnb.BaseTest import TestBase
class Test(TestBase):
def __init__(self,testName,browser):
super(Test,self).__init__(testName,browser)
def testName(self):
try:
self.driver.get(self.url)
h_page = HomePage()
f_page = h_page.seachPlace("Sicily,Italy")
f_page.selectExperience()
finally:
self.driver.quit()
runner.py
import unittest
from airbnb.TestCase1 import Test
from airbnb.TestCase2 import Test
loader = unittest.TestLoader()
test_names = loader.getTestCaseNames(Test)
suite = unittest.TestSuite()
for test in test_names:
suite.addTest(Test(test,"chrome"))
runner = unittest.TextTestRunner()
result = runner.run(suite)
Also even that one test case is getting passed, some error message is coming
Ran 1 test in 9.734s
OK
Traceback (most recent call last):
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\runfiles.py", line 275, in <module>
main()
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\runfiles.py", line 97, in main
return pydev_runfiles.main(configuration) # Note: still doesn't return a proper value.
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\_pydev_runfiles\pydev_runfiles.py", line 874, in main
PydevTestRunner(configuration).run_tests()
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\_pydev_runfiles\pydev_runfiles.py", line 773, in run_tests
all_tests = self.find_tests_from_modules(file_and_modules_and_module_name)
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\_pydev_runfiles\pydev_runfiles.py", line 629, in find_tests_from_modules
suite = loader.loadTestsFromModule(m)
File "C:\Python27\lib\unittest\loader.py", line 65, in loadTestsFromModule
tests.append(self.loadTestsFromTestCase(obj))
File "C:\Python27\lib\unittest\loader.py", line 56, in loadTestsFromTestCase
loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
TypeError: __init__() takes exactly 3 arguments (2 given)
I did this by searching for all the modules of test classes with a pattern and then used __import__(modulename) and called its Test class with desired parameters,
Here is my runner.py
import unittest
import glob
loader = unittest.TestLoader()
suite = unittest.TestSuite()
test_file_strings = glob.glob('Test*.py')
module_strings = [str[0:len(str)-3] for str in test_file_strings]
for module in module_strings:
mod = __import__(module)
test_names =loader.getTestCaseNames(mod.Test)
for test in test_names:
suite.addTest(mod.Test(test,"chrome"))
runner = unittest.TextTestRunner()
result = runner.run(suite)
This worked but still looking for some organized solutions.
(Not sure why second time its showing Ran 0 tests in 0.000s )
Finding files... done.
Importing test modules ... ..done.
----------------------------------------------------------------------
Ran 2 tests in 37.491s
OK
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK

from authorizenet.apicontrollers import createTransactionController :: ImportError: cannot import name requests

I am using authorize.net python lib for making payment gateway. I am following this link:
https://github.com/AuthorizeNet/sample-code-python
The code is like: PaymentTransactions/charge-credit-card.py
from authorizenet import apicontractsv1 <br>
from authorizenet.apicontrollers import createTransactionController <br>
def charge_credit_card(amount):<br>
"""
Charge a credit card
"""
When I run this file, code throwing this error:
Traceback (most recent call last): File
"C:/development/learning/VueJsLearning/Payment-system/sample-code-python/PaymentTransactions/charge-credit-card.py", line 10, in
from authorizenet.apicontrollers import createTransactionController File
"C:\Python27\lib\site-packages\authorizenet\apicontrollers.py", line
9, in
from authorizenet import apicontrollersbase File "C:\Python27\lib\site-packages\authorizenet\apicontrollersbase.py",
line 11, in
from pip._vendor import requests File "C:\Python27\lib\site-packages\pip_vendor\requests__init__.py", line
83, in
from pip._internal.compat import WINDOWS File "C:\Python27\lib\site-packages\pip_internal__init__.py", line 42, in
from pip._internal import cmdoptions File "C:\Python27\lib\site-packages\pip_internal\cmdoptions.py", line 16,
in
from pip._internal.index import ( File "C:\Python27\lib\site-packages\pip_internal\index.py", line 15, in
from pip._vendor import html5lib, requests, six ImportError: cannot import name requests

PyQt4: How to load compiled ui-files correctly?

I created a Qt resource file with all my ui files inside of it, I compiled that with pyrcc4 command-line in a python file, and then I loaded the ui files using loadUi. Here an example:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import os
import sys
from PyQt4.QtCore import Qt, QFile
from PyQt4.uic import loadUi
from PyQt4.QtGui import QDialog
from xarphus.gui import ui_rc
# I import the compiled qt resource file named ui_rc
BASE_PATH = os.path.dirname(os.path.abspath(__file__))
#UI_PATH = os.path.join(BASE_PATH, 'gui', 'create_user.ui')
UI_PATH = QFile(":/ui_file/create_user.ui")
# I want to load those compiled ui files,
# so I just create QFile.
class CreateUser_Window(QDialog):
def __init__(self, parent):
QDialog.__init__(self, parent)
# I open the created QFile
UI_PATH.open(QFile.ReadOnly)
# I read the QFile and load the ui file
self.ui_create_user = loadUi(UI_PATH, self)
# After then I close it
UI_PATH.close()
Well its works fine, but I have a problem. When I open the GUI-window once, everything works fine. After closing the window I try to open the same GUI-window again, I get ja long traceback.
Traceback (most recent call last): File "D:\Dan\Python\xarphus\xarphus\frm_mdi.py", line 359, in
create_update_form self.update_form = Update_Window(self) File
"D:\Dan\Python\xarphus\xarphus\frm_update.py", line 135, in init
self.ui_update = loadUi(UI_PATH, self) File
"C:\Python27\lib\site-packages\PyQt4\uic__init__.py", line 238, in
loadUi return DynamicUILoader(package).loadUi(uifile, baseinstance,
resource_suffix) File
"C:\Python27\lib\site-packages\PyQt4\uic\Loader\loader.py", line 71,
in loadUi return self.parse(filename, resource_suffix, basedir) File
"C:\Python27\lib\site-packages\PyQt4\uic\uiparser.py", line 984, in
parse document = parse(filename) File
"C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser) File
"C:\Python27\lib\xml\etree\ElementTree.py", line 657, in parse
self._root = parser.close() File
"C:\Python27\lib\xml\etree\ElementTree.py", line 1654, in close
self._raiseerror(v) File "C:\Python27\lib\xml\etree\ElementTree.py",
line 1506, in _raiseerror raise err xml.etree.ElementTree.ParseError:
no element found: line 1, column 0
Can everyone help me?
Maybe I have a solution, but I don't know if that is a perfectly pythonic.
Well, we know all python projects have an __ init __-file. We need it for initializing Python packages, right? Well I thought: Why not use this file? What did I do? I define in the __ init __ -file a function like so:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
from PyQt4.uic import loadUi
from PyQt4.QtCore import Qt, QFile
def ui_load_about(self):
uiFile = QFile(":/ui_file/about.ui")
uiFile.open(QFile.ReadOnly)
self.ui_about = loadUi(uiFile)
uiFile.close()
return self.ui_about
Now in my "About_Window"-class I do this:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import os
import sys
from PyQt4.QtCore import Qt, QFile
from PyQt4.uic import loadUi
from PyQt4.QtGui import QDialog
import __init__ as ui_file
class About_Window(QDialog):
def __init__(self, parent):
QDialog.__init__(self, parent)
self.ui_about = ui_file.ui_load_about(self)
You see I importe the package-file (__ init __-file) as ui_file and then I call the function and save the return of the function in the variable self.ui_about.
In my case I open the About_Window from a MainWindow(QMainWindow), and it looks like so:
def create_about_form(self):
self.ui_about = About_Window(self)
# Now when I try to show (show()-method) a window I get two windows
# The reason is: I open and load the ui files from compiled
# qt resorce file that was define in __init__-module.
# There is a function that opens the resource file, reads
# the ui file an closes and returns the ui file back
# That's the reason why I have commented out this method
#self.ui_about.show()
You see I commented out the show()-method. It works without this method. I only define the About_Window()-class. Well I know that isn't maybe the best solution, but it works. I can open the window again and again without traceback.
If you have a better solution or idea let me know :-)

Resources