Rserve: pyServe not able to call basic R functions - r

I'm calling Rserve from python and it runs for basic operations, but not if I call basic functions as min
import pyRserve
conn = pyRserve.connect()
cars = [1, 2, 3]
conn.r.x = cars
print(conn.eval('x'))
print(conn.eval('min(x)'))
The result is:
[1, 2, 3]
Traceback (most recent call last):
File "test3.py", line 9, in <module>
print(conn.eval('min(x)'))
File "C:\Users\acastro\.windows-build-tools\python27\lib\site-packages\pyRserve\rconn.py", line 78, in decoCheckIfClosed
return func(self, *args, **kw)
File "C:\Users\acastro\.windows-build-tools\python27\lib\site-packages\pyRserve\rconn.py", line 191, in eval
raise REvalError(errorMsg)
pyRserve.rexceptions.REvalError: Error in min(x) : invalid 'type' (list) of argument
Do you know where is the problem?
Thanks

You should try min(unlist(x)).
If the list is simple, you may just try as.data.frame(x).
For some more complicate list, StackOverFlow has many other answers.

Related

How to measure the length of a call stack?

Recently had a test question asking "how deep" the call stack for fact1 where n = 5. Here is the code:
int fact1(int n)
{
if (n == 1)
{
return 1
}
else {
return n * fact(n-1)
}
}
The answer on the test was 5, but I believe it is 4. I don't believe the first call is to be counted in the number of calls.
Actually, every function call ends in the call stack.
Your example looks like C; in C, there is always a main function; even the main function ends on the call stack.
I don't think there is a way to examine the call stack in C; especially since the compiler is allowed to optimise away whatever it wants. For instance, it could optimise tail-recursion, and then the call stack would be smaller than you'd expect.
In Python the call stack is easy to examine; just crash the function whenever you want, by throwing an exception (for instance with assert(False)). Then the program will produce an error message containing the full "stack trace", including the list of every function on the stack.
Here is an example of a stack trace in python:
def fact1(n):
assert(n != 1)
return n * fact1(n-1)
def main():
f = fact1(3)
print(f)
main()
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in main
File "<stdin>", line 3, in fact1
File "<stdin>", line 3, in fact1
File "<stdin>", line 2, in fact1
AssertionError
And another example just for fun:
def print_even(n):
if (n <= 1):
print('yes' if n == 0 else 'no')
assert(False)
else:
print_odd(n-1)
def print_odd(n):
if (n <= 1):
print('yes' if n == 1 else 'no')
assert(False)
else:
print_even(n-1)
def main():
n = 5
print('Is {} even?'.format(n))
print_even(n)
main()
Output:
Is 5 even?
no
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in main
File "<stdin>", line 6, in print_even
File "<stdin>", line 6, in print_odd
File "<stdin>", line 6, in print_even
File "<stdin>", line 6, in print_odd
File "<stdin>", line 4, in print_even
AssertionError

Using writeOGR with rpy2

I am trying to use writeOGR using R called from Python 3.8.
import rpy2.robjects as ro
.....
ro.r('ttops <- .....')
ro.r('writeOGR(obj=ttops, dsn="T:/Internal/segmentation", layer="test", driver="ESRI Shapefile")')
errors with:
R[write to console]: Error in writeOGR(obj = ttops, dsn = "T:/Internal/LiDAR/crown_segmentation", :
could not find function "writeOGR"
Traceback (most recent call last):
File "C:/Users/david/PycharmProjects/main.py", line 7, in <module>
main()
File "C:/Users/david/PycharmProjects/main.py", line 4, in main
R_Packages().process()
File "C:\Users\david\PycharmProjects\model_testing\r_methods.py", line 17, in process
ro.r('writeOGR(obj=ttops, dsn="T:/Internal/segmentation", layer="test", driver="ESRI Shapefile")')
File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\site-packages\rpy2\robjects\__init__.py", line 416, in __call__
res = self.eval(p)
File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\site-packages\rpy2\robjects\functions.py", line 197, in __call__
return (super(SignatureTranslatedFunction, self)
File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\site-packages\rpy2\robjects\functions.py", line 125, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 44, in _
cdata = function(*args, **kwargs)
File "C:\Users\david\AppData\Local\Programs\Python\Python38\lib\site-packages\rpy2\rinterface.py", line 624, in __call__
raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in writeOGR(obj = ttops, dsn = "T:/Internal/segmentation", :
could not find function "writeOGR"
Am I missing something or is this a limit of rpy2? If it is a limit, what is an alternative to write shapefiles of R data using Python?
There was a library that I did not need to define in R that was needed in Python:
ro.r('library(rgdal)')

setting environment variable to numeric value leads to error in python

Trying to setup ENV variables in the following code
import os
dicta = {}
def setv(evar, evalue):
os.environ[evar] = evalue
dicta.setdefault('UENV', {}).update({evar: evalue})
# Set environment variables
setv('API_USER', 'username')
setv('API_PASSWORD', 'secret')
setv('NUMBER', 1)
on the last statement where NUMBER variable is set to numeric value 1. getting following error:
Traceback (most recent call last):
File "./pyenv.py", line 19, in <module>
setv('NUMBER', 1)
File "./pyenv.py", line 13, in setv
os.environ[evar] = evalue
File "/home/python/3.6.3/1/el-6-x86_64/lib/python3.6/os.py", line 674, in __setitem__
value = self.encodevalue(value)
File "/home/python/3.6.3/1/el-6-x86_64/lib/python3.6/os.py", line 744, in encode
raise TypeError("str expected, not %s" % type(value).__name__)
TypeError: str expected, not int
I don't want to convert the variable value to str and keep the value in int. Any thought on keeping NUMBER value as numeric 1 and do not see this error message
Environment Variables are string values. Typecasting them back into integers after you import them from your environment them is the way to go.

How to convert dict to RDD in PySpark

I am learning the Word2Vec Model to process my data.
I using Spark 1.6.0.
Using the example of the official documentation explain my problem:
import pyspark.mllib.feature import Word2Vec
sentence = "a b " * 100 + "a c " * 10
localDoc = [sentence, sentence]
doc = sc.parallelize(localDoc).map(lambda line: line.split(" "))
model = Word2Vec().setVectorSize(10).setSeed(42).fit(doc)
The vectors are as follows:
>>> model.getVectors()
{'a': [0.26699373, -0.26908076, 0.0579859, -0.080141746, 0.18208595, 0.4162335, 0.0258975, -0.2162928, 0.17868409, 0.07642203], 'b': [-0.29602322, -0.67824656, -0.9063686, -0.49016926, 0.14347662, -0.23329848, -0.44695938, -0.69160634, 0.7037, 0.28236762], 'c': [-0.08954003, 0.24668643, 0.16183868, 0.10982372, -0.099240996, -0.1358507, 0.09996107, 0.30981666, -0.2477713, -0.063234895]}
When I use the getVectors() to get the map of representation of the words. How to convert it into RDD, so I can pass it to KMeans Model?
EDIT:
I did what #user9590153 said.
>>> v = sc.parallelize(model.getVectors()).values()
# the above code is successful.
>>> v.collect()
The Spark-Shell shows another problem:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\spark-1.6.3-bin-hadoop2.6\python\pyspark\rdd.py", line 771, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "D:\spark-1.6.3-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py", line 813, in __call__
File "D:\spark-1.6.3-bin-hadoop2.6\python\pyspark\sql\utils.py", line 45, in deco
return f(*a, **kw)
File "D:\spark-1.6.3-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 8.0 failed 1 times, most recent failure: Lost task 3.0 in stage 8.0 (TID 29, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "D:\spark-1.6.3-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\worker.py", line 111, in main
File "D:\spark-1.6.3-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\worker.py", line 106, in process
File "D:\spark-1.6.3-bin-hadoop2.6\python\lib\pyspark.zip\pyspark\serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "D:\spark-1.6.3-bin-hadoop2.6\python\pyspark\rdd.py", line 1540, in <lambda>
return self.map(lambda x: x[1])
IndexError: string index out of range
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Just parallelize:
sc.parallelize(model.getVectors()).values()
Parallelized Collections will help you over here.
val data = Array(1, 2, 3, 4, 5) # data here is the collection
val distData = sc.parallelize(data) # converted into rdd
For your case:
sc.parallelize(model.getVectors()).values()
For your doubt:
The action collect() is the common and simplest operation that returns our entire RDDs content to driver program.
The application of collect() is unit testing where the entire RDD is expected to fit in memory. As a result, it makes easy to compare the result of RDD with the expected result.
Action Collect() had a constraint that all the data should fit in the machine, and copies to the driver.
So, you can not perform collect on RDD

Tuple index out of range Tkinter

So I've got a program that should take a function as input and graph it on a Tkinter canvas.
def draw(self):
self.canvas.delete(ALL)
for n, i in enumerate(self.sav):
self.function, colour = self.sav_func[n]
i = self.p1(i)
i = self.p2(i, self.function, colour)
if i != [0]:
try:
self.canvas.create_line(i, fill = colour)
except TclError as err:
tkMessageBox.showerror(TclError, err)
self.sav.remove(self.sav[len(self.sav)-1])
self.sav_func.remove(self.sav_func[len(self.sav_func)-1])
This section is giving me the following error:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1410, in __call__
return self.func(*args)
File "D:/Google Drive/assign2_2-1.py", line 113, in add_func
self.redraw_all()
File "D:/Google Drive/assign2_2-1.py", line 132, in redraw_all
self.draw()
File "D:/Google Drive/assign2_2-1.py", line 145, in draw
self.canvas.create_line(i, fill = colour)
File "C:\Python27\lib\lib-tk\Tkinter.py", line 2201, in create_line
return self._create('line', args, kw)
File "C:\Python27\lib\lib-tk\Tkinter.py", line 2182, in _create
cnf = args[-1]
IndexError: tuple index out of range
From what I can gather it's something to do with the number of inputs not matching the number of outputs, but I'm still a little lost. Help would be great!
it looks like i doesn't have enough values. To create a line it needs four values: x1,y1,x2,y2.

Resources