Caused a HMError when running HTKDemo - htk

I just installed HTS-2.2 and HTK-3.4.1 in my 64bit Ubuntu with gcc 3.4 compiler. After that, typing HInit and HCopy etc. seemed work. So I wondered how the HTKDemo works.
When I run the demo, HTK caused a problem:
HMM Def Error: <Mean> symbol expected in GetMean at line 6/col 11/char 120 in proto/L
ERROR [+7050] HMError:
HMM Def Error: GetMean Failed at line 6/col 12/char 121 in proto/L
ERROR [+7050] HMError:
HMM Def Error: Regression Class Number expected at line 7/col 0/char 122 in proto/L
ERROR [+7050] HMError:
HMM Def Error: GetMixtures failed at line 7/col 1/char 123 in proto/L
ERROR [+7050] HMError:
HMM Def Error: Get Stream Information failed at line 7/col 2/char 124 in proto/L
ERROR [+7050] HMError:
HMM Def Error: GetStream failed at line 7/col 3/char 125 in proto/L
ERROR [+7050] HMError:
HMM Def Error: GetStateInfo failed at line 7/col 4/char 126 in proto/L
ERROR [+7050] HMError:
ERROR [+7032] LoadHMMSet: GetHMMDef failed
ERROR [+2128] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HInit
Source Directory Empty hmms/hmm.0
The same to the proto S, V, N, C.
I really want to know the reason and how to fix it? Thx!

I got the same +7050 error, in my case was a mispelled hmm definition file, probably you have the same problem since the error reported: " symbol expected".
Just check your definition file had the right format as follows:
~h (phoneme name)
<BEGINHMM>
<NUMSTATES> (NStates)
<STATE> 2 (number starting from 2, ending NStates-1 )
<MEAN> 13 (or any number defined)
-4.717658e+000 ...
<VARIANCE> 13 (the same as mean possibly)
4.735534e+001 ...
<STATE> 3 ....
...
<GCONST> 1.269744e+002
<TRANSP> 3 (again not important)
0.0 1.0 0.0
0.0 0.9 0.1
0.0 0.0 0.0
<ENDHMM>
~h (next phoneme )
...

Related

Getting error while installing lme4 package in R in MacOSX Monterey

I am getting the following error when trying to install the lme4 package in R.
install.packages("lme4")
Installing package into ‘/usr/local/lib/R/4.2/site-library’
(as ‘lib’ is unspecified)
also installing the dependency ‘minqa’
trying URL 'https://cloud.r-project.org/src/contrib/minqa_1.2.4.tar.gz'
Content type 'application/x-gzip' length 53548 bytes (52 KB)
==================================================
downloaded 52 KB
trying URL 'https://cloud.r-project.org/src/contrib/lme4_1.1-29.tar.gz'
Content type 'application/x-gzip' length 3306026 bytes (3.2 MB)
==================================================
downloaded 3.2 MB
* installing *source* package ‘minqa’ ...
** package ‘minqa’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
/usr/local/opt/gcc/bin/gfortran -fno-optimize-sibling-calls -fPIC -g -O2 -c altmov.f -o altmov.o
altmov.f:42:72:
42 | 10 HCOL(K)=ZERO
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 10 at (1)
altmov.f:45:72:
45 | DO 20 K=1,NPT
| 1
Warning: Fortran 2018 deleted feature: Shared DO termination label 20 at (1)
altmov.f:46:72:
46 | 20 HCOL(K)=HCOL(K)+TEMP*ZMAT(K,J)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 20 at (1)
altmov.f:53:72:
53 | 30 GLAG(I)=BMAT(KNEW,I)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 30 at (1)
altmov.f:57:72:
57 | 40 TEMP=TEMP+XPT(K,J)*XOPT(J)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 40 at (1)
altmov.f:59:72:
59 | DO 50 I=1,N
| 1
Warning: Fortran 2018 deleted feature: Shared DO termination label 50 at (1)
altmov.f:60:72:
60 | 50 GLAG(I)=GLAG(I)+TEMP*XPT(K,I)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 50 at (1)
altmov.f:76:72:
76 | 60 DISTSQ=DISTSQ+TEMP*TEMP
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 60 at (1)
altmov.f:172:72:
172 | 90 XNEW(I)=DMAX1(SL(I),DMIN1(SU(I),TEMP))
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 90 at (1)
altmov.f:237:72:
237 | 140 GW=GW+GLAG(I)*W(I)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 140 at (1)
altmov.f:248:72:
248 | 150 TEMP=TEMP+XPT(K,J)*W(J)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 150 at (1)
altmov.f:249:72:
249 | 160 CURV=CURV+HCOL(K)*TEMP*TEMP
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 160 at (1)
altmov.f:255:72:
255 | 170 XALT(I)=DMAX1(SL(I),DMIN1(SU(I),TEMP))
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 170 at (1)
altmov.f:268:72:
268 | 180 W(N+I)=XALT(I)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 180 at (1)
altmov.f:275:72:
275 | 190 XALT(I)=W(N+I)
| 1
Warning: Fortran 2018 deleted feature: DO termination statement which is not END DO or CONTINUE with label 190 at (1)
clang (LLVM option parsing): Unknown command line argument '-x86-pad-for-align=false'. Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--x86-slh-loads=false'?
make: *** [altmov.o] Error 1
ERROR: compilation failed for package ‘minqa’
* removing ‘/usr/local/lib/R/4.2/site-library/minqa’
ERROR: dependency ‘minqa’ is not available for package ‘lme4’
* removing ‘/usr/local/lib/R/4.2/site-library/lme4’
The downloaded source packages are in
‘/private/var/folders/02/6dwz1gvd1qz4gml8v_wv6n980000gp/T/RtmpVREiYV/downloaded_packages’
Warning messages:
1: In install.packages("lme4") :
installation of package ‘minqa’ had non-zero exit status
2: In install.packages("lme4") :
installation of package ‘lme4’ had non-zero exit status
I recently upgraded my OS to MacOSX Monterey (12.3.1) in a MacBook 2020 (intel chip).
I have tried reinstalling gcc and gccfortran using brew with no effect.
Can anyone help me with the issue?
Removing xcode command line tools and reinstalling it solved the issue.
$ xcode-select --print-path
/Library/Developer/CommandLineTools
$ sudo rm -r -f /Library/Developer/CommandLineTools
$ xcode-select --install

while trying to convert a string to date : An error occurred while calling o140.showString ; could not parse at index 0?

i have a column date in the format 1/1/15 (month / day / year) without leading zeros and 15 instead of 2015.
i tried
data = data.withColumn('date' , to_date(unix_timestamp(data['date'], 'MM-dd-yyyy').cast("timestamp")))
data.orderBy('date').show()
it gives NULL in the date column (maybe because of the year format; i tried with MM-dd-yy , M-d-yy too)
so i tried
data = data.withColumn('date' , regexp_replace('date', '15', '2015'))
data = data.withColumn('date' , regexp_replace('date', '/2015/', '-15-'))
data = data.withColumn('date' , regexp_replace('date' , '/' , '-'))
now I have the date as 1-1-2015 and then when i tried the code from above , it shows the following error:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-39-470368900f3b> in <module>
----> 1 data.orderBy('date').show()
C:\Users\Admin\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py in show(self, n, truncate, vertical)
438 """
439 if isinstance(truncate, bool) and truncate:
--> 440 print(self._jdf.showString(n, 20, vertical))
441 else:
442 print(self._jdf.showString(n, int(truncate), vertical))
C:\Users\Admin\Anaconda3\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
C:\Users\Admin\Anaconda3\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
126 def deco(*a, **kw):
127 try:
--> 128 return f(*a, **kw)
129 except py4j.protocol.Py4JJavaError as e:
130 converted = convert_exception(e.java_exception)
C:\Users\Admin\Anaconda3\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o140.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 34.0 failed 1 times, most recent failure: Lost task 1.0 in stage 34.0 (TID 54, DESKTOP-IQ36PJF, executor driver): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '5-17-2015' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:150)
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:141)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:86)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.parse(TimestampFormatter.scala:77)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:31)
at org.sparkproject.guava.collect.Ordering.leastOf(Ordering.java:628)
at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
at org.apache.spark.rdd.RDD.$anonfun$takeOrdered$2(RDD.scala:1492)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.time.format.DateTimeParseException: Text '5-17-2015' could not be parsed at index 0
at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2046)
at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1874)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:78)
... 24 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2194)
at org.apache.spark.rdd.RDD.$anonfun$reduce$1(RDD.scala:1094)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:1076)
at org.apache.spark.rdd.RDD.$anonfun$takeOrdered$1(RDD.scala:1498)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1486)
at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:183)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3627)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2697)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2697)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2904)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:300)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:337)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '5-17-2015' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:150)
at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:141)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:86)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.parse(TimestampFormatter.scala:77)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:31)
at org.sparkproject.guava.collect.Ordering.leastOf(Ordering.java:628)
at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
at org.apache.spark.rdd.RDD.$anonfun$takeOrdered$2(RDD.scala:1492)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
... 1 more
Caused by: java.time.format.DateTimeParseException: Text '5-17-2015' could not be parsed at index 0
at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2046)
at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1874)
at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:78)
... 24 more
any help regarding this would be helpful!!!
thanks !

Running embedded R code in Oracle raise error

I am a newbie in Oracle R embedded execution.
well, I have following code registered as
BEGIN
sys.rqScriptDrop('TSFORECAST');
SYS.RQSCRIPTCREATE('TSFORECAST',
'function(dat){
require(ORE)
require(forecast)
myts <- ts(dat,frequency=12)
model <- auto.arima(myts)
fmodel <- forecast(model)
fm = data.frame(fmodel$mean, fmodel$upper,fmodel$lower)
names(fm) <- c("mean","l80","l95","u80","u95")
return(fm)
}'
);
END;
as I execute the function for the first time with this code:
select *
from table(
rqTableEval(
cursor(select balance from tmp_30),
cursor(select 1 as "ore.connect" from dual),
'select 1 mean, 1 l80, 1 l95, 1 u80, 1 u95 from dual',
'TSFORECAST'
)
)
it generates the results I expected. But after that it will never produce any result but instead it raises this error:
ORA-20000: RQuery error
Error in (function () :
unused arguments (width = 480, bg = "white", type = "raster")
ORA-06512: at "RQSYS.RQTABLEEVALIMPL", line 112
ORA-06512: at "RQSYS.RQTABLEEVALIMPL", line 109
20000. 00000 - "%s"
*Cause: The stored procedure 'raise_application_error'
was called which causes this error to be generated.
*Action: Correct the problem as described in the error message or contact
the application administrator or DBA for more information.
I have searched this error but could not find anything helpful. Can anyone help me with this error?

piplinedRDD can't convert to dataframe using toDF

I have a pyspark dataframe contains rows of data seperated by comma. I want to split each row and apply LabeledPoints method to it. Then covnert it to dataframe.
Here is my code
import os.path
from pyspark.mllib.regression import LabeledPoint
import numpy as np
file_name = os.path.join('databricks-datasets', 'cs190', 'data-001', 'millionsong.txt')
raw_data_df = sqlContext.read.load(file_name, 'text')
rdd = raw_data_df.rdd.map(lambda line: line.split(',')).map(lambda seq:LabeledPoints(seq[0],seq[1:])).toDF()
It gives the following error message after apply .DF().
---------------------------------------------------------------------------
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 38.0 failed 1 times, most recent failure: Lost task 0.0 in stage 38.0 (TID 44, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
Py4JJavaError Traceback (most recent call last)
<ipython-input-65-dc4d86a8ee45> in <module>()
----> 1 rdd = raw_data_df.rdd.map(lambda line: line.split(',')).map(lambda seq:LabeledPoints(seq[0],seq[1:])).toDF()
2 print(type(rdd))
3 #print(rdd.take(5))
/databricks/spark/python/pyspark/sql/context.py in toDF(self, schema, sampleRatio)
62 [Row(name=u'Alice', age=1)]
63 """
---> 64 return sqlContext.createDataFrame(self, schema, sampleRatio)
65
66 RDD.toDF = toDF
/databricks/spark/python/pyspark/sql/context.py in createDataFrame(self, data, schema, samplingRatio)
421
422 if isinstance(data, RDD):
--> 423 rdd, schema = self._createFromRDD(data, schema, samplingRatio)
424 else:
425 rdd, schema = self._createFromLocal(data, schema)
/databricks/spark/python/pyspark/sql/context.py in _createFromRDD(self, rdd, schema, samplingRatio)
Answer found:
rdd = raw_data_df.map(lambda row: row['value'].split(',')).map(lambda seq:LabeledPoint(float(seq[0]),seq[1:])).toDF()
Here, I need to specifically reference each line of text using row['value'], even though there is only one feature in the row.

%R magic no longer works in IPython or Jupyter following update to Canopy Ver 1.7.4.3348

Previously %R and %%R magics were working in IPython and Jupyter python notebooks.
The R terminal version is:
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)
After upgrading to Version 1.7.4.3348 in Enthought Canopy, the notebooks and IPython no longer work. I have tried reinstalling following Installing RKernel and http://irkernel.github.io/installation/, which worked before. I run the command to load the R-extension as per
%load_ext rpy2.ipython
I get the error message as follows:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-691c6d73b073> in <module>()
----> 1 get_ipython().magic(u'load_ext rpy2.ipython')
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2161 magic_name, _, magic_arg_s = arg_s.partition(' ')
2162 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2163 return self.run_line_magic(magic_name, magic_arg_s)
2164
2165 #-------------------------------------------------------------------------
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2082 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2083 with self.builtin_trap:
-> 2084 result = fn(*args,**kwargs)
2085 return result
2086
<decorator-gen-64> in load_ext(self, module_str)
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/IPython/core/magics/extension.pyc in load_ext(self, module_str)
64 if not module_str:
65 raise UsageError('Missing module name.')
---> 66 res = self.shell.extension_manager.load_extension(module_str)
67
68 if res == 'already loaded':
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/IPython/core/extensions.pyc in load_extension(self, module_str)
82 if module_str not in sys.modules:
83 with prepended_to_syspath(self.ipython_extension_dir):
---> 84 __import__(module_str)
85 mod = sys.modules[module_str]
86 if self._call_load_ipython_extension(mod):
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/rpy2/ipython/__init__.py in <module>()
----> 1 from .rmagic import load_ipython_extension
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/rpy2/ipython/rmagic.py in <module>()
57 template_converter = ro.conversion.converter
58 try:
---> 59 from rpy2.robjects import pandas2ri as baseconversion
60 template_converter = template_converter + baseconversion.converter
61 except ImportError:
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/rpy2/robjects/pandas2ri.py in <module>()
7 INTSXP)
8
----> 9 from pandas.core.frame import DataFrame as PandasDataFrame
10 from pandas.core.series import Series as PandasSeries
11 from pandas.core.index import Index as PandasIndex
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/__init__.py in <module>()
20
21 # numpy compat
---> 22 from pandas.compat.numpy_compat import *
23
24 try:
/Users/Llewelyn_home/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/compat/numpy_compat.py in <module>()
13
14 # numpy versioning
---> 15 _np_version = np.version.short_version
16 _np_version_under1p8 = LooseVersion(_np_version) < '1.8'
17 _np_version_under1p9 = LooseVersion(_np_version) < '1.9'
AttributeError: 'module' object has no attribute 'version'
Could it be related to Canopy version of numpy being listed as 1.10.4-1 and the np.version result being 1.11.1 (based on error message)? Any suggestions gratefully received. PS. R works in the console still, plus in terminal and in Jupyter with an R kernel...
The support crew at Enthought examined the version of numpy, then pandas. Reinstalling both did not solve the problem. The unexplained resolution occurred from the pip install theano --upgrade command on the Canopy Terminal. Logging this error as an unexplained issue with %R but with the strong indication that it is about dependency version.

Resources