Error: unexpected symbol in RScript - No further information provided about the line or syntax generating error - r

I have read the many posts related to R Syntax errors, but everyone points to the error message and using it to figure out where the error occurs. My situation is different in that the error is generic. See below:
Error: unexpected symbol in "RScript correlation_presalesfinal3.R"
RStudio executes it fine.
It is an incredibly simple script, and I am wondering if it has to do with how I am constructing my Postgres syntax. Does R require line break symbols between the statements (select, from, group by etc)?
That is the only thing I can thing of. I am trying to compare a separate R-generated correlation to one generated by PostgreSQL directly. This particular piece is the call to PostgreSQL to calculate correlation directly.
I appreciate your help!
Here is the code:
#Written by Laura for Standard Imp
#Install if necessary (definitely on the first run)
install.packages("RColorBrewer")
install.packages("gplots")
install.packages("RSclient")
install.packages("RPostgreSQL")
#libraries in use
library(RColorBrewer)
library(gplots)
library(RSclient)
library(RPostgreSQL)
# Establish connection to PostgreSQL using RPostgreSQL
drv <- dbDriver("PostgreSQL")
# Full version of connection setting
con <- dbConnect(drv, dbname="db",host="ip",port=5432,user="user",password="pwd")
# -----------------------------^--------^-------------------^---- -------^
myLHSRHSFinalTable <- dbGetQuery(con,"select l1.a_lhsdescription as LHS, l2.a_rhsdescription as RHS, l7.a_scenariodescription as Scenario, corr(l3.driver_metric, l4.driver_metric) as Amount from schema_name.table_name l3 join schema_name.table_name l4 on L3.Time_ID = l4.Time_ID join schema_name.opera_00004_dim_lhs l1 on l3.LHS_ID = l1.member_id join schema_name.opera_00004_dim_rhs l2 on l4.RHS_ID = l2.member_id join schema_name.opera_00004_dim_scenario l7 on l3.scenario_id = l7.member_id join schema_name.opera_00004_dim_time l8 on l3.time_id = l8.member_id where l7.a_scenariodescription = 'Actual'
group by l1.a_lhsdescription , l2.a_rhsdescription, l7.a_scenariodescription ")
myLHSRHSFinalTable
write.csv(myLHSRHSFinalTable, file = "data_load_stats_final.csv")
# Close PostgreSQL connection
dbDisconnect(con)

Your description of the problem possibly lacks enough detail for people to answer, but in my situation I ran into this same error message because I was executing Rscript from within the R shell. The documentation in R isn't always clear, as neither is the help, for indicating to the user where the commands are to be executed.
If you're working from the 'terminal' you use Rscript to run an R script, whereas if you're working from within the 'R shell' you use source() to run an R script.
As I'm still a newbie, I'm sure this answer is too much of an oversimplification, but it works to solve the basic error I was getting.
My script file called output.R can be executed from the 'terminal' command line prompt ("$") within my Linux system by the command:
$Rscript output.R
Or alternatively from within R, by first running the R shell, then executing the command at the R prompt (">")
$R
>source("output.R")

Related

Spark DataFrame does not persist when running all commands in a databricks notebook

In databricks I am using spark (in R) to import data to a spark DataFrame
sc <- spark_connect(method="databricks")
tbl_change_db(sc, "database_name")
some_var <- spark_read_table(sc,"table_name")
In a subsequent command I reference the DataFrame using sparklyr operators
refined_data_set <- sdf_read_column(some_var, 'column_name') %>% unique()
When running each command one by one everything works as expected. However, if I try to run the whole notebook, the refined_data_set command fails with:
error in evaluating the argument 'x' in selecting a method for function 'unique': org.apache.spark.sql.AnalysisException: Table or view not found: table_name
If I run the refined_data_set cell on it's own immediately after the error, it works. If I include the DataFrame declaration and refining in one cell and run the whole notebook it also works. Why is the DataFrame not persisting across all cells in the notebook when running them as one?
EDIT:
The problem seems to stem from the use of spark_read_table a second time in a command between these two. The command references a completely different database:
scp <- spark_connect(method="databricks")
tbl_change_db(scp, other_database_name)
other_var <- spark_read_table(scp,"other_table_name")
Removing this command fixes the issue, but why is it causing the issue? How can we use two DataFrames in the same notebook?

How can I unserialize a model object using PL/R in Greenplum/Postgres?

Error unserializing model object in Greenplum via PL/R
I store model objects in a greenplum database (the open source version) and I've successfully been able to serialize my model objects, insert them into a table in greenplum and unserialize when needed, but using R version 3.5 installed on my machine (local). This is the R code below that runs successfully:
Code:
fromtable = 'modelObjDevelopment'
mod.id = '7919'
model_obj <-
dbGetQuery(conn,
sprintf("SELECT val from standard.%s where model_id::int = '%s';",
fromtable, mod.id))
iter_model <- postgresqlUnescapeBytea(model_obj)
lm_obj_back <- unserialize(iter_model)
summary(lm_obj_back)
Recently, I have installed PL/R on greenplum with all the necessary libraries that I generally use. I am attempting to recreate the code I use in local R (mentioned above) to run on greenplum. After much research I have been trying to run the following transformed code, which relentlessly keeps failing and giving me the same error.
Code:
DROP FUNCTION IF EXISTS mdl_load(val bytea);
CREATE FUNCTION mdl_load(val bytea)
RETURNS text AS
$$
require("RPostgreSQL")
iter_model<-postgresqlUnescapeBytea(val)
model<-unserialize(iter_model)
return(length(val))
$$
LANGUAGE 'plr';
select length(val::bytea) as len, mdl_load(val) as t
from modelObjDevelopment
where model_id::int = 7919
At this point I don't care what I return, I just want the unserialize function to work.
Error:
[22000] ERROR: R interpreter expression evaluation error Detail: Error in unserialize(iter_model) : unknown input format Where: In PL/R function mdl_load
Hope someone had a similar issue and might have a clue for me. It seems that the bytea object changes size after being passed into Pl/R. I am new to this method and hope someone can help.
$$
require(RPostgreSQL)
## load the PostgresSQL driver
drv <- dbDriver("PostgreSQL")
## connect to the default db
con <- dbConnect(drv, dbname = 'XXX')
rows<-dbGetQuery(con, 'SELECT encode(val::bytea,'escape') from standard.modelObjDevelopment where model_id::int=1234')
iter_model<-postgresqlUnescapeBytea(rows[[model_obj_column]])
model<-unserialize(iter_model)
$$
We solved this problem together. For future people coming to this site, get and unserialize model object inside R code is the way to go.

send2cy doesn't work in Rstudio ~ Cyrest, Cyotoscape

When I run "send2cy" function in R studio, I got error.
# Basic setup
library(igraph)
library(RJSONIO)
library(httr)
dir <- "/currentdir/"
setwd(dir)
port.number = 1234
base.url = paste("http://localhost:", toString(port.number), "/v1", sep="")
print(base.url)
# Load list of edges as Data Frame
network.df <- read.table("./data/eco_EM+TCA.txt")
# Convert it into igraph object
network <- graph.data.frame(network.df,directed=T)
# Remove duplicate edges & loops
g.tca <- simplify(network, remove.multiple=T, remove.loops=T)
# Name it
g.tca$name = "Ecoli TCA Cycle"
# This function will be published as a part of utility package, but not ready yet.
source('./utility/cytoscape_util.R')
# Convert it into Cytosccape.js JSON
cygraph <- toCytoscape(g.tca)
send2cy(cygraph, 'default%20black', 'circular')
Error in file(con, "r") : cannot open the connection
Called from: file(con, "r")
But I didn't find error when I use "send2cy" function from terminal R (Run R from terminal just calling by "R").
Any advice is welcome.
I tested your script with local copies of the network data and utility script, and with updated file paths. The script ran fine for me in R Studio.
Given the error message you are seeing "Error in file..." I suspect the issue is with your local files and file paths... somehow in an R Studio-specific way?
FYI: an updated, consolidated and update set of R scripts for Cytoscape are available here: https://github.com/cytoscape/cytoscape-automation/tree/master/for-scripters/R. I don't think anything has significantly changed, but perhaps trying in a new context will resolve the issue you are facing.

Error when running (working) R script from command prompt

I am trying to run an R script from the Windows command prompt (the reason is that later on I would like to run the script by using VBA).
After having set up the R environment variable (see the end of the post), the following lines of code saved in R_code.R work perfectly:
library('xlsx')
x <- cbind(rnorm(10),rnorm(10))
write.xlsx(x, 'C:/Temp/output.xlsx')
(in order to run the script and get the resulting xlsx output, I simply type the following command in the Windows command prompt: Rscript C:\Temp\R_code.R).
Now, given that this "toy example" is working as expected, I tried to move to my main goal, which is indeed very similar (to run a simple R script from the command line), but for some reason I cannot succeed.
Again I have to use a specific R package (-copula-, used to sample some correlated random variables) and export the R output into a csv file.
The following script (R_code2.R) works perfectly in R:
library('copula')
par_1 <- list(mean=0, sd=1)
par_2 <- list(mean=0, sd=1)
myCop.norm <- ellipCopula(family='normal', dim=2, dispstr='un', param=c(0.2))
myMvd <- mvdc(myCop.norm,margins=c('norm','norm'),paramMargins=list(par_1,par_2))
x <- rMvdc(10, myMvd)
write.table(x, 'C:/Temp/R_output.csv', row.names=FALSE, col.names=FALSE, sep=',')
Unfortunately, when I try to run the same script from the command prompt as before (Rscript C:\Temp\R_code2.R) I get the following error:
Error in FUN(c("norm", "norm"))[[1L]], ...) :
cannot find the function "existsFunction"
Calls: mvdc -> mvdcCheckM -> mvd.has.marF -> vapply -> FUN
Do you have any idea idea on how to proceed to fix the problem?
Any help is highly appreciated, S.
Setting up the R environment variable (Windows)
For those of you that want to replicate the code, in order to set up the environment variable you have to:
Right click on Computer -> Properties -> Advanced System Settings -> Environment variables
Double click on 'PATH' and add ; followed by the path to your Rscript.exe folder. In my case it is ;C:\Program Files\R\R-3.1.1\bin\x64.
This is a tricky one that has bitten me before. According to the documentation (?Rscript),
Rscript omits the methods package as it takes about 60% of the startup time.
So your better solution IMHO is to add library(methods) near the top of your script.
For those interested, I solved the problem by simply typing the following in the command prompt:
R CMD BATCH C:\Temp\R_code2.R
It is still not clear to me why the previous command does not work. Anyway, once again searching into the R documentation (see here) proves to be an excellent choice!

How to read data from Cassandra with R?

I am using R 2.14.1 and Cassandra 1.2.11, I have a separate program which has written data to a single Cassandra table. I am failing to read them from R.
The Cassandra schema is defined like this:
create table chosen_samples (id bigint , temperature double, primary key(id))
I have first tried the RCassandra package (http://www.rforge.net/RCassandra/)
> # install.packages("RCassandra")
> library(RCassandra)
> rc <- RC.connect(host ="192.168.33.10", port = 9160L)
> RC.use(rc, "poc1_samples")
> cs <- RC.read.table(rc, c.family="chosen_samples")
The connection seems to succeed but the parsing of the table into data frame fails:
> cs
Error in data.frame(..dfd. = c("#\"ffffff", "#(<cc><cc><cc><cc><cc><cd>", :
duplicate row.names:
I have also tried using JDBC connector, as described here: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive
> # install.packages("RJDBC")
> library(RJDBC)
> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", "/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar", "`")
But this one fails like this:
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Even though the location to the java driver is correct
$ ls /Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
You have to download apache-cassandra-2.0.10-bin.tar.gz and cassandra-jdbc-1.2.5.jar and cassandra-all-1.1.0.jar.
There is no need to install Cassandra on your local machine; just put the cassandra-jdbc-1.2.5.jar and the cassandra-all-1.1.0.jar files in the lib directory of unziped apache-cassandra-2.0.10-bin.tar.gz. Then you can use
library(RJDBC)
drv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("D:/apache-cassandra-2.0.10/lib",
pattern="jar$",full.names=T))
That is working on my unix but not on my windows machine.
Hope that helps.
This question is old now, but since it's the one of the top hits for R and Cassandra I thought I'd leave a simple solution here, as I found frustratingly little up-to-date support for what I thought would be a fairly common task.
Sparklyr makes this pretty easy to do from scratch now, as it exposes a java context so the Spark-Cassandra-Connector can be used directly. I've wrapped up the bindings in this simple package, crassy, but it's not necessary to use.
I mostly made it to demystify the config around how to make sparklyr load the connector, and as the syntax for selecting a subset of columns is a little unwieldy (assuming no Scala knowledge).
Column selection and partition filtering are supported. These were the only features I thought were necessary for general Cassandra use cases, given CQL can't be submitted directly to the cluster.
I've not found a solution to submitting more general CQL queries which doesn't involve writing custom scala, however there's an example of how this can work here.
Right, I found an (admittedly ugly) way, simply by calling python from R, parsing the NA manually and re-assigning the data-frames names in R, like this
# install.packages("rPython")
# (don't forget to "pip install cql")
library(rPython)
python.exec("import sys")
# adding libraries from virtualenv
python.exec("sys.path.append('/Users/svend/dev/pyVe/playground/lib/python2.7/site-packages/')")
python.exec("import cql")
python.exec("connection=cql.connect('192.168.33.10', cql_version='3.0.0')")
python.exec("cursor = connection.cursor()")
python.exec("cursor.execute('use poc1_samples')")
python.exec("cursor.execute('select * from chosen_samples' )")
# coding python None into NA (rPython seem to just return nothing )
python.exec("rep = lambda x : '__NA__' if x is None else x")
python.exec( "def getData(): return [rep(num) for line in cursor for num in line ]" )
data <- python.call("getData")
df <- as.data.frame(matrix(unlist(data), ncol=15, byrow=T))
names(df) <- c("temperature", "maxTemp", "minTemp",
"dewpoint", "elevation", "gust", "latitude", "longitude",
"maxwindspeed", "precipitation", "seelevelpressure", "visibility", "windspeed")
# and decoding NA's
parsena <- function (x) if (x=="__NA__") NA else x
df <- as.data.frame(lapply(df, parsena))
Anybody has a better idea?
I had the same error message when executing Rscript with RJDBC connection via batch file (R 3.2.4, Teradata driver).
Also, when run in RStudio it worked fine in the second run but not first.
What helped was explicitly call:
library(rJava)
.jinit()
It not enough to just download the driver, you have to also download the dependencies and put them into your JAVA ClassPath (MacOS: /Library/Java/Extensions) as stated on the project main page.
Include the Cassandra JDBC dependencies in your classpath : download dependencies
As of the RCassandra package, right now it's still too primitive compared to RJDBC.

Resources