Error reading data into Spark using spraklyr::spark_read_csv - r

I'm running Spark in 'standalone' mode on a local machine in Docker containers. I have a master and two workers, each is running in its own Docker container. In each of the containers the path /opt/spark-data is mapped to the same local directory on the host.
I'm connecting to the Spark master from R using sparklyr, and I can do a few things, for example, loading data into Spark using sparklyr::copy_to.
However, I cannot get sparklyr::spark_read_csv to work. The data I'm trying to load is in the local directory that is mapped in the containers. When attaching to the running containers I can see that the file I'm trying to load does exist in each of the 3 containers, in the local (to the container) path /opt/spark-data.
This is an example for the code I'm using:
xx_csv <- spark_read_csv(
sc,
name = "xx1_csv",
path = "file:///opt/spark-data/data-csv"
)
data-csv is a directory containing a single CSV file. I've also tried specifying the full path, including the file name.
When I'm calling the above code, I'm getting an exception:
Error: org.apache.spark.sql.AnalysisException: Path does not exist: file:/opt/spark-data/data-csv;
I've also tried with different numbers of / in the path argument, but to no avail.
The documentation for spark_read_csv says that
path: The path to the file. Needs to be accessible from the
cluster. Supports the ‘"hdfs://"’, ‘"s3a://"’ and ‘"file://"’
protocols.
My naive expectation is that if, when attaching to the container, I can see the file in the container file system, it means that it is "accessible from the cluster", so I don't understand why I'm getting the error. All the directories and files in the path are owned by rood and have read permissions by all.
What am I missing?

try without "file://" and with \\ if your are Windows user.

Related

Grakn Error; trying to load schema for "phone calls" example

I am trying to run the example grakn migration "phone_calls" (using python and JSON files).
Before reaching there, I need to load the schema, but I am having trouble with getting the schema loaded, as shown here: https://dev.grakn.ai/docs/examples/phone-calls-schema
System:
-Mac OS 10.15
-grakn-core 1.8.3
-python 3.7.3
The grakn server is started. I checked and the 48555 TCP port is open, so I don't think there is any firewall issue. The schema file is in the same folder (phone_calls) as where the json data files is, for the next step. I am using a virtual environment. The error is below:
(project1_env) (base) tiffanytoor1#MacBook-Pro-2 onco % grakn server start
Storage is already running
Grakn Core Server is already running
(project1_env) (base) tiffanytoor1#MacBook-Pro-2 onco % grakn console --keyspace phone_calls --file phone_calls/schema.gql
Unable to create connection to Grakn instance at localhost:48555
Cause: io.grpc.StatusRuntimeException
UNKNOWN: Could not reach any contact point, make sure you've provided valid addresses (showing first 1, use getErrors() for more: Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=5f59fd46): com.datastax.oss.driver.api.core.connection.ConnectionInitException: [JanusGraph Session|control|connecting...] init query OPTIONS: error writing ). Please check server logs for the stack trace.
I would appreciate any help! Thanks!
Nevermind -- I found the solution, in case any one else runs into a similar problem. The server configuration file needs to be edited: point the data directory to your project data files (here: the phone_calls data files) & change the server IP address to your own.

What is the filepath that a "Read CSV" operator needs to read a file from RapidMiner Server?

I have a RM Server running on a VM (Ubuntu) on top of my Win10 machine.
I have a process to read a .csv file and write its contents on a MySQL database on a MySQL Server which also runs on the same VM.
The problem is that the read file operator does not seem to be able to find the file.
Scenario1.
When I try as location-name in the read csv operator ../data/myFile.csv
and run the process on Server I am getting Failed to execute initialization process: Error executing process /apps/myApp/process/task_read_csv_to_db: The file 'java.io.FileNotFoundException: /root/../data/myFile.csv (No such file or directory)' does not exist.
Scenario2.
When I try as location-name in the read csv operator /apps/myApp/data/myFile.csv
and run the process on Server I am getting Failed to execute initialization process: Error executing process /apps/myApp/process/task_read_csv_to_db: The file 'java.io.FileNotFoundException: /apps/myApp/data/myFile.csv (No such file or directory)' does not exist.
What is the right filepath that I should give to the Read CSV operator?
Just to update with the answer. After David's suggestion, I resulted in storing the .csv file outside of the /rapidminer-server-home/data/repository since every remote repository seems to be depicted with an integer instead of its original name, making the use of the actual full path of the file not usable.
I would say, the issue is that depending on the location of the JobAgent that is executing your process, the relative path might be varying.
Is /apps/myApp/data/myFile.csv the correct path to the file? If not, I would suggest to use the absolute path to the file. Hope this helps.
Best,
David

Load h5 keras model file in R

I'm building a R package for binary classification and I'm using opencpu to host it. Currently I've saved the h5 file as .RData file(serialized), which is then loaded in the environment using the .onLoad() function in R. This enables the R script to use the environment variable to load keras model using keras::unserialized_model().
I've tried directly using keras::load_model_hdf5() in the code, but after building and deploying on opencpu, when I try to hit the prediction API, I get error
ioerror: unable to open file (unable to open file: name = '/home/modelfile_26feb.h5', errno = 13, error message = 'permission denied', flags = 0, o_flags = 0)
I have changed permission for the file(777) and even the groups but still getting the error.
I even tried putting the file in inst/extdata folder so that it gets in the package but still same error.
Can anyone help on this, or suggest some alternative to load the h5 model directly?
Which OS does OpenCPU run on? Why does it try to write in /home/, this is very unusual? The best solution is to adapt your code to write in getwd() or tempdir(). Even better is to store data in a local database or redis server and let R read it from there, so you don't need disk access at all.
If you run on Ubuntu Server, reading from /home/ is not permitted by default. If you want to allow this, you need to add apparmor rules, see section 3.5 of the server manual.
Some relevant topics from the opencpu mailing list:
write in home dir: https://groups.google.com/d/msg/opencpu/5vRvgSKY-qE/4xMzZCGJBAAJ
keras in opencpu: https://groups.google.com/d/msg/opencpu/HhRzFVVFdaA/n5Nu1sxyFgAJ
write tmp folder: https://groups.google.com/d/msg/opencpu/Y1tYhaQUzwU/ubSEd_CDCgAJ

What is the working directory for a shiny app?

I have set up a Shiny server on Linux at this path
/srv/shiny-server/myApp
This app calls data, which are stored (though a symbolic link) in
/srv/shiny-server/myApp/data/input.txt
and opened with the relative path
"data/input.txt"
When calling this app :
my_ip_adress:3838/myApp
It breaks, not finding the data. I guess it is a path issue, but the idea of a R package is self-containment, so this relative path should be fine.
I have tried with the full path:
"/srv/shiny-server/myApp/data/input.txt"
but get
Warning: Error in fread: File '/srv/shiny-server/myApp/data/input.txt' does not exist. Include one or more spaces to consider the input a system command.
Am I missing something?

Installing an MSP using Powershell works on the local machine, fails remotely. Why?

I need some Powershell advice.
I need to install an application's MSP update file on multiple Win08r2 servers. If I run these commands locally, within the target machine's PS window, it does exactly what I want it to:
$command = 'msiexec.exe /p "c:\test\My Application Update 01.msp" REBOOTPROMPT=S /qb!'
invoke-wmimethod -path win32_process -name create -argumentlist $command
The file being executed is located on the target machine
If I remotely connect to the machine, and execute the two commands, it opens two x64 msiexec.exe process, and one msiexec.exe *32 process, and just sits there.
If I restart the server, it doesn't show that the update was installed, so I don't think it's a timing thing.
I've tried creating and remotely executing a PS1 file with the two lines, but that seems to do the same thing.
If anyone has advice on getting my MSP update installed remotely, I'd be all ears.
I think I've included all the information I have, but if something is missing, please ask questions, and I'll fill in any blanks.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
My process for this is:
Read a CSV for server name and Administrator password
Create a credential with the password
Create a new session using the machine name and credential
Create a temporary folder to hold my update MSP file
Call a PS1 file that downloads the update file to the target server
>>> Creates a new System.Net.WebClient object
>>> Uses that web client object to download from the source to the location on the target server
Call another PS1 file that applies the patch that was just downloaded –>> This is where I’m having issues.
>>> Set the variable shown above
>>> Execute the file specified in the variable
Close the session to the target server
Move to the next server in the CSV…
If I open a PS window and manually set the variable, then execute it (as shown above in the two lines of code), it works fine. If I create a PS1 file on the target server, containing the same two lines of code, then right click > ‘Run With PowerShell’ it works as expected / desired. If I remotely execute my code in PowerGUI, it returns a block of text that looks like this, then just sits there. RDP’d into the server, the installer never launches. My understanding of the “Return Value” value is that “0″ means the command was successful.
PSComputerName : xx.xx.xx.xx
RunspaceId : bf6f4a39-2338-4996-b75b-bjf5ef01ecaa
PSShowComputerName : True
__GENUS : 2
__CLASS : __PARAMETERS
__SUPERCLASS :
__DYNASTY : __PARAMETERS
__RELPATH :
__PROPERTY_COUNT : 2
__DERIVATION : {}
__SERVER :
__NAMESPACE :
__PATH :
ProcessId : 4808
ReturnValue : 0
I even added a line of code between the variable and the execution that creates a text file on the desktop, just to verify I was getting into my ‘executeFile’ file, and that text file does get created. It seems that it’s just not remotely executing my MSP.
Thank you in advance for your assistance!
Catt11.
Here's the strategy I used to embed an msp into a powershell script. It works perfectly for me.
$file = "z:\software\AcrobatUpdate.msp"
$silentArgs = "/passive"
$additionalInstallArgs = ""
Write-Debug "Running msiexec.exe /update $file $silentArgs"
$msiArgs = "/update `"$file`""
$msiArgs = "$msiArgs $silentArgs $additionalInstallArgs"
Start-Process -FilePath msiexec -ArgumentList $msiArgs -Wait
You probably don't need to use the variables if you don't want to, you could hardcode the values. I have this set up as a function to which I pass those arguments, but if this is more of a one-shot deal, it might be easier to hard-code the values.
Hope that helps!
using Start-Process for MSP package is not a good practice because some update package lockdown powershell libs and so you must use WMI call

Resources