Loading .RData file into Data Science Experience - r

I am trying to load a .RData file into my R Notebook in DSX. I have followed the instructions in this notebook (https://apsportal.ibm.com/exchange/public/entry/view/90a34943032a7fde0ced0530d976ca82) but am still unable to load my data. So far, I have been successful in the following steps:
I have loaded my dataset into object storage.
I inserted my credentials using the Insert to code -> Insert Credentials button. This seemed to work as expected.
In the next cell, I chose the Insert to code -> Insert textConnection object option. This seemed to work as expected also.
The output of step # 3 was as follows:
Your data file was loaded into a textConnection object and you can process the data with your package of choice.
data.1 <- getObjectStorageFileWithCredentials_xxxxxxxxxx("projectname", "file.RData")
After this, since my file is a .RData file, I typed the following command:
data <- load("file.RDA")
When I ran this cell, I got the following output:
Warning message in readChar(con, 5L, useBytes = TRUE):
“cannot open compressed file 'file.RDA', probable reason 'No such file or directory'”
Error in readChar(con, 5L, useBytes = TRUE): cannot open the connection
Traceback:
load("file.RDA")
readChar(con, 5L, useBytes = TRUE)
When I type in the following command to print the dataset:
data
I get the following output:
X.html..h1.Forbidden..h1..p.Access.was.denied.to.this.resource...p...html.
Please can someone help?
Thanks,
Venky

Here is a workaround given that load can't read from a response object since to read objects from Object storage, only way is the REST api.
I tried to use rawConnection instead of textConnection but it seems to be not helping.
So instead of passing the read object from OS directly to load or readRDS function.You can write it to GPFS of spark service attached and read it from there same as reading from local.
Change this lines from generated code:-
rawdata <- content(httr::GET(url = access_url, add_headers ("Content-Type" = "application/json", "X-Auth-Token" = x_subject_token)), as="raw")
rawdata
Basically instead of returning text , return raw object and then write that as binary object to local GPFS.
data.3 <- getObjectStorageFileWithCredentials_216c032f3f574763ae975c6a83a0d523("testObjectStorage", "sample.rdata")
writeBin(data.3,"sample.rdata")
Now read it back using readRDS or load.
load("sample.rdata")
To see loaded dataframe.
ls()
I hope it helps.
Thanks,
Charles.

Related

Difficulty opening a package data file of unknown type

I am trying to load the state map from the maps package into an R object. I am hoping it is a SpatialPolygonsDataFrame or something I can turn into one after I have inspected it. However I am failing at the first step – getting it into an R object. I do not know the file type.
I first tried to assign the map() output to an R object directly:
st_m <- maps::map(database = "state")
draws the map, but str(st_m) appears to do nothing, unless it is redrawing the same map.
Then I tried loading it as a dataset: st_m <- data("stateMapEnv", package="maps") but this just returns a string:
> str(stateMapEnv)
chr "R_MAP_DATA_DIR"
I opened the maps directory win-library/3.4/maps/mapdata/ and found what I think is the map file, “state.L”.
I tried reading it with scan and got an error message I do not understand:
scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
scan() expected 'a real', got '#'
I then opened the file with Notepad++. It appears to be a binary or compressed file.
So I thought it might be an R data file with an unusual extension. But my attempt to load it returned a “bad magic number” error:
st_m <- load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
bad restore file magic number (file may be corrupted) -- no data loaded
Observing that these responses have progressed from the unhelpful through the incomprehensible to the occult, I thought it best to seek assistance from the wizards of stackoverflow.
This should be able to export the 'state' or any other maps dataset for you:
library(ggplot2)
state_dataset <- map_data("state")

send2cy doesn't work in Rstudio ~ Cyrest, Cyotoscape

When I run "send2cy" function in R studio, I got error.
# Basic setup
library(igraph)
library(RJSONIO)
library(httr)
dir <- "/currentdir/"
setwd(dir)
port.number = 1234
base.url = paste("http://localhost:", toString(port.number), "/v1", sep="")
print(base.url)
# Load list of edges as Data Frame
network.df <- read.table("./data/eco_EM+TCA.txt")
# Convert it into igraph object
network <- graph.data.frame(network.df,directed=T)
# Remove duplicate edges & loops
g.tca <- simplify(network, remove.multiple=T, remove.loops=T)
# Name it
g.tca$name = "Ecoli TCA Cycle"
# This function will be published as a part of utility package, but not ready yet.
source('./utility/cytoscape_util.R')
# Convert it into Cytosccape.js JSON
cygraph <- toCytoscape(g.tca)
send2cy(cygraph, 'default%20black', 'circular')
Error in file(con, "r") : cannot open the connection
Called from: file(con, "r")
But I didn't find error when I use "send2cy" function from terminal R (Run R from terminal just calling by "R").
Any advice is welcome.
I tested your script with local copies of the network data and utility script, and with updated file paths. The script ran fine for me in R Studio.
Given the error message you are seeing "Error in file..." I suspect the issue is with your local files and file paths... somehow in an R Studio-specific way?
FYI: an updated, consolidated and update set of R scripts for Cytoscape are available here: https://github.com/cytoscape/cytoscape-automation/tree/master/for-scripters/R. I don't think anything has significantly changed, but perhaps trying in a new context will resolve the issue you are facing.

Error while trying to write a csv file

I'm trying to write a csv file in RStudio on a MacOS, but I'm getting this error:
Error: Failed to open '/Users/some.one/Documents/data'
and this is my code:
write_csv(dat, path_dat, na = "NA", append = FALSE, col_names = TRUE)
can someone maybe tell me what might cause such an error?
Edit:
dat ........... is a data frame
path_dat ...... is the path in the error : /Users/some.one/Documents/data
I would check that you have access to this directory. You can do via getwd()
If you don't, you can change the directory you're writing to via setwd(path_to_directory) and then run your write.csv function.

Having saved a dataframe to hdfs I have an error when I try to unserialize it when reading it back in using rhdfs

I have written a dataframe into hdfs using the rhdfs library and when I try to read it back in I have errors.
The code to write the dataframe is as follows,
df.file <- hdfs.file("/mydir/df.Rdata", "w")
hdfs.write(df, df.file)
hdfs.close(df.file)
And to read it back in I use
df.file <- hdfs.file("/mydir/df.Rdata", "r")
m <- hdfs.read(df.file)
df <- unserialize(m)
hdfs.close(df.file)
But I get an error at the unserialize stage,
Error in unserialize(m) : read error
Does anyone have any idea what the cause of this error is and what I can do to prevent it. Any help would be much appreciated.
This happens when the object you unserialize is bigger than 65536 bytes
If you look at the RStudio Environment, you will see that df object is raw[1:65536] and you missed a part of the file
you should read it by pieces like this code:
http://chingchuan-chen.github.io/posts/2015/04/08/installations-of-rhdfs-rmr2-plyrmr-and-hbase

Trouble providing data sets with package

I have two data sets full and raw that I placed in the data/ directory of my package. However, when I load my package, they are not available. I tried looking for them using the data function, but did not see them.
data(raw, package = "pkg")
Warning message:
In data(raw, package = "pkg") : data set 'raw' not found
Do I have to export them somehow?
I noticed when I tried to open the file using load from another computer, it read in as a string. Maybe I'm not writing the data frame properly? I used:
save(df.full, file = "full.RData")
save(df.raw, file = "raw.RData")

Resources