How to run R Code within Python interface? - r

I am trying to run below code where I want to read csv file and then write "sas7bdat". I have tried below code.
we already have prerequisite library installed on the system for R.
from rpy2 import robjects
robjects.r('''
library(haven)
data <- read_csv("filename.csv")
write_sas(data, "filename.sas7bdat")
''')
After running above code, there are no output get generated by this code and even I am not getting any error.
Expected output: trying to read .csv file and then that data i want to export in .sas7bdat format. (In Standard python 3.9.2 Editor)
python do not have such functionality/library hence I am trying this way to export data in .sas7bdat format.
Plz Suggest some change in above code or any other way in python through which I can create/export .sas7bdat format in python.
Thanks.

I had experience using R in Python Jupyter Notebooks, it is a bit complicated at beginning, but it did work. Here I just pasted my personal notes, hope these help:
# Major steps in installing "rpy2":
# Step 1: install R on Jupyter Notebook: conda install -c r r-essentials
# Step 2: install the "rpy2" Python package: pip install rpy2 (you may have to check the version)
# Step 3: create the environment variables: R_HOME, R_USER and R_LIBS_USER
# you can modify these environment variables in the system settings on your windows PC or use codes to set them every time)
# load the rpy2 module after installation
# Then you will be able to enable R cells within the Python Jupyter Notebook
# run this line in your Jupyter Notebook
%load_ext rpy2.ipython
My work was to do ggplot2 in Python, so I did:
# now use R to access this dataframe and plot it using ggplot2
# tell Jupyter Notebook that you are going to use R in this cell, and for the "test_data" generated using the Python
%%R -i test_data
library(ggplot2)
plot <- ggplot(test_data) +
geom_point(aes(x,y),size = 20)
plot
ggsave('test.png')

Please before you run the code make sure that haven and reader are installed in your R kernel.
from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage
string = """
write_sas <- function(file, col_names = TRUE, write_to){
data <- readr::read_csv(file, col_names = col_names)
haven::write_sas(data, path = write_to)
print(paste("Data is written to ", write_to))
}
"""
rwrap = SignatureTranslatedAnonymousPackage(string, "rwrap")
rwrap.write_sas( file = "https://robjhyndman.com/data/ausretail.csv",
col_names = False,
write_to = "~/Downloads/filename.sas7bdat")
You can use any of the R function arguments. same as I used col_names

Related

How to export sf object to GDB using RPyGeo in R (Windows)?

I have a bunch of sf objects I'd like to export to GDB from R. I'm running R 4.0.2 on Windows 10. In this case the sf objects are all vector point data. The main reasons to export to GDB are to keep longer field names (the shapefile truncation is very annoying), and because GDBs are more desirable storage locations for our workflows.
Yes, I know about the ArcGisBinding package. I've got it to work in a test script but it's pretty unstable - often crashing and requiring a restart of R. This is a problem, because the sf objects I'd like to export come after an already long Rmd that reads in, formats and cleans the data. So it's not a simple manner of re-running the script until arc.write doesn't break. I could break up the script, but then I'd still have to read in a bunch of shapefiles. One option I haven't yet explored is using reticulate to call a python script instead of trying to do everything in R, but we're trying to do our analysis all in one place, if possible.
I'm pretty sure I've managed to set up RPyGeo appropriately, first setting my python path using the reticulate package. I'm doing it this way because IT restrictions means I can't edit PATH variables on my machine.
#package calls
library(sf)
library(spData)
library(reticulate)
#set python version in reticulate
py_path <- "C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe"
reticulate::use_python(python = py_path, required = TRUE)
#call RPyGeo
library(RPyGeo) # for potential point export
#output gdb
out.gdb <- "C:/LOCAL_PROJECTS/Output/Output.gdb"
#RPyGeo Parameters
# Note that, in order to use RPyGeo you need a working ArcMap or ArcGIS Pro installation on your computer.
# python path - note that this will change depending on which version of Arc one is using
# py_path <- "C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe"
arcpy <- rpygeo_build_env(workspace = out.gdb,
overwrite = TRUE,
extensions = c("Spatial","DataInteroperability"),
path = py_path)
I've tried a bunch of different tools to export an sf object, here using dummy data also used in the RPyGeo vignette
data(nz, package = "spData")
arcpy$Copy_management(in_data = nz,out_data = "nz_test")
arcpy$Copy_management(in_data = nz,out_data = file.path(out.gdb,"nz"))
arcpy$FeatureClassToGeodatabase_conversion(Input_Features = nz,Output_Geodatabase = out.gdb)
arcpy$FeatureClassToFeatureClass_conversion(in_features = nz,out_path = out.gdb,out_name = "nz")
arcpy$QuickExport_interop(Input = nz,Output = file.path(out.gdb,"nz"))
arcpy$CopyFeatures_management(in_features = nz,out_feature_class = file.path(out.gdb,"nz"))
arcpy$CopyFeatures_management(in_features = nz,out_feature_class = "nz")
Each time I get an error, for example:
Error in py_call_impl(callable, dots$args, dots$keywords) :
RuntimeError: Object: Error in executing tool
Detailed traceback:
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py", line 3232, in CopyFeatures
raise e
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py", line 3229, in CopyFeatures
retval = convertArcObjectToPythonObject(gp.CopyFeatures_management(*gp_fixargs((in_features, out_feature_class, config_keyword, spatial_grid_1, spatial_grid_2, spatial_grid_3), True)))
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\geoprocessing\_base.py", line 511, in <lambda>
return lambda *args: val(*gp_fixargs(args, True))
I'm not an expert in ArcPy by any means. Nor am I an expert in tracing errors inside packages. Am I making a simple syntax mistake? Is there something else that I'm missing? Any help would be much appreciated!

How do I read an .tar.xz file?

I downloaded the Gwern Branwen dataset here: https://www.gwern.net/DNM-archives
I'm trying to read the dataset in R and I'm having a lot of trouble. I tried to open one of the files in the dataset called "1776.tar.xz" and I think I "unzipped" it with untar() but I'm not getting anything past that.
untar("C:/User/user/Downloads/dnmarchives/1776.tar.xz",
files = NULL,
list = FALSE, exdir = ".",
compressed = "xz", extras = NULL, verbose = FALSE, restore_times = TRUE,
tar = Sys.getenv("TAR"))
Edit: Thanks for all of the comments so far! The code is in base R. I have multiple datasets that I downloaded from Gwern's website. I'm just trying to open one to explore.
Base R includes function untar. On my Ubuntu 19.10 running R 3.6.2, default installation, the following was enough.
fls <- list.files(pattern = "\\.xz")
untar(fls[1], verbose = TRUE)
Note.
In the question, "dataset" is singular but there were several datasets (plural) on that website. To download the files I used
args <- "--verbose rsync://78.46.86.149:873/dnmarchives/grams.tar.xz rsync://78.46.86.149:873/dnmarchives/grams-20150714-20160417.tar.xz ./"
cmd <- "rsync"
od <- getwd()
setwd('~/tmp')
system2(cmd, args)
Thanks everyone! Not sure what was wrong with r for a bit but I reinstalled. I ended up unzipping manually and loading up the files.
I find that base R's untar() is a bit unreliable and/or slow on Windows.
What worked very well for me (on all platforms) was
library(archive)
archive_extract("C:/User/user/Downloads/dnmarchives/1776.tar.xz",
dir="C:/User/user/Downloads/dnmarchives")
It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.
And one can also use it directly read in a csv file within an archive without having to UNZIP it first using
read_csv(archive_read("C:/User/user/Downloads/dnmarchives/1776.tar.xz", file = 1), col_types = cols())
On Debian or Ubuntu, first install the package xz-utils
$ sudo apt-get install xz-utils
Extract a .tar.xz the same way you would extract any tar.__ file.
$ tar -xf file.tar.xz
Done.

Writing a partitioned parquet file with SparkR

I've got two scripts, one in R and a short second one in pyspark that uses the output. I'm trying to copy that functionality into the first script for simplicity.
The second script is very simple -- read a bunch of csv files and emit them as partitioned parquet:
spark.read.csv(path_to_csv, header = True) \
.repartition(partition_column).write \
.partitionBy(partition_column).mode('overwrite') \
.parquet(path_to_parquet)
This should be equally simple in R but I can't figure out how to match the partitionBy functionality in SparkR. I've got this so far:
library(SparkR); library(magrittr)
read.df(path_to_csv, 'csv', header = TRUE) %>%
repartition(col = .$partition_column) %>%
write.df(path_to_parquet, 'parquet', mode = 'overwrite')
This successfully writes one parquet file for each value of partition_column. The issue is the emitted files have the wrong directory structure; whereas Python produces something like
/path/to/parquet/
partition_column=key1/
file.parquet.gz
partition_column=key2/
file.parquet.gz
...
R produces only
/path/to/parquet/
file_for_key1.parquet.gz
file_for_key2.parquet.gz
...
Am I missing something? the partitionBy function in SparkR appears only to refer to the context of window functions and I don't see anything else in the manual that could be related. Perhaps there's a way to pass something in ... but I don't see any examples in the documentation or from a search online.
Partitioning of the output is not supported in Spark <= 2.x.
However, it will be supported in SparR >= 3.0.0 (SPARK-21291 - R partitionBy API), with the following syntax:
write.df(
df, path_to_csv, "parquet", mode = "overwrite",
partitionBy = "partition_column"
)
Since corresponding PR modifies only R files, you should be able to patch any SparkR 2.x distribution, if upgrading to development version is not an option:
git clone https://github.com/apache/spark.git
git checkout v2.4.3 # Or whatever branch you use
# https://github.com/apache/spark/commit/cb77a6689137916e64bc5692b0c942e86ca1a0ea
git cherry-pick cb77a6689137916e64bc5692b0c942e86ca1a0ea
R -e "devtools::install('R/pkg')"
In the client mode this should be required only on the driver node.
but these are not fatal, and shouldn't cause any serious issues.

No graph output when using googleVIS in jupyter

Using this works in R console:
plot(G)
but when typed into a cell in jupyter I get:
starting httpd help server ... done
and no graph.
So here is what I did.
Into Anaconda 2.7.11, I installed R essentials
conda install -c r r-essentials
started up jupyter
notebook jupyter
installed needed reqs, XML and googleVIS, by typing this into a cell
options(repos=structure(c(CRAN="https://cloud.r-project.org/")))
install.packages('googleVis')
install.packages('XML')
typed this code into a cell
suppressPackageStartupMessages(library(googleVis))
library(googleVis)
library(XML)
url <- "http://en.wikipedia.org/wiki/List_of_countries_by_credit_rating"
x <- readHTMLTable(readLines(url), which=3)
levels(x$Rating) <- substring(levels(x$Rating), 4,
nchar(levels(x$Rating)))
x$Ranking <- x$Rating
levels(x$Ranking) <- nlevels(x$Rating):1
x$Ranking <- as.character(x$Ranking)
x$Rating <- paste(x$Country, x$Rating, sep=": ")
G <- gvisGeoChart(x, "Country", "Ranking", hovervar="Rating",
options=list(gvis.editor="S&P",
projection="kavrayskiy-vii",
colorAxis="{colors:['#91BFDB', '#FC8D59']}"))
then
plot(G)
This code works fine when typed directly into an R console and makes a nice map. But something is causing jupyter to choke on starting a server. I guess since jupyter itself is web page running in a server there is some sort of problem with a web page starting a server?
I have the same issue. According to this demonstration, all you need in an R console is to additionally add this line after loading the library:
library(googleVis)
op <- options(gvis.plot.tag='chart')
The introduction of tag can also be found here in googleVis's documentation. After trying it in Jupyter all I got is a block of .json data and it cannot compile into an interactive graph.
Hope this help in anyway. I will probably try something like plotly, which does work on jupyter.

How to get R htmlWidgets to work on Windows 10 machine?

I am trying to work my way through an R htmlWidgets tutorial and I am occuring what seems to be a bug related to Windows 10.
The code below works on my Windows 7 machine but not my Windows 10 machine:
# libpath
.libPaths("C:/R/R-3.2.4revised/library")
library(htmlwidgets)
library(devtools)
# need to be something in the package
placeholder <- function(x, y) x+y
# generate package
package.skeleton(name = "mywidget", list = c("placeholder"),
environment = .GlobalEnv,
path = ".", force = FALSE,
code_files = character())
# package dir
path <- "C:/Users/kaspe/Desktop/R/practise/htmlWidgets/mywidget"
#devtools::create("mywidget") # create package using devtools
setwd(path) # navigate to package dir
htmlwidgets::scaffoldWidget("mywidget") # create widget scaffolding
devtools::install()
When I am running the command:
> htmlwidgets::scaffoldWidget("mywidget") # create widget scaffolding
It produces the following error:
Created boilerplate for widget constructor R/mywidget.R
Error in editor(file = file, title = title) :
argument "name" is missing, with no default
Same base R and R-studio on both machines.
Does anyone have a clue about what might be wrong here?
Best
Kasper
What about htmlwidgets::scaffoldWidget("mywidget", edit = FALSE)? I don't know anything about windows, but perhaps some analogue of the $EDITOR system variable hasn't been properly set.

Resources