Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I know that there is a Python package that imports RData file.
But I was wondering if that is the best option for me.
I have Dataframes in R that I want to use in Python.
I was wondering if I should save this as json or csv and then read with pandas in Python, or I should just save it as RData and use the rpy2 package.
All I need is just turn these R dataframes into Python data frame, so I can manipulate and combine with other results I calculated in Python...
You can use feather.
It's a data format for data frames (created by #Wes McKinney and #hadley) to make data sharing between R and python easy (and some other languages too).
In R:
library(feather)
file_path <- "foo.feather"
data_frame <- read_feather(file_path)
write_feather(data_frame, file_path)
In python:
import feather
file_path = 'foo.feather'
data_frame = feather.read_dataframe(file_path)
feather.write_dataframe(data_frame, file_path)
PS.: Podcast on feather where authors discuss it's application, pros/cons and future.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 days ago.
This post was edited and submitted for review 3 days ago.
Improve this question
Let's say you've just completed writing a series of custom functions in a RMarkdown book to analyze your dataset from reading, tidying, analysis, visualization and export. You now want to deploy these functions on
a folder full of csv datasets sequentially. The functions can't be used as standalones as it requires variables/objects that are outputs from the first function. Essentially, they need to be run in a linear order.
What is the most efficient method of the two below for combining these functions together?
I imagine there's two approaches:
Should you create individual R script files for each function and source these into another R script file to run each function as standalone lines of code one after another e.g.,
x<- read_csv(data_sets)
clean_output <- func1(x)
results_output <- func2(clean_output)
table_plots_output <- func3(results_output)
export_csv <- func4(table_plots_output)
OR
Should you write a sort of master function that contains all the functions you've created previously to run all your processes/functions (cleaning, analysis, visualization and export of results) in a single line of code?
x<- read_csv(data_sets)
export_csv <- master_funct(x) {
clean_output <- func1(x)
results_output <- func2(clean_output)
table_plots_output <- func3(results_output)
func4(table_plots_output)
}
I try to follow Tidyverse approaches, so if there is a Tidyverse approach to this task that would be great.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
As I'm dealing with a huge dataset I had to split my data into different buckets. Thus, I want to save some interim results in a csv to recall it later. However, my datafile contains some columns with lists, which according to R can not be exported (see snapshot). Do you guys know a simple way for a R newbie to make this work?
Thank you so much!
I guess the best way to solve your problem is switching to a more apropriate file format. I recomend using write_rds() from the readr package, which creates .rds files. The files you create with readr::write_rds('your_file_path') can be read in with readr::read_rds('your_file_path').
The base R functions are saveRDS() and readRDS() and the functions mentioned earlier form the readr are just wrappers with some convience features.
Just right click, then choose new csv to the folder where you want to save your work. Then set the separator of the csv to a comma.
Input all data in column form. You can later make it a matrix in your R program.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have 18 years TRMM daily rainfall data (6573 .nc4 files). I need to combine all those .nc4 files into one and to organize them as a time series data for any specific location. How can I get rid of this?
I have tried nccopy, cdo, free netCDF extractor is not support these operation. I am a new researcher.
I would suggest to do this with python and xarray. It is very easy to build up a script doing this:
import xarray
from datetime import datetime
time_delta = datetime(2018, 1, 1)-datetime(2000, 1, 1)
list_of_file_names = [f"{day}.nc" for day in range(time_delta.days)]
all_data = xarray.open_mfdataset(list_of_file_names)
An alternative would be using cdo. You can find an example here example
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have the next link
[1] https://drive.google.com/open?id=0ByCmoyvCype7ODBMQjFTSlNtTzQ
This is a pdf file. The author of a paper gave the list of mutation in this format.
I need to annotate the mutation of this file.
I need a txt or TVS or VCF file to be reading by annovar.
Can you help me to convert this using R or other software in ubuntu?
In principle this is a job for tabulizer but I couldn't get it to work in this instance; I suspect the single table over so many pages confused it.
You can read it in to R as text with the pdftools package easily enough
library(pdftools)
txt <- pdf_text("selection.pdf")
Now txt is an R list, with each element of the list a character string for a single page in the original document. You might be able to do something fancy with regular expressions to convert this to more meaningful data.
However, it makes more sense to ask the original author for their data in an appropriate format. Publishing a 561 page PDF of tabular data is just nuts.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have an R dataset (an .Rdata file) that I need to convert to either SAS (.sas7bdat or .xpt) or SPSS (.sav or .por). How can I import this dataset into SAS or SPSS?
If you want to use this in SPSS, consider using the STATS_GETR extension command. It can read R workspace or data files and map appropriate elements directly to an SPSS dataset. This extension command is available from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) website or, for Statistics 22, it can be installed via the Utilities menu.