How to use a file modified by a R chunk in a Python one - r

I am working in Rmarkdown into primarily R chunks, which I used to modify data frames. Now that they are ready, a colleague gave me Python codes to process some of the data. But when transitioning from a R chunk to a Python one, the environment changes and I do not know how to use the previous files directly.
reticulate::repl_python()
biodata_file = women_personal_data
NameError: name 'women_personal_data' is not defined
NameError: name 'women_personal_data' is not defined
Ideally, I would like not to have to save the files on my computer between R and Python, and then back at R again, to avoid accumulating files that are not completely clean yet (because I figured it could be a solution).
I tried this solution but seems to not work with Data Frames
Thanks !

biodata_file = r.women_personal_data
The '.r' makes it take it from R, because the variable was called
r women_personal_data
TIP = to come back to R, the variable is now called py$women_personal_data

Related

How to keep style format unchanged after writing data using openxlsx in R

I am using openxlsx in order to write the outputs of my data.
I have used the following code to read my data using readxl.
df1=read_excel("C:/my_data.xlsx",skip=2);
Now I want to write the output and keep the original Excel file using any possible package. I have used the following codes, but it does not keep the original Excel file. Can we do it it in R packages?
write.xlsx(df1, 'C:/mydata.xlsx',skip=2)
Given your code, you should nhave two different data files in your working directory:
"my_data.xlsx" (the one that you loaded), and "mydata.xlsx" (the one that you created through R). R shouldn't overwrite your files if you give them different names.
If there's only one file, are you sure you didn't use the same name for both files? If so, then everything should work fine if you give the files different names (e.g. "my_file1.xlsx" and "my_file2.xlsx")!
Also, in general, it's a good idea to give data files an informative name so that you don't accidentally delete/overwrite files that you need. For example, if the original excel data is you raw data, consider naming it "data_raw.xlsx", and make sure that you only read it, and whenever you make some changes to it, save it under a different name (e.g. "data_processed1.xlsx").
You can also save data files in the native R format .rds using the save_rds() function, this is especially helpful if you want to keep special attributes of variables such as factors, etc...
Hope this helps!

"filename.rdata" file Exploring and Converting to CSV

I'm no R-programmer (because of the problem I started learning it), I'm using Python, In a forcasting task I got a dataset signalList.rdata of a pheomenen called partial discharge.
I tried some commands to load, open and view, Hardly got a glimps
my_data <- get(load('C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/signalList.Rdata'))
but, since i lack deep knowledge about R, I wanted to convert it into a csv file, or any type that I can deal with in python.
or, explore it and copy-paste manually.
so, i'm asking for any solution whether using R or Python or any tool to get what's in the .rdata file.
Have you managed to load the data successfully into your working environment?
If so, write.csv is the function you are looking for.
If not,
setwd("C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/")
signalList <- load("signalList.Rdata")
write.csv(signalList, "signalList.csv")
should do the trick.
If you would like to remove signalList from your working directory,
rm(signalList)
will accomplish this.
Note: changing your working directory isn't necessary, it just makes it easier to read in a comment I feel. You may also specify another path for saving your csv to within the second argument of write.csv.

Command to use with easy way the insert of R dataframe

I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())
You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.

Using R to write a .mat file not giving the right output?

I had a .csv file that I wanted to read into Octave (originally tried to use csvread). It was taking too long, so I tried to use R to workaround: How to read large matrix from a csv efficiently in Octave
This is what I did in R:
forest_test=read.csv('forest_test.csv')
library(R.matlab)
writeMat("forest_test.mat", forest_test_data=forest_test)
and then I went back to Octave and did this:
forest_test = load('forest_test.mat')
This is not giving me a matrix, but a struct. What am I doing wrong?
To answer your exact question, you are using the load function wrong. You must not assign it's output to a variable if you just want the variables on the file to be inserted in the workspace. From Octave's load help text:
If invoked with a single output argument, Octave returns data
instead of inserting variables in the symbol table. If the data
file contains only numbers (TAB- or space-delimited columns), a
matrix of values is returned. Otherwise, 'load' returns a
structure with members corresponding to the names of the variables
in the file.
With examples, following our case:
## inserts all variables in the file in the workspace
load ("forest_test.mat");
## each variable in the file becomes a field in the forest_test struct
forest_test = load ("forest_test.mat");
But still, the link you posted about Octave being slow with CSV files makes referece to Octave 3.2.4 which is a quite old version. Have you confirmed this is still the case in a recent version (last release was 3.8.2).
There is a function designed to convert dataframes to matrices:
?data.matrix
forest_test=data.matrix( read.csv('forest_test.csv') )
library(R.matlab)
writeMat("forest_test.mat", forest_test_data=forest_test)

Open large files with R

I want to process a file (1.9GB) that contains 100.000.000 datasets in R.
Actually I only want to have every 1000th dataset.
Each dataset contains 3 Columns, separated by a tab.
I tried: data <- read.delim("file.txt"), but R Was not able to manage all datasets at once.
Can I tell R directly to load only every 1000th dataset from the file?
After reading the file I want to bin the data of column 2.
Is it possible to directly bin the number written in column 2?
Is it possible the read the file line by line, without loading the whole file into the memory?
Thanks for your help.
Sven
You should pre-process the file using another tool before reading into R.
To write every 1000th line to a new file, you can use sed, like this:
sed -n '0~1000p' infile > outfile
Then read the new file into R:
datasets <- read.table("outfile", sep = "\t", header = F)
You may want to look at the manual devoted to R Data Import/Export.
Naive approaches always load all the data. You don't want that. You may want another script which reads line-by-line (written in awk, perl, python, C, ...) and emits only every N-th line. You can then read the output from that program directly in R via a pipe -- see the help on Connections.
In general, very large memory setups require some understanding of R. Be patient, you will get this to work but once again, a naive approach requires lots of RAM and a 64-bit operating system.
Maybe package colbycol could be usefull to you.

Resources