How can I run inputs from R through a matlab function? - r

I have a matlab code that I would like to run 3 inputs through that I have in R (delta_x, cutoff_1, cutoff_2).
First I would like to know how to save these in order to run them through the matlab code. delta_x is an r dataframe with around 1000 lines of data in a single column. Cutoff 1 and 2 are both just integers
The code is much longer than that shown below so it would take far too long to convert into R and I am on a tight schedule and not very familiar with matlab.
Essentially, what I would like to do is run my 3 inputs that I have in r through the matlab code that I have been provided with. Is anyone aware of how I can do this? I have read something about matlabr package?? but not been able to figure it out.
Matlab code looks like the following for reference:
function [start_end]=plume_finder(delta_x,cutoff_1,cutoff_2)
[pks,pksloc]=findpeaks(delta_x);% find the peaks
valley=find(islocalmin(delta_x)==1);% find the valleys
list_va=delta_x(valley);%find the values at the valley
%There is two possibliltes, the valley presents first or the peak present
%first. So we have to use 'shift' to make adjustment for this.
if valley(1)>pksloc(1)
shift=0;
else
shift=1;
end
upperlim=min(length(pks),length(valley));% Find where to stop
%pksloc(:,2:end), indicating the whether this peak point could meet the
%requirements(2:whether the peak value greater than the cut-off, 3:whether
%the peak value has significant difference from valley point on its left,
%4:whether the peak value has significant difference from valley point on
%its right--> 1 stands for yes, and 0 stands for no)
for i=1:upperlim
if pks(i)<cutoff_1
pksloc(i,2)=0;
else
pksloc(i,2)=1;
end
end
for i=2:upperlim
if (pks(i)-list_va(i-1+shift))<=cutoff_2
pksloc(i,3)=0;
else
pksloc(i,3)=1;
end
end

In MATLAB, write a matlab script that can load ascii .csv files, process data and save results to new ascii .csv files.
In R, save the variables to ascii .csv files.
In R, invoke matlab script in batch mode through system() and wait for the generation of new processed files.
In R, load the processed data file, and do whatever is waiting for.
P.S.1. csv files are pretty straightforward but lose precision, HDF5 or SQLite formats can be used instead.
P.S.2. The whole thing can be turned around that some R script is invoked within matlab through file system and system calls. Or call both R and MATLAB from bash or python or something else.
P.S.3 It's also possible to pass data through a local network socket connection, I will leave the details for now.

Related

Trying to automate an R script that always runs against one dataset and conditionally against another

Very new to R and trying to modify a script to help my end users.
Every week a group of files are produced and my modified script, reaches out to the network, makes the necessary changes and puts the files back, all nice and tidy. However, every quarter, there is a second set of files, that needs the EXACT same transformation completed. My thoughts were to check if the files exist on the network with a file.exists statement and then run through script and then continue with the normal weekly one, but my limited experience can only think of writing it this way (lots of stuff is a couple hundred lines)and I'm sure there's something I can do other than double the size of the program:
if file.exists("quarterly.txt"){
do lots of stuff}
else{
do lots of stuff}
Both starja and lemonlin were correct, my solution was to basically turn my program into a function and just create a program that calls the function with each dataset. I also skipped the 'else' portion of my if statement, which works perfectly (for me).

R misreading of time from xlsx dataTable

I have an issue very annoying.
I have some oxygen measurements saved in .xlsx table (created directly by the device software). Opened with excel, this is my part of my file.
In the first picture, we can notice that sometimes, the software skips a second (11:13:00 then 13:02).
in the second picture, just notice the continuity of time from 11:19:01 to 11:19:09.
I call my excel table in R with the package readxl with the code
oxy <- read_excel("./Metabolism/20180502 DAPH 20.xlsx" , 1)
And before any manipulation, when I check my table in R (Rstudio), I have that:
In the first case, R kept the time continuity by adding 11:13:01 and shift the next rows.
Then, later, reverse situation: the continuity of time was respected in excel, but R skips a second and again, shits the next rows.
At the end, there is the same number of rows. I guess it is a problem with the way R and excel round the time. But these little errors prevent me using the date to merge two tables, and the calculations afterwards are wrong.
May I do something to tell R to read the data exactly the same way Excel saved them?
Thank you very much!
Index both with a sequential integer counter each starting at the same point and use that for merging like with like. If you want the Excel version to be 'definitive' convert the index back to time with a lookup based on your Excel version.

R save as NetCDF file after simple calculation

I want to do something (apparently) simple, but didn't yet find the right way to do it:
I read a netcdf file (wind speed from the ERA5 reanalysis) on a grid.
From this, I use the wind speed to calculate a wind capacity factor (using a given power curve).
I then want to write a new netcdf file, with exactly the same structure as the input file, but just replacing the input wind speed by the new variable (wind capacity factor).
Is there a simple/fast way to do this, avoiding to redefine all the dims, vars ... with ncvar_def and ncdim_def ?
Thanks in advance for your replies!
Writing a netcdf file in R is not overly complicated, there is a nice example online here:
http://geog.uoregon.edu/GeogR/topics/netCDF-write-ncdf4.html
You could copy the dimensions from the input file.
However if your wind power curve is a simple analytical expression then you could perform this task in one line from the command line in bash/linux using climate data operators (cdo).
For example, if you have two variables 10u and 10v in the file (I don't recalled the reanalysis names exactly) then you could make a new variable WCF=SQRT(U2+V2) in the following way
cdo expr,'wcf=sqrt(10u**2+10v**2)' input.nc output.nc
See an example here:
https://code.mpimet.mpg.de/boards/53/topics/1622
So if your window power function is an analytical expression you can define it this way without using R at all or worrying about dimensions etc, the new file will have an variable wcf added. You should then probably use NCO to alter the metadata (units etc) to ensure they are appropriate.

High-scale signal processing in R

I have high-dimensional data, for brain signals, that I would like to explore using R.
Since I am a data scientist I really do not work with Matlab, but R and Python. Unfortunately, the team I am working with is using Matlab to record the signals. Therefore, I have several questions for those of you who are interested in data science.
The Matlab files, recorded data, are single objects with the following dimensions:
1000*32*6000
1000: denotes the sampling rate of the signal.
32: denotes the number of channels.
6000: denotes the time in seconds, so that is 1 hour and 40 minutes long.
The questions/challenges I am facing:
I converted the "mat" files I have into CSV files, so I can use them in R.
However, CSV files are 2 dimensional files with the dimensions: 1000*192000.
the CSV files are rather large, about 1.3 gigabytes. Is there a
better way to convert "mat" files into something compatible with R,
and smaller in size? I have tried "R.matlab" with readMat, but it is
not compatible with the 7th version of Matlab; so I tried to save as V6 version, but it says "Error: cannot allocate vector of size 5.7 Gb"
the time it takes to read the CSV file is rather long! It takes
about 9 minutes to load the data. That is using "fread" since the
base R function read.csv takes forever. Is there a better way to
read files faster?
Once I read the data into R, it is 1000*192000; while it is actually
1000*32*6000. Is there a way to have multidimensional object in R,
where accessing signals and time frames at a given time becomes
easier. like dataset[1007,2], which would be the time frame of the
1007 second and channel 2. The reason I want to access it this way
is to compare time frames easily and plot them against each other.
Any answer to any question would be appreciated.
This is a good reference for reading large CSV files: https://rpubs.com/msundar/large_data_analysis A key takeaway is to assign the datatype for each column that you are reading versus having the read function decide based on the content.

Appending text file for parallel simulation output

I am running R simulation using a multi-core system. The simulation result I am monitoring is a vector of length 900. My plan was to append this vector (row-wise) in text file (using write.table) once each simulation ends. My simulation run from 1:1000. While I was working on my laptop the result was fine because the work is sequential. When I work i cluster the simulations are divided and there might be a conflict on who to write first. The reason for my claim is I am even getting even impossible values for the first column of my text file (This column used to store the simulation index). If you need a sample code I can attach.
There is no way to write to a text file with parallel threads that respects order. Each thread has it's own buffer and has no indication when it is appropriate to write, because there is no cross communication. So what will happen is they'll all try to the same file at the same time, even in the middle of another thread's write which is why you're getting impossible values in your first column.
The solution is to write to separate files for each thread, or return the value as the output of the multithreaded apply loop. Then combine the results at the end sequentially.

Resources