I have a very large simulink input file (*.csv) that is too big to the handled in a single node...
I am wondering if it is possible to not read the whole file once at the beginning of the simulation, but instead stream the data in real time as needed by the simulation.
My first thought was to implement a custom script in JAVA or C# (sender) that reads line by line the input csv file and STREAM the data to simulink via TCP . Simulink would receive the data using a TCP block receiver.
My questions are two:
Is my approach feasible?
Given the problem stated, what would
be your solution?
I suspect it would be easier to the run simulation using sequential chunks of the data, saving the model state at the end of each chunk, and starting the simulation from the state saved at the end of the previous chunk. The doc describing how to do this is Save and Restore Simulation State as SimState.
You might try writing an S-Function in C that opens your file and streams your data line by line. The easiest way to do this would be using the S-Function Builder block. You would nonetheless need to parse your file in C. (which, in the case of a CSV file, shouldn't be hard)
Related
I am using the cloudy r aws.s3 code to pull down an RDS file from s3. I have my own custom R runtime in a lambda. aws.s3 has a handy method called s3readRDS("s3://pathtoyourfile"). This works well, but has a limitation in that it saves the RDS file to disk, then must read it back in using readRDS. This is fine for smaller files, but for larger files, there is a no-go as we have limited disk storage.
Right now, I'm kind of stuck with these largish data files and the ability to pull them out into a database is just not feasible at the moment due to the target group, so I'm trying to minimize our cost and maximize throughput and this is the last nail in that coffin.
According to the documentation:
"Some users may find the raw vector response format of \code{get_object} unfamiliar. The object will also carry attributes, including \dQuote{content-type}, which may be useful for deciding how to subsequently process the vector. Two common strategies are as follows. For text content types, running \code{\link[base]{charToRaw}} may be the most useful first step to make the response human-readable. Alternatively, converting the raw vector into a connection using \code{\link[base]{rawConnection}} may also be useful, as that can often then be passed to parsing functions just like a file connection would be. "
Based on this(and an example using load()) the below code looks like it should work but it does not.
foo <- readRDS(rawConnection(get_object("s3://whatever/foo.rds")))
Error in readRDS(rawConnection(get_object("s3://whatever/foo.rds", :
unknown input format
I can't seem the datastream right for readRDS or unserialize to make sense of it. I know the file is correct as using the save to disk/load from disk works fine. But I want to know how to make "foo" into an unserialized object without the save/load.
My application needs to record video interviews with the ability to pause and resume, and have these multiple segments captured to the file.
I'm using directshow.net to capture camera stream to a preview window AND an avi file, and it works, except that whenever I start recording a new segment, I overwrite the avi file instead of appending. The relevant code is:
captureGraphBuilder.SetOutputFileName( ref mediaSubType, Filename, out muxFilter, out fileWriterFilter )
How can I create a capture graph so that the capture is appended to a file instead of overwriting it?
Most media files/formats, and AVI specifically, do not suppose or allow appending. When you record, you populate the media file AND then you finalize it on completion. You typically don't have the option to "unfinalize" and resume recording.
The symptom of overwriting you are seeing is a side effect of writing filter implementation. There is no append vs overwrite mode you can easily switch to.
Your options basically are the following (in the order of less-to-more development):
Record new media file each time, then run an external tool (like FFmpeg) which is capable to concatenate media and produce new continuous file out of segments.
Implement a DirectShow filter inserted into the pipeline (esp. in two instances, for video and for audio) which is capable to implement pause/resume behavior. Once you pause the filter would discard new media data, and once you resume it starts again passing them respectively modifying time stamps to mimic continuous stream. The capture graph will be in running state through all segments and pauses.
Implement a custom multiplexer and/or writer filter which is capable to read existing file and append new media so that the file itself is once again finalized on completion with old and new segments, continuous.
Item #3 above is technically possible to implement, but I don't think such implementation at all exists: workarounds are always easier to do. #2 is a sort of supposed way to address the mentioned task, but since you are doing C# development with DirectShow.NET, I anticipate that it is going to be a bit difficult to address the challenge from this angle. #1 is relatively easy to do and the cost involved is an external tool to use.
I am using Lidgren networking library to create a real time multiplayer game.
What I am trying to do is, save all incoming packages (including all bytes) to a peer in a binary file. Later when I need to debug some weird behavior of networking, I can load this file and have it load all (or rebuild) the packages that it saved, sequentially. This way, I can find how the weird behavior occurred exactly.
My question is, how do I recreate this package when I load it from the file?
It is a NetIncomingMessage that I need to recreate, I assume, and so far I thought of either creating it anew, or sending an NetOutgoingMessage to self, so it hopefully has the same effect I want to achieve, if the first approach fails.
The way I solved this is by creating an interface (wrapper object) of the NetIncomingMessage which contains a data byte array among other data members, and than have a thread to fill a list of these objects based on the saved incoming time, which is requested and removed (dequeued) from another thread.
See https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem
I'm a new DM's user and I need to transfer data (pixels bright) between Digital Micrograph and R, for processing and modelling an image.
Specifically, I would need to extract the bright pixels from an original image, send it to R for processing, and return to DM for to represent the new image.
I would like to know if it is possible and how to do it from an script in DM.
A lot of thanks. Regards.
There is very little direct connection between DM (scripting) and the outside world, so the best solution is quite likely the following (DM-centric) route:
A script is started in DM, which does:
all the UI needed
extract the intensities etc.
save all required data in a suitable format on disc at specific path. (Raw data/text-data/...)
call an external application ( anything you can call from a command prompt, including .bat files) and waits until that command has finished
Have all your R code been written in a way that it can be called from a command prompt, potentially with command prompt parameters (i.e. a configuration file):
read data from specific path
process as required (without an UI, so do it 'silently')
save results on disc on specific path
close application
At this point, the script in DM continues, reading in the results (and potentially doing some clean-up of files on disc.)
So, in essence, the important thing is that your R-code can work as a "stand-alone" black-box executable fully controlled by command-line parameters.
The command you will need to launch an external application can be found in the help-documentation under "Utility Functions" and is LaunchExternalProcess. It has been introduced with GMS 2.3.1.
You might also want try using the commands ScrapCopy() and ScrapPasteNew()to copy an image (or image subarea) into the clipboard, but I am not sure how the data is handled there exactly.
The question may looks duplicate. But i am not getting the answer which i am looking.
The problem is, in unix, one of the 4GL binary is fetching data from the table using cursor and writing the data in .txt file.
The table contains around 50 Million records.
The binary took lot of time and not completing. the .txt file is also 0 byte.
I want to know the possibilities why the records are not written in the .txt file.
Note: There is enough disk space available.
Also, for 30 Million records, i can get the data in the .txt file as i expected.
The information you provide is insufficient to tell for sure why the file is not written.
In UNIX, a text file is just like any another file - a collection of bytes. No specific limit (or structure) is enforced on "row size" or "row count," although obviously, some programs might have certain limits on maximum supported line sizes and such (depending on their implementation).
When a program starts writing data to a file (i.e. once the internal buffer is flushed for the first time) the file will no longer be zero size, so clearly your binqary is doing something else all that time (unless it wipes out the file as part of the cleanup).
Try running your executable via strace to see the file I/O activity - that would give some clues as to what is going on.
Try closing the writer if you are using one to write to the file. It achieves the dual purpose of closing the resource along with flushing the remaining contents of the buffer.
CPU calculated output needs to be flushed if you are using any mechanism of buffered writer. I have encountered such situations a few times and in almost all cases, the issue was that of flushing the output.
In java specifically, usually the best practice of writing data involves buffers. So when the buffer limit is reached, it gets written to the file but doesn't get written to the file when the end of buffer has not been reached yet. This happens when program closes without flushing the buffered writer.
So, in your case, if the processing time that it takes is reasonable and still the output is not on the file, it may mean that the output has been calculated and put on the RAM but could not be written to the file (which represents the disk) due to the output not being flushed.
You can also consider the answers to this question.