Creating a NetIncomingMessage with Lidgren from saved package in a binary file - networking

I am using Lidgren networking library to create a real time multiplayer game.
What I am trying to do is, save all incoming packages (including all bytes) to a peer in a binary file. Later when I need to debug some weird behavior of networking, I can load this file and have it load all (or rebuild) the packages that it saved, sequentially. This way, I can find how the weird behavior occurred exactly.
My question is, how do I recreate this package when I load it from the file?
It is a NetIncomingMessage that I need to recreate, I assume, and so far I thought of either creating it anew, or sending an NetOutgoingMessage to self, so it hopefully has the same effect I want to achieve, if the first approach fails.

The way I solved this is by creating an interface (wrapper object) of the NetIncomingMessage which contains a data byte array among other data members, and than have a thread to fill a list of these objects based on the saved incoming time, which is requested and removed (dequeued) from another thread.
See https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem

Related

Read in an RDS file from memory using aws.s3

I am using the cloudy r aws.s3 code to pull down an RDS file from s3. I have my own custom R runtime in a lambda. aws.s3 has a handy method called s3readRDS("s3://pathtoyourfile"). This works well, but has a limitation in that it saves the RDS file to disk, then must read it back in using readRDS. This is fine for smaller files, but for larger files, there is a no-go as we have limited disk storage.
Right now, I'm kind of stuck with these largish data files and the ability to pull them out into a database is just not feasible at the moment due to the target group, so I'm trying to minimize our cost and maximize throughput and this is the last nail in that coffin.
According to the documentation:
"Some users may find the raw vector response format of \code{get_object} unfamiliar. The object will also carry attributes, including \dQuote{content-type}, which may be useful for deciding how to subsequently process the vector. Two common strategies are as follows. For text content types, running \code{\link[base]{charToRaw}} may be the most useful first step to make the response human-readable. Alternatively, converting the raw vector into a connection using \code{\link[base]{rawConnection}} may also be useful, as that can often then be passed to parsing functions just like a file connection would be. "
Based on this(and an example using load()) the below code looks like it should work but it does not.
foo <- readRDS(rawConnection(get_object("s3://whatever/foo.rds")))
Error in readRDS(rawConnection(get_object("s3://whatever/foo.rds", :
unknown input format
I can't seem the datastream right for readRDS or unserialize to make sense of it. I know the file is correct as using the save to disk/load from disk works fine. But I want to know how to make "foo" into an unserialized object without the save/load.

How to transfer (to connect) data between Digital Micrograph and R

I'm a new DM's user and I need to transfer data (pixels bright) between Digital Micrograph and R, for processing and modelling an image.
Specifically, I would need to extract the bright pixels from an original image, send it to R for processing, and return to DM for to represent the new image.
I would like to know if it is possible and how to do it from an script in DM.
A lot of thanks. Regards.
There is very little direct connection between DM (scripting) and the outside world, so the best solution is quite likely the following (DM-centric) route:
A script is started in DM, which does:
all the UI needed
extract the intensities etc.
save all required data in a suitable format on disc at specific path. (Raw data/text-data/...)
call an external application ( anything you can call from a command prompt, including .bat files) and waits until that command has finished
Have all your R code been written in a way that it can be called from a command prompt, potentially with command prompt parameters (i.e. a configuration file):
read data from specific path
process as required (without an UI, so do it 'silently')
save results on disc on specific path
close application
At this point, the script in DM continues, reading in the results (and potentially doing some clean-up of files on disc.)
So, in essence, the important thing is that your R-code can work as a "stand-alone" black-box executable fully controlled by command-line parameters.
The command you will need to launch an external application can be found in the help-documentation under "Utility Functions" and is LaunchExternalProcess. It has been introduced with GMS 2.3.1.
You might also want try using the commands ScrapCopy() and ScrapPasteNew()to copy an image (or image subarea) into the clipboard, but I am not sure how the data is handled there exactly.

Concurrent Read Access to Thread Object that Emulates Map

I am experiencing (very) slow page load times that increase proportionately to the number of active users on the system. I have a hunch that this is related to a custom defined thread object:
define stageStoreCache => thread {
parent map
public oncreate() => ..oncreate()
}
This stageStoreCache object simply mimics the behavior of a map whose data available across the entire instance.
Many threads are reading it and very few threads are writing to it. Is this a poorly conceived solution to having a large map of data available across the instance? It's a fairly large map of maps that when exported to map->asstring can exceed 5MB. The objective is to prevent translating data stored as JSON in the database to Lasso types on the fly.
It seems that the large size of the stageStoreCache is not what causes problems. It seems to really be the number of concurrent users on the system.
Thanks for any insight you can offer.
You said that this holds a map of maps and is rather large. If those sub-maps are large, it is possible that the way you are accessing the data is causing the issue. Here's what I mean, if you are doing something like this:
// Potential problem as it copies the sub-map each time
stageStoreCache->find('sub-map')->find('data')
stageStoreCache->find('sub-map')->find('other')
The problem comes in that each time stageStoreCache->find('sub-map') is called it actually has to copy all the map data it finds for "sub-map" out of the thread object and into the thread requesting that data. If those sub-maps are large, this takes time. A better approach would be to do this once and stash it in a local variable:
// Better Approach
local(cache) = stageStoreCache->find('sub-map')
#cache->find('data')
#cache->find('other')
This at least only has to copy the "sub-map" over once. Another approach that might be better (only testing could tell) would be to refactor your code so that each call to stageStoreCache drills down to the data you actually want, and have just that small amount of data copied over.
// Might even be better as it just copies the values you want
stageStoreCache->drill('sub-map', 'data')
stageStoreCache->drill('sub-map', 'other')
Ultimately, I would love for Lasso to improve thread objects so that they never blocked for reads. (I had thought this had been submitted as a feature request, but I'm not finding it on Rhinotrac.) Until that happens, if none of my suggestions help then you may need to investigate using something else to cache this data in such as memcached.
Testing is the only way to tell for sure. But I would go a long way to avoid having a thread object that contains some 5 MB of data.
Take this snippet from the Lasso guide into consideration:
"all parameter values given to a thread object method are copied, as well as any return value of a thread object method"
http://www.lassoguide.com/language/threading.html
Meaning that one of the key features that makes Lasso 9 so fast, the extensive use of reference data, is lost.
Each time you have a call for stageStoreCache all the data it contains will first be copied into the thread that asks for it. That is an awful lot of copying.
I have found that having settings and site wide data contained in smallest possible chunks is convenient and fast. And also, to only actually set it up when it is called for. Unlike the old approach that had a config file that was included on every call, setting up a bunch of variables where the majority maybe never got used on that particular call. Here's a Ke trick that I'm using instead. Consider this:
define mysetting1 => var(__mysetting1) || $__mysetting1 := 'Setting 1 value'
define mysetting2 => var(__mysetting2) || $__mysetting2 := 'Setting 2 value'
define mysetting3 => var(__mysetting3) || $__mysetting3 := 'Setting 3 value'
Have this is a file that is read at startup, either in a LassoApp that's initiated or a file in the startup folder.
These settings can then be called like this:
code blabla
mysetting2
more code blabla
mysetting1
mysetting2
With the beauty that, in this case, there is no wasted processing to initiate mysetting3, since it's not called for. And that mysetting2 is called for several times but is still only initiated once.
This technique can be used for simple things like the above, but also to initiate complex types or methods. Like session management, calling post or get params etc.

Why does System V shared memory have separate get and attach functions?

Using System V shared memory IPC requires calls to the following two functions:
int shmget(key_t key, size_t size, int shmflg);
void *shmat(int shmid, const void *shmaddr, int shmflg);
Why are they designed to be separate, instead of having a single function that accepts these arguments, performs both functions and simply returns the address?
We can consider files as an analogy. open on a string (the file path) gives us a file descriptor, and we use that to read/write from the file. We close on the file descriptor when we're done. This design seems natural, we don't have to open with a string to get a descriptor, and then attach to the descriptor.
As an example of what I have in mind, take a look at the FreeBSD sendmail shared memory implementation.
This kind of separation (shm_open and mmap) also exists with POSIX shared memory, but the reason was that mmap existed before shm_open was implemented and could be reused, and mmap requires a descriptor (source: UNIX Network Programming Vol. 2, R. Stevens, chapter 13, page 326).
Shared memory is probably one of the fastest ways of allowing for IPC as data need not be copied, the problem associated with it though is synchronizing access between multiple threads. You could do this using semaphores or record locks , we end up using the later in unix fro shared memory even though they are not as efficient as they are simple, the system cleans up well, and you don't need some of the bling that semaphores bring along.
Lets look into how these work to understand why they are implemented as such.
In comes the shmid_ds used by the linux kernel (http://www.tldp.org/LDP/lpg/node68.html)
the shm_nattch is the unsigned int counter for current attaches. shmget gets you an shm id and sets stuff like the ipc_perm , dates, pid, atime ctime, request of the segment size (shm_segsz)
next the shmctl kicks in and does stuff for ipc using IPC_STAT, IPC_RMID, IPC_SET like setting perms, getting or removing shm_id for a segment or even locking or unlocking it.
Once the segment is ready shmat is used by a process to attach to its address space, depending on the flags and address parameters. Once it attaches the kernel increments the shm_nattch. When detaching we call shmdt to detach . Removal of the identifier and the associated data structure is not automated some process has to do this calling shmctl with the IPC_RMID and depending on shm_perm
As you can see this is all very similar to how one would use semaphores and the implementation makes sense.
One possible reason I could think of is this:
(From the manpage of shmget)
After a fork(2) the child inherits the attached shared memory segments.
After an execve(2) all attached shared memory segments are detached from the process.
Upon _exit(2) all attached shared memory segments are detached from the process.
Well, technically attaching and detaching is basic reference counting on the shared memory segment that is reserved during shmget.
The functionalities of allocating the shared memory segment, via shmget and reference counting them (up or down, via shmat and shmdt respectively), are separate so that, code can be reused during fork and exec.
If they were both packed into the same function, you would anyways need a separate function, which just does reference counting (to be invoked during fork/exec). So, I think this design is simply to promote code reuse, and avoid code duplication.

Techniques for infinitely long pipes

There are two really simple ways to let one program send a stream of data to another:
Unix pipe, or TCP socket, or something like that. This requires constant attention by consumer program, or producer program will block. Even increasing buffers their typically tiny defaults, it's still a huge problem.
Plain files - producer program appends with O_APPEND, consumer just reads whatever new data became available at its convenience. This doesn't require any synchronization (as long as diskspace is available), but Unix files only support truncating at the end, not at beginning, so it will fill up disk until both programs quit.
Is there a simple way to have it both ways, with data stored on disk until it gets read, and then freed? Obviously programs could communicate via database server or something like that, and not have this problem, but I'm looking for something that integrates well with normal Unix piping.
A relatively simple hand-rolled solution.
You could have the producer create files and keep writing until it gets to a certain size/number of record, whatever suits your application. The producer then closes the file and starts a new one with an agreed naming algorithm.
The consumer reads new records from a file then when it gets to the agreed maximum size closes and unlinks it and then opens the next one.
If your data can be split into blocks or transactions of some sort, you can use the file method for this with a serial number. The data producer would store the first megabyte of data in outfile.1, the next in outfile.2 etc. The consumer can read the files in order and delete them when read. Thus you get something like your second method, with cleanup along the way.
You should probably wrap all this in a library, so that from the applications point of view this is a pipe of some sort.
You should read some documentation on socat. You can use it to bridge the gap between tcp sockets, fifo files, pipes, stdio and others.
If you're feeling lazy, there's some nice examples of useful commands.
I'm not aware of anything, but it shouldn't be too hard to write a small utility that takes a directory as an argument (or uses $TMPDIR); and, uses select/poll to multiplex between reading from stdin, paging to a series of temporary files, and writing to stdout.

Resources