Arduino/AVR: Is it safe to interrupt Serial/I2C communications - arduino

I want to do some interrupt-driven signal processing on an Atmega328, which might not have enough SRAM (2K) to store the data of an entire run. This means I'll have to write part of the buffer to external memory while still gathering data.
My question is whether it is safe to have serial writes or I2C communications (e.g. to an SD card) while still triggering interrupts. I think serial communications themselves are interrupt driven so this might become an issue. Is this true? How about I2C? If both are likely to cause problems, what would be the recommended way (if any) to flush a buffer while still gathering data?

It is a common scenario and really depends on how much processing time you require exactly for each task and how tight your timing restrictions are regarding the device communication.
Thinks you should review/consider:
You have to be able to write data faster to the external RAM than it is acquired -> How fast do you gather data?
See that you spent as little time as possible in your interrupt handlers
Have an eye on interrupt starvation. Priority can cause ISR A not to be executed if ISR B triggers more frequently that AND has higher priority. Adjust execution interval if possible.
Check what serial data has to be sent in sequence or in short succession to honor timing requirements. You may have to pause/delay other processing for a short time.

Related

Free RTOS context switching

I am a beginner in RTOS programming.I have a query regarding the same.
Query:I understand that context switching happens between various tasks as per priority assigned. I wanted to know how exactly does a higher priority task interrupts a low priority task technically? Does each task is assigned to hardware interrupt pin so that whenever micro-controller is interrupted on that pin by external hardware,the specific task is processed provided it is assigned higher priority when compared to the task that is presently being processed? But practically speaking if there are 128 tasks present in the program it might require 7 hardware pins reserved for interrupts. What is the logic I am missing?
I recommend to read the pretty good docs on https://www.freertos.org e.g. RTOS Fundamentals
I’m sure this will provide a good overview and related details.
Besides that you’ll find out that usually you don’t need external hardware pins to run a multitasking OS.
Free RTOS uses only sys_tick/os_tick hw-interrupt for context switching. This is high precision periodic interrupt configured on any underlying controller
for example on Cortex M:
https://www.keil.com/pack/doc/CMSIS/Core/html/group__SysTick__gr.html
In the interrupt handling of this, FreeRTOS schedular switches the tasks based on the Ready Queue Task list and its priorities.

How to evaluate the performance of DPDK in a complex circumstance

Let's think about a circumstance like that:
In a MMORPG Game, We send a packet to server, and the server will do a lot of computing about all the palyers who have some connection like attack or heal or somethingelse. After that, we may receive several packets.
Since we may receive several packets, the thing is a little hard. If we only read one, then we can just use timestamp to see the time cost. But now, we cannot do that. So, how to evaluate the performance between traditional TCP/IP stack and the DPDK process in a complex circumstance like that?
If we only read one, then we can just use timestamp to see the time cost. But now, we cannot do that.
Answer> you can always register callback handler in RX, to get it invoked per packet.
how to evaluate the performance between traditional TCP/IP stack and the DPDK process in a complex circumstance like that?
Answer> I assume you are having TLDK or mTCP or ANS as userspace stack, your best approach is to have callback at each read success.

Microcontroller software intrerrupts how do they work

I was reading about micro-controller interrupts, and I have a question does the save of the current state happen for software interrupts too or just for hardware interrupts.
Yes. The general concept behind any interrupt is that it can suspend the execution of the underlying program without that program realising it has been interrupted. This requires storing of the CPU and (some) register states and their restoration once the interrupt service routine has completed. The CPU typically has special hardware mechanism for doing this.
Software interrupts share the same mechanism and so the state is saved and restored.
Note, however, that the normal use for software interrupts is on more complex microprocessors where they allow a safe hardware mechanism for moving between privilege modes - e.g. your application and an operating system. On a low level microcontroller they are of little use, as if you already know you want to call the piece of code in the interrupt, you might as well just call it directly as a function.
Typically the state saved is minimal: just enough to restore the previous state when the only action of the ISR is RTI. Usually just the instruction pointer (and code segment) and flags. The whole processor state is not saved, otherwise it would place too much overhead on a time-critical interrupt. It is up the the ISR to save and restore anything more than the bare-bones mechanism of the processor interrupt.
Yes and no, yes as Jon has described, but software interrupts can be used as your interface to the operating system, and for that use case you likely will be setting up registers as parameters into the software interrupt. And expect registers when it returns to be the result.
So software interrupts, can be treated like hardware interrupts and you have to preserve the state, or they can be treated as or used as fancy function calls and you have parameters going in and
In short it's a maybe.
Without knowing the exact example I couldn't say. What gets saved or doesn't get saved really depends on which software is generating/receiving the interrupt.
At low levels there is a lot of concern about speed at which things happen. And hence work that may not need to be done is avoided. Therefore there is no point in "saving your state" if you don't intend to modify it. So it really depends on what exactly we are talking about.
This is where specifications come into play. Operating systems and processors all specify exactly what state they do save and what state they do not save and depending on your specific problem the answer may differ...
#dwelch's answer explains it for Linux. (which is closest to what you were asking for)
#Jon's answer explains it for processors in general.
#weather-vane's answer explains it for micro-controllers.
These are all specific examples. If you were working with FreeRTOS on a micro-controller you would have to probably make your own software interrupt protocol and define it yourself in which case you can choose to save state or leave the responsibility to the caller. In the end, it is just a choice that a programmer that wrote the system made and hence you must go find his notes and read them.
In the end, it is mostly a gentlemen agreement.

MPI_Isend /Irecv: Is it possible to access the sendbuffer on unused memory-locations in the meanwhile

I would like to speedup my MPI- Program with the use of asynchronous communication. But the used time remains the same. The workflow is as followed.
before:
1. MPI_send/ MPI_recv Halo (ca. 10 Seconds)
2. process the whole Array (ca. 12 Seconds)
after:
1. MPI_Isend/ MPI_Irecv Halo (ca. 0,1 Seconds)
2. process the Array (without Halo) (ca. 10 Seconds)
3. MPI_Wait (ca. 10 Seconds) (should be ca. 0 Seconds)
4. process the Halo only (ca. 2 Seconds)
Measurements showed that the communication and processing the Array-core nearly take the same time for common workloads. So asynchronism should nearly hide the communication time.
But it dosn't.
One fact - and I thinks this could be the problem - is that the sendbuffer is also the array the calculations are made on. Is it possible that MPI serializes the memory-access although communication ONLY accesses the Halo (with derived datatype) and the computation ONLY accesses the core (only reading) of the array???
Does anybody know if this is for sure the reason?
Is it maybe implementation-dependend (I'm using OpenMPI)?
Thanks in advance.
It isn't the case that MPI serializes the memory accesses in the user code (that's beyond the library's power to do, in general), and it is true that what exactly does happen is implementation specific.
But as a practical matter, MPI libraries don't do as much communication "in the background" as you might hope, and this is particularly true when using transports and networks like tcp + ethernet, where there's no meaningful way to hand off communication to another set of hardware.
You can only be sure that the MPI library is actually doing something when you're running MPI library code, eg in an MPI function call. Often, a call to any of a number of MPI calls will nudge an implementations "progress engine" that keeps track of in-flight messages and ushers them along. So for instance one thing you can quickly do is to make calls to MPI_Test() on the requests within the compute loop to make sure things start happening well before the MPI_Wait(). There is of course overhead to this, but this is something that's easy to try to measure.
Of course you could imagine the MPI library would use some other mechanism to run things behind the scenes. Both MPICH2 and OpenMPI have played with separate "progress threads" which execute separately from the user code and do this ushering along in the background; but getting that to work well, and without tying up a processor while you're trying to run your computation, is a genuinely difficult problem. OpenMPI's progress threads implementation has long been experimental, and in fact is temporarily out of the current (1.6.x) release, although work continues. I'm not sure about MPICH2's support.
If you are using infiniband, where the network hardware has a lot of intelligence to it, then prospects brighten a bit. If you are willing to leave memory pinned (for the openfabrics), and/or you can use a vendor-specific module (mxm for Mellanox, psm for Qlogic), then things can progress somewhat more rapidly. If you're using shared memory, than the knem kernel module can also help with intranode transport.
One other implementation-specific approach you can take, if memory isn't a big issue, is to try to use eager protocols for sending the data directly, or send more data per chunk so fewer nudges of the progress engine are needed. What eager protocols means here is that data is automatically sent at send time, rather than just initiating a set of handshakes which will eventually lead to the message being sent. The bad news is that this generally requires extra buffer memory for the library, but if that's not a problem and you know the number of incoming messages is bounded (eg, by the number of halo neighbours you have), this can help a great deal. How to do this for (eg) shared memory transport for openmpi is described on the OpenMPI page for tuning for shared memory, but similar parameters exist for other transports and often for other implementations. One nice tool that IntelMPI has is an "mpitune" tool that automatically runs through a number of such parameters for best performance.
The MPI specification states:
A nonblocking send call indicates that the system may start copying
data out of the send buffer. The sender should not modify any part of the
send buffer after a nonblocking send operation is called, until the
send completes.
So yes, you should copy your data to a dedicated send buffer first.

Limiting TCP sends with a "to-be-sent" queue and other design issues

This question is the result of two other questions I've asked in the last few days.
I'm creating a new question because I think it's related to the "next step" in my understanding of how to control the flow of my send/receive, something I didn't get a full answer to yet.
The other related questions are:
An IOCP documentation interpretation question - buffer ownership ambiguity
Non-blocking TCP buffer issues
In summary, I'm using Windows I/O Completion Ports.
I have several threads that process notifications from the completion port.
I believe the question is platform-independent and would have the same answer as if to do the same thing on a *nix, *BSD, Solaris system.
So, I need to have my own flow control system. Fine.
So I send send and send, a lot. How do I know when to start queueing the sends, as the receiver side is limited to X amount?
Let's take an example (closest thing to my question): FTP protocol.
I have two servers; One is on a 100Mb link and the other is on a 10Mb link.
I order the 100Mb one to send to the other one (the 10Mb linked one) a 1GB file. It finishes with an average transfer rate of 1.25MB/s.
How did the sender (the 100Mb linked one) knew when to hold the sending, so the slower one wouldn't be flooded? (In this case the "to-be-sent" queue is the actual file on the hard-disk).
Another way to ask this:
Can I get a "hold-your-sendings" notification from the remote side? Is it built-in in TCP or the so called "reliable network protocol" needs me to do so?
I could of course limit my sendings to a fixed number of bytes but that simply doesn't sound right to me.
Again, I have a loop with many sends to a remote server, and at some point, within that loop I'll have to determine if I should queue that send or I can pass it on to the transport layer (TCP).
How do I do that? What would you do? Of course that when I get a completion notification from IOCP that the send was done I'll issue other pending sends, that's clear.
Another design question related to this:
Since I am to use a custom buffers with a send queue, and these buffers are being freed to be reused (thus not using the "delete" keyword) when a "send-done" notification has been arrived, I'll have to use a mutual exlusion on that buffer pool.
Using a mutex slows things down, so I've been thinking; Why not have each thread have its own buffers pool, thus accessing it , at least when getting the required buffers for a send operation, will require no mutex, because it belongs to that thread only.
The buffers pool is located at the thread local storage (TLS) level.
No mutual pool implies no lock needed, implies faster operations BUT also implies more memory used by the app, because even if one thread already allocated 1000 buffers, the other one that is sending right now and need 1000 buffers to send something will need to allocated these to its own.
Another issue:
Say I have buffers A, B, C in the "to-be-sent" queue.
Then I get a completion notification that tells me that the receiver got 10 out of 15 bytes. Should I re-send from the relative offset of the buffer, or will TCP handle it for me, i.e complete the sending? And if I should, can I be assured that this buffer is the "next-to-be-sent" one in the queue or could it be buffer B for example?
This is a long question and I hope none got hurt (:
I'd loveeee to see someone takes the time to answer here. I promise I'll double-vote for him! (:
Thank you all!
Firstly: I'd ask this as separate questions. You're more likely to get answers that way.
I've spoken about most of this on my blog: http://www.lenholgate.com but then since you've already emailed me to say that you read my blog you know that...
The TCP flow control issue is such that since you are posting asynchronous writes and these each use resources until they complete (see here). During the time that the write is pending there are various resource usage issues to be aware of and the use of your data buffer is the least important of them; you'll also use up some non-paged pool which is a finite resource (though there is much more available in Vista and later than previous operating systems), you'll also be locking pages in memory for the duration of the write and there's a limit to the total number of pages that the OS can lock. Note that both the non-paged pool usage and page locking issues aren't something that's documented very well anywhere, but you'll start seeing writes fail with ENOBUFS once you hit them.
Due to these issues it's not wise to have an uncontrolled number of writes pending. If you are sending a large amount of data and you have a no application level flow control then you need to be aware that if you send data faster than it can be processed by the other end of the connection, or faster than the link speed, then you will begin to use up lots and lots of the above resources as your writes take longer to complete due to TCP flow control and windowing issues. You don't get these problems with blocking socket code as the write calls simply block when the TCP stack can't write any more due to flow control issues; with async writes the writes complete and are then pending. With blocking code the blocking deals with your flow control for you; with async writes you could continue to loop and more and more data which is all just waiting to be sent by the TCP stack...
Anyway, because of this, with async I/O on Windows you should ALWAYS have some form of explicit flow control. So, you either add application level flow control to your protocol, using an ACK, perhaps, so that you know when the data has reached the other side and only allow a certain amount to be outstanding at any one time OR if you cant add to the application level protocol, you can drive things by using your write completions. The trick is to allow a certain number of outstanding write completions per connection and to queue the data (or just don't generate it) once you have reached your limit. Then as each write completes you can generate a new write....
Your question about pooling the data buffers is, IMHO, premature optimisation on your part right now. Get to the point where your system is working properly and you have profiled your system and found that the contention on your buffer pool is the most important hot spot and THEN address it. I found that per thread buffer pools didn't work so well as the distribution of allocations and frees across threads tends not to be as balanced as you'd need to that to work. I've spoken about this more on my blog: http://www.lenholgate.com/blog/2010/05/performance-comparisons-for-recent-code-changes.html
Your question about partial write completions (you send 100 bytes and the completion comes back and says that you have only sent 95) isn't really a problem in practice IMHO. If you get to this position and have more than the one outstanding write then there's nothing you can do, the subsequent writes may well work and you'll have bytes missing from what you expected to send; BUT a) I've never seen this happen unless you have already hit the resource problems that I detail above and b) there's nothing you can do if you have already posted more writes on that connection so simply abort the connection - note that this is why I always profile my networking systems on the hardware that they will run on and I tend to place limits in MY code to prevent the OS resource limits ever being reached (bad drivers on pre Vista operating systems often blue screen the box if they can't get non paged pool so you can bring a box down if you don't pay careful attention to these details).
Separate questions next time, please.
Q1. Most APIs will give you "write is possible" event, after you last wrote and writing is available again (can happen immediately if you failed to fill major part of send buffer with the last send).
With completion port, it will arrive just as "new data" event. Think of new data as "read Ok", so there's also a "write ok" event. Names differ between the APIs.
Q2. If a kernel mode transition for mutex acquisition per chunk of data hurts you, I recommend rethinking what you are doing. It takes 3 microseconds at most, while your thread scheduler slice may be as big as 60 milliseconds on windows.
It may hurt in extreme cases. If you think you are programming extreme communications, please ask again, and I promise to tell you all about it.
To address your question about when it knew to slow down, you seem to lack an understanding of TCP congestion mechanisms. "Slow start" is what you're talking about, but it's not quite how you've worded it. Slow start is exactly that -- starts off slow, and gets faster, up to as fast as the other end is willing to go, wire line speed, whatever.
With respect to the rest of your question, Pavel's answer should suffice.

Resources