Lwip 1.4.1 stuck on endless while loop in tcp_fasttmr - tcp

Core: Cortex-M7
Microcontroller: stm32f765zi
IP Stack: lwIP 1.4.1
I have a STM32F7 based embedded system with tcp server setup using LwIP which accepts 3 connections on the application layer.
The server runs as expected for a few hours and randomly gets stuck in the while loop of the tcp_fasttmr function shown below:
void
tcp_fasttmr(void)
{
struct tcp_pcb *pcb;
++tcp_timer_ctr;
tcp_fasttmr_start:
pcb = tcp_active_pcbs;
**while(pcb != NULL) {**
if (pcb->last_timer != tcp_timer_ctr) {
struct tcp_pcb *next;
pcb->last_timer = tcp_timer_ctr;
/* send delayed ACKs */
if (pcb->flags & TF_ACK_DELAY) {
LWIP_DEBUGF(TCP_DEBUG, ("tcp_fasttmr: delayed ACK\n"));
tcp_ack_now(pcb);
tcp_output(pcb);
On investigating, both pcb->last_timer and tcp_timer_ctr had the same value of 58
pcb->last_timer = tcp_timer_ctr = 58
It will be of great help if anybody can explain the reason for this behaviour of LwIP.
Further investigation also pointed out that two of the 3 client connections were pointing to the same pcb. I am not sure how this is happening as well because the wireshark traces show that the 3 different clients were communicating successfully until getting stuck in the while loop.
I found in one of the forums that I could make it a finite while loop by placing an arbitrary timeout counter. But since this issue occurs randomly after few hours, I can't tell whether I really have solved this issue. This is why I would like to know the reason this is happening.

I found the following answer here: https://lists.gnu.org/archive/html/lwip-users/2012-12/msg00003.html
This is a common bug in lwIP ports that do not obey lwIP's threading
requirements. The reason for pcb->next pointing to pcb is most often
that more than one execution context calls lwIP functions at the same
time. In your case (no OS), I guess you are using lwIP from the main
loop and from an ISR (e.g. feeding packets into ethernet_input or
ip_input from interrupt level or calling lwIP timer functions from
interrupt level), which is not supported.

Related

How to use a GPIO pin, for serial flow control with Qt?

THE GOAL
In my Qt application, I need to control a GPIO pin, depending on data being sent over the serial bus. So, I need to set it to HIGH for as long as I transmit data, and to LOW, immediately after the transmission ends. Consider it as a serial communication flow control pin, which when set to 1 it enables transmission, and when set to 0 enables receive of data. The entire system is half-duplex and communicates in a master-slave fashion.
THE PROBLEM
I managed to come close to a solution, by setting it to HIGH immediately before any transmission, introducing some constant delay (I used QThread:usleep() ) depending on the baud rate and then setting it to low again, but I was getting random "stretchings" of the pulse (staying HIGH longer than it should) when I was visualizing it with an oscilloscope.
ATTEMPTED SOLUTIONS
Well, it seems that some "magic" is taking place, which adds some extra delay, on top of the one I have manually defined. In order to get rid of that possibility, I used the bytesWritten() signal, so I can fire my setPinLow() slot when we finish writing the actual data to the port. So my code now looks like this:
classTTY::classTTY(/*someStuff*/) : port(/*some other stuff*/)
{
s_port = new QSerialPort();
connect(s_port, SIGNAL(bytesWritten(qint64)), this, SLOT(setPinLow()));
if(GPIOPin->open(QFile::ReadWrite | QFile::Truncate | QFile::Text | QFile::Unbuffered)) {
qDebug() << "GPIO pin ready to switch.";
} else {
qDebug() << "Failed to access GPIO pin";
}
bool classTTY::sendData(data, replyLength)
{
directionPinEnable(true);
if(m_port->isOpen()) {
s_expectedReplyLength = replyLength;
s_receivedData.clear();
s_port->flush();
s_port->write(data);
return true;
}
return false;
}
void classTTY::setPinLow()
{
gpioPinEnable(false);
}
void classTTY::gpioPinEnable(bool enable){
if(enable == true){
GPIOPin->write("1");
} else if (enable == false) {
GPIOPin->write("0");
}
}
After implementing it the pin started to give really short pulses, much more like "spikes", which implies (I think) that now it stays HIGH for as long as the Qt write() process lasts, and not while the actual propagation of the data lasts.
THE QUESTION(S)
What is that extra delay being added when I use the naive,
QThread::usleep approach, that causes the stretch of the pulse?
Why the signal-slot approach is not working, since it is
event-driven?
In general, how can I instruct the pin to go active ONLY during the
transmission of data and then drop again to zero, so I can receive
the slave's reply?
What is that extra delay being added when I use the naive, QThread::usleep approach, that causes the stretch of the pulse?
Linux is not a real-time operating system a thread sleep suspends the process fo no less than the time specified. During the sleep, other threads and processes may run and may not yield the processor for a longer time than your sleep period, or may not yield at all and consume their entire OS allocated time-slice. Beside that kernel driver interrupt handlers will always preempt a user-level process. Linus has a build option for real-time scheduling, but the guarantees remain less robust that a true RTOS and latencies typically worse.
Note also that not only can your thread be suspended for longer than the sleep period, but the transmission may be extended by more than the number of bits over baud-rate - the kernel driver can be preempted by other drivers and introduce inter-character gaps over which you have no control.
Why the signal-slot approach is not working, since it is event-driven?
The documentation for QSerialPort::waitForBytesWritten() states:
This function blocks until at least one byte has been written to the serial port and the bytesWritten() signal has been emitted.
So it is clear that the semantics of this are that "some data has been written" rather than "all data has been written". It will return whenever a byte is written, then if you call it again, it will likely return immediatly if bytes are continuing to be written (because QSerialPort is buffered and will write data independently of you application).
In general, how can I instruct the pin to go active ONLY during the transmission of data and then drop again to zero, so I can receive the slave's reply?
Qt is not unfortunately the answer; this behaviour needs to be implemented in the serial port kernel driver or at least at a lower-level that Qt. The Qt QSerialPort abstraction does not give you the level of control or insight into the actual occurrence "on the wire" that you need. It is somewhat arms-length from the hardware - for good reason.
However there is a simple solution - don't bother! it seems entirely unnecessary. It is a master-slave communication, and as such the data itself is flow control. The slave does not talk until spoken to, and the master must expect and wait for a reply after it has spoken. Why does the slave need any permission to speak other than that implied by being spoken to?

LWIP: How exactly does the TCP_INTERVAL relate to the reception of ACK Messages?

I am trying to implement a data transfer from an embedded board to a PC. For this, I need to use low latency communication and I am bound to use Ethernet with TCP/IP.
Furthermore, I'm using the lwip stack.
First of all, I disabled nagle algorithm, because I have to send small packets of data (10 KB) and I want them to be sent as soon as possible, without waiting for intermediate ACKS.
The Wireshark Log shows me that this is working quite fine (the whole data is being sent to the PC in about 1msec).
After that, the PC takes about 200msec to send the last ACK (because the last Segment is not maximum size).
The problem is now, that on the embedded processor, it takes a very long time, until the lwip gives my application the message, that all of the data has been ACKED.
When I decrease the TCP_INTERVAL (to let's say 5), it speeds up greatly.
I am wondering, why lwip behaves like this? I would think that the Periodic-TCP-Tasks (which are being called according to the TCP_INTERVAL) have nothing to do with the Handling of the received frames (which is really another call in the main).
I hope I could state my problem somehow understandable, if not I would appreciate feedback, so I can improve my question!
Thanks!
EDIT:
After more debugging, I found out that the process of sending data results in the following function calls:
My main calls tcp_write(...)
tcp_tmr() is called multiple times (through the LwIP_Periodic_Handle() function). This happens seven times. During the eigth call:
tcp_output() is called. During this call, all segments which were added during the last tcp_write() call are sent by calling tcp_output_segment().
So now it is clear that if I reduce the TCP_INTERVAL, of course the data gets sent sooner, because the tcp_tmr() function is called more quickly.
but my question is still: Is this the normal behaviour? It seems a bit odd, that lwIP is waiting such a long time before actually sending the data.
Since Youre doing this My main calls tcp_write(...)
use tcp_output() immediately after tcp_write
or else use tcp_write() in tcp_recv callback

Arduino WiFi shield stop working at WiFi.status()

UPDATE:
I pinpointed where the problem is coming from. To avoid any complication, I'm using ScanNetwork example, so I don't even have to put in SSID. The code stops functioning on the board as soon as it hits WiFi.status().
I have a Serial.println before and after it tries to get a WiFi.status(), the serial.println after wasn't performed, and of course, I'm still not connected.
I've downloaded fresh copy of the code, and the situation remains the same. I've really run out of idea....
I'm using the official arduino wifi shield, and I have the following code:
status = WiFi.begin([ssid],[pass]);
Serial.println(status);
Status is neither WL_CONNECTED nor WL_IDLE_STATUS, which are the two possible responses outlined in the official reference http://arduino.cc/en/Reference/WiFiBegin
Status is the number 4.
and of course, I couldn't connect to wifi.
What is this????
I've pressed the reset button a million times, is there a more powerful factory restore button?
I've figured it out.
Apparently, there's a jumper, that when you put it in, it'll kick the shield into a DFU-mode that enables reprogramming. And the shield wouldn't be present as a result.
According to WiFi.h The return values of the begin() functions (all three of them, one for each security scheme) are ints. It is not stated outright on this function but I believe that just as with the status() function the return type is a wl_status_t. wl_status_t is an enum declared in wl_definitions.h As:
typedef enum {
WL_NO_SHIELD = 255,
        WL_IDLE_STATUS = 0,
        WL_NO_SSID_AVAIL,
        WL_SCAN_COMPLETED,
        WL_CONNECTED,
        WL_CONNECT_FAILED,
        WL_CONNECTION_LOST,
        WL_DISCONNECTED
} wl_status_t;
So your 4 is WL_CONNECT_FAILED. Probably not surprising to you since, you know, you connection failed.
The hobbiest's debugger, AKA reset button, will only do so much. Printing the status was a good start. Be sure you are using the right flavor of begin() for you security type, you seem to be using the one for WPA. Consider shutting off you router's security completely (if safe to do so in your area), or using a spare router, to test the ability of the shield to communicate at all. Also this may sound obvious but check for a misspelling of the SSID.

Windows XP embedded - RS485 problems

We've got a system running XP embedded, with COM2 being a hardware RS485 port.
In my code, I'm setting up the DCB with RTS_CONTROL_TOGGLE. I'd assume that would do what it says... turn off RTS in kernel mode once the write empty interrupt happens. That should be virtually instant.
Instead, We see on a scope that the PC is driving the bus anywhere from 1-8 milliseconds longer than the end of the message. The device that we're talking to is responding in about 1-5 milliseconds. So... communications corruptions galore. No, there's no way to change the target's response time.
We've now hooked up to the RS232 port and connected the scope to the TX and RTS lines, and we're seeing the same thing. The RTS line stays high 1-8 milliseconds after the message is sent.
We've also tried turning off the FIFO, or setting the FIFO depths to 1, with no effect.
Any ideas? I'm about to try manually controlling the RTS line from user mode with REALTIME priority during the "SendFile, clear RTS" cycle. I don't have many hopes that this will work either. This should not be done in user mode.
RTS_CONTROL_TOGGLE does not work (has a variable 1-15 millisecond delay before turning it off after transmit) on our embedded XP platform. It's possible I could get that down if I altered the time quantum to 1 ms using timeBeginPeriod(1), etc, but I doubt it would be reliable or enough to matter. (The device responds # 1 millisecond sometimes)
The final solution is really ugly but it works on this hardware. I would not use it on anything where the hardware is not fixed in stone.
Basically:
1) set the FIFOs on the serial port's device manager page to off or 1 character deep
2) send your message + 2 extra bytes using this code:
int WriteFile485(HANDLE hPort, void* pvBuffer, DWORD iLength, DWORD* pdwWritten, LPOVERLAPPED lpOverlapped)
{
int iOldClass = GetPriorityClass(GetCurrentProcess());
int iOldPriority = GetThreadPriority(GetCurrentThread());
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
EscapeCommFunction(hPort, SETRTS);
BOOL bRet = WriteFile(hPort, pvBuffer, iLength, pdwWritten, lpOverlapped);
EscapeCommFunction(hPort, CLRRTS);
SetPriorityClass(GetCurrentProcess(), iOldClass);
SetThreadPriority(GetCurrentThread(), iOldPriority);
return bRet;
}
The WriteFile() returns when the last byte or two have been written to the serial port. They have NOT gone out the port yet, thus the need to send 2 extra bytes. One or both of them will get trashed when you do CLRRTS.
Like I said... it's ugly.
Any ideas?
You may find that there's source code for the serial port driver in the DDK, which would let you see how that option is supposed to be implemented: i.e. whether it's at interrupt-level, at DPC-level, or worse.
Other possibilities include rewriting the driver; using a 3rd-party RS485 driver if you can find one; or using 3rd-party RS485 hardware with its own driver (e.g. at least in the past 3rd parties used to make "intelligent serial port boards" with 32 ports, deep buffers, and its own microprocessor; I expect that RS485 is a problem that's been solved by someone).
8 milliseconds does seem like a disappointingly long time; I know that XP isn't a RTOS but I'd expect it to (usually) do better than that. Another thing to look at is whether there are other high-priority threads running which may be interfering. If you've been boosting the priorities of some threads in your own application, perhaps instead you should be reducing the priorities of other threads.
I'm about to try manually controlling the RTS line from user mode with REALTIME priority during the "SendFile, clear RTS" cycle.
Don't let that thread spin out of control: IME a thread like that can if it's buggy preempt every other user-mode thread forever.

Receiving image through winsocket

i have a proxy server running on my local machine used to cache images while surfing. I set up my browser with a proxy to 127.0.0.1, receive the HTTP requests, take the data and send it back to the browser. It works fine for everything except large images. When I receive the image info, it only displays half the image (ex.: the top half of the google logo) heres my code:
char buffer[1024] = "";
string ret("");
while(true)
{
valeurRetour = recv(socketClient_, buffer, sizeof(buffer), 0);
if(valeurRetour <= 0) break;
string t;
t.assign(buffer,valeurRetour);
ret += t;
longueur += valeurRetour;
}
closesocket(socketClient_);
valeurRetour = send(socketServeur_, ret.c_str(),longueur, 0);
the socketClient_ is non-blocking. Any idea how to fix this problem?
You're not making fine enough distinctions among the possible return values of recv.
There are two levels here.
The first is, you're lumping 0 and -1 together. 0 means the remote peer closed its sending half of the connection, so your code does the right thing here, closing its socket down, too. -1 means something happened besides data being received. It could be a permanent error, a temporary error, or just a notification from the stack that something happened besides data being received. Your code lumps all such possibilities together, and on top of that treats them the same as when the remote peer closes the connection.
The second level is that not all reasons for getting -1 from recv are "errors" in the sense that the socket is no longer useful. I think if you start checking for -1 and then calling WSAGetLastError to find out why you got -1, you'll get WSAEWOULDBLOCK, which is normal since you have a non-blocking socket. It means the recv call cannot return data because it would have to block your program's execution thread to do so, and you told Winsock you wanted non-blocking calls.
A naive fix is to not break out of the loop on WSAEWOULDBLOCK but that just means you burn CPU time calling recv again and again until it returns data. That goes against the whole point of non-blocking sockets, which is that they let your program do other things while the network is busy. You're supposed to use functions like select, WSAAsyncSelect or WSAEventSelect to be notified when a call to the API function is likely to succeed again. Until then, you don't call it.
You might want to visit The Winsock Programmer's FAQ. (Disclaimer: I'm its maintainer.)
Have you analyzed the transaction at the HTTP level i.e. checked Headers?
Are you accounting for things like Chunked transfers?
I do not have a definite answer in part because of the lack of details given here.

Resources