I'm looking for a simple clock synchronization protocol that would be easy to implement with small footprint and that would work also in the absence of internet connection, so that it could be used e.g. within closed laboratory networks. To be clear, I'm not looking for something that can be used just to order events (like vector clocks), but something that would enable processes on different nodes to synchronize their actions based on local clocks. As far as I understand, this would require a solution that can take clock drift into account. Presence of TCP/IP or similar relatively low-latency stream connections can be assumed.
Disclaimer: I'm not an NTP expert by any means. Just a hobbyist having fun on the weekend.
I realize you said you didn't want an NTP implementation, because of the perceived complexity and because an Internet NTP server may not be available in your environment.
However, an simplified NTP look-up may be easy to implement, and if you have a local NTP server you can achieve good synchronization.
Here's how:
Review RFC 5905
You'll see NTP v4 packets look something like:
LI (2 bits)
VN (3 bits) - Use '100' (4)
Mode (3 bits)
Stratum (8 bits)
Poll (8 bits)
Precision (8 bits)
Root Delay (32 bits)
Root Dispersion (32 bits)
Reference Id (32 bits)
Reference Timestamp (64 bits)
Origin Timestamp (64 bits)
Receive Timestamp (64 bits)
Transmit Timestamp (64 bits)
Extension Field 1 (variable)
Extension Field 2 (variable)
...
Key Identifier
Digest (128 bits)
The digest is not required, so forming a valid client request is very easy. Following the guidance in the RFC, use LI = '00', VN = '100' (decimal 4), Mode = '011' (decimal 3).
Using C# to illustrate:
byte[] ntpData = new byte[48]
Array.Clear(ntpData, 0, ntpData.Length);
ntpData[0] = 0x23; // LI = 00, VN = 100, Mode = 011
Open a socket to your target server and send it over.
int ntpPort = 123;
IPEndPoint target = new IPEndPoint(Dns.GetHostEntry(serverDnsName).AddressList[0], ntpPort);
Socket s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
s.Connect(target);
s.Send(ntpData);
In the response, the current time will be in the Transmit Timestamp (bytes [40 - 48]). Timestamps are 64-bit unsigned fixed-point numbers. The integer part is the first 32 bits, the fractional part is the last 32 bits. It represents the number of seconds since 0h on Jan-1-1900.
s.Receive(ntpData);
s.Close();
ulong intPart = 0;
ulong fractPart = 0;
for (int i = 0; i < 4; i++)
intPart = (intPart << 8) | ntpData[40 + i];
for (int i = 4; i < 8; i++)
fractPart = (fractPart << 8) | ntpData[40 + i];
To update the clock with (roughly) second granularity, use: # of seconds since 0h Jan-1-1900 = intPart + (fractPart / 2^32). (I say roughly because network latency isn't accounted for, and we're rounding down here)
ulong seconds = intPart + (fractPart / 4294967296);
TimeSpan ts = TimeSpan.FromTicks((long)seconds * TimeSpan.TicksPerSecond);
DateTime now = new DateTime(1900, 1, 1);
now = DateTime.SpecifyKind(now, DateTimeKind.Utc);
now += ts;
"now" is now a DateTime with the current time, in UTC.
While this might not answer your question, hopefully it makes NTP a little less opaque. =)
I was able to implement a parred down version of the Precision Time Protocol very quickly and easily based solely on the wikipedia article. If all you are interested in is synchronizing them with each other as opposed to synchronizing them with the outside world, you should be able to get millisecond accuracy with minimal effort.
The fundamental basics of the protocol involve the following:
A master clock broadcasts a sync message with the timestamp of when he sent the message (T1).
The clients record the time at which they received the sync message as T1'.
The clients send a delay request back to the master and record the time they sent the message as T2.
The master responds to the delay request with the time he received the message. This time is T2'.
Client adjusts their clock by (T1' - T1 - T2' + T2)/2.
If you need better stability, you can implement a phase locked loop or a linear regression or something similar to better control your jitter and avoid large swings due to network lags. There are a number of more complicated features specified by the protocol, but if you want to implement them depends on how close 'good enough' is.
ntp is the right tool for the job. You do not need an internet connection, and for an extra $105 and a few hours of your life, you can even be GPS synchronized for an absolute time reference without an internet connection, though that appears to not be important to you.
Ignoring the slight additional complexity of GPS synchronization, you can get synchronized to a chosen system's clock using a few configuration file lines (four lines on each client, five lines on the server). The ntpd binary is 505kB on my system. You can also use ntpdate which can be periodically run to adjust the system clock (zero lines of configuration on the client, other than the call to the ntpdate application with the right arguments). That binary is 80kb. There is a SNTP protocol which allows even smaller footprints for embedded appliations (talking to a normal ntp server). There is also an alternate NTP implementation called chrony.
There is also a program called rdate (typically only on older systems, though source is available) which works kinda like ntpdate but much less precisely. You also need an RFC 868 server, often provided in inetd.
The only other alternative is the Precision Time Protocol already mentioned.
Might Precision Time Protocol fit the bill? It doesn't look real simple, but it seems to do more or less exactly what you are asking for. (There are some open source implementations referenced on the Wikipedia page.)
I think the problem is that this is an inherently tricky problem, so the solutions tend to be complex. NTP is trying to provide a correct absolute time, which definitely goes beyond what you need, but it does have the advantage of being well known and widely implemented.
Might using http://www.ietf.org/rfc/rfc5905.txt be appropriate?
Even if it is much more than what you need, you could certainly implement a "compatible" client that works with an NTP server (even if you run your own NTP server), but where the client implementation is purposely naive?
Eg, if you don't care about small time adjustments, don't implement them. If you don't care about bidirectional synchronisation, don't implement that, etcetera.
(Be warned: Most of the functionality present in that RFC is there for a reason - Accurate time synchronisation has many pitfalls - including the fact that many OS's do not like it if the time suddenly changes)
Not really a proper answer, but just a reminder to make sure that you understand exactly what the hardware clock sources are and any caveats about them - especially if you are planning to use some slightly exotic possibilities like the low-power CPU / RTOS combination you mention.
Even the x86 case has at least 2 or 3 clocks which could be in use, depending on the setup - all with different properties.
Related
I'm currently setting up an AZURE RTOS (ThreadX on STM32), with Ethernet, SPI and ADCs activated.
This STM32 has to pass-through configuration information from time to time, coming from my PC over the Ethernet-Port.
It has to pass these information via SPI to two other STM32, which makes the first STM32 the system-controller / system-interface. This will be a low-priority task, since the activation of the passed configuration will be started by sync-lines, running from the system-controller to the two other STMs.
While doing so, the system-controller has to read-in ADC values constantly and pass them via Ethernet / TCP to my computer.
I've used the ThreadX TCP server example, as given by STM, as a starting point.
From there I've managed to set up three servers on three ports, communicating sucessfully with a python script on my PC (as a first test).
Now come the two great questions:
1)
Since my input signal may contain frequencies up to 2.5 MHz, I want to digitize this signal with the full 5 MSPS (Nyquist), which ADC3 is capable of.
The smallest internally available data-type at full resolution is uint16_t, which makes the data rate work out to be R = 16 * 5 MSPS = 80 MBit/s (worst-case, I bet, there is optimization possible ... e.g. 8 bits resolution, which halves the data-rate ... but this resolution might not be enough ... or 16 bits, and FFT afterwards, which is also sufficient, since I'm mostly interested in energy per frequency band, but initially I wanted to do this on my computer, for best flexibility).
Even if the Ethernet-IF is capable of doing 100MBit/s, the TCP layer of NetXDuo, I bet, is not.
(There is also USB OTG on this board available, but since networked devices are in my opinion more versatile, I prefer using Ethernet ... nevertheless, USB might be a backup solution)
From my measurements, a data-stream transmitted to the uC via TC from within python, and mirrored back within a thread to my PC allows for relatively consistent 20 MBit/s.
... How do I push this speed to a better level?
(I think 20MBit/s is the back-and-forth data-rate, so one-way may be faster)
However. Second question:
2)
The ADC within the STM is capable of storing data via DMA to memory.
There are two callbacks available, one at half-full, one at full buffer state.
My problem is mostly about the way of reading out the DMA and/or triggering the conversion in the first place.
How do you do this the "right" way on a RTOS (such that you don't brake the RT in RTOS)?
I see some options here, what are the pros/cons you can think of?
a) Let the ADC run freely, calling the call-backs at the respective fill-levels, triggering a TCP-transmission whenever one of the call-backs is reached
-> may lead to glitches due to insufficient speed of the TCP layer in my opinion.
b) Let the ADC conversion be triggered by a thread, which is preempted and will later TCP-transmit the data, as soon as the memory-buffer is full
-> may lead to inconsistency in the converted values, since you get burst-style conversions, with gaps in between, while the buffer is read
c) Let a thread trigger each conversion individually
-> A no-go I think, since threads are not triggered that often, to get a decent sample-frequency
d) Let a free-running ADC trigger callbacks, let a thread do the FFT, transmit within another thread the data via TCP
-> May work, but is less flexible, since the data gets crunched within the uC.
--> Are there other ways you can think of / what do you think about the ways I named here?
--> What do you think about question 1)?
Have a nice day!
I'm developing a web front end for a GNU Radio application developed by a colleague.
I have a TCP client connecting to the output of two TCP Sink blocks, and the data encoding is not as I expect it to be.
One TCP Sink is sending complex data and the other is sending float data.
I'm decoding the data at the client by reading each 4-byte chunk as a float32 value. The server and the client are both little-endian systems, but I also tried byte swapping (with the GNU Radio Endian Swap block and also manually at the client), and the data is still not right. Actually it's much worse then, confirming there is no byte order mismatch.
When I execute the flow graph in GNU Radio Companion with appropriate GUI elements, the plots look correct. The data values are shown as expected to between 0 and 10.
However the values decoded at the client are generally around 0.00xxxxx, and the plot looks like noise rather than showing a simple tone as is seen in GNU Radio. If I manually scale the data by multiplying by 1000 it still looks like noise.
I'll describe the pre-D path in GNU Radio since it's shorter, but I see the same problem on the post-D path, where a WBFM Receive and a Rational Resampler are added, followed by a Throttle block and then a TCP Sink block sending float data.
File Source (Output Type: complex, vector length: 1) =>
Throttle (vector length: 1) =>
Low Pass Filter (FIR Type: Complex->Complex (Decimating)) =>
Throttle (vector length: 1) =>
TCP Sink (input type: complex, vector length: 1).
This seems to be the correct way to specify the stream parameters (and indeed Companion shows errors if I make changes which mismatch the stream items), but I can find no way to decode the data correctly on the other end of the stream.
"the historic RFC 1700 (also known as Internet standard STD 2) has defined the network order for protocols in the Internet protocol suite to be big-endian , hence the use of the term 'network byte order' for big-endian byte order."
see https://en.wikipedia.org/wiki/Endianness
having mentioned the network order for protocols being big-endian, this actually says nothing about the byte order of network payload itself.
also note: Sun Microsystems made big-endian native byte order computers (upon which much Internet protocol development was done).
i am surprised the previous answer has gone this long without a lesson on network byte order versus native byte order.
GNURadio appears to assume native byte order from a UDP Source block.
Examining the datatype color codes in Help->Types of GNURadio Companion, the orange colored 'float' connections are float32.
To verify a computer's native byte order, in Python, do:
from sys import byteorder
byteorder
the result will be 'little' or 'big'
It might be possible that no matter what type floats you are sending, when bytes get on network they get ordered in little endian. I had similar problem with udp connection, and I solved it by parsing floats as little endian on client side.
I am using the readstream interface to sample at 100hz, I have been able to integrate the interface into Oscilloscope application. I just have a doubt in the way I pass on the buffer value on to the packet to be transmitted . Currently this is how I am doing it :
uint8_t i=0;
event void ReadStream.bufferDone( error_t result,uint16_t* buffer, uint16_t count )
{
if (reading < count )
i++;
local.readings[reading++] = buffer[i];
}
I have defined a buffer size of 50, I am not sure this is the way to do it as I am noticing just one sample per packet even though I have set Nreadings=2.
Also the sampling rate does not seem to be 100 samples/second when I check.I am not doing something right in the way I pass data to the packet to be transmitted.
I think I need to clarify a few things according to your questions and comments.
Reading a single sample from an accelerometer on micaZ motes works as follows:
Turn on the accelerometer.
Wait 17 milliseconds. According to the ADXL202E (the accelerometer) datasheet, startup time is 16.3 ms. This is because this particular hardware is capable of providing first reading not immediately after being powered on, but with some delay. If you decrease this delay, you will likely get a wrong reading, however, the behavior is undefined, so you may sometimes get a correct reading or the result may depend on environment conditions, such as ambient temperature. Changing this 17-ms delay to a lower value is certainly a bad idea.
Read values (in two axes) from the Analog to Digital Converter (ADC), which as an MCU component that converts analog output voltage of the accelerometer to the digital value (an integer). The speed at which ADC can sample is independent from the parameters of the accelerometer: it is another piece of hardware.
Turn off the accelerometer.
This is what happens when you call Read.read() in your code. You see that the maximum frequency at which you can sample is once every 17 ms, that is, 58 samples per second. It may be even a bit smaller because of some overhead from MCU or inaccuracy of timers. This is true when you sample by calling Read.read() in a loop or every fixed interval, because this call itself lasts no less than 17 ms (I mean the delay between the command and the event).
What you may want to do is:
Turn on the accelerometer.
Wait 17 ms.
Perform series of reads.
Turn off the accelerometer.
If you do so, you have one 17-ms delay for a set of samples instead of such delay for each sample. What is important, these steps have nothing to do with the interface you use for performing readings. You may call Read.read() multiple times in your application, however, it cannot be the same implementation of the read command that is already implemented for this accelerometer, because the existing implementation is responsible for turning on and off the accelerometer, and it waits 17 ms before reading each sample. For convenience, you may implement the ReadStream interface instead and call it once in your application.
Moreover, you wrote that ReadStream used a microsecond timer and is independent from the 17-ms settling time of the ADC. That sentence is completely wrong. First of all, you cannot say that an interface uses or does not use a timer. The interface is just a set of commands and events without their definitions. A particular implementation of the interface may use timers. The Read and ReadStream interfaces may be implemented multiple times on different platforms by various hardware components, such as accelerometers, thermometers, hygrometers, magnetometers, and so on. Secondly, the 17-ms settling time refers to the accelerometer, not the ADC. And no matter which interface you use, Read or ReadStream, and which timers a driver uses, milli- or microsecond, the 17-ms delay is always required after powering on the accelerometer. As I mentioned, you probably want to make this delay once per multiple reads instead of once per a single read.
It seems that the TinyOS source code already contains an implementation of the accelerometer driver providing the ReadStream interface which allows you to sample continuously. Look at the AccelXStreamC and AccelYStreamC components (in tos/sensorboards/mts300/).
The ReadStream interface consists of two commands. postBuffer(val_t *buf, uint16_t count) is called to provide a buffer for samples. In the accelerometer driver, val_t is defined as uint16_t. You may post multiple buffers, one by one. This command does not yet start sampling and filling buffers. For that purpose, there is a read(uint32_t usPeriod) command, which directs the device to start filling buffers by sampling with the specified period (in microseconds). When a buffer is full, you get an event bufferDone(error_t result, val_t *buf, uint16_t count) and a component starts filling a next buffer, if any. If there are no buffers left, you get additionally an event readDone(error_t result, uint32_t usActualPeriod), which passes to your application a parameter usActualPeriod, which indicates an actual sampling period and may be different (especially, higher) from a period you requested when calling read due to some hardware constraints.
So the solution is to use the ReadStream interface provided by AccelXStreamC and AccelYStreamC (or maybe some higher-level components that use them) and pass an expected period in microseconds to the read command. If the actual period is lower than one you expect, this means that sampling at higher rate is impossible either due to hardware constraints or because it was not implemented in the ADC driver. In the second case, you may try to fix the driver, although it requires good knowledge of low-level programming. The ADC driver source code for this platform is located in tos/chips/atm128/adc.
We've got a system running XP embedded, with COM2 being a hardware RS485 port.
In my code, I'm setting up the DCB with RTS_CONTROL_TOGGLE. I'd assume that would do what it says... turn off RTS in kernel mode once the write empty interrupt happens. That should be virtually instant.
Instead, We see on a scope that the PC is driving the bus anywhere from 1-8 milliseconds longer than the end of the message. The device that we're talking to is responding in about 1-5 milliseconds. So... communications corruptions galore. No, there's no way to change the target's response time.
We've now hooked up to the RS232 port and connected the scope to the TX and RTS lines, and we're seeing the same thing. The RTS line stays high 1-8 milliseconds after the message is sent.
We've also tried turning off the FIFO, or setting the FIFO depths to 1, with no effect.
Any ideas? I'm about to try manually controlling the RTS line from user mode with REALTIME priority during the "SendFile, clear RTS" cycle. I don't have many hopes that this will work either. This should not be done in user mode.
RTS_CONTROL_TOGGLE does not work (has a variable 1-15 millisecond delay before turning it off after transmit) on our embedded XP platform. It's possible I could get that down if I altered the time quantum to 1 ms using timeBeginPeriod(1), etc, but I doubt it would be reliable or enough to matter. (The device responds # 1 millisecond sometimes)
The final solution is really ugly but it works on this hardware. I would not use it on anything where the hardware is not fixed in stone.
Basically:
1) set the FIFOs on the serial port's device manager page to off or 1 character deep
2) send your message + 2 extra bytes using this code:
int WriteFile485(HANDLE hPort, void* pvBuffer, DWORD iLength, DWORD* pdwWritten, LPOVERLAPPED lpOverlapped)
{
int iOldClass = GetPriorityClass(GetCurrentProcess());
int iOldPriority = GetThreadPriority(GetCurrentThread());
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
EscapeCommFunction(hPort, SETRTS);
BOOL bRet = WriteFile(hPort, pvBuffer, iLength, pdwWritten, lpOverlapped);
EscapeCommFunction(hPort, CLRRTS);
SetPriorityClass(GetCurrentProcess(), iOldClass);
SetThreadPriority(GetCurrentThread(), iOldPriority);
return bRet;
}
The WriteFile() returns when the last byte or two have been written to the serial port. They have NOT gone out the port yet, thus the need to send 2 extra bytes. One or both of them will get trashed when you do CLRRTS.
Like I said... it's ugly.
Any ideas?
You may find that there's source code for the serial port driver in the DDK, which would let you see how that option is supposed to be implemented: i.e. whether it's at interrupt-level, at DPC-level, or worse.
Other possibilities include rewriting the driver; using a 3rd-party RS485 driver if you can find one; or using 3rd-party RS485 hardware with its own driver (e.g. at least in the past 3rd parties used to make "intelligent serial port boards" with 32 ports, deep buffers, and its own microprocessor; I expect that RS485 is a problem that's been solved by someone).
8 milliseconds does seem like a disappointingly long time; I know that XP isn't a RTOS but I'd expect it to (usually) do better than that. Another thing to look at is whether there are other high-priority threads running which may be interfering. If you've been boosting the priorities of some threads in your own application, perhaps instead you should be reducing the priorities of other threads.
I'm about to try manually controlling the RTS line from user mode with REALTIME priority during the "SendFile, clear RTS" cycle.
Don't let that thread spin out of control: IME a thread like that can if it's buggy preempt every other user-mode thread forever.
I know in a lot of asynchronous communication, the packet begins starts with a start bit.
But a start bit is just a 1 or 0. How do you differentiate a start bit from the end bit from the last packet?
Ex.
If I choose my start bit to be 0 and my end bit to be 1.
and I receive 0 (data stream A) 1 0 (data stream B) 1,
what's there to stop me from assuming there is a data stream C which contains the same contents of "(data stream A) 1 0 (data stream B)" ?
Isn't it more convenient to have a start BYTE and then check the data stream for that combination of bits? That will reduce the possibility of a confusing between the start/end bit.
Great question! Most asynchronous communication also specifies a stop bit, which is the complement of the start bit, ensuring each new symbol begins with a stop-to-start transition.
Example: let's transmit the characters ABC, which are ASCII 65, 66, and 67:
A = 65 = 0x41 = 0100 0001
B = 66 = 0x42 = 0100 0010
C = 67 = 0x43 = 0100 0011
Let's also assume (arbitrarily) that the start bit is 0 and the stop bit is 1, and the data is transmitted from MSB to LSB. The transmitter will be in the stop (1) state when no data is transmitted. So the receiver might see this:
Data: ....1111 0010000011 111 0010000101 0010000111 11111....
(quiet) ^ A $ ^ B $ ^ C $ (quiet)
With apologies for the ASCII graphics, the data consists of a series of stop (1) bits while the channel is idle. When the transmitter is ready to send a character, it sends a start (0) bit (marked with ^), followed by the character code, and ending with a stop (1) bit (marked with $). It continues to send stop bits until the next character is transmitted, beginning with another start bit.
The reason we use start bits instead of bytes is efficiency. The scheme above requires 10 bits (1start + 8data + 1stop) to transmit 8 bits of data, resulting in an overhead of (10 - 8) / 8 = 1/4 = 25%. If we used start and stop bytes, we'd need to transmit 3 bytes for each byte of data, which would be an overhead of (3 - 1)/1 = 2 = 200%. If the start, data, and stop bytes were each 8 bits, we'd have to transmit 24 bits instead of 10 for each character, so it would take almost 2 1/2 times as long to send the data!
One can always define a start byte as an indication that a message is beginning (and the ASCII SOH, STX, and ETX codes were intended for such purposes). However, the standard hardware and protocols for connection to data-transmission equipment (RS232C and later) operate at a lower level, and it is generally neither possible nor desirable to alter that arrangement (especially via software).
High performance synchronous data transmission schemes, such as those used on local-area networks and wide-area transmission systems do use elaborate frame markers. The frame marker is a distinct pattern of bits that never occurs in the stream for message data. There is typically a special rewriting rule that essentially "escapes" any in-data occurrence of a similar bit pattern so that transmission equipment will not see it as a frame marker. These escaped patterns are reconstructed by the recipient so the sender and receiver never have to pay attention to this. These arrangements make specialized hardware even more important, such as in the typical Network Interface Card (these days, motherboard chip) on personal computers.
BACKGROUND ON ASYNCHRONOUS SERIAL COMMUNICATION
It is useful to think of asynchronous serial transmissions as asynchronous between character/data frames and synchronous within the span of the character frame (including the start bits and initial stop/fill).
With this scheme, there is a constant fill signal between the frames and it is usually at least one data-bit wide, although some arrangements require a 1.5-bit or two-bit stop/fill. The stop "bit" uses the same signal level and can be considered the minimum fill period before another start bit will arrive.
When a frame is arriving, it is necessary to synchronize with the predetermined number of bits it is expected to carry. The transition from the fill to an opposite level signal is accomplished by the start bit which is always opposite to the stop/fill level. The sampling of the bits can be timed to happen in the middle of subsequent bit-arrival periods.
Technically, if frames were being sent at the maximum rate, it would not be necessary to send any stop/fill, proceeding to the start bit of the next frame immediately. However, counting on at least one bit worth of fill before the start-bit transition helps to keep the sender and receiver synchronized.
If you think of the asynchronous streams as being encoded from key depressions using a keyboard, you can see the importance of allowing arbitrary fill between character frames. Once it is known what frame to send next, it can be inserted immediately, with its start bit, at the agreed bit rate, after there has been at least one bit worth of preceding stop/fill.
It is also useful to notice that, in typical low-speed asynchronous transmissions, there are only two kinds of bits/levels, so the only way the presence of data as opposed to fill can be distinguished is by a marker scheme like this where the start of the frame is uniquelly detectable and the end of frame is predetermined (unless there is a more-sophisticated variable-length frame structure generally not used in asynchronous serial communication). It is actually rather difficult for a receiver to discover the bit rate of a transmitter without some additional agreement, such as looking for a recognizable data sequence from which one can estimate the bit rate which would have it arrive correctly when it arrives in incorrect form.
Even though high-speed modems now transmit complex analog signals that aren't described in terms of two simple signal levels, the RS232C (and later-mode) digital communication between a computer UART and the data coupling on the modem is pretty much as described.
High-speed modems also have additional capabilities for synchronizing with a distant end-point, as you can tell by listening to the signal audio while a connection starts up. In addition, There are separate signal lines in the serial cable to the computer that are used for pacing between the computer and the modem so that the sending party does not transmit new data frames faster than the receiving party (either computer or modem) can accept them. But a frame, once started, is always started at the agreed synchronous speed.
Wikipedia has a good description of asynchronous serial communication, what computer serial ports use.
There is a common over-simplification that suggests the stop bit determines the length of the data. That's not the case. The stop bit looks just like a level for another data bit. The way the stop bit and the period until the next start bit, are recognized is by knowing the bit rate at which in-frame data and start/stop bits are being transmitted and knowing how many bits a frame contains. Otherwise, there is no way to distinguish a stop bit from just another bit of that polarity as part of the data frame.
Here is the way start and stop bits usually work:
A start bit is sent, say 1. This indicates to the receiver that a specified number of bits of data will be transmitted, say 8.
8 bits of data are sent.
A stop bit is sent, say 0. This indicates that the 8 bits of data have been sent.
If more data is to be sent, each byte must be initiated with a start bit and terminated with a stop bit. The transmitter and receiver must agree on how many bits of data are sent for each start bit so the receiver will be able to distinguish the stop bit from the data. Sometimes the start bit is actually multiple bits or even a byte, but the idea is the same. The receiver recognizes the end of the data frame when it sees the stop bit after receiving the pre-specified number of data bits. Sometimes a parity bit is sent before the stop bit to provide a simple error-detecting mechanism.
It is all protocol dependent. You can say that after start symbol you will expect N symbols or you will read until you encounter the stop symbol.
Where symbol colud be any n-bit sequence (including bit and byte.)
Indeed, your example with bits exactly apply to a protocol which uses bytes instead of bits.
Say you send 00000000 stream A 11111111 00000000 stream B 11111111. In this case you may still confuse it with stream C = stream A 11111111 00000000 stream B.
Usually a start bit is used because a voltage level change can trigger an event (See edge triggering in flip flops.) On the other hand a start symbol with multiple bits will be used to synchronize clocks of two systems in addition to triggering an event. An example of it would be a PAL signal.
The start and stop bits come from the days of the teletypes. They essentially were pulses that took up time to sort of let the mechanical hardware set. Dos text file lines are ended with CR LF which literally caused the carriage to return to column 1, and the platen to advance one line. I think it is in the order because it takes longer for the CR to occur, and the LF can effectively happen in parallel.
Detecting it is a little bit harder. You sort of have to watch the bit stream go by.
Over time you ought to be able to detect it, as the data is normally ASCII with the start/stop bits around it. Normally this isn't an issue, because it is handled by the UART which runs the COM port.