|One-Way-Loader's fields of application:
So, with bootloader transmissions, we always have to consider this "deadtime", and it all boils down to sort of a 'stop-and-go' transfer.
In a minimum protocol, Computer sends small blocks of data to the Controller, allowing it to process, buffer and write to EEPROM or Flash. Having finished the previous write operation, the Controller would immediately send back sort of a confirmation message, signalling that it is ready to accept further data right now. Then the Computer may send further data or commands. Such two-way protocols are doing well, but definitely require some feedback channel. Now, how do we get rid of this annoying dependency?
Solution: Since Microcontrollers are quite deterministic, deadtimes of an operative Bootloader could be calculated in advance, provided that some technical parameters are known to the Sender, who could create sort of a "timed" serial transmission that does not depend on a backchannel for handshake. The Receiver could finish previous operations shortly before the next data packet is about to arrive. Unfortunately, in the reality of modern RS232 implementations (multitasking OS, several layers of abstraction, driver latencies and inconsistencies) it became impossible to send out serial data with exactly defined "pauses" interrupting. What we could rather do: "fill up" the pausing periods with a calculated number of serial characters that carry no data but will of course take a defined amount of time to be sent. And just transmit the whole stuff continuously! The Receiver of such self-timed transmission would have the chance to process previous data, while those "dummy" characters just flow by. However, in the moment, when the Receiver is ready to accept new data, it will re-synchronise to some of the last dummy characters, shortly before the next block of valid data is about to arrive. No flow control and no feedback-channel needed!
|The OWL signal is a regular unidirectional RS232 transmission in mode 8-N-1 at chosen baud rate.
The OWL signal may be sent smoothly from any RS232 interface.
Timeline of a rather short OWL Transmission for ATtiny25 target running at 10 MHz (4 blocks of EEPROM data, 8 blocks of Flash data, Transmission speed of 9600 baud)
|General: The diagram above illustrates timing characteristics of an "OWL Transmission". Different data types have been highlighted for clarity, but actually there is no interruptions of serial data flow. At a glance we can see, that those PREAMBLE runs can have very different size, since these are adjusted to the individual processing- and deadtimes of the respective target controller and its technical set-up.
Block time: Every data block consists of a starting character and 16 bytes of payload data. At given baudrate, the block transmission time is therefore a constant. For example, with 9600 baud, each block will take about 18 milliseconds (tB).
Introductory Preamble: To catch an OWL Transmission and initially synchronise serial data reception, the Receiver must wait for the very first Preamble. Actually, it does not necessarily have to "wait for" that signal. With regard to a manual coordination of Sender's and Receiver's activation, the OWL Preambles provide capability to step into an already running Preamble (see below). So there is option to first start the OWL Transmission with comparably long INTRO PREAMBLE, then activate the target device in serenity.
Block decryption time: Data decryption consumes computation time. At clock frequencies of some MHz, this decryption time (tD) is in the range of a few milliseconds only. Since each block must be decrypted, the decryption time prescribes minimum duration of all Preambles. (To be safe, there should always be some extra Preamble characters to compensate for runtime deviations and to allow the receiver to re-synchronise and autobaud.) Yet, block decryption is not the longest delay that occurs with AVR Bootloaders. Physical writes into EEPROM and Flash eventually cost much more time.
Authentication sequence (S1): The first three blocks constitute for a cryptographic protocol that safely authenticates the Sender towards the Receiver. As this step takes place only in SRAM memory and does not involve other computationally demanding tasks, minimum Preambles (tD) are sufficient in this sequence.
EEPROM sequence (S2): EEPROM memory could be directly overwritten, but this will take several milliseconds per byte, so that the write time for a block of 16 EEPROM locations will sum up to a whopping of 60 milliseconds (tEW). Fortunately this is only the case for EEPROM data actually to be written. Those comparably long Preambles between EEPROM blocks are plain to see in the diagram.
Flash sequence (S3): A pretty long Preamble of about 180 ms (tFE) is following the first payload block of Flash data. This is because of the Flash Erase cycle that is necessary before Flash may be overwritten with new data (at least on ATtinys). After Flash Erase was complete, the Flash session gets the gear. But you might have noticed that the Preambles between Flash data blocks have slightly different size. This is intentionally due to the organisation of Flash memory in the microcontroller. In this example, the target chip (ATtiny25), features Flash memory pagesize of 32 bytes. Since the cipher determines unified blocksize of 16 bytes, the Bootloader program must aggregate two successive blocks before the next Flash Page Write. Accordingly, the Sender will have to insert the enlarged Preamble that considers Flash Write time (tFW) only after every 2nd block. (Note: Controllers with lots of Flash memory usually have larger pagesizes. On such devices, the Flash Write cycle will only appear with every 4th, 8th or 16th block; vastly optimising Flash Erase and Flash Write times.)
Outroductory Preamble: In some hardware setups it may be necessary to keep the serial channel open for a while after the actual transmission has already finished. Any number of dummy characters could be appended to the serial data stream to extend transmission time accordingly.
Calculations: In order to generate such customised Transmission, the Sender must know some individual properties of the receiving microcontroller platform, such as memory sizes and organisation, average number of processor cycles needed for decryption, absolute clock frequency and, of course, the correct cryptographic key for the intended Receiver. See Software section for further details on how this meta-information is administered.
|Sounds good... literally: OWL Transmission, 1 kB of payload, 9600 bits per second transcoded to an audio file
Note: This acoustic sample is just to illustrate the timing character of an OWL-Transmission. It is not exactly the same as the 'OWL-Audio' export format described below.
When the Receiver gets back on-line, last characters of the preceeding Preamble are just zooming by. Now the Receiver has occasion to repeatedly synchronise and calibrate for the actual baudrate of that serial transmission, enabling for a technically robust reception of further data.
Re-synchronisation and autobauding right before re-entry to data reception
Autobauding and Frame-Synchronisation:
|Note: For the purpose of this documentation, all RS232 signals are depicted in the same unipolar logic that the microcontroller's UART would normally expect (e.g. coming from MAX232 or FT232). The logical "1" (stopbit or idle) is identified by a logical High (3.3 or 5 volts), a logical "0" corresponds to the Low (0 volts). Just to mention, the OWL Firmware can also be configured for inverse signalling. In some cases, this could simplify the hardware interface even more.
Performance The table below gives an overview of what is feasible with the Synchro-Autobaud procedure described. In a testing set-up with ATtiny2313-20PU and MAX232 at 5 volts, many different controller clock frequencies and baudrates have been tried out. Test-Transmission contained data samples of 2 x 64 bytes for the EEPROM and a simple LED-flashing programme for the Flash, the latter been filled up with approximately 1 KB of random data to challenge data integrity and also testing some other aspects.
||Baud Min.||Baud Max.
These technical properties do not concern the logical format of an OWL Transmission. On the crypto layer, only the data blocks of 16 bytes are relevant. Their purpose is to transport payload data and provide sequence control.
All blocks are encrypted with RST. By the special key-feedback-mode, each block of data becomes cryptographically dependent on all previous blocks. This is the foundation of a reliable error detection mechanism that covers all transmit data at whole. Thus, the legitimate receiver can safely determine whether all data in a logical section ("RST Sequence") has been completely and correctly transmitted.
An OWL Transmission consists of 3 consecutive RST sequences. These will transmit, always in the same order, Authentication, EEPROM and Flash data. In the OWL-variant of RST, the key generator is never reset in the course of a Transmission. Therefore, the sequences S1, S2 and S3 become cryptographically dependent on each other respectively, and constitute for a forgery-safe cryptogram that provides both; sort of a sequence control and a highly reliable over-all error detection.
OWL Transmission's logical cycle (see also diagram):
Normal case: Sender has applied correct crypto key and Transmission came through without major disturbance. Receiver has managed to decrypt each single block and finish each sequence correctly. Finally it has detected the last VI of S3. At this very moment, it is quite clear that all previous data must have been complete and error-free. Whole session was successful and the OWL Firmware passes control to an existing or newly written Application firmware. Indirect feedback: Immediate start of the Application.
Exceptional cases: Transmission errors; signal interruptions, excessive pauses (Timeout); failed autobauding or synchronisation; accidental use of a wrong key; intentional use of a wrong key (addressing multiple Receivers on the same line); crude manipulation attempts... All of these will be recognized by the Receiver as an error condition. Depending on which sequence currently is in, the Receiver will take appropriate action. Indirect feedback: Blockade until hardware reset.
Remarks: It has to be pointed out, that this elaborate format for a Bootloader transmission comes without control characters and commands. It does not need header blocks nor other metadata that could give rise for a known-plaintext attack. Au contraire, there is additional entropy injected with every new sequence. Albeit this comparably simple and static format, accessing EEPROM or Flash remains optional, since it is possible to write only EEPROM or only Flash contents by skipping the other memory section respectively. The OWL Transmission can incorporate whole firmware updates consisting of programming content for EEPROM and Flash, into one and the same Transmission or Transmission file.
in comparison to other candidates, RST provides a pretty good compromise between code efficiency and security in a microcontroller environment.
How RST works
For its randomised substitutions and transpositions, the algorithm makes use of primitives such as adding, bitshifting, inverting, and swapping, which are in fact the building blocks of many well-established block ciphers. However, RST does not depend on predefined keys, lookup-tables or sophisticated choice of delta-constants. All arithmetics is controlled by dynamically generated pseudo random vectors.
The respective pseudo random number generator, PRNG, is initially loaded with the secret key (seed) at the beginning of a crypto sequence.
The block algorithm of RST achieves medium to good avalanche with minimum round count. That is, one single "flipped" bit in ciphertext will affect about 25 to 50 percent of the resulting plaintext after decryption. After each block round, the state of the PRNG is modified by decrypted plaintext data, resulting in a massive and irrecoverable error propagation over all consecutive blocks. This is the basis for the system's fault detection and authentication mechanisms.
RST file format
For PC applications, a logical format had been developed, that allows for an authenticated and cryptographically secure file transfer. It can be proven that the resulting data stream meets all general requirements on a good cryptogram, such as data not being discernable from randomness, no identifyable headers etc. Only when decrypted correctly, the RST sequence unfolds to these 3 sections:
The IV has yet a second function in RST. Its random pattern will, with utmost certainty, never ever repeat in any of the message blocks. Therefore, the IV may be used as a definite end-of-message marker. I suggest naming this closing block the 'VI' to underline its functional inverse to the IV. All that the legitimate Receiver must do is; copy the very first decrypted RST block to a buffer, then compare every consecutively decrypted block with that VI. This will enable the Receiver to do these qualified decisions:
On IP-based distribution channels, several measures on the transport layer are in charge to ensure an error-free transmission of binary data contents. Rather than trying to repair a corrupted file, nowadays the obvious alternative is to just retry another download or e-mail delivery.
The OWL Transmission consists of three contiguous RST sequences that are cryptographically dependent on each other as the PRNG will never be resetted in the same Bootloader session. Intermediate VIs tell the Bootloader that the current logical sequence was successfully finished with no error and the next sequence could start (= control information). Finally, the very last VI constitutes for a cryptographic over-all checksum on the Transmission at-whole.
With OWL-RST, the PRNG is the Software implementation of a classic LFSR of 128 bits with standard polynomial for a Maximum Length Sequence, filtered by the so-called "Self-Shrinking" function. The latter provides for a significant hardening of the LFSR sequence under cryptographic aspects and introduces high degree of non-linearity in connection with an adequate key feedback mechanism. With OWL-RST, all arithmetic operations are optimised for 8-bit processor architecture. For further details see Firmware.
Since RST is no Feistel scheme, the pseudo-random vectors that control cryptographic transpositions and substitutions must be applied in reverse order to get the inverse of encryption or decryption, respectively. So there is a need to buffer the pseudo random sequence for a complete block round at least on one side, while the counterpart may use those random vectors directly out of the PRNG. Of course, with this unidirectional crypto application, the memory- and code-saving variant has been shifted to the microcontroller.
The fact that there are no lookup tables and complicated transformations involved, allows for memory-optimised and efficient implementation on microcontrollers. Even with the least number of rounds, the RST ciphertext presents good balancing and diffusion properties. Plaintext-to-key-feedback in RST enables to use secret IVs and causes massive error propagation, which is the foundation of its over-all error-detection and message-authentication abilities.
But there is one notable disadvantage with this concept: PRNG must be kept running all the way. Cryptographically strong PRNGs consume significant computation time. This is the main reason why, on the PC, an experimental implementation of "RST128" has performed 3 to 5 times slower in a benchmark with similar implementation of "AES".
Personal conclusions: In many crypto applications, data throughput is not the most important criterion. Regarding AVR Bootloaders, the bottleneck would always be physical writes of programming data and the serial transmission itself. The OWL-RST cipher provides reasonably safe encryption with good statistic properties and a very small memory footprint while essential features of error detection and authentication are "all-inclusive".
Criticism on this "in-house" solution is legitimate, further analysis desirable. I would like to point out that the most critical component with OWL-RST, the pseudo-random generator, has been chosen from well-understood technologies. Even from today's perspective the 128-bit MLS-LFSR with Self-Shrinking constitutes for a cryptographically strong PRNG.
Further, it is assumed, that downstream blockcipher and the IV-VI-mechanism do not give rise to any vulnerabilities but rather impose more complications to key extraction and sequence analysis. I am open to any qualified arguments and suggestions on that topic!
Top | Index
Features of the OWL Bootloader firmware:
Invoke: Usually, a Bootloader is to be invoked via Hardware Reset. Calling the Bootloader from a Reset event is the best method from a technical and legal perspective, since it provides clear separation between the spheres of Bootloader and Application Firmware. Hardware Reset may be triggered by the rising edge on the RESET pin of the Controller, but also by Power-On-, Brown-Out- or Watchdog-Event.
OWL FW 2020: New OWL FW provides option to "filter" Reset sources, that is, circumventing Bootloader invoke completely for one or some reset events. See "Reset Sources Filter" on further details.
OWL FW 2023: The OWL FW now provides additional option circumvent Reset sources' filtering in order to allow controlled hand-over from an Application to the Bootloader while still using the regular Bootstart address. For more details, scroll down to paragraph "MCUSR-Null-Startup".
Initialisation: Bootloader is the very first program to start after any hardware reset. It initialises the stackpointer and all registers, ports, and memory that it uses by itself. It is not the business of a Bootloader firmware on a microcontroller to initialise all SRAM and I/O. This is the clear responsibility of Reset routine of the Application respectively.
Synchro-Autobauding: Bootloader waits for a level change on the assigned RX-portline until Timeout. First High-Low-transition is always discarded to allow the signal to stabilise. Subsequent level changes are measured by the Synchro-Autobaud method described before. If the incoming signal is actually an OWL Preamble of sufficient quality, synchronisation will succeed and the Bootloader has gained precise timing reference for software-based serial data reception to follow. Yet the Synchro-Autobaud cycle is repeated before each individual block of data; this makes OWL Transmissions pretty immune to a drifting of clock frequencies at both sides, transmitting and receiving end.
Block data reception: After successful Synchro-Autobauding, incoming serial characters are decoded by "Software-UAR(T)". Bootloader will discard further Preamble characters, until the Blockstart indicator is received. The 16 bytes following this Blockstarter are the encrypted data of interest. It is buffered in SRAM for further processing, i.e. decryption.
Decrypt block data: Bootloader gets to ignore the beginning of the next Preamble, because now it has to decrypt current buffer contents. With least number of rounds, the block cipher will fetch 64 of pseudo random vectors from the PRNG for decryption. Decrypted data is available then in the same 16 locations of the SRAM buffer.
Immediately after decryption, block data is being XORed to PRNG state registers (R0-R15). Entropy of an IV-block or payload data will therefore modify all subsequent decryption.
If the just decrypted block is the very first block of an RST sequence, it is copied to a second SRAM buffer as the "VI".
All blocks subsequently decrypted are then compared to that VI (see annotations on Cryptosystem). As long as the current block was not equal to the VI, the Bootloader is assuming that the respective block was regular write data and it will process this data accordingly.
If the program finds, that the current block is identical to the VI, it knows that all previous blocks have been processed error-free and the current sequence has finished with success. Then the Bootloader can procede to the next sequence, or it has finished the whole session successfully.
However, if any error has occured, the VI is never recognized. After the Transmission has ended and Timeout elapsed, the Bootloader will have to auto-erase Flash data that has already been touched, and otherwise fall into a blocking state, thus giving indirect feedback to the user on the failure of this Transmission (see 'General error handling').
Programming: Entire block decryption routine is about 50 machine operations including the PRNG. Block comparison plus XOR feedback was combined to a single loop structure, comprising of only 12 opcodes. (Try this in a high-level kindergarten ... )
Authentication sequence (S1): Bootloader must be sure that an incoming Transmission was actually encrypted with the secret key of the Bootloader and no other key. Faulty Transmissions or Transmissions that were generated with a wrong key must be rejected before any write access to EEPROM or Flash memory could be allowed. A safe mechanism of Authentication towards the Bootloader is provided by sending an initial RST sequence of fixed length that will not contain payload data, but by way of the IV-VI-mechanism, enables the Bootloader to safely check that the right key was in. From the match of IV=VI in this initial RST-sequence the Bootloader knows with utmost certainty, that the Sender has actually used the correct key for encryption and it can progress to the next sequence, which is EEPROM sequence S2.
In all other cases, the Transmission must have been faulty or the Sender was using a mismatching key. Then the Bootloader will fall into a blocking state, which can only be overcome by hardware reset.
This consequent "blockade" behaviour allows operating multiple Bootloaders with different keys and different technical requirements on a common programming line ("one-way-bus"), making sure that per Reset cycle only the one Bootloader that has actually been addressed will follow that Transmission, while all the non-addressed Bootloaders on the same bus will safely turn into the blockade state. That means, all Bootloaders not been addressed by the current crypto Transmission stay passive on the bus and no write access and no uncontrollable start of an Application firmware will happen on the respective controllers.
EEPROM sequence (S2): Bootloader decrypts and copies the IV of the EEPROM sequence. If there are data blocks following the IV, data is written in portions of 16 bytes to the EEPROM memory using Atomic Write Mode. If the EEPROM sequence did not contain data (i.e. IV immediately followed by VI), no single EEPROM location is overwritten.
Note 1: On EEPROM, no erase cycle is needed and data can simply be overwritten. EEPROM contents on higher address range is NOT erased by an EEPROM sequence does simply not touch these locations. This behaviour can be utilized to perform sort of incremental EEPROM updates.
Note 2: To explicitly "delete" entire EEPROM, the Sender has to transmit an EEPROM sequence that will overwrite all EEPROM locations.
Note 3: EEPROM writes do not require any precautions against address overflow. In a well-formed Transmission, of course there is no more EEPROM data sent than the Target controller can take. An address overflow in EEPROM sequence could only happen due to a corrupted Transmission. If this is the case, the VI of EEPROM sequence would never be detected and the Bootloader will stick to the EEPROM write mode and eventually overwrite parts or whole of the EEPROM with garble of data. However, this does not impose an operational risk to the Target device, since EEPROM memory has ten times more write durability than Flash and the EEPROM does not hold executable code. That is why the EEPROM sequence was placed before Flash sequence. It serves as a 'crumple zone' before more critical Flash sequence.
When Timeout occurs in EEPROM sequence, the Bootloader will fall into a blocking state, giving indirect feedback that something went wrong.
If the concluding VI of EEPROM sequence has been detected, the EEPROM sequence S2 is successfully finished and the Bootloader passes over to the final Flash sequence S3.
Flash sequence (S3): Bootloader decrypts and copies the IV of the Flash sequence. If the block directly following the IV is already the VI, i.e. no data blocks in between, the Bootloader knows that there is no Flash at all to be erased and overwritten. Flash will of course be left untouched and the Bootloader session was finished successfully.
If at least one data block was following the IV, the Bootloader must erase the Application Flash before any Flash pages are to be overwritten with that firmware data. As the Flash Erase cycle will take considerable time, the Sender has to calculate a matching Preamble. Flash Erase is performed top-to-bottom for safety reasons on ATtinys. For Flash writes, the Bootloader will buffer incoming Flash data to the Flash write buffer, then trigger Page Write to the current Flash page.
Based on 16-byte-units, the Flash write routine is quite future-proof. It could work up to a pagesize of 4096 bytes... (largest Flash Pages currently seen on ATmegas is 256 bytes.)
An explicit verification of write data seems completely dispensable with OWL. Many years of experience and feedback on 'TSB' have shown that faulty Flash writes did never occur, as soon as the data has made it up to the write buffer and operating conditions were stable enough at least for the duration of the actual Flash write operation. Requirements aren't that hard and any errors on the physical/transport layer would be safely detected by the OWL crypto.
When the last VI from S3 was detected, all data that was just written is supposed to be error-free, and the Bootloader session was successfull at all.
However, if no VI has been detected, and the Bootloader timed-out, it will go into the blocking state, giving the indirect feedback that something has gone terribly wrong with this session ... Should a faulty Flash session have already touched Flash memory, the Bootloader will trigger an emergency erase of all Flash (on ATtiny) to remove executable code that may be corrupted.
Transmission successfull, hand-over to Application firmware: After successful completion of the third sequence, the Bootloader will almost immediately pass to the Application firmware that might have been updated or left untouched.
General error handling: Failure could happen even before any data has been transmitted. In certain minimalistic hardware designs, a cold start (device just plugged in, power to be hard-switched) could cause an extended period of low or undefined voltage level on peripherals of the controller, while Brown-Out Detector has already released programme execution as from the controller's perspective, supply voltage has sufficiently stabilised. If a Bootloader starts listening on a portline for incoming signal, it may become confused and possibly "crash" as a result of those invalid/insufficient conditions. (As occasionally seen with TSB in certain USB devices.)
Start-up behaviour of the One-Way-Loader is quite favourable in this respect. A bit of "garble" on the line, found immediately after the reset event, is ignored and the Bootloader would simply time-out and hand-over to the Application. Starting the Application firmware has priority over catching any Transmission.
Only after the initial Authentication sequence was successfully verified, the Bootloader will assume that this is indeed a valid Transmission. Either everything was complete and error-free and within Timeout limits, then the Application will be started immediately after Transmission has ended; OR there had been one or more errors or Transmission timed-out, then the Bootloader will go into the blocking state and the user will get an indirect but clear feedback on the failure of this Transmission.
Timeout timing: Bootloader's Timeout in real-world-scenarios should allow for delays of a few 1/100 seconds up to some seconds. The AVR Controllers could work in a wide range of clock rates from 128 kHz up to 25 MHz roughly. To cover this with only one "Timeout byte", it is necessary to calculate an individual prescaler factor, considering actual clock frequency of the controller in order to provide a calibrated timeout subroutine with base unit of 1/100 second. Respective timing factors are then coded directly into the OWL Firmware for that certain Target device in the course of firmware-make. By this, a Timeout byte of "100" will always give a timeout of 1 second, a Timeout byte of only "1" will give 0.01 seconds and the value "255" will result in the largest Timeout of about 2.55 seconds.
PRNG (key generator): PRNG is loaded with the secret key at Bootloader start-up. The internal state of the PRNG will then change with every single transposition-vector being requested and it is also modified by the plaintext of each decrypted block (see diagram on key feedback).
In here, the PRNG is the software implementation of a classic 128-bit LFSR with feedback taps on the bit positions 128, 127, 126 and 121 (Galois-XOR). The output bit sequence goes through the so-called Self-Shrinking Filter. In fact, this constitutes for a cryptographically strong PRNG, actually a *CS*PRNG. Although the decimated SSG LFSR will consume about 3 times more shifting cycles compared to plain LFSR, this is not too much of a disadvantage in the Bootloader Application, since other factors restrict data throughput to a greater extent.
Last but not least, assembler code of this SSG has been vastly optimised. Those LFSR-typical bitshifts are primarily carried out only on two working registers (instead of 16), and just with each 8th bit-shifting, the remaining 14 registers are directly byte-shifted, which saves plenty of clock cycles.
This variant of SSG-LFSR is indeed one of the most cycle- and code-efficient PRNG implementations available for 8-bit MCUs.
Port limits: Present OWL Firmware can make use of all I/O ports directly accessible by assembly instructions cbi, sbi, sbic, sbis. Some rather exotic devices also feature a "PORTG" or "PORTH" whoose I/O addresses are located in higher I/O memory. These can only be addressed by SRAM-I/O commands, whoose opcodes have slower timing and consume more Flash. Therefore, it is not planned to support these extraordinary ports from OWL Firmware.
Code flexibility: Compared to TSB, the Assembly source of the OWL Firmware is more tightened and clearly structured. It also depends on less conditional assembly, since the OWL Transmission's logical format is basically the same for all devices and timing specialties are covered by the tailored Preambles in Transmission layer. Finally, the One-Way-Loader does not waste a whole Flash page for user data. All Flash, minus 512 bytes for the Bootloader, is available for Application firmware.
Portability: Currently the Assembler source for the OWL Firmware is able to adapt to over 120 ATtinys and ATmegas. Latest extensions cover devices over 64k (see next paragraph). Most of the supported devices have been running successfully with TSB in the past. There is great chance that the wheel does not have to be re-invented regarding some chip's specialties.
Finally! ATmega128x and ATmega256x: OWL Firmware/Software supports the big ones. It keeps up with linear addressing, as this is supposedly the safest and most reliable method of Flash programming in a one-way setup. So, it may take a several minutes to transmit full 128 or even 256 Kilobytes of data into the Controller and patience will be awarded...
But we could apply modified Flash-Erase policy on the ATmegas to make things more comfortable. On the ATtinys that do not support protected Bootloader Section, it was advisable to pre-erase whole Application Flash memory before new contents could be written, and even more important, to immediately erase Flash memory in case of an unrecoverable error in the Flash-related part of a Bootloader session. Otherwise defective code in Flash could have been executed and possibly destroy Bootloader code or lock-out the device from further Bootloader access. On ATmegas, the Bootloader won't be affected by corrupted Application code; no compulsive necessity for a "clean" AppFlash!
Which led us to this variant of Flash programming that also provides additional flexibility: Only those Flash Pages that are actually meant to be overwritten will be erased right before the respective Flash-Write occurs. This gives us an option to reserve higher address space for parts of an Application that will never or seldomly change (e.g. libraries, tables). Application firmware updates will start from $00000 as usual. It may be updated more frequently by way of the Bootloader, but as long as the Flash programming sequence S3 does not reach higher address range, any code that is residing up there will be left untouched.
With comparably high clock frequencies, it is of course possible to speed-up the whole thing, especially since the bigger ATmegas usually have large pagesize (up to 256 bytes), so that the delay caused by physical Page-Erase and Page-Write operations is comparably small.
Watchdog support: As of 02/2019 the OWL Firmware is compatible to Applications that make intensive use of the AVR Watchdog-Reset feature. WD-Resets are intended to restart an Application Firmware after expiration of the WD timer interval, even if such Application has been stuck somewhere in an endless loop and other escape strategies have failed. In a Bootloader setup, of course the Bootloader's reset routine would be started before an Application's reset routine could handle this condition. The Bootloader must therefore consider WD status, since otherwise it may be restarted over and over due to repeated WD-timer-events, resulting in a 'bricked' Application. Yer olde TinySafeBoot simply forwarded to the Application, when detecting WD reset condition. This occured to be a safe method, yet it rendered the Bootloader next to useless in WD scenarios with Resets permanently activated from WDTON fuse. One-Way-Loader finally faces the problem: It tries to turn off the WD first of all. Just in case that this don't work, OWL reconfigures the WD-Prescaler to a maximum (2 or 8 seconds, depends) and integrates regular watchdog timer resets in its program flow (opcode wdr). With enlarged Prescaler interval the WD can never interfere with an OWL session. Once the Bootloader has returned control to the Application firmware, it is the responsibility of the Application's Reset/Init routines to properly reconfigure the Watchdog. Therefore OWL will leave the MCUSR register unaltered, enabling the Application firmware to check Reset conditions and act accordingly. Also the general error condition (Blockade) is secured against WD-Resets. This is to ensure continued compatibility of the new WD-friendly Firmware with hardware setups that operate multiple OWL-featured AVR controllers on the same serial programming line.
Reset Sources filter (since 08/2020): Now the Bootloader can act upon different MCU reset conditions, namely 'Watchdog', 'Brown Out', 'External' and 'Power-on'. Depending on which event has actually triggered the Controller's reset, OWL will either activate (do its Bootloader job and listen to RX until Timeout) or swiftly forward to start the Application (without notable delay).
This feature is supposed to improve compatibility in special setups and/or to Applications that rely heavily on a precise reset timing. It could also help with hardware environments that would occasionally issue reset pulses, to which the bootloader should never react.
Technically, the Firmware reads MCU Status Register and filters for 'reset source' flags WDRF|BORF|EXTRF|PORF by way of simple logical AND and branch instruction.
So, in Firmware Make Mode, the user has option to define the respective bitmask by way of a new argument '--resets=' or '-rs='. For example: '-rs=1110' means 'Watchdog', 'BrownOut' and 'External' resets are allowed to start the Bootloader, while 'PowerOn' resets would immediately forward to the Application. This can make the Bootloader nearly "invisible" regarding certain Reset events. Alternative syntax uses letters W, B, E, P (in arbitrary order) to represent allowable reset sources. Both notations are printed in Target Data listing for clarity. Default setting is, of course, NO Reset Source filter applied (equal to '-rs=1111', allowing all reset sources to activate the Bootloader, which is then a 100% compatible to reset behaviour of older OWL FW.
Please be aware that this new RS feature does not configure anything regarding the actual Reset behaviour of the Controller. It just configures how the Bootloader reacts on certain Reset Sources. In particular, the new OWL Firmware does not modify MCUSR. It remains sole responsibility of an Application Firmware to evaluate reset conditions and clear certain reset flags, when appropriate. Refer to the datasheets and other technical advice on AVR reset mechanisms.
MCUSR-Null-Startup (new 04/2023): For a smooth hand-over from an Application firmware to the Bootloader, sort of a standardised procedure would be desirable. It should use the regular BOOTSTART address, which can be seen as a version-independent constant (since OWL does not provide a jumptable for self-evident reasons).
Solution: The condition of MCUSR = 0 is recognised before RS-bits are checked. The new OWL Firmware will instantly forward to the Bootloader (ie. wait for signal until Timeout), if this exceptional condition applies. This enables for a safe start of OWL from an Application which has cleared MCUSR and jumps to BOOTSTART. Since the new feature does not interfere with OWL's behaviour regarding "natural" Reset conditions, these can still be differentiated by Reset Sources filter, as described above. However, the added opcodes will cost 4 bytes more. The feature fits into most ATmegas and ATtinys, but not all. Have a look at changelog.txt to see the list of currently unsupported devices. It has to be pointed out, that apart from that, all devices have full functionality of current OWL Firmware.
Compatibility Requirements, Precautions, Good Practice:
Top | Index
PhilosophyMy hardware/software projects rely on administrative structures as simple as possible, maximum transparency and minimum dependencies. Call me crazy, but I am still convinced, that technology should serve mankind and digital enslavement not being our destiny. Those who use technology self-confidential with competence, those who know the difference between mutual benefits and exploitation, and those who can forego useless stuff, will keep their freedom, have more fun with technology and can effectively protect his/her private life and business secrets.
Now the OWL Software has reached sort of maturity and features wide range of functionality. Therefore, cryptic command strings, illogical pell-mell case-sensitive syntax, weird dependencies, brainf***in' semantics are inevitable ...
No, no, just kiddin'...! The OWL Software introduces a human-friendly command-line parser and meaningful screen messages. For each commandline option, there is long and short notations available, and both of them are quite memorable. Besides, no case sensitive shit. My parser doesn't even care about the order of arguments (latin semantics). All that the user should remember is the name of the options and additional parameters needed to perform the desired task. Context-based on-screen-help will give advice regarding certain options. Who ever has done anything at the commandline, shall be able to use this OWL Software intuitively. I would even dare to say: Unlike certain 'dudes', this OWL commandline tool is ready-to-use without GUI frontend!
NOTE: These folders are used by default, when no other path was specified in file references. With newer versions of the OWL Software tool (2020+), the user is free to specify other locations in the context of many functions!
Bootloader Templates: ./templates
Make single BootloaderWhen the Software is submitted with a valid AVR device name, it retrieves the corresponding OWL Firmware from Templates folder. Based on this machine code, it generates a customised version of the firmware according to the port assignments and other specifications found in the commandline. Finally, the Hexfile is extended by some meta info and then saved as a Target file into the folder ./targets.
owl --device=tn2313 --rxport=d0 --txport=d1 --clock=4000 --targetname=Testloader
will produce a single new OWL in the targets folder for the ATtiny2313 that uses PORTD0 for RX-input and PORTD1 for TX-signalling and featuring 4 MHz of clock frequency with naming Testloader00.hex.
Make series of BootloadersJust add the option "number" to the respective commandline for Target Make Mode. Software will then generate the specified number of Bootloaders for same hardware configuration, with totally different crypto keys and systematic numbering in the filenames.
owl --device=tn2313 --rxport=d0 --txport=d1 --targetname=Testloader --number=10
This will produce 10 Targets named:
These Bootloaders all feature same technical parameters but individual cryptographic keys. They may be installed onto 10 Target devices all over the world, armored by Lockbits and/or physical means. As long as the respective Target files are kept safe and secret at our site, no one else but us can produce valid Transmissions to the respective Target devices. Customers may only know the Target filename of their device, so they can obtain authorised firmware updates from the provider's download site or by email, by indicating that serial number featuring filename. Serial numbering has no correlation whatsoever to the cryptographic key of a Bootloader, since all keys are normally derived from a random process.
Make TransmissionSoftware expects valid target name as a reference to the Bootloader for which to generate a valid OWL Transmission. Provided that the respective Target file is found, the program will know its crypto key and all meta info which is needed to calculate an encrypted Transmission with correct timing for this particular device.
Hexfile with write data intended for Flash and/or EEPROM memory of the Target device should be specified.
The serialport argument is only needed if the Transmission shall be sent-out "live", i.e. immediately after the command has been fired. With no serial port specified, the Transmission will go directly into a binary Transmission file (the "OWL Transmission" with extension ".owl") in the folder ./transmissions.
With the new Software, the Transmission is directly written into a file in the order of concatenated crypto sequences as follows:
The OWL Transmission can be forwarded directly to a serial port. It's image file (.owl) preserves all encrypted data and timing information that is needed to replay same Transmission later on. This is the ideal format for encrypted distribution of Firmware Updates.
Single TransmissionWith a valid serial port specified in Transmission Mode, the Software will send out OWL data stream through referenced interface, using the default baudrate as specified in the Target file.
owl --targetname=bootloader_m8 --flashfile=program.hex --serialport=COM2
If no serial port was specified, the respective .owl file, which is normally saved to the folder transmissions, can be sent later to a serial port, or sent via network to a different location. Automatic naming scheme will compose of an ISO timestamp and the original Target name.
Serial TransmissionIt is possible to address multiple targets in Transmission Mode. If Targets were systematically named or numbered, we can specify their namespace by wildcards, i.e. "?" or "*". Software will find all Target files matching the referenced pattern and automatically generate individual Transmission for each one of them.
owl --targetname=Bootloader0? --flashfile=program.hex
This would capture all Bootloaders matching the search pattern, i.e. "Bootloader00" to "Bootloader09", and make custom Transmissions for each single Target including the Flash firmware update of "program.hex". Respective Transmission files are consequently saved with systematic naming, derived from the original Target names, into the folder ./transmissions for distribution.
Audio-ExportThe OWL Transmission, consisting of Preamble characters and crypted data blocks, is already a quite balanced bitstream. It recommends for trying out some crazily simple, DC-free or even "floating" transmission methods. Soon the idea came up to abuse the PC soundcard as an alternative serial data output!
To keep it short, the option --audioexport has been developed and refined exactly with this intention; transcoding of serial data to a valid PCM file (naming extension .wav), compatible to almost any multimedia-capable platform.
When this OWL Audio is played back over high level, low impedance outputs, the voltage swing is often sufficient to directly drive red/infrared LEDs or Optocouplers. Decoding of this differential signal is possible with very few electronic components. Since the differential audio encoding applies further layer of signal balancing by itself, it could be of interest for the transmission of non-balanced binary data. That's why --audioexport option is no longer restricted to .owl source files. Find more technical explanation below!
Random keysAssuming that we can make random keys of 128 bits, it is VERY unlikely that two identical keys would ever conflict in the same universe! This improbability drive allows to generate unique device keys locally and use them in a worldwide dimension without the need to check these keys against some central database. This means more freedom and self-determination for users.
Apart from that, random keys will solve most of the problems that we've had before in conjunction with password schemes à la TSB. For example, in a hardware setup with multiple Bootloaders hooked to one common programming line, no conflicts nor chicken-egg problems are to be expected anymore, since every OWL Bootloader will have its individual 128 bits address right from the beginning. Yet, on the access layer, those keys are conveniently linked to memorable Target names. For the authorised user, access to all his Bootloaders is absolutely transparent.
Downside: The generation of good random keys is not that trivial. Oh, we've had that topic before ...
However, when it comes to serial generation of OWL-Bootloaders, there is lots of good random bytes being requested in a small timeframe. It would be reasonable then to have some 'stock of entropy' at hand that won't get exhausted prematurely.
Therefore, the OWL Software creates and makes use of an own Random Pool file named randpool.bin, which is located in the executable's home directory. In the current version, the RP is fixed to a 512 bytes of random data. It is refreshed from live-entropy (system timer, system random devices) at least once by every program invoke.
In view of a projected mass-generation of Bootloaders (say, more than 100s of keys in a rush), the file randpool.bin could additionally be refreshed by means of an external True Random Number Generator (such as the XR232USB). [For which a new Software tool is also planned.]
In fact, on a Unix/Linux desktop, there is no desperate need for separate random-pooling, as we got mighty kernel services to provide random data, in particular by way of the very convenient virtual device drivers /dev/random resp. /dev/urandom. IMHO a pretty good and independent solution with built-in entropy-estimation and quality-assurance regarding deliverable randomness. Therefore, all OWL SW since 2019 uses /dev/urandom as the primary random source on a Linux machine.
In lack of any such trustable API under 'Windows', the OWL SW still offers an option --randpool to start some separate entropy-collection, which is based mainly on CPU-load fluctuations on the running system and could be seen as an attempt to perform a minimised random-pooling similar to the Linux kernal. In fact, the refined method, as with new versions of OWL SW 2020xxxx, provably does a good job at it. Please refer to the source code in the owlrst-module for further details. Further testing is in progress and feedback strongly appreciated!
BackupsThe Software does not make automatic copies. Backups of a complete OWL folder may be easily prepared by means of standard tools, i.e. simply copy the whole contents of the owl-folder including subfolders to an external media. In particular, the Target files, normally located in folder ./targets should be saved on a regular basis.
SecurityThe Software does not overwrite Target files that already exist.
To prevent accidental erasure of Target files by other applications, crucial Target files may be protected by a read-only flag, if applicable. Of course this is no replacement for a Backup regime!
It is assumed that any person having physical access to the machine with the OWL Software and keys, is authorised to do so. Consequently, the OWL Software does not provide means of an additional access control. For example, a master password scheme has been tested and found quite cumbersome.
The mature user is always aware of security implications. For example, computers or network accounts that hold or process personal or confidential data must not be accessible by unauthorised persons, that's the baseline. Being or feeling exposed to threats of espionage or sabotage, we will positively implement some "secured environment".
Crypto-TestingThe OWL Software offers some testing functionality regarding the crypto layer.
Key generator (PRNG): owl --key=Hexstring
Virtual machinesVarious constellation with VirtualBox 5.XX have been tested for fun, and it actually was fun, since there were no problems at all with the OWL Software, compiled for the respective Guest machine to run under different Host systems, as soon as the Guest has been granted access to a serial interface on the Host. Most combinations of WinXP/Win7/Ubuntu14/Debian8 ran smoothly. In general, we should not expect best performance in a VM, especially regarding screen output and interface connections! In all setups, the additional abstraction layer caused significant lag on the RS232 transactions. Data sent out was often "stuttering". However, no single character got lost and ALL Transmissions could be decrypted with no problem. "Emergency operation" of the OWL Software in a VM under a different operating system - check!
Additional notes on VMs: REFRAIN FROM UPGRADING TO VIRTUALBOX 6.xx! Under recent Linux, I have experienced what many people have already reported on the web: Windows Guest systems repeatedly crash or freeze when trying to access serial ports of the host system. In fact, this has never been an issue with VirtualBox 5 or lower.
While VM is running that was assigned one or more serial ports, those ports may be BLOCKED on the Host system and/or undefined behaviour could occur when software yet tries to access those ports.
To-Do's and BugfixesSurely the new version of the Software tool will be improved, debugged and optimized even more. Do not hesitate to send me your ideas, reports and criticism! You might also have a look at the changelog.txt which is part of the download package and gives explanation on many programming why's and how's. Anyway: Thanks for your ongoing feedback!
Top | Index
|Simplest unidirectional RS232-Interface for the One-Way-Loader. Note: Inverted signal logic||
|Simple & Safe One-Way-Interface:
TXD drives an ordinary 4-DIP Optocoupler.
Air gap TXD via LED.
|Air gap TXD using FT232
|Simple & safe audio interface for differentially encoded stereo transmission
("eXtended Tiny Encryption Algorithm")
(simple stream cipher)
|Plain graphics data, unencrypted:
Bitmap 200x200 px, 8 bits greyscale, 40 kB.
(which is 2500 blocks of 16 bytes!)
|Encrypted in stupid ECB mode which uses the same keyset over and over. Patterns of plaintext remain visible.
(This IS bad!)
|Encrypted with continuously running key generator (RST stream cipher). Random result of smooth statistics.
|Decrypted with plaintext-key-feedback (RST). Only one bit was flipped:
All subsequent data and final checksum corrupted.
Error or attack safely detected!
AVR controller that is supported by the OWL
Hardware-Software environment for AVR projects, tool for programming AVRs by way of ISP
Hello-World-program for the respective Controller, e.g. simple LED-flasher, in Intel Hex format (standard)
Not being totally clueless at the command line
Respect, but no fear of Fusebits
Workable RS232 interface (or virtual COM adapter for USB)
( COMx, /dev/ttySx, /dev/ttyUSBx )
OWL download for Linux/Windows
2. Install OWL Software on the PC
Actually, there is nothing to "install". Just unpack the download to the desired location in userspace. Open command prompt and change to the 'owl' directory.
3. Make tailored OWL FirmwareSample setup: ATmega8 in a typical RS232 set-up (MAX232, FT232) and 8 MHz external chrystal
The controller is being connected via MAX232 (or FT232) to the RS232 or USB of the host computer. Such appliances will most likely use the UART component of the controller with their regular firmware, thus being determined to PD0/PD1 for RXD/TXD.
Serial communications should have been tested in this setup prior to the Bootloader install. Also it is assumed, that there exists some LED on PB2 which will give us an optical feedback. The sample firmware ledblink_m8.hex will use that port. You may test it once without Bootloader to verify that it basically works.
Now we make an authorised OWL Bootloader for this Hardware. OWL commandline is:
owl --device=m8 --rxport=d0 --clock=8000 --baud=9600 --targetname=testowl_m8
That's all of the "ultra complicated" process of making a customised Bootloader with unique crypto key. You will find the new Firmware file under: ./targets/testowl_m8.hex
4. Installation of the OWL Firmware (Bootloader)Now start your preferred ISP-programming software (like "avrdudess", "extreme burner", "TwinAVR") in order to transfer this freshly created Bootloader testowl_m8.hex into the Target chip. Flash should be fully erased before doing so.
And we have to set some Fuses different to the factory defaults. Most important prerequisite for an AVR Bootloader is to enable Flash memory writes (SPM enabled), which is implicit on ATmegas (while on the ATtinys, a certain Fusebit SELFPRGEN must be set-up). In fact, whenever SPM's allowed, it is always recommended to also have the Brown-Out-Detector (BODEN) activated and appropriate voltage level defined. This is to prevent Flash corruption from unsound coldstart conditions. Relevant Fuses for said target device using ATmega8 with external 8-MHz-crystal to start with:
BOOTSZ=10; BODEN=0; BODLEVEL=0; BOOTRST=0; CKSEL=1100; SUT=00
Byte-values: Ext: $FF Low: $8D High: $EC
Note: Fusebit-Calculator makes this a lot easier.
Now we have a workable OWL on this Target device.
5. Transmit your first Transmission
For convenience, our testing firmware ledblink_m8.hex is located right in the owl folder. In this example, the Target is being connected via COM2 (Linux: /dev/ttyS1) to the computer. Timing is not that critical with default Timeout of 1 second on the bootloader and about 1 second of Introductory Preamble on the Transmission. Therefore, you can FIRST fire up the commandline, THEN reset the Target controller:
Some info commandsShort reference on all commandline options:
Full reference on all commandline options:
Detailed reference on submitted options in the Help context (here: 'flasherase', 'timeout' and 'serialport'):
owl --help --flasherase --timeout --serialport
List all devices that are currently supported with firmware templates:
Watch technical data on a certain device:
Watch master data on authorised Bootloaders (example):