One-Way-Loader - compact crypto bootloader for AVRs


Let me present an exceptional AVR bootloader compatible to most of the TinyAVR and MegaAVR series. Unlike all the others, it makes a consequent use of unidirectional data transmission. It integrates well into conventional RS232, RS485 or USB-RS232 setups, but also allows for minimalist and/or custom interfaces that use only one input line. Transmission is technically and cryptographically secured. Programming content for EEPROM and Flash may be bundled to one session and exported to a secured firmware distribution.


Introduction | Idea and concept | Technical details | Quickstart | Remarks | License | Download | Links | Index

OWL web draft cc0
  • Bootloader covers most of the ATtiny and ATmega series; small footprint below 512 bytes.

  • Extremely robust unidirectional data transfer; lots of interface options from RS232, RS485 to One-Way-Wire, Optical link and Audio.

  • 128-bit block cipher provides for data security, fault detection and unique device identifiers.

  • Software generates single or series of customized bootloaders out-of-the-box; manages target devices and keys.

  • Encrypted firmware-updates for EEPROM and Flash in one pass; distribution format for single or series of bootloaders.

  • Comprehensive and universal solution with good spirit!


Introduction

Definition: In the world of microcontrollers, a bootloader is a piece of firmware, residing on a controller, that enables regular firmware updates by way of a common standard interface (e.g. RS232), rather than platform-specific programming adapters like ISP or JTAG.

Security becomes a vital issue these days... A simple password-protected bootloader could restrict access to a legitimate group of users. Data content is quite safe within the controller, since read-out of data may be effectively blocked on AVRs by setting the respective Lockbits. In the outside world, firmware code may be available in an unencrypted form; vulnerable and prone to interception, compromise, manipulation or transmission errors. This may be less of a problem in local setups. On the other hand, in the context of embedded applications to be updated in a security-sensitive environment and/or being subject to in-field-updates, we may rather need some idiot-proof and encrypted method of firmware update.

A crypto bootloader provides comprehensive firmware protection. Crypto bootloaders will only accept encrypted transmissions and decrypt data just before writing them into the respective memory segments within the controller. In the best case, unencrypted firmware contents will not occur anywhere, but in the secured environment of the developer/publisher and, of course, in one or more target devices. Most crypto bootloaders use symmetric ciphers, mainly for two reasons; symmetric crypto could be implemented even on ressource limited devices and features clearly defined access rights. Doing it right, the crypto bootloader is a key element of integrity assurance, anti-tampering and/or other legitimate interests regarding firmware updates and distribution.

That's how it should be. Naturally bootloaders represent elevated demands on code-effectiveness, independency, reliability and transparency. The usual "high-level" approach can help producing garden applications out of the hat, but this is not the best you can get. When it comes to make a really compact and reliable (bootloader) firmware for a microcontroller, direct Assembler programming is the silver bullet. More people should have followed the example of TinySafeBoot. Years ago, this slowburning little project has demonstrated the benefits of a straightforward, resource-saving bootloader for 8-bit-AVRs, that was not only pretty compact and portable at the firmware side, but also provided a platform independent software utility that could make customized bootloaders out-of-the-box with no dependencies from any specific toolchain.

The ongoing feedback on TSB agrees with me: Many AVR users, still having their own thoughts and ideas, appreciate independent and transparent solutions. A simple and strong crypto bootloader has been on my wish list, too... Having seen enough from all these "inspiring" examples of so-called security bootloaders, finally started this little project here...

Top | Index


Idea and concept

The "One-Way-Loader" incarnates the following ideas and concepts:
  1. One-way transmission with added value
    Usually, the bootloader on a microcontroller receives data from a computer and writes them into certain memory areas of its host. Transmissions in the reverse direction, i.e. bootloader reads data from memory and sends this back to a computer, is rarely seen in usual bootloader setups. Virtually, such read-back features are not needed at all, not even in a developer's environment, since we will most likely use ISP/JTAG there. And, just a reminder, the thing is called bootloader, not bootsaver... To conclude: Pragmatists and end users will just be fine with a write-only-bootloader.
    Data will flow just one direction from the computer down to the controller then. But why do most bootloaders still demand for a full-fledged bidirectional serial interfaces? This is because their bulky protocols want to send back confirmation characters or checksums every now and then.
    The requirement for a feedback channel means hardware dependencies and restricts versatility of a bootloader. Many standalone appliances do not consider RS232/RS485 connectivity, and its hard enough to provide a pair of portlines even for temporary RS232-TTL breakout on some ATtinys. Well, there IS some tiny bootloaders available that offer "one-wire" interfacing... halving port requirements at one stroke. But this one wire has to transmit data in both directions then, and it demands for a special adapter circuit (CI-V-interface) on the computer's side. Unfortunately, the bidirectionality of this one line makes galvanic isolation pretty more complicated.
    Ah, if we could only do without that feedback channel... Such simplified interface could be assigned to any single portline and fed directly with an unidirectional serial data stream. From One-Wire to One-Way! Then this imaginary "one-way-transmission" would be a continuous serial data signal, supposedly with improved technical robustness. Its minimized hardware requirements enables plenty of options for minimised and/or custom bootloader interfaces, far beyond conventional RS232. In all this, the "one-way-bootloader" could be easily held compatible to an existing two-wired RS232/RS485 setup by using only the RXD-line on its part.
    There's further positive side-effects on the host computer to be expected for a one-way operation of serial, namely simplifications in serial interface handling on the computer. This will support very robust and cross-platform solutions for firmware-update tools and enable alternative programming channels that do not even depend on native RS232/RS485 or USB. Best of all: This is no longer fiction...
    Spoiler alarm: Flow control may be dropped completely, if the sender will calculate (and maintain) an exact timing for data transmission. Error protection may be done by sort of a cryptographic over-all-checksum. Quod erat demonstrandum!

  2. Cryptography provides for data protection, data integrity, authentication and unique device identifiers
    Flummoxed readers may want to throw in: "As if it weren't enough with that one-way stuff, does he even want to deal with crypto? This can never work, no way..."
    Well, the correct term was "one way"... Let me assure you, it all works even better with than without crypto!
    For a firmware transmission, there is a zero fault policy. Anyway, the method of data transfer shall be quite robust, but even highly controlled technical conditions can not exclude the occurrence of random errors safely. Yes, that's why error correcting codes have been invented, but these would inflate the data volume and cannot either guarantee for absolute freedom of errors. Unfortunately, without any feedback channel, the one-directional bootloader is unable to re-request doubtable data packets, and it can not perform an external verifying. In fact, all those concepts from the era of modem transmissions are definitely out of place in conjunction with that one-way approach!
    Now cryptography gets on stage. We gonna get some reasonable block cipher, apply a specific method of key feedback, and build a Transmission format of encrypted data that is also being armored by strong cryptographic checksum (similar to a MAC). With the appropriate algorithms, the receiver can unilaterally determine integrity of the Transmission in whole. If that super-checksum was OK, then the bootloader would directly hand over to the new or existing application on the microcontroller. However, if there were errors, bootloader must erase all faulty data on its own, and the user would have to start a new Transmission. Anyway, in the context of firmware updates, the only 'Good Practice' is to start a completely new programming session, if any doubts on data integrity have raised.
    In addition to confidentiality and integrity protection, introducing cryptography will have another benefit: As strong cryptographic keys are normally generated at random, we get unique device addresses (in the sense of a UUID) for any of our bootloader-featured devices. This will make things easier for many bootloader applications, since we no longer have to bother with the weak-password-problem and collisions in a multi-bootloader-setup can not occur!

  3. Compact firmware for most ATtinys and ATmegas (and... who knows)
    Like the predecessor project, this bootloader should run on as many AVR chips as possible. Consequently we won't try with paralympic bricolage in "C". We use Assembler! It enables to combine signal processing, sequence control and crypto functions in a highly compact yet transparent code. So, my bootloader firmware doesn't need fucking two kilobytes for its amazing functionality... It doesn't even occupy one kilobyte... it gets along with half a kilobyte!

  4. Software tool provides for bootloader generation, bootloader data transfer and Key management
    Concept of the PC software for the crypto bootloader is a plain command line tool. This means minimal dependencies and standardized input-output interfaces. The tool will easily integrate into various development environments, editors and scripting. Of course, the executables may be built for (at least) Linux and Windows systems.
    One main task of a bootloader software tool is to generate encrypted firmware Transmissions for "authorized targets" (i.e. devices for which the user owns the cryptographic keys). This undertaking should not be more complicated than with unencrypted bootloaders. Nobody would remember strong bootloader passwords these days. A convenient and transparent compromise is to access the bootloaders by their "target names". Software will then search a local database for that name to retrieve crypto key and meta data of the associated device.
    The software should also create custom bootloaders out-of-the-box. This has been a popular option with TSB and in conjunction with the crypto bootloader it makes even more sense to do it automatically, since there is more code to modify. However, we still got option to modify the source code by hand and make a workable bootloader target file from 'manual' assembly.
    Another aspect, never been considered in those occupational therapies for nerds, is already a standard functionality of the software tool: the semi-automatic generation of a series of bootloaders with individual keys and serial numbered naming, and, matching perfectly, the export of bulk firmware updates for such series of bootloaders for distribution purposes. Just in case that someone wants to crazily operate and maintain more than two or three target devices...

  5. Cool naming, cool license
    According to its unique feature of unidirectional secured Transmission, the bootloader was named: " One-Way-Loader", alternative spelling "OneWayLoader" or "OWL" .
    The One-Way-Loader is Open Source. This is a matter of course, since everything revolves around security and crypto.
    The One-Way-Loader is available under the MIT-License. This is for pragmatic reasons. My time is precious and I can imagine better things than to endlessly discuss licensing terms. The permissive and uncomplicated MIT-license is a security guarantee for all involved parties and allows for a free, voluntary and fair cooperation.

See the Technical details to get hardcore insight.


Just try-out: Quickstart



One-Way-Loader fields of application:
  • Ultimate replacement for TSB that already supports same devices
  • Provide a transparent cryptographic protection for local projects
  • In-field firmware updates using very simple and/or rugged interfaces
  • Secure update channel for EEPROM-only or Flash-only data
  • Bootloader uses existing RS232/RS422/RS485
  • Bootloader uses minimalistic or covert programming channels - for example "air-gap opto"
  • Firmware distribution over public channels (Web, FTP, eMail)
  • Restricted firmware update for very special applications



Note: Have you already tried the alternative stylesheet available for this and other tech pages at jtxp.org?
Under Firefox, go menu bar "View -> Page Style -> Black on White"


Top | Index


Technical details

A wise OWL
  1. Timing trick

  2. Autobauding

  3. Logical format

  4. Crypto layer

  5. Firmware

  6. Software

  7. Hardware options




OWL-Transmission (1) - Timing-Trick

Challenge: A bootloader program on a microcontroller can not receive data and write them to its nonvolatile memory at the same time. For technical reasons, it has to buffer small amounts of data, process data, then prepare for the actual write operation. If there is a crypto layer in between, deciphering will cost additional processor time. But the main party stopper is the physical write access in EEPROM or Flash. In this critical phase, the bootloader firmware must wait for write access to be accomplished, or the processor would simply halt for several milliseconds. This is why interrupt-optimized data reception could not achieve continuous data flow in a bootloader application, but rather introduce additional complexity and stability risks. So, we have to deal with the fact that in the bootloader application there is always sort of a "deadtime" due to these processing and physical access tasks. This will boil down to some kind of a stop-and-go transfer.
In a minimum protocol, Computer sends small blocks of data to the Controller, allowing it to process and write them to EEPROM or Flash in a rush. Having finished these operations, the Controller sends back some protocol character, signalling that he is ready to accept further data right now. Then the Computer may send further data or commands. Such two-way protocols are working quite well, but definitely require sort of a feedback channel. How do we get rid of this dependency?

Solution: The microcontroller is a quite deterministic system. If the Computer knows about some technical parameters, it could calculate Controller's deadtimes in advance to make a "timed" serial transmission. Unfortunately, in a modern RS232 implementation, abstraction layers and their latency effects make it nearly impossible to send out serial data with exactly defined "pauses". What we could rather do is; keep on transmitting and fill up the necessary "pauses" with dummy characters. These are normal serial characters that will provide for precisely defined minimum intervals between actual blocks of data. Doing so, the bootloader is granted enough time to process each individual block of data. Getting back "on-line", the bootloader will see some of the last filling characters passing by, enabling to instantly re-synchronize to the serial data stream, then catch up the next block of valid data. No feedback crutches needed anymore!
The OWL signal is a one-way transmission of RS232 characters in the mode 8-N-1 at chosen baud rate.
The OWL signal may be sent from any standard-compliant RS232 interface.
  • It transmits unidirectional from the Computer ("Sender") to the Microcontroller ("Receiver").
  • It sends encrypted data to ensure confidentiality and integrity.
  • It considers the receiver's deadtimes by filling up pauses with calculated number of filling characters.
  • These "Preambles" constitute for guaranteed minimum intervals of time between data blocks.
  • Also the preambles allow the receiver to re-synchronize and autobaud when catching up.
  • This one-way transmission does not need a return channel for flow control and error detection.
  • This one-way transmission presents minimum requirements to the RS232 hardware and software.
  • The continuously transmit signal becomes particularly robust in a technical way
Timing diagram on a typical short OWL Transmission
Timeline of a short OWL-Transmission for an ATtiny25 target running at 10 MHz
(4 blocks of EEPROM data, 8 blocks of Flash data, transmission speed of 9600 baud )

General: Diagram was derived from a real OWL Transmission, which is a continuous stream of serial data according to the elaborated standard. Different data types have been highlighted. At a glance we can see, that the PREAMBLE runs have very different lengths, since they take into account the individual processing and deadtimes of the target controller depending on the respective operating mode.

Block time: Every data block consists of 16 bytes plus one starting character. At given baudrate, the block transmission time is therefore a constant. For example, with 9600 baud, each block will take about 18 milliseconds (tB).

Introductory Preamble: At the very start of a Transmission, the Receiver must wait for or step into an already running Preamble, and initially synchronize on that signal. The INTRO PREAMBLE looks comparably short in this presentation, but could be enlarged as much as necessary to enable some comfortable manual coordination of Sender's and Receiver's commitment.

Block decryption time: Data decryption consumes computation time. At clock frequencies of some MHz, this "decryption time" (tD) is in the range of a few milliseconds. Each block must be decrypted, so the decryption time defines a minimum duration of those Preambles. (To be safe, there should always be some extra preamble characters to compensate for runtime deviations and to allow the receiver to re-synchronize and autobaud.) Just to mention, this is not necessarily the longest delay that occurs in a crypto bootloader. Some physical writes into EEPROM and Flash will consume much more time.

Authentication sequence (S1):
The first three consecutive blocks authenticate the Sender before the Receiver. As this step takes place only in memory and is not computationally demanding, minimum preambles (tD) are sufficient in this sequence.

EEPROM sequence (S2): EEPROM memory could be directly overwritten, but this will take several milliseconds per byte, so that the write time for a block of 16 EEPROM locations will sum up to a whopping of 60 milliseconds (tEW). Fortunately this is only the case for EEPROM data actually to be written. Those larger Preambles between EEPROM blocks are clearly visible in the diagram.

Flash sequence (S3): A pretty long preamble of about 180 ms (tFE) is following the first payload block of Flash data. This is because of the Flash Erase cycle that is necessary before Flash may be overwritten with new data. Yet, after Flash Erase is accomplished, the Flash session gets the gear. But you might have noticed that the Preambles between Flash data blocks are slightly different. This is due to the organization of Flash memory in the microcontroller. In this example, the target chip (ATtiny25), features Flash memory pages of 32 bytes each. Since crypto-transmission was standardized to blocks of 16 bytes, the bootloader program must congregate two successive blocks before the next Flash Page can be written. Accordingly, the Sender will insert the enlarged preamble that also considers Flash write time (tFW) only after every second block. (Note: Controllers with bigger Flash usually have larger pagesizes, but always a multiple of 16. On such devices, the Flash Write cycle will appear on every 4th, 8th or 16th block, thus optimizing the time needed for Flash Erase and Flash Write operations.)

On the calculations: In order to perform such customised timing, the Sender needs some background knowledge on the intended Receiver. In addition to the correct crypto key, the Sender must know platform specific constants like controller type, the average number of processor cycles needed for the decryption algorithm, and of course the absolute clock frequency on the target device. See section Software for further details.
Sounds good... literally: OWL-Transmission, 1 kB of payload, 9600 bits per second transcoded to an audio file

Top | Index




OWL-Transmission (2) - Synchronization and Autobauding

Sender inserts number of preamble characters between data blocks to provide accurate guard intervals that will allow the Receiver to process previous data in time.
When the Receiver gets back on-line, last of the current Preamble's characters are just zooming by. The Receiver will thus have occasion to re-synchronize and adapt for the actual baudrate of serial transmission, enabling for a technically robust reception of following data.

Re-synchronisation and autobauding right before re-entry to data reception
Determinations:
  •   Preamble character ($CC, & b11001100) with special bit pattern allows for synchronization and autobauding.
  •   Blockstart character ($55, & b01010101) marks the end of the preamble and the beginning of the data block.

(P(re))-synchronization:
  •  Receiver may hit asynchronously into an ongoing transmission and find the following conditions:
    • Low level, i.e. "Startbit" or "0-bitcell", is always waited out until the line goes High again.
    • High level, i.e. "Idle", "stop bit "or "1-bitcell", is waited out until the line goes Low.
    • The instant of such Low phase is the starting shot for the first measure cycle.

Autobauding and Frame-Synchronization:
  • Preamble character features 2 continuous low phases: 3 and 2 bitcells of runtime
  • Receiver measures the duration of two consecutive low phases.
  • Second value is to be subtracted from the first value that has been measured:
    • Positive result ( 3 - 2 = 1 ) means that both Low phases were captured within one character frame. The receiver is obviously in-sync and the resulting value can be taken directly for a timing reference.
    • Negative result ( 2 - 3 = -1 ) indicates that these Low phases were captured from separate characters. The receiver is not really in sync yet. It will skip the next high-low transition then simply start a new double measurement, which will then provide the correct result and leave with in-sync state.
  • This underlying differential method provides an error compensated measurement of one bitcell's timing factor.

Data reception:
  • After the Synchro-Autobaud procedure, all consecutive characters will be received by the Software decoder with good timing reference. The receiver...
    • Discards further preamble characters until the blockstarter arrives;
    • Decodes and buffers 16 data bytes following the blockstarter;
    • Decrypts and processes the data in the buffer.
  • In the meantime, the Sender will resume to Preamble characters.
  • As soon as the Receiver will return on-line, it will find the current preamble still running (if calculated generously) and instantly commit the next measure cycle of Synchro-Autobauding. The circle closes.

Goodies:

  • We get all the benefits from a Software UAR(T) with autobauding. Any existing port can be used for input terminal. Any valid transmission speed in a wide range may be workable for serial communications. Actually the serial communications is nearly independent from the exact clock frequency on the controller.
  • The procedure is quite flexible. It can wait for a signal and while interruptions in transfer until Timeout. It can also step right into an already running Preamble. (In fact, the latter is the normal case during OWL-Transmissions. Faulty synchronization or ambiguities never seen with this method.)
  • Only the active-low phases are being measured. A "stuttering" transmission (occasional buffer underrrun) does not affect accuracy or reliability of the measurement.
  • The procedure is very robust. Differential measurement can compensate for slightly distorted signals with asymmetric falling/rising times on their edges up to a certain extent.
  • Preambles, Blockstarters and Data are digitally balanced. Assuming continuous transmission without longer periods of "Idle"state, the digital sum value of the OWL signal would swing exactly in the media between Low and High level, which is a favourable property with regards to DC-free or differential interface circuits and generally improves durability on imperfect real-world channels.
  • Synchro-Autobauding is repeated before each block of data, thus presenting improved stability in long transits that otherwise may be affected by instable or drifting clock frequencies (on both sides).
  • Wide baudrate window. The current programming makes good use of the 16-bit counters so that a wide range of valid baudrates is available. See Table.
  • Synchro and Autobauding get along with 1 up to 2.5 of preamble characters. Preambles may be calculated on the edge when using clock frequencies with small tolerance. (This allows for an even more time-optimized Transmission that will run significantly faster than any two-way bootloader session with an optimized minimum protocol on the same baudrate. This is due to the fact, that on the PC-side, all the delays normally associated with the serial port handling of data direction and management of receive buffer, could be dropped with no replacement!)
Note: For the purpose of this documentation, all RS232 signals are depicted in the same unipolar logic that the microcontroller's UART would normally expect (e.g. coming from MAX232 or FT232). The logical  "1 " (Stopbit or Idle) is identified by a logical High (3.3 or 5 volts), a logical  "0" corresponds to the Low (0 volts). Just to mention, the OWL-Firmware can also be configured for inverse signalling. In some cases, this could simplify the hardware interface even more.

Performance

The following table gives an overview of what is feasible with the synchro-Autobaud procedure described. In a testing set-up with ATtiny2313-20PU and MAX232 at 5 volts, many different controller clock frequencies and baudrates have been tried out. Transmission for testing contained data samples of 2 x 64 bytes for the EEPROM and a simple LED-flasher for the Flash, the latter been filled up with approximately 1 KB of random data to challenge data integrity and test some other features as well.
Before every trial, the AVR has been erased via ISP command, to exclude false positive results.

Evaluation: If the LED started flashing immediately after the transmission was complete, the entire transmission must have passed through completely and without any errors. In all other cases, the attempt has failed. In a few situations, i have been reading-back memory contents by ISP to check for. This confirmed that the error handling by the current bootloader firmware works quite reliable.

Viable baudrates at different clock frequencies

MCU-Takt (kHz)
Baud Min. Baud Max.
16
< 30
100
128
30
450
500
50
900
1000
100
1800
2000
200
3600
3000
300
4800
3560
450
7200
4000
450
7200
4433
450
9600
6000
450
14400
8000
600
14400
10000
600
19200
12000
1200
28800
14745
1800
38400
16000
2400
38400
17734
2400
57600
24000
3600
76800
27256
4800
115200
30000
9600
115200

Remarks:
  1. This table is not applicable to TSB or similar bootloaders. OWL has wider range of workable baudrates!
  2. Established rule of thumb on the valid baud range is: Clock/10 < Baud rate < Clock * 2
  3. If the interface provides very distorted logic levels, there may be problems with the highest baud rate. The next slower baud rate should work then.
  4. The lower limit (Baud Minimum) is determined by the fact that the programme uses a 16-bit counter for pulse width measurement; with too slow baudrate, the counter will overflow.
  5. The upper limit (Baud Maximum) is the result of increasing inaccuracy when the measured signal is quite fast in relation to the processor's clock.
  6. Clock frequencies over 24 MHz were fed by an external oscillator, as no special circuitry for overtone chrystal was available in that setup and actually, the respective ATTiny2313-20PU was "basically" specified only up to 20 MHz...
  7. Only standard baud rates have been used in the test. With a USB-COM-converter (like FT232), also non-standard baudrates may be used. Should be no problem for that Autobauding either.

Top | Index



OWL-Transmission (3) - Logical format

Preambles implement tailor-made timing on the transport layer, thus eliminates the need for a feedback channel to flow-control. Preambles also provide the Receiver with a reference signal for (repeated) Synchronisation and Autobauding.

Those technical properties on a signal level do not really matter for the logical format of an OWL-Transmission. On the crypto layer, only the data blocks of 16 bytes are relevant. They provide encrypted data and sequence control.

All blocks are encrypted with RST. A so-called "RST sequence" consists of an initialization vector (IV), the actual message content (in one or more blocks), and a concluding block (VI, that is a replica of IV). Because of the chosen key-feedback-mode, every block is cryptographically dependent on all blocks before. By means of a simple IV-VI-comparison, the legitimate Receiver is clearly to determine whether all previous data blocks have been completely and correctly received and decrypted. These properties are being thoroughly exploited by the OWL bootloader.

The OWL transmission consists of three such RST sequences. They contain, always in the same order, Authentication data, EEPROM data and Flash data. The key generator will not be reset between sequences. Thus, not only the blocks of one sequence depend on each other, but also the next sequence is cryptographically dependent from the previous. This allows for a basic sequence control that is absolutely satisfactory for a One-Way-Loader!

The OWL transmission cycle (see also diagram):
  • S1: Authentication
    • IV = random data, modifies original key for all successive rounds
    • No data = Block!
    • One data block only = only key feedback
    • VI not recognised = Block without Timeout!
    • VI recognised = proceed to S2
  • S2: EEPROM data
    • IV = random data, modify key state after S1
    • No data blocks, direct VI = proceed to S3
    • Data blocks = bundle for EEPROM write
    • VI not recognized / Timeout = Error, freeze in S2 with NO Timeout.
    • VI recognized = proceed to S3

  • S3: Flash data
    • IV = random data, modify key state after S2
    • No data block, direct VI = Finished, immediately start Application firmware!
    • At least one data block = initiate Flash Erase, collect block data for Flash Page Writes
    • VI not recognized / Timeout = Error, Flash Erase!
    • VI recognized = Success, all transmission error-free, immediate start of Application firmware!
Normal case: Sender has used the right crypto key and the Transmission came through without any disturbance. So the Receiver was able to decrypt each single block and finish each sequence correctly. Finally he detected the last VI in S3. At this moment, it is clear that all previous data must have been complete and error-free. Whole session was successful and the OWL firmware passes control to an existing or newly written application firmware. Indirect feedback: Immediate start of the application.

Exceptional cases: Transmission errors; signal interruptions, excessive pauses (Timeout); synchronicity loss; accidental use of a wrong key; intentional use of a wrong key (addressing multiple receivers on the same line); crude manipulation attempts... All of these will be recognized by the Receiver as an error condition. Depending on which sequence currently is in, the Receiver will take appropriate action. Indirect feedback: Blockade until hardware reset.

Remarks: It has to be pointed out, that this elaborated format for bootloader transmission comes without control characters and commands. It neither uses header blocks nor other metadata that could give rise for a known-plaintext attack. Au contraire, there is additional entropy injected with every new sequence. Although this is a very simple and static format, accessing EEPROM or Flash remains optional, since it is possible to write only EEPROM or only Flash contents without touching other memory area. It is also possible to bundle complete firmware updates consisting of data for EEPROM and Flash, in the same transmission or transmission file.

Top | Index


Cryptography

The One-Way-Loader makes use of a block cipher called "RST" ("Randomized Substitution-Transposition"). This has once been a personal study on block encryption. RST does not only define a simple but strong block cipher, but also specific method of key vectoring, key feedback and a preferred file format. This crypto was conceived with practical applications in mind, rather than academic fame or math elegance.
On the PC platform there is couple of "well established" crypto alternatives available that the buzzword believers would be better off. Objectively, many of these algorithms provide a better data throughput than my small modular cryptosystem, not necessarily better cryptographic strength.
In the bootloader application, priorities are slightly different. So, it turned out, that in comparison with other candidates, this RST thing really looks like a pretty good compromise regarding code efficiency and security in a microcontroller environment.


How RST works


For its "randomized substitutions and transpositions", the algorithm makes use of very simple operations, such as addition, bitshifting, inverting, and swapping, which are the foundation of well-established block ciphers too. However, RST does not make use of fixed keys or lookup-tables. All arithmetic steps are being controlled by dynamically generated "vectors" coming from a pseudo random number generator.
This PRNG is to be loaded with the secret key as the starting number (seed) at the beginning of a crypto sequence. The PRNG is in fact the "key component" of the crypto system. It is essentially interchangeable and may be implemented according to different quality criteria and requirements on speed and code effectiveness (microcontroller applications).
The block round of RST achieves medium to good avalanche effect even in the minimum number of rounds. That is, a single "flipped" bit in the ciphertext will affect between 10 to 50 percent of resulting plaintext after decryption. However, after each block round, the inner state of the PRNG is being modified by the decrypted plaintext of previous block. This will result in a massive error propagation, especially when using a cryptographically strong PRNG with nonlinear characteristics. This avalanche is the basis for the systems good failure detection and authentication mechanisms.


RST crypto scheme: encryption, decryption, rolling key, key feedback from plaintext, error propagation, integrity check
Continuous key modification and plaintext-keystate feedback in RST encryption/decryption 

RST file format

For PC applications, a logical format had been developed, that allows for an authenticated and cryptographically secure file transfer. The whole RST cryptogram consists of encrypted blocks not being discernable from random data. In particular, it does not feature any header blocks or other non-randomized data. When decrypted correctly, the RST sequence comprises of three sections:
  •   IV = block with initialization vector
  •   DATA = block or blocks with message (payload)
  •   VI = block with end signature (MAC, here: recurrence of IV)
First block is the only block that will be encrypted by the initial key. This first block consists of random numbers. As a result of the feedback onto the PRNG, next message block is being processed with a completely new and random session key. On that score, the first block in a RST sequence has the function of a secret Initialization vector or IV. Trivial attacks on the crypto are futile, since there is no "shortcut" for a cryptoanalytic attack on encrypted random numbers!

The IV block has yet another function. Its random pattern will most likely never occur in any of the message blocks. We may use this block as a marker for the end of message and sort of an overall-checksum for the sequence on the whole. For functional distinction this repetition of the IV is called "VI" in RST-nomenclature. What the legitimate receiver must do, is to copy the first block he has decrypted in a session to a separate buffer, then compare all consecutively decrypted blocks with this VI-block. This will enable the Receiver to do the following decisions:
  • If the Receiver finds that the current decrypted block is different from VI, it is assumed to be an ordinary data block that may be processed (e.g. saved to disk). However, at this time, the Receiver can not know if this data is actually sound.
  • If the Receiver detects that the current block exactly matches the VI, it knows with utmost certainty that the Transmission is finished and all previously transmitted data blocks were authentic, complete and error-free.
  • In all other cases, the end of the file will be reached without ever recognizing the VI. This is the ERROR condition.
Instead of error localization and error correction, we use reliable error detection. Incorrectly decrypted files are useless, anyway. Since it is the duty of network transport layers to achieve an error-free transmission of any data, it would be in fact a waste of time to deal with error-correcting codes on a file level. The best we can to in face of a corrupted file, is to repeat the respective download or sending of e-mail. For the legitimate recipient the successfull RST decryption will ensure the basic properties of authenticity, completeness and message integrity. Completely sufficient in many fields of application.

"OWL-RST"

The One-Way-Loader uses a modified 128-bit variant of RST. Block and key size fixed to 128 bits (16 bytes).
The OWL Transmission consists of three contiguous RST sequences. These are cryptographically dependent from each other as the PRNG will never be resetted in the same session. Intermediate VIs provide the bootloader with valuable control information (i.e. stepping forward to the next logical sequence) and the very last VI constitutes for a strong over-all checksum on the whole Transmission.
With OWL-RST, the recommended PRNG is the Software implementation of a classic LFSR of 128 bits, generated by Galois feedback taps on the bit positions 128, 127, 126 and 121 for a maximum length sequence, which It has been extended by so-called "Self-Shrinking" filter function. This provides for a significant hardening of the LFSR sequence under certain cryptographic aspects and introduces some higher degree of non-linearity in connection with the key feedback. For further details see Firmware.
Since RST is not a Feistel cipher, pseudo-random vectors must be applied in reverse order for encryption and decryption respectively. That is to say, that one side will have to buffer the whole pseudo random sequence for a complete block round, while the counterpart can still draw its random vectors directly from the PRNG. It is a matter of fact, this in this unidirectional crypto application, the memory- and code-saving variant has been shifted to the mikrocontroller.


Block encryption (pseudo code):

load PRNG with initial key

# PRNG()             pseudo random number generator, consecutively clocked, similar to RND()
# RANDSET[0...63]    an array to keep 64
preloaded PRNG vectors to be read in reverse order
# BUFFER[0...15]     contains working data, initially loaded with plaintext block

for i = 63 to 0 
    RANDSET[i] = PRNG (state)
    state = state + 1
next i


for r = 3 to 0           # outer rounds counting down 3,2,1,0
    for x = 15 to 0      # inner block cycle counting down 15 to 0
       
        y = RANDSET(r*16 + x)   # read vectors from RANDSET in reverse order 63 to 0
        ax = BUFFER[x]         
        ay = BUFFER[y]         
               
invert ay
                ax = ax - ay    # substitution
                right-shift ax, rotate bit 0 to bit 7
   # bitshift permutation
               
BUFFER[y] = ax 
                BUFFER[x] = ay  # byte-swap elements
    next x
next r

BUFFER[0...15] = encrypted block
Block decryption (pseudo code):

load PRNG with initial key

# BUFFER[0...15]         encrypted block
# PRNG()                 deliver 4-bit pseudorandom vectors, according to state

state = 0
for r = 0 to 3           # outer rounds
    for x = 0 to 15      # inner block cycle
        y = PRNG(state)
        state = state + 1
        ax = BUFFER(y)   # byte-swap elements
        ay = BUFFER(x)      
                arithmetic left-shift ax, shift zero to bit 0, bit 7 into Carry
                ax = ax + ay + Carry
                invert ay       
                BUFFER(x) = ax   
                BUFFER(y) = ay
   
    next x
next r

BUFFER[0...15] = decrypted block

Discussion

The advantages in the current implementation of "OWL-RST" are plain to see: The PRNG keeps on running. Each consecutive block is to be encrypted/decrypted from a fresh new keyset. The unpleasant results of block ciphers that operate in unsuitable key-feedback modes, would never occur. RST is a block cipher that benefits from the properties of a stream cipher without adopting its vulnerabilities.
The fact that there are no lookup tables and complicated transformations involved, enables memory optimized implementations on a microcontroller. Even with the least number of rounds, RST achieves great balancing, confusion and diffusion properties. Additional measures for whitening the cryptotext are therefore dispensable. The plaintext key feedback in RST is merely for the "secret IV" feature and to guarantee massive error reproduction, which is the basis for an overall error-detection and message-authentication.

One could say that this main advantage is also a disadvantage: The PRNG keeps on running all the way. A cryptographically strong PRNG eats up computation time. This is the main reason why, on the PC, a partially optimized implementation of "RST-128" was about 3 to 5 times slower, than a similar implementation of AES, and roughly the same applies to the respective AVR-implementations.

Personal conclusion:
In the bootloader environment, and may be other applications, data throughput of the crypto layer may not be the most important factor. In bootloader applications, the bottleneck is always physical write access and serial transmission. OWL-RST as a cipher provides reasonably safe encryption with a very small memory footprint and essential features of error detection and authentication "all-inclusive".
Criticism of this home-made solution is legitimate. I would like to point out that the most critical component, the pseudo-random generator, is not a development of my own, but has been chosen from well-researched technologies. Even from today's perspective the classic 128-bit MLS-LFSR extended by Self-Shrinking filter constitutes for a cryptographically safe PRNG.
Further, I assume that the downstream block cipher and the IV-VI-mechanism do not tear open new vulnerabilities, but making any attacks aiming at the extraction of key portion and analysis of the current LFSR sequence even more complicateed. I also like to hear qualified arguments and suggestions!

Top | Index



Firmware

The OWL-Firmware is written in pure Assembler, using the core instruction set, which is supported by almost all 8-bit AVRs. No dependency on special hardware components, timers and interrupts. This bootloader is already available for about 100 different ATtinys and ATmegas. All variants under 512 bytes. Source code fully disclosed and comprehensively commented. Right here, i'll give some "high-level" annotations on the respective functional blocks.

Features:
  • Portable code for many 8-bit AVRs;
  • Can use any existing portline for data reception;
  • Well-thought signal processing enables rapid and repeated re-sync and auto-bauding;
  • Waits until Timeout, can step into an already running signal preamble;
  • Data reception possible in normal or inverted logic;
  • Optional output of a control signal in normal or inverted logic ("Dummy-TXD", RS485-TE);
  • 128-bits of random key as a wordwide device identifier;
  • Secured authentication, decryption and control;
  • Linear Write access to EEPROM and/or Flash (supported up to 64-K devices);
  • Clearly defined behavior in case of failure.

Memory footprint
Memory footprint of the OWL Firmware on ATtiny
Memory footprint of the OWL Firmware on ATmega
... on ATtinys:
... on ATmegas:
  • Bootloader occupies top 512 bytes below Flashend.

  • Invoke: Modified rjmp/jmp by $0000, jumps to BOOTSTART.

  • Bootloader will start the Application's reset-routine after Timeout or having done some Update.

  • INFO TAG is attached to any Flash-Upload, giving the Bootloader indication on the desired Timeout (byte) and Reset-Jump of the Application (word).

  • Crypto-Key hard-coded in Bootloader-Firmware (last 16 bytes below Flashend).
  • Bootloader occupies BOOT FLASH SECTION
    of only 512 Bytes below Flash-End.

  • Invoke: BOOT RESET VECTOR (BOOTRST) will call the Bootloader on BOOTSTART with every hardware-reset.

  • Having done its job or after Timeout elapsed, the Bootloader will jump to $0000 where regular Applikation starts.

  • Crypto-Key und Timeout-Byte hard-coded into the Bootloader-Firmware (last 17 bytes below Flashend).

  • Bootloader-Section can be protected comprehensively by Lockbits (direct support for Bootloaders on the ATmegas).

Note: Addresses are bytes.

Invoke: Usually, a bootloader is to be invoked via hardware Reset. It is triggered by the rising edge on the RESET line of the Controller, by Power-On-Reset (just connecting the controller to the power supply), or Brown-Out-Reset. BTW: Calling the bootloader via hardware reset is the best method from a technical and legal perspective, since it provides clear separation between the spheres of Bootloader and Application Firmware.


Initialization: The bootloader will start as the first program after hardware reset. It initializes the stackpointer and all registers, ports, and memory that it uses by itself. The bootloader will NOT initialize any of the remaining SRAM or other I/O ports. This is not the business of a bootloader firmware on a microcontroller. Just to mention.
  • ATtinys: Bootloader searches the application flash from top to bottom for a so-called INFO TAG and, if this tag has been found,  fetches the Timeout byte from there. It then loads the individual crypto key from another location (hard-coded on the flash top) into the working registers for the crypto PRNG (R0-R15). After that, the bootloader starts to listen on the assignet RX-portline. If any signal occurs before Timeout, the bootloader will try to synchronize itself with that signal (see below). If nothing happens and Timeout has elapsed, the bootloader loads jump address of the Application Firmware's reset routine from the INFO TAG and then starts the Application from that address. For an "empty" flash (no application existing, all $FF bytes), the bootloader will use the longest possible Timeout and restart itself. That is to say, with yet no Application firmware existing on the controller, the bootloader remains permanently accessible with no need for a repeated reset.

  • ATmegas: Bootloader loads the individual crypto key into and starts listening for an incoming signal. An INFO TAG is not needed on ATmegas, since the Timeout byte is hard-coded to the firmware and the Application would always start at $0000. If a signal arrives before Timeout, the bootloader will synchronize itself with it and expect further Transmission. If Timeout has elapsed, the bootloader jumps to address $0000, where the application would have been started anyway. Through an empty Flash, the processor would simply run into the bootloader again, so it remains permanently available on ATmegas, too.

Synchro-Autobauding:
Bootloader waits for a level change at the assigned RX-portline until Timeout. First High-Low-transition is being discarded to allow the signal to stabilize. Subsequent level change are being evaluated by the Synchro-Autobaud method described before. If the incoming signal is actually an OWL-Preamble with sufficiently steep edges, initial synchronization will certainly succeed. After that, the bootloader has gained precise timing reference for reception of the coming serial transmisson. Yet the cycle of Synchro-Autobauding is being repeated before each individual block of data; thus making OWL-Transmission pretty immune to long- and middle-term fluctuations of the clock frequencies (on both parts).


Block data reception: After successful Synchro-Autobauding, the characters to follow are decoded via "Soft-UAR(T)" that is basically same technology as in TSB, using half-bitcell for a timing reference. First, the bootloader will wait-out remaining Preamble characters, until the Blockstart character is detected. The 16 bytes following the Blockstarter are the encrypted data of interest. It will be read and buffered in SRAM for further processing.


Decrypt block data: The bootloader gets "offline" and decrypts the 16 bytes in the buffer. With least number of rounds, the block cipher will fetch 64 of pseudo random vectors from the PRNG for decryption. Decrypted block of data is available in the same 16 locations of SRAM buffer.
Any freshly decrypted block is being used for XOR-feedback to the PRNG state (R0-R15). By this, the entropy of payload data will also modify subsequent decryption and,  most important, cause a massive error propagation.
In addition, if this was the first block of a new RST sequence, it will be copied to a second SRAM buffer as VI.
All blocks subsequently decrypted are then compared to that VI buffer (see annotations on cryptosystem). As long as the current block was not equal to the VI, the bootloader is assuming that the respective block was regular write data and it will process this data accordingly.
If the program finds, that the current block is identical to the VI, it knows that the current sequence was accomplished error-free. In this case, the bootloader will pass over to the next sequence or is being finished at all.
In case of an error, the VI will never be recognized. After the Transmission has ended and Timeout elapsed, the bootloader will probably have to erase flash data, otherwise fall into a blocking state. This will give an indirect feedback on the failure of this Transmission (see General error handling).
Programming: Entire block decryption routine is about 50 machine operations including PRNG. Block comparison plus XOR feedback has been combined to one loop, which will comprise of only 12 (twelve!) opcodes. (Try this in your "high-level" concentration camp...)


Authentication sequence (S1): The bootloader must verify that an incoming Transmission was actually encrypted with the secret key of the bootloader and no other key. This can be achieved by sending a "dummy" Transmission of fixed length that will not contain any write data. It is only to provide the bootloader with an occasion to check by the IV-VI-mechanism if this sequence was a valid RST cryptogram. If yes, then the bootloader knows with utmost certainty, that the sender has actually used the correct key for encryption and it can progress to the next sequence, which is S2 (EEPROM data).
In all other cases, the Transmission must have been faulty or the sender has simply used a different key. The bootloader will then go into a blocking state that can only be left by hardware reset.
This consequent blockade behaviour allows to operate many crypto bootloaders with different keys and different technical requirements on a common programming line ("one-way-bus"), making sure that per Reset cycle only the one bootloader that has actually been addressed will follow that Transmission, while all the non-addressed bootloaders on the same bus will safely turn into blockade state. In particular, on those blocked devices no write access and no uncontrollable start of an Application firmware will happen.


EEPROM sequence (S2): Bootloader decrypts and copies the IV of the EEPROM sequence. If there are data blocks following the IV, this data is written block-wise to the EEPROM in atomic write mode. If the EEPROM sequence did not contain any data, i.e. the VI directly follows IV, nothing in EEPROM will be overwritten.
Note 1: There is no precursory EEPROM-erase before EEPROM writes. This means that older EEPROM contents on higher EEPROM locations would not be destroyed if the current EEPROM accesss does not reach that higher address.
Note 2: To delete the entire EEPROM, the sender will send a sequence that simply contains enough data to reach all EEPROM locations and overwrite them with $FF.
Note 3: In the EEPROM mode special precautions against address overflow aren't necessary, since an address overflow can only happen due to a faulty or corrupted Transmission, and in this case the EEPROM write access would simply overwrite the EEPROM again and again. This is a calculated risk, since the EEPROM does not contain executable code and is, to some extent, the 'crumple zone' before any critical Flash access can happen.
If the EEPROM sequence times out, bootloader will go into the blocking state, giving indirect feedback that something has gone terribly wrong...
After VI has been detected, the bootloader goes on with Flash sequence S3.


Flash sequence (S3): Bootloader decrypts and copies the IV of the Flash sequence. If the block directly following the IV is already the VI, i.e. no Flash data blocks been sent, then the bootloader knows that there is no Flash to be erased and overwritten. Flash will of course be left untouched and the bootloader session was finished successfully.
If there at least one block of data is following the IV, the bootloader must erase the Application Flash before any Flash pages may be overwritten with new firmware data. The Flash Erase cycle will take a little more time, and the Sender has certainly calculated a matching Preamble. Flash Erase is performed top-to-bottom for safety reasons, in particular with regards to the ATtinys. For Flash writes, the bootloader will buffer incoming Flash contents to the Flash write buffer, then trigger Page-write into the current Flash page.
Based on 16-byte units, the flash write routine is quite future-proof. It would still work up to a pagesize of 4096 bytes... (the largest pages currently seen on ATmegas is 256 bytes.)
An explicit verifying of the written data is omitted with the OWL. Many years of experience and feedback on TSB have shown that there have never occured faulty flash writes, as soon as the respective data has made it up to the write buffer and operating conditions were stable enough for the duration of the flash write. In the case of OWL, we can be even more relaxed, since the entire Transmission path is protected by the crypto-layer.
If the last VI from S3 was detected, then the Transmission was successful in whole and all data that has been written must have been error-free, so the bootloader session was successfull at all.
If no VI has been detected, and the bootloader timed-out, it will go into the blocking state to give an indirect feedback that something has went wrong. If the Flash has already been touched in this session, the bootloader will trigger an emergency erase of the Flash, to remove all likely corrupted executable code from the Flash.


Transmission successfull, hand-over to Application firmware:
After successful completion of the third sequence, the bootloader will almost immediately pass to the Application firmware that may have been updated or left untouched.
  • On ATtiny, bootloader must re-search for the INFO TAG that could have been rewritten and relocated in the course of an Application firmware update. Bootloader then jumps to the referenced address of the Reset routine and thus starts the Application firmware.
  • On ATmega, the bootloader has to restore access to the RWW memory area (= Application flash) first of all. Then it simply jumps to the address $0000 where the Application would have been started anyway.
So, with an Applilkation that has a distinctive reset behaviour, preferably some LED flashing, beep code or display message, the user will get in indirect but clearly noticeable feedback on the success of the Transmission.


General error handling: Errors could already occur before any data has been transmitted. In a rather minimalistic hardware design, a cold start (device just plugged in, power hard-switched) could cause longer periods of low or undefined voltage levels on the respective RX-port of the controller, until the supply voltages have stabilized. If the bootloader was already listening, it may be confused and possibly "hang" as a result of invalid signal artifacts. (Such unpleasant occurrences happened occasionally with TSB on certain USB devices.)
The OWL's starting behaviour has vastly improved in this respect. A bit of "garble" on the line immediately after hardware reset, is simply ignored. The bootloader will simply time-out and hand-over to the application. Priority is to start the main Application firmware.
Only after the initial Authentication sequence was successfully verified, the bootloader will assume that this is indeed a valid Transmission. Either everything was complete and error-free and within Timeout, then, as mentioned above, the Application would be started immediately after the end of Transmission; OR there had been one or more errors or Transmission timed-out, then the bootloader will go into the blocking state and the user will get an indirect but clear feedback on the failure of this Transmission.


Timeout timing: Bootloader's Timeout should allow for delays of a tenth of second up to several seconds roughly, which is supposed top match all interface variants and practical needs. AVR Controllers could work in a wide range of clock rates from 128 kHz up to 30 MHz, which makes it hard to cover with only one byte for a counter-prescaler. The OWL firmware still uses a "Timeout byte", but this is now calibrated to a unit of 1/100ths second exactly. That is to say, a Timeout value of "100" will always give a timeout of 1 second, regardless of the actual clock frequency on the actual controller. A Timeout value of "1" will give 0.01 seconds of timeout and the value  "255" is about 2.5 seconds. This is achieved by coding-in the clock frequency intended for a device into the OWL-Firmware for that respective device, assuming that the user will indicate this parameter at the time of firmware-make.


PRNG (key generator): PRNG will be loaded with the secret only when the bootloader is called first. This internal state of the PRNG will change with every single transposition-vector being requested and it is also modified by the plaintext text of each decrypted block (see diagram for key feedback).
In here, the PRNG is the software implementation of a classic 128-bit LFSR with feedback taps on the bit positions 128, 127, 126 and 121 (Galois-XOR). The output bit sequence goes through the so-called Self-Shrinking Filter. This constitutes for a provably strong PRNG. However, the SSG LFSR will consume about 3 times more shifting cycles than plain LFSR; but this is not too much of a disadvantage since in the bootloader application there are other factors to limit data throughput.
Last but not least, the assembly code of this SSG has been significantly optimised (...live and learn...). Those LFSR-typical bitshifts are primarily carried out only on two working registers (instead of 16), and just with each 8th shifting, all the remaining 14 registers are directly byte-shifted (mov), which saves a lot of clock cycles.
In my opinion, the SSG LFSR is one of the most cycles- and code-efficient PRNG implementations available for 8-bit MCUs, and it is a shame that again I had to find it by myself, since nothing similar could be found on the net.


Port limits: The present OWL firmware can use all I/O ports directly usable by means of cbi, sbi, sbic, sbis assembly instructions. Some exotic devices also know a  "PORTG" or  "PORTH" whoose I/O addresses are located above $3F. These ports are only accessible via SRAM commands which have a slower timing and will cost more memory, so these are not being supported by the OWL firmware concept and probably will never be.


Code flexibility compared to TSB:
I am confident, that the OWL firmware will be more maintenance-friendly than its predecessor project. The assembler source has become even more compact and clearly structured than TSB. It contains several preprocessor instructions for conditional assembly to adjust for device-specific properties, but not that many as with TSB. In fact, the OWL Transmission's logical format is the same for all Targets, there is basically no "extra sausage" served for any device. Just to add that the One-Way-Loader does not waste a whole Flash page for user data. All Flash, minus 512 bytes of bootloader code, is free for Application firmware.


Portability: Currently the assembler source for the OWL firmware already covers approximately 100 ATtinys and ATmegas with up to 64 kilobytes of Flash without any special adjustments. Several ATtinys and ATmegas have been successfully run with TSB in the past. Some chips required minor adjustments to the code, which fortunately did not have to be re-invented. So there is great chance that the OWL will run smoothly on the same devices from the start.
AVRs with more than 64 KiB (e.g. Mega128x and Mega256x) may indeed require some more efforts. These chips have 128 resp. 256 Kilobytes of Flash memory, which can not be covered by the 16-bit Z-register at whole. An additional I/O register "RAMPZ" is needed to access the higher memory banks.
A one-way loader for these devices seems possible, but maintaining the safety philosophy of linear write access, this might be some annoying experience for impatient users. In an extreme case, an Application may indeed fill up the full Flash of flash of 128 or even 256 kilobytes (minus 512 bytes), which will not take a several seconds, but several minutes. I am very interested in the opinion of people who actually use these large ATmegas in every-day applications. So far, i have a basic test environment with Mega1284 now and will have a closer look at this sooner or later.
My competence and motivation ends with those  "XMegas"! The whole I/O concept and some architectural details are very different to ordinary ATmegas and i do not use any of them.

Fuses and Lockbits:

Generally, the Controller must meet certain requirements for the operation of a bootloader.

ATtinys:
  • Activate SELFPRGEN (otherwise the loader can not write any data to memory!).
  • Enable BODLEVEL and adjust to actual supply voltage. This is vital to prevent flash corruption in a real-world setup.
  • Lockbits Mode 3 prevents the sniffing or modification of memory contents via the ISP/JTAG.

ATmegas:
  • Enable BOOTRST to activate the bootloader invoke by hardware reset.
  • Enable BODEN + BODLEVEL prevents flash corruption with unstable operating voltage.
  • BOOTSZ = 10 or BOOTSZ = 11 to use a bootloader section of only 512 bytes (256 words).
  • BLB in Mode 2 or 3 protects the bootloader section from uncontrolled write access (immortalizes bootloader)
  • Lockbits in Mode 3 prevents the sniffing or modification of memory contents via ISP/JTAG.

Compatibility Requirements and Precautions:

  • No dependencies. Bootloader firmware must not be dependent on any other firmware components. Otherwise it could not reliably load an initial application firmware to a controller that is otherwise empty.
    It would be possible to call bootloader routines through an application, but one should withstand the temptation to do so. Such interaction between Application and Bootloader can and will likely lead to technical problems and gave rise to legal trouble, if at least one Software component is released under some rather restrictive license. Application firmware and Bootloader firmware on a microcontroller should not be interdependent but as clearly separated as possible.

  • Application firmware and bootloader firmware must not occupy more space together than is actually available...  Oh, well, wasn't aware of that... ;-) If it doesn't fit in, the Software will tell you.

  • General precautions for ATtinys: An application should do without flash write operations, preferably, since it could possibly damage bootloader code or leak key data from the bootloader!

  • General precautions for ATmegas: Whole bootloader section may be protected by Fusebits against any unwanted access, making bootloaders installed on ATmegas quite safe with no further precautions necessary.

Top | Index




Software

The OWL-Software is a command line tool for the PC platform. The program is available in source code as well as executables for Windows (32 bit) and Linux (32/64-bit). The OWL-Software can do the following:
  • Generation of single bootloaders with custom ports and random keys
  • Generation of serial bootloaders with custom ports and random keys
  • Provide information about supported controller models and their hardware options
  • Provide information about self-generated bootloader and their hardware options
  • Manage authenticated bootloaders, keys and meta data via memorable project names
  • Sending data for flash and/or EEPROM in one rush ("Transmission")
  • Export of encrypted Transmissions to a container or audio file container for distribution purposes
  • Import of encrypted Transmissions to be forwarded to a local target device
  • Test vectoring on crypto modules
  • Comparably comprehensive Help system

Philosophy

My hardware/software projects rely on simple administrative structures, maximum transparency and minimum dependencies. Call me crazy, but I am still convinced, that technology should serve mankind, and that digital enslavement is not our destiny. Those who use technology consciously and self-confident, those who know the difference between mutual benefits and exploitation, and those who can forego useless stuff, might have more fun with technology in the future and can effectively protect his/her private and business secrets.

Now that OWL-Software has reached sort of maturity and features wide range of functionality... This means: Cryptic command strings, illogical, pell-mell, case sensitive syntax, complex dependencies, brainf***in' semantics...

No, no, just kiddin'...! The OWL-Software features a human-friendly command-line parser and comparatively informative screen messages. For each commandline option, there is long and short notations available, and both of them are quite memorable. Besides, no case sensitive shit. My parser doesn't even care about the order of the options and arguments (latin semantics!). All that we should remember is the the name of some options and additional parameters also to be specified for a desired function. Lot of things can be achieved by thinking, and the rest could be looked up in the on-screen-help. Anyone, who has ever done some jobs at the commandline shall be able to use this OWL-Software intuitively. I would even say: Unlike certain 'dudes ', the OWL commandline tool is ready-to-use without a GUI frontend!

Install

The OWL-Software is portable, that is, it will run from any location, as long as user has the necessary rights to access and execute. Of course, the executables would run from external media, for example USB thumbdrives. For the "Installation", simply unzip the downloaded package to the desired location. The following folder structure is being established then:
  • owl
    • templates
    • targets
    • transmissions
The whole OWL structure may be relocated with no problems, since the application by itself does not remember any absolute path.

Folders structure

There are three (3) data types that must be managed in an OWL environment: bootloader templates, bootloader firmware, and bootloader Transmissions. These are to be found in separate files and folders. Those accustomed to organizing their data mainly in the file system will easily keep track.

Bootloader templates: /templates
For each AVR model directly supported there are pre-assembled machine code available. This contains an OWL firmware with default ports (B0/B1) and default-key ($0011... EEFF) for the respective device. The firmware Template is saved as an Intel(R) Hexfile. Whenever the OWL-Software is to create a custom bootloader, it will search for a hexfile that matches the submitted device name in the Template folder.

Bootloader Firmware: /targets
Creation of a new bootloader (Target) is done according to the proven technique been developed for TSB, that will do the few modifications directly in machine code, thus not depending on assembler or compiler infrastructure. To make a new Target, i.e. a bootloader with customized ports, timing and crypto key, the program will search for the reference code in the templates folder, then modify the respective I/O commands that refer to the default ports B0/B1, adapt the timing reference according to the intended clock frequency and make a random crypto key for that new-born bootloader.
Finally, the program saves this modified OWL-Firmware to a hexfile under the desired target name (or autogenerated name) in the folder targets. This is a valid Intel Hex firmware file, intended to be read in by an AVR programming software and written to the target controller via ISP. After changing the Fusebits, the new OWL is ready for use.
The target file is the only location for crypto key and meta data of a bootloader!
So better not delete, if we actually want to use that Bootloader...
The meta information for each target is appended as a commentary lines behind the final hex record. Those are human edible and editable. An ISP Software would simply ignore these lines, but the OWL-Software can parse them and thus get all the relevant Meta infos for that specified Target. By the way, renaming, copying, moving and modifying of Target files is possible at any time with onboard means.

Bootloader Transmissions: /transmissions
The Software can send the data stream for a bootloader transfer to the specified serial port, or redirect to a file in the transmissions folder by default. The OWL Transmission is an 8-bit stream of binary (encrypted) data, so are the OWL Transmission files with the extension .owl. By the storage in a file container, no timing information will get lost. An .owl-File contains the whole of an OWL session including authentication and data for EEPROM and/or Flash.
The OWL-Software can of course accept an .owl file and send it to a serial port. It will automatically use the correct baudrate, since it is appended as an ASCII tag to the end of that binary file. Other plaintext is not included.
It is also possible to send an .owl file from the command line or from other software to a serial port via redirection. This will also result in a flawless OWL Transmission!
Example DOS/windows console:
mode COM1 9600,N,8,1 | copy /b transmission0815.owl COM1
Same under Linux:
stty -F /dev/ttyS0 9600 cs8 -cstopb -parenb | cat transmission0815.owl > /dev/ttyS0

Compilation of a Transmission

The Software needs a valid target name as a reference to the bootloader for which it is to generate a Transmission. If that Target file is existing, the program will know the crypto key and all Meta-Info that is necessary to calculate an encrypted Transmission with correct timing for this particular device.
Also, of course, a hexfile with write data for the Flash and/or EEPROM should be specified.
A serial port must be specified only if the Transmission is intended to get sent out "live", i.e. immediately after the command has been issued. With no serial port is specified, the data stream would go into a Transmission file (extension ".owl") in the folder transmissions. Additional information and modifiers are possible (please consult help texts).
The tool will compile the Transmission to a string array in memory, in the order of the RST sequences:
  1. Authentication sequence (S1): The first sequence consists only of an IV, a dummy block (random numbers) and the VI. The PRNG is loaded with the initial bootloader key (taken from the Target file). So, the cryptogram for the authentication sequence S1, together with IV and VI, always has a length of 3 blocks, corresponding to 48 cryptobytes. The string for S1 is provided with enlarged introductory preamble to ease up manual coordination of bootloader Transmissions, as mentioned before. At the interval of the three crypto blocks, minimum preambles and block starters are being inserted.
  2. EEPROM sequence (S2): The program reads EEPROM data from the specified hexfile and fills it up to the next full block (padding up with null bytes). EEPROM data record is featured with the IV and VI block, then encrypted via RST based on the current keystate. In a second step, the program inserts the necessary preambles between crypto data blocks. These preambles must take into account the decryption time but also the EEPROM write time for 16 bytes per block. If no EEPROM write data has been loaded, the EEPROM sequence S2 is only consisting of IV and VI blocks at the distance of a minimum preamble. The complete string for S2 is appended to the string of S1.
  3. Flash sequence (S3): The program reads Flash data from the specified hexfile and fills it up with random bytes (and possibly the INFO TAG). The Flash data record is encrypted based on the current keystate, featured with IV and VI and some extended preambles for a Flash erase operation and Flash page writes. If necessary, an optional outro preamble is added. This sub-string is then appended to the combined string of S1 and S2, making the whole Transmission string of S1, S2 and S3.


Single Transmission

If a valid serial port has been specified for Transmission mode, such as --serialport=COM2, the Software will send-out the OWL data stream directly through this interface. Normally the default baudrate that's already been specified in the Target file is applied to the Transmission, but if the commandline has specified a different baudrate (by --baud=xxxx), this will be used [or tried to; validity check of a modified baudrate missing yet.]

If no serial port has been specified, the data stream is being forwarded to a so called Transmission file, normally saved in the folder transmissions folder. The file will have a unique naming that consists of a timestamp as the prefix plus the original Target name. With the option --transfile=path/filename, we can also specify a custom path and the first part (prefix) of the Transmission's file name. File extension of OWL Transmissions is always .owl.

Serial Transmission

It is possible to address multiple targets in the Transmission mode. The Targets should be systematically numbered. Then we can simply specify their name-space by wildcard symbols, i.e.  "? " or  "* ". The Software will look for all the matching targets for this pattern and generate individual Transmissions for each one.

Example:
owl --targetname=Bootloader0? --flashfile=program.hex

Would capture all bootloaders matching the pattern, i.e. "Bootloader00" to "Bootloader09", if existing, and makes a custom Transmission for each single Target, containing the referenced firmware code of "program.hex". These individual Transmission files are then being saved to the folder transmissions with the extension: ".owl"

Bootloader generation

The Software needs a valid device name and retrieves the corresponding file from Templates folder. Then, based on this machine code, it generates a customized version of the firmware in the way described above. This file is added by Meta-Data and stored as a so-called Target file in the folder targets.
Example:
owl --device=tn2313 --rxport=d0 --txport=d1 --targetname=Testloader

will produce a single new OWL in the targets folder, named:
TestLoader00.hex

Serial bootloaders

There is a feature that allows bootloaders to be produced in series. To do this, we simply add the argument "Number" to the commandline for making a single bootloader. The Software will then make the specified number of bootloaders for the same hardware configuration, but with different crypto keys.
Example:
owl --device=tn2313 --rxport=d0 --txport=d1 --targetname=Testloader --number=10

This will produce 10 Targets with file names:
Testloader00. Hex
Testloader01. Hex
...
testloader09. Hex

These Target loaders all feature the same standardized technical parameters but individual cryptographic keys. They can be installed onto 10 Target devices, armored by Lockbits and/or physical means, maybe operated at diversified locations. As long as the respective Target files containing the associated crypto keys are kept secret at our site, no one else can update Application firmware on these devices by way of the bootloader. Customers may get to know the serial number of their respective device, so they can, for example, request firmware-updates for their devices from a website or by email. The serial number has no link to the actual cryptographic key.
Because of the "entropy problem" (discussed below), it is recommended to generate no more than 100 bootloaders in one pass.

Audio-Export

As mentioned before, the balanced binary data stream of the OWL Signal may be transmitted and regenerated easily despite of DC-free or even floating channels. Some of you may have listened to the audio sample of an OWL Transmission and instantly got the impression that this raw data signal sounds like a modem transmission or stuff. Indeed, it has similar properties, so the idea came up to try the PC-Soundcard as an alternative output channel. Finally implemented an option --audiofile=Dateiname.wav for the export of OWL Transmission(s) to WAV/RIFF file(s) for playback over an ordinary Headphones or Line-output. The interface circuit is ludicrously minimalistic. See below!

Random keys

Assuming that we can make reasonably random keys of 128 bits, it is virtually impossible that two randomly identical keys will ever occur and conflict in the same universe! This improbability drive enables to make unique device keys locally and to use them worldwide without the need to ever check these keys against some centralized database. This means more freedom and self-determination for users.
By the way, random keys will solve most of the problems that we have had in the past with password schemes á la TSB. In a hardware setup with multiple bootloaders using the same common programming line, there are no conflicts nor chicken-egg problems anymore. Every involved OWL would already have its individual 128 bits address right from the beginning, and those keys are linked to a memorable Target name. For the authorized user, access to all his bootloaders is absolutely transparent. This applies to single and series of bootloaders.
Downside: generation of good random keys is not that trivial. Oh, we've had that topic before...

Random-Pooling

Computers, as we all know, are unable to generate random numbers. The so-called random number generators built into modern CPUs are not trustworthy for legion of reasons... For the generation of individual cryptographic keys, the usual remedies (timer, mouse movements and entropy from the filesystem) are sufficient, but only when used rarely.
But if the Software has to make serial bootloaders or Transmissions, it needs bunch of good randomness in a short time. Here it is better to have a supply of entropy, a so-called Random Pool, at hand. Such Random Pool is a file that will be refreshed/refilled with entropy on various occasions or generated from trusty sources like physical randomness (TRNG). If the Software needs much entropy in a short time, it can come back to the Random Pool, which is not exhausted too quickly.
The OWL-Software creates its own random pool. To do this, it creates the file randpool.bin in its root directory. This file will carry at least 512 bytes of randomness. It is accessed by the method of a ring buffer, i.e. the Software reads 512 bytes from the beginning of that file and copies them to a working array memory. This block is being used and modified in the course of certain Software activities. Leading 512 bytes of the file will be deleted from the beginning of randpool.bin and the modified internal random pool will be appended to the end of that file when leaving the program. By this, there are always at least 512 bytes remain in randpool.bin which will guarantee a good entropy reserve due to repeated refreshing by new random events.
The file randpool.bin may be larger than 512 bytes, it can grow up to 65536 bytes (64 KiB). This is intended for building up a stock of entropy by means of a physical or true random number generator (such as XR232USBXR232/USB). Now, when the OWL-Software accesses randpool.bin, the contents of that random file will move forward in 512 byte increments, while the modified and only slightly worn-out internal random block will be appended to the file's end, so that the Pool is never really "used up". It may take very long, until a 64k file or real randomness will degrade by any means in this application, and there is presumably no temporal correlation between generation and use of that random data that may be utilized by an attacker.
Yet, this large-random-pool strategy is viable ONLY when randpool.bin can be protected from unauthorized access; is has to be treated like a very large cryptographic key. If it can be read, modified or copied by an attacker, all the cryptographic security it offers, is lost. If a such TRNG is permanently available, it makes more sense to refresh randpool.bin shortly before new randomness is actually requested.
As a compromise, the Software also offers a command line option --randpool that will open an interactive screen in which the user can harvest some entropy for the Random Pool by means of hectic mouse movements.

Back-ups

The OWL-Software does not automatically create copies from files. Backups of a complete OWL folder can be made by standard tools available on every serious operating system. In particular, the folder targets should be backed up regularly if we do have important/irreplaceable bootloaders and keys.

Security

The Software does not overwrite any Target files that already exist. If a Target name has already been assigned, a warning message is issued and the operation is cancelled.
To prevent accidental erasure of Target files by other applications, crucial Target files may be protected by a read-only flag, if applicable. Of course this is no replacement for a Backup regime!
The Software's "security concept" assumes, that any person gaining access to the machine, is simply authorized to do so. The OWL-Software does not provide for additional access control, such as a master password. Such options have been considered but found to be cumbersome and not worth the efforts.
In a critical environment, the user should be aware of possible security implications anyway. A system that processes personal or confidential data must not be accessible by unauthorized persons, that's the baseline. If we see ourself exposed to real threats of espionage or sabotage, we will sooner or later implement some "secured environment". (Only a few suggestions: Do not rely only on virus scanners, desktop-firewalls or script blockers. Most of them is 'snakeoil'. Chuck out proprietary crap from your system, learn about general operating system vulnerabilities and technical threats. Use fully encrypted hard drives, distrust governments and automatic updates!)

Data formats

  • Export: For cross-platform compatibility, all Intel-Hex files generated for Targets are standardized to 7-bit ASCII and feature CR-LF line feeds.
    The OWL format for Transmissions is a binary format that may contain all possible 8-bit characters. We are right to expect that these files may be easily stored and used across platforms with no unwanted alterations; particularly as is seems that no popular application is being associated with the file extension  ".owl" so far.
    The file randpool.bin is a binary file containing 8-bit random numbers that may be 512 to 65536 bytes of size. It is only maintained on the local system.
  • Import: Hex files with records for EEPROM or Flash can read-in by the program with linefeeds in the LF (Unix), CR (MAC) or CR-LF (DOS/Windows) format. Actually this is not my merit, but thanks to the thoughtful implementation of LINE INPUT under FreeBASIC. So, the Software should be able to read the formats that various assemblers and compilers eject under the generic term "Intel-hex".

Crypto-Testing

The OWL-Software offers a few testing functions regarding the crypto layer.
Key generator (PRNG):    owl --key=Hexstring
Specifying only a hex key without any further options triggers testing mode of the PRNG. The PRNG is then loaded with the specified key and continuously clocked.
This function demonstrates how much of a minimally different output key will affects the generated PRNG sequence. It is also suitable for comparison with a reference implementation of the 128-bit SSG LFSR.
Screen output is in Hex. The first line represents the starting state (seed) of the PRNG. The following lines represent the raw sequence of the PRNG module, i.e. 4-bit vectors normally provided to the block cipher (but without key feedback!).
If you specify default key (--key="00112233445566778899aabbccddeeff" ) or empty argument (--key=""), the screen output of the first 256 vectors must be identical to the following sequence (as of SW version 201806xx up):

00112233445566778899AABBCCDDEEFF

0035F2BFEBBC79D7B6FB6E536D14DCA2
8A41FABFDFA8A7CB278D9B93ED144009
4116BBDB07E70257590C1602B2F35DF4
4C932A9D825C6A464896D1173D8F910C
1A121048A968625C3513DA716419F961
9083A7F4853B5D7F2D08C286E12A8008
08620ECC967578F6AEA63B5FB2B2234F
0F5CBDE922983F8961C6BF9B65D75082


File encryption:    owl --key=Hexstring --encrypt=Filename
Encrypts the specified file with the specified crypto key by default method (an RST sequence with IV, data, VI). The target file name is built by appending  ".raw" to the original filename. This file extension would simplify import of crypteted data into certain graphics, audio and analysis tools for further investigation.

File decryption:     owl --key=Hexstring --decrypt=Filename.raw
Decrypts the specified file with the specified crypto key. If decryption was successful (IV = VI), the file will be saved with a timestamp under the original name with its original file extension restored. This means that the original file can not be overwritten accidentally. (Note: In this reduced RST implementation, the decrypted file may be enlarged by up to 15 so-called padding bytes.) Should the decryption have failed, the demo function would also save this corrupted file to be available for analytical purposes.

Virtual machines

Various constellations with VirtualBox 5.XX have been tested for fun, and it actually was fun, since there were no problems at all with the OWL-Software, compiled for the respective Guest machine to run under different Host systems, as soon as the Guest has been granted access to a serial interface on the Host. Most combinations of WinXP/Win7/UBUNTU14/Debian8 ran smoothly. In general, we should not expect best performance in a VM, especially regarding screen output and interface connections! In all setups, the additional abstraction layer lead to a noticeable lag on the RS232 transactions. Data sent out was periodically "stuttering," but fortunately no single characters got lost - all OWL Transmissions could be decrypted with no problem. "Emergency operation" of the OWL-Software in a VM under a different operating system - check!

To-Do's

Hardware-Software projects are known to be a highly dynamic combat situation. What may look so nicely organized, has developed in a more or less "organic" way. Surely this project will undergo several revisions the next time and develop from breathtaking to brilliant ;) The following items are already flagged for next project revisions:
  • nicer formatting of screen output
  • maybe adjust some of the commandline names and descriptions
  • further improve collection of randomnesss (random pooling)
  • more differentiated error handling
  • extensions for over 64k-devices
  • test some interface alternatives
  • tidy-up code
  • improve checking valid range for a chosen baudrate
  • web documentation, a pain in the ass...

BTW: Thanks for your kind feedback!


Top | Index



Hardware-Options

  • Target platform features some RS232 interface (RS422/RS485, USB-RS232): The device would normally communicate via RS232 to another device, such as a PC or terminal. Of course, the same interface could be used by a bootloader. For the One-Way-Loader this means:

    • OWL may use the existing RS232 interface, i.e. its RXD assigned portline for data reception. This will enable firmware updates without opening the device and without special programming adapters to be connected. Just uses the existing RS232 cable. Pretty comfortable option from a user's perspective.
      Basically, the OWL will only need RXD line for data reception, since it does not send back any data. Yet, in a two-wire-setup, from the moment of hardware reset and bootloader actication the respective TXD portline would be left high-impedance and input. This could leat to uncontrolled logical state or oscillation in poorly designed circuits, so the interface driver would "send back" some undefined garble to the serial port. In such cases, it is recommended to actively set up the TXD assigned port as an output and issue some statit High level to it, as the regular application would do either when there is nothing to send back. OWL can be configured for an additional "TX port", that is not actually transmitting anything but a static "Idle" level.

    • RS422/RS485 line drivers (e.g. SN75176) normally require sort of a control signal (transmit-enable, TE) to switch data direction on the differential bus. This signal is normally to be provided by the regular application firmware on the controller. The OWL should do the same, as it must make sure that for the time of bootloader session the interface chip will be constantly switched to receive data direction. OWL can provide such signal. Just assign the respective portline for "TX port" and configure for a noninverted (high) or inverted (low) output level (depending on the logic of interface circuit).

  • Target platform without RS232 hardware: These are the more interesting applications in my opinion. Standalone applications, that normally would have no RS232, can be featured with a strong crypto bootloader, finally. The unidirectional Transmission with its least hardware requirements offers some exciting possibilities for minimalist and/or rugged interfacing:
    • Direct electric connection: We can reserve any portline on the microcontroller for OWL data reception. Connection may be led out by appropriate terminals. An existing One-Wire adaptor (CI-V-interface) could directly connect to such minimalist interface with no problem.
      Even more simplified: We may take the TXD-TTL signal directly coming from a USB-COM-converter (FT232, PL2303) and connect that to the respective input port on the micro. Or we may take the TXD signal coming from a genuine RS232 (i.e. -12V / 12V) and limit its voltage swing by a resistor and zener diode to TTL levels (0V/5V). This will lead to an "inverted logic" but we may configure the bootloader for inverted RXD signal.
    • Plain optocouplers: Standard optocouplers, such as the PC817, enable serial date transmission with enhanced electrical safety and easy option for signal inversion, if needed. The phototransistor connects to the respective RXD-portline. Pullup resistor of about some kiloohms will improve signal steepness. On the counterside, the coupler's LED connects to the serial data signal. See circuit samples below.

    • Open-Air opto: This is basically same as an optocoupler. At the microcontroller's site, the respective port line is connecting to a fast phototransistor with a pullup resistor and nothing more. It will draw the line to Low state when exposed to sufficient intensity of red or infrared light. The sender is a bright LED driven by the TXD-signal of a serial port. This constitutes for the simplest and safest (in terms of electrical safety) data transmission, because of the air-gap. This variant has been tested in some environments. Works surprisingly well when there is no direct sunlight to interfere! Could work eben better with invisible infrared sender and infrared filter at the receiver, but not that cool... Air-gap opto offers option of firmware-update via relly cheap, minimalist but rugged programming interfaces.
Light saber wirelessly programming an OWL armored device

    • Capacitive/Magnetic couplers: The continuous OWL transmission is a balanced bitstream, therefore may be transmitted easily over DC-free capacitive or inductive transducers without the need for additional modulation. This has been proven to basically work on modest or resonant frequency band. (However, when galvanic isolation is the primary goal, any Optocoupler-based solution may be preferable.)

    • PC-Audio to OWL-RXD: The Software provides a feature to export an OWL Transmission to digital audio. Firmware updates could then be packed and distributed in the form of (lossless) audio files and sent to the target controller via toy devices such as laptops, smartphones and tablets that do not feature RS232 but likely some Headphones connector. There had been no doubt that one can regenerate serial data from an audio signal using sophisticated modulation/demodulation schemes. This is what vintage modems and remote data transmission is all about. Anybody can do complicated. The challenge is to keep it really, really simple but strong. Worth a little more thoughts and experimentation. Now it seems there is at least one circuit and encoding option that abuses the the soundcard pretty well. The adapter I've conceived basically consists of one (1) Optocoupler and a couple of resistors. On the sender's side, serial data signal is to be differentially encoded (not modulated!) and spread over both stereo channels. Playback requires no more than a high level output terminals 'Headphones' or 'Line-out'. On the receiver's end, a special bipolar (but common) Optocoupler will differentiate both channels with reference to each other and the signal pouring out of the phototransistor will be in-phase serial data, regardless of actual polarity, floating, commutation or slight imbalance of left and right channel. See circuit samples below.


Simple & Safe One-Way-Interface:
TXD drives an ordinary 4-DIP Optocoupler. Favourable variant, if there is no full RS232 interfacing provided otherwise.

Simplest unidirectional data transfer from classic RS232 to the microcontroller via optocoupler
Simple unidirectional data transfer from classic RS232 via Optocoupler
Simple unidirectional data transfer from classic RS232 via LED and phototransistor on the controller's side

Air-gap TXD via LED
.
Air-gap TXD, same principle, using FT232.
Circuit suggesttion to drive a transmit-LED from an FT232 (USB-VCP) that will result in the normal logic on the receiving side as in the examples before.
Simple & Safe Audio-Interface for differential encoding of stereo channels.
Simplest interface for unidirectional and failsafe data transfer from a high-level soundcard output to a microcontroller


OWL rune draft jt (CC0)
More iconic OWL logo

Top | Index


Remarks

Crypto contest for the small bootloader

For the crypto layer, only a few candidates seemed viable with regards to the tight restrictions on memory and computing power of the target platforms. Sample code for AES, XTEA, a simple stream cipher (XOR-style) and "RST" had been avaiable as the project became more defined. Some deeper testing/comparison had to be conducted. This is why RST made the grade:

Algorithm
AVR implementation
Pro's
Con's
AES
(Rijndael)
  • Optimized AES 128 bits for AVR-ATmegas is available
    "Rijndael Furious"
  • Code: 1570 bytes
    (block cipher only!)
  • Clocks per block: 2700...3500
  • Reference: http://point-at-infinity.org/avraes/
  • Will perform AES-128 according to specs and thus could be "certified"
  • AES offers good and well understood statistic properties regarding diffusion/confusion 
  • Comparably fast on AVR-controllers (Assembler)
  • Different variants of key feedback that work with regular AES could be implemented.
  • Further code required for CRC and Key-Feedback!
  • Large memory footprint  (lookup tables, S-boxes)
  • Decryption slightly slower than encyrption
  • Possibly oversized for bootloader-applications t
  • They want licensing fee for commercial application...
XTEA
("eXtended Tiny Encryption Algorithm")
  • Standard of 64 bit blocks and
    128 bit keys
  • Code: 206 bytes (core functionality)
  • Clocks per block:
    ~ 12.600

    (split in 2 x 8 bytes)
  • Reference: www.efton.sk (link outdated?)
  • No patents, royalty-free
  • Provably good statistics
  • AVR assembly version quite compact and well-designed
  • Notable efforts needed for key feedback, error-detection, etc.
  • Possibly weak with minimum rounds per block
  • Initially works on 64 bit blocks, requires elaborate makeover to adapt for 128 bit blocks and  128 bit system of key chaining and cryptographic checksum
PRNG-XOR
(simple stream cipher)
  • Simplest
  • Code: ~60 bytes
    for plain XOR-stream
  • Blockwise or bytewise XOR by a pseudo-random-number-generator (PRNG)
  • Clocks per block: < 1000
  • Classic stream cipher
  • Using cryptographically strong PRNG may provide sufficient security for some applications
  • Most compact solution
  • Notable efforts needed for key feedback and error-detection.
  • Vulnerable to known-plaintext attacks
  • Not sufficient for serious crypto-bootloader!
RST
("Randomized Substitution-Transposition")
  • "Block cipher controlled by a stream cipher"...
  • PRNG of 128 bits
  • Continuously clocked
  • Use of Init-Vectors, over-all-error-detection
  • Modular options for block cipher, PRNG and key feedback
  • Code: ~ 160 Bytes
  • Clocks per Block:
    < 10.000
    with minimum round count
  • Good statistics
  • IV mechanism and error detections "all inclusive"
  • Inofficially tested well
  • Very compact implementation on 8-Bit-CPUs
  • Use of cyptographically strong PRNG
  • Nice compromise between cryptographic strength, code efficiency and functionality
  • Fully disclosed and quite simple
  • Not "certified" at all
  • Strong PRNG cost lots of computing time
  • Avalanche effect within block may be small (< 20%)
  • IV-VI-mechanism will enlarge ciphertext by 2 blocks.
  • Storage of key sequence necessary at one side.


Block encryption with or without error propagation and rolling key scheme

(1)
(2)
(3)
(4)




Plain graphics data, unencrypted:
Bitmap 200x200 px, 8 bits greyscale,
40 kibytes.
(equals 2.500 blocks of 16 bytes!)
Encrypted in the very stupid ECB mode that uses the same keyset over and over... Patterns of plaintext date still visible.
(This IS bad!)

Encrypted with continuously clocked key generator. No statistic deviations anymore, sound statistics.
Strong encryption.
Decrypted in plaintext-key-feedback mode (RST) with one bit flipped:
All subsequent data and final checksum corrupted.
Error or attack safely detected!

Top | Index


Quick Starter

1. Prerequisites

  • Hello-World-program for the respective Controller, e.g. simple LED-flasher, in Intel-Hex-format (standard)
  • Hardware Software environment for AVR projects (IDE, compiler/assembler, ISP programmer, boards)
  • AVR controller that is supported by OWL, on board with ISP connector (or adapter). Examples examples could be easily adapted for other devices.
  • OWL download for your operating system
  • Earlier done things at the command line
  • Respect, but no fear from Fuses!
  • At least one existing RS232 port (or virtual COM adapter for USB), accessible by ( "COMx ",  "/dev/ttySx")
  • Simple Hello-world-test program for the respective controller platform, e.g. simple LED blinker, available in the format Intel-Hex (standard)

2. Install OWL-Software on the PC

Actually, there is nothing to "install" nor any registration or Registry bitches. Just unpack the download file "owl.zip" to the desired location on your system. Then open command prompt and change to the 'owl' root directory.
When you type owl without further options, the general help screen should scroll over. If this has worked, then all the other functionality should also work, most likely. (Do not forget to prepend " ./" under Linux.)

In these examples, we may use the long form of the commandline options for clarity.
By the way, the leading
"-" resp.
" -- " are not mandatory. The Software would show a listing of all short and long forms by entering: owl --help.

3. Make tailored OWL firmware

Sample setup: ATmega8 with classic RS232 interface (MAX232, FT232) around 8 MHz of clock frequency
The controller is being connected via MAX232 (oder FT232) to the RS232 or USB of the host computer. Such appliances will most likely use the UART component of the controller, thus being determined to use PD0/PD1 for RXD/TXD.
Serial communications should have been tested successfully in this setup. Also it is assumed, that there exists some LED on PB2 that could give us an optical feedback. The sample firmware ledblink_m8.hex will use that port. You may test it one time without Bootloader to make sure that it works.

Now we make an authorized OWL bootloader for this Hardware. OWL commandline is:

owl --device=m8 --rxport=d0 --clock=8000 --targetname=testowl_m8

That's all of the "ultra complicated" process of making a customized bootloader with unique crypto key... disappointed ? You will find the new Firmware file under:   /targets/testowl_m8.hex

4. Installation of the OWL-Firmware (Bootloader)

Now start your preferred ISP-programming Software (like "avrdudess", "extreme burner", "TwinAVR") to transfer the freshly created firmware testowl_m8.hex into the target chip. Flash should be fully erased before doing so.

And we have to set the Fuses. Most important setting in conjunction with bootloaders is to enable "Self-Programming" (SPIEN). This is a prerequisite for any bootloader, since otherwise a firmware could not write flash memory contents by itself. Additionally the Brown-Out-Detektor (BODEN) must be activated; this is to provide for a sound coldstart behaviour and prevent Flash corruption.


Fuses: SPIEN=0; BOOTSZ=10; BODEN=0; BODLEVEL=0; CKSEL=1111; SUT=11
Byte-values:    Ext: $FF         High: $DD         Low: $7F

Or, as cryptical as can be, using "avrdude" at the command prompt:
avrdude -U efuse:w:0xff:m -U fuse:w:0x7f:m -U hfuse:w:0xdd:m


Hint: There are Fusebit-Calculators available on the Web making things a lot easier.

Now we have a workable OWL on this Target device.

5. Transmit test firmware

For the convenience, our testing firmware ledblink_m8.hex is located right in the owl folder. In this example, the Target is being connected via COM2 (Linux: /dev/ttyS1) to the computer. The commandline will look like this:

owl --targetname=testowl_m8 --flashfile=ledblink_m8.hex --serialport=COM2

Type in, Reset the controller-device and fire up the command. Transmission will be sent shortly with default speed of 9600 baud (if we had not otherwise specified). Transmission should take only a few seconds. LED at the controller starts to flash? Congratulations!

Note: For different AVR and Ports, change test firmware and target loader accordingly.

Some informational commands

Short reference on all commandline options:
owl --help

More detailed reference on referenced options (here: 'flasherase', 'timeout' and 'serialport'):
owl --help --flasherase --timeout= --serialport=


List all devices that are currently supported with firmware templates:
owl --supported

Watch technical data on a certain device:
owl --device=Devicename

Watch master data on authorized bootloaders
(example):
owl --targetname= testowl_m8



Top | Index


License

Lots of experience, passion, tears, sweat and beer has flown in to this little project... I will provide the world community with this work of art under the following conditions:

Programs (firmware, Software) for the One-Way-Loader are subject to the MIT license. Respective note is being included in all source codes. All of the one-way loader programs that have been released by myself, are deliberately Open Source. For special versions or contractual works, Individual agreements may apply.

Documentation and images for the One-Way-Loader are available under Creative Commons - Universal (CC0). Includes circuit diagrams, drawings and accompanying documents that do not feature an explicit copyright notice and are not implicitly subject to another license. For example, OWL logos and icons are own creations or modifications of drafts that were already in the public domain.

Want to support my work? Suggestions, criticism, donations


Top | Index


Download

Top | Index


Links


Top | Index


First release: 06/2018 ~ Recent update: 07/2018