Recent update: 2023-09-18

One-Way-Loader - Compact Crypto Bootloader for AVRs


The One-Way-Loader is an exceptional AVR Bootloader compatible to most of the TinyAVR and MegaAVR devices. It makes a consequent use of unidirectional data transmission, enabling quite minimalist and/or custom interfaces which actually use only one input line. It also integrates well into conventional RS232, RS485 or USB-RS232 setups. Transmissions are secured on a technical and cryptographic level. Programming contents for EEPROM and Flash may be exported to a single file container, suitable for encrypted firmware distribution.



OWL web draft cc0
  • Bootloader covers most of the ATtiny and ATmega series with small memory footprint of 512 bytes only.

  • Extremely reliable unidirectional data transfer; lots of interface options such as RS232, RS485, One-Wire, Optical and Audio.

  • 128-bit block cipher provides for data security, fault detection and unique device addresses.

  • Advanced Software tool generates single or series of customised Bootloaders out-of-the-box; manages target devices and keys.

  • Encrypted firmware-updates for EEPROM and Flash in one pass; distribution format for single or series of Bootloaders.

  • Open-Source solution of good spirit and permissive license!



Introduction

Definition: In the world of microcontrollers, a 'Bootloader' is a piece of firmware, residing on the controller, that enables regular firmware updates by way of a common standard interface (e.g. RS232), rather than using platform-specific programming adapters like ISP or JTAG.

Security has become a vital issue these days. For example, a simple password-protected Bootloader could already restrict access to a legitimate group of users. Data content is quite safe inside the AVR-chip, as read-out of data by unauthorised parties may be effectively blocked by setting up the respective Lockbits, possibly supported by physical means of protection. Unfortunately, when firmware update files are handled in an unencrypted form, the firmware distribution channel is prone to interception, compromise, manipulation and transmission errors. Less of a problem in a strict private and local setup. However, in the context of embedded applications, intended for in-field-updating by normal users, one would definitely prefer an idiot-proof and encrypted method for firmware-updates.

Crypto Bootloaders offer comprehensive firmware protection. A crypto Bootloader will only accept encrypted transmissions and decrypt data just before writing into the respective memory segments within the controller. Unencrypted firmware will not appear anywhere but in the secured environment of the developer/publisher and, of course, 'sealed' in target devices. Most crypto Bootloaders use symmetric ciphers, mainly for two reasons; symmetric crypto could be implemented even on resource limited devices and it features clearly defined access rights. Doing it right, the crypto Bootloader can provide for integrity assurance, anti-tampering and/or other legitimate interests regarding firmware update and distribution.

Naturally Bootloader firmware is a real challenge in terms of code-effectiveness, independency, reliability and also transparency. These requirements can hardly be met with any "high-level" approach of programming. When it comes to make a small-footprint and resource-saving Bootloader firmware, Assembler is the silver bullet. More people should have followed the example of TinySafeBoot. Years ago, it has demonstrated the benefits of a straightforward, resource-saving Bootloader firmware for 8-bit-AVRs that came with a lightweight software tool that could make customised Bootloaders out-of-the-box with no dependencies on any specific toolchain.

It's plain to see that more and more users want independent and transparent solutions. A simple and strong crypto Bootloader has been on my wish list for many years. Having seen enough from all those inspiring examples of so-called security Bootloaders, which are wrong in many ways, finally started this little project here.

Top | Index


Idea and Concept

The "One-Way-Loader" integrates the following ideas and concepts:
  1. One-way transmission with added value
    Usually, the bootloader on a microcontroller receives data, that is sent from a computer, and writes this data into certain memory areas of the controller. Transmissions that go in reverse direction, i.e. bootloader reads data from controller memory to send it back to a computer, is rarely seen in usual bootloader scenarios. Actually, such read-back features are not really needed, not even in a developer's environment, since we will have ISP/JTAG there. Just to mention, the thing is called bootloader, not bootsaver... That is to say: Pragmatists and end users should be fine with a write-only-Bootloader.
    So, data will flow just one direction from the computer down to the controller. But why do most bootloaders still demand for full-fledged bidirectional serial interfaces? This is because their bulky protocols want to send back confirmation characters or checksums every now and then!
    The prerequisite of a feedback channel means hardware dependencies and restricts versatility of a bootloader. Many standalone appliances do not even consider RS232/RS485 connectivity. With narrow resources on the ATtiny controller family, it's hard enough to reserve some portlines for a temporary RS232-TTL breakout. Well, there has been a few bootloaders around which offered sort of a One-Wire interfacing, thereby reducing bootloader interface requirements to one single portline. However, the underlying bidirectional protocol demands that the precious i/o reserved for bootloader communications is actually capable of an alternating transmission in both directions, which often excludes this port from other application outside the scope of bootloader sessions.
    What if we didn't need ANY feedback channel? We finally come from One-Wire to One-Way! Such one-way-interface could be established on any single portline that is suitable for data reception, even temporarily. In a one-way-bootloader session, this input line is simply fed with an unidirectional serial data stream. Hardware requirements are reduced to the minimum. In all this, such "one-way-bootloader" would stay a 100 percent compatible to an existing two-wired RS232/RS485 setup, simply using only the unidirectional RXD-line on its own.
    There's more benefits to expect from this all-unidirectional approach. Minimized technical requirements on the serial interface enable pretty simple and reliable updater applications that could be implemented on almost any computer or device that is capable of sending sort of a serial data stream. There are alternative programming channels imaginable that do not even depend on legacy RS232/RS485 subsystems. Best of all: This is no longer fiction.
    Spoiler alarm: Flow control may be dropped completely, if the sender could calculate (and maintain) a decent timing for the serial transmission. Error detection may be hit by cryptographic checksum over whole transmission. Quod erat demonstrandum!

  2. Cryptography provides for data protection, data integrity, authentication and unique device addresses
    Flummoxed readers may be tempted to throw in: "As if it weren't enough with that unidirectional transmission stuff, did he say 'crypto'? This can never work, no way..."
    Well, the correct term was "one way"... Let me assure you, it all works even better with than without crypto!
    For firmware transmissions, there is a zero fault policy. First of all, the method of data transfer shall be designed as reliable as can be. But even under highly controlled technical conditions, the occurrence of random errors could not be excluded in reality. Error correcting codes have been invented to address this issue, but these would massively inflate data volume while still unable to guarantee an absolute error-free transmission. Without any feedback channel, the unidirectional bootloader could not re-request faulty data packets, and of course it can not perform any external verifying. Is one-way transmission a dead-end? Not at all! In fact, all these conventional protocols are simply out of place with respect to a well elaborated one-way approach!
    Now cryptography gets on stage. We gonna get us some reasonable block cipher, apply specific method of key feedback, and build a transmission format of encrypted data that is protected by strong cryptographic checksum (similar to a MAC). With an appropriate algo, the receiver can unilaterally determine integrity of the transmission at whole. If that 'super-checksum' was okay, then the bootloader could immediately hand over to the newly written Application program, since it is error-free with utmost certainty that the checksum mechanism provides. However, if any error has occured, the unidirectional bootloader must promptly erase all faulty data that has been written and somehow give an indirect feedback that the transmission has failed. Then the user has option to start a new transmission, preferably under consolidated technical conditions. This is generally considered good practice in the context of firmware updates, whenever doubts on data integrity have raised.
    In addition to privacy and data integrity, introducing crypto will have another benefit: As strong cryptographic keys are generated from randomness, we automatically get unique device addresses (in the sense of a UUID) for any of such devices equipped with that crypto Bootloader. This will enable individual addressing of Bootloaders and we will no longer have to bother with weak password schemes and collisions in multi-Bootloader-setups.

  3. Compact firmware for most ATtinys and ATmegas (and... who knows)
    Like the predecessor, the new bootloader should run on as many AVR chips as possible. It shall present minimum requirements on the target hardware. Consequently we won't fool around with paralympic exercises in "C". We use Assembler! It has the power to combine signal processing, sequence control and strong crypto to a highly compact yet transparent and portable code. So, THIS bootloader firmware doesn't need a fucking two kilobytes for its amazing functionality... It doesn't even occupy one kilobyte... it gets along with half a kilobyte!

  4. Software tool provides for bootloader generation, crypted data transfers and key management
    Concept of the PC software is a plain commandline tool. This means minimal dependencies and standardised input-output interfaces. A commandline tool will easily integrate into various development-, editor- and scripting environments. Executables may be built for (at least) Linux and Windows systems.
    One main task of a crypto-bootloader's software tool is to generate encrypted firmware transmissions for "authorised targets", i.e. devices for which the user owns the cryptographic keys. The undertaking shall not be more complicated than with unencrypted bootloaders. Nobody would remember strong bootloader passwords these days. A pragmatic and transparent compromise is to use memorable, user-defined clear-names being linked to keyfiles in a secured workstation. Software will search its local database for a referenced target name, then retrieve the associated crypto key and meta data in order to make the encrypted transmission to that respective target.
    The one-way bootloader's software tool should also create customised bootloader firmware out-of-the-box. This has been a very popular option with TSB, and in conjunction with the crypto bootloader it makes even more sense, since there is several more code locations that can be modified automatically. However, there is still option to dive into the fully disclosed source code in order to make a workable bootloader by way of manual assembly.
    Another aspect, rarely considered in those occupational therapies for nerds, was planned from the beginning to be standard feature in the new bootloader's software tool: Semi-automatic generation of a series of bootloaders with individual keys and systematic naming, and, matching perfectly; Export of bulk firmware updates for such a series of bootloaders for distributional purposes. Just in case that someone wants to maintain more than just two or three identical target devices...

  5. Cool naming, cool license
    According to its unique feature of unidirectional secured transmission, the bootloader was named: One-Way-Loader, alternative spelling OneWayLoader or OWL.
    The One-Way-Loader is Open Source, since everything revolves around security and crypto, which implies complete transparency.
    The One-Way-Loader is available under the MIT License. This is mainly for pragmatic reasons. Can't waste my precious time with endless discussions on licensing terms. The permissive and concise MIT License offers legal certainty for all parties involved and allows for a free, voluntary and fair cooperation. So, if you take advantage of using my inventions, please consider a donation or other form of support.



Just try-out: Quickstart


Have a look at the Technical Details to get hardcore insight.



One-Way-Loader's fields of application:
  • Ultimate replacement for 'TSB', already supporting same devices
  • Strong authorisation mechanism (128 bit key)
  • Transparent cryptographic protection for local projects
  • In-field firmware updates using simplified and/or rugged interfaces
  • Using an existing RS232/RS422/RS485 interface
  • Protected update channel for EEPROM and/or Flash data
  • Using minimalistic or covert programming channels, for example air gap opto or soundcard-audio
  • Secured firmware distribution over public channels
  • Protecting delicate embedded systems from third-party attacks



Note: Have you tried the alternative stylesheet available for this and other tech pages at jtxp.org?
Under Firefox, go menu bar "View -> Page Style -> Black on White"


Top | Index


Technical Details

Wise OWL
  1. Timing Trick

  2. Autobauding

  3. Logical Format

  4. Cryptography

  5. Firmware

  6. Software

  7. Hardware Options




OWL Transmission (1) - Timing Trick

Challenge: A bootloader program on a microcontroller can not receive and write data to its nonvolatile memory at the same time. Typically, it has to buffer small chunks of data and prepare this data for the actual write operation. If there is a strong crypto layer in the game, deciphering will cost significant processor time. But the main party stopper is the physical writes into EEPROM or Flash. In this very critical phase, a bootloader's firmware must wait for erase and write cycle to complete, before any new data could be received and processed. That's why an interrupt-optimised data reception is questionable in the context of Bootloaders, since yet it can not guarantee for a continuous flow of data, but likely introduces additional complexity and stability risks.
So, with bootloader transmissions, we always have to consider this "deadtime", and it all boils down to sort of a 'stop-and-go' transfer.

In a minimum protocol, Computer sends small blocks of data to the Controller, allowing it to process, buffer and write to EEPROM or Flash. Having finished the previous write operation, the Controller would immediately send back sort of a confirmation message, signalling that it is ready to accept further data right now. Then the Computer may send further data or commands. Such two-way protocols are doing well, but definitely require some feedback channel. Now, how do we get rid of this annoying dependency?

Solution: Since Microcontrollers are quite deterministic, deadtimes of an operative Bootloader could be calculated in advance, provided that some technical parameters are known to the Sender, who could create sort of a "timed" serial transmission that does not depend on a backchannel for handshake. The Receiver could finish previous operations shortly before the next data packet is about to arrive. Unfortunately, in the reality of modern RS232 implementations (multitasking OS, several layers of abstraction, driver latencies and inconsistencies) it became impossible to send out serial data with exactly defined "pauses" interrupting. What we could rather do: "fill up" the pausing periods with a calculated number of serial characters that carry no data but will of course take a defined amount of time to be sent. And just transmit the whole stuff continuously! The Receiver of such self-timed transmission would have the chance to process previous data, while those "dummy" characters just flow by. However, in the moment, when the Receiver is ready to accept new data, it will re-synchronise to some of the last dummy characters, shortly before the next block of valid data is about to arrive. No flow control and no feedback-channel needed!
The OWL signal is a regular unidirectional RS232 transmission in mode 8-N-1 at chosen baud rate.
The OWL signal may be sent smoothly from any RS232 interface.

  • Only transmits data in one direction, i.e. the Computer ("Sender") to a Microcontroller ("Receiver").
  • Such "Transmission" consists of encrypted payload data blocks to ensure confidentiality and integrity.
  • Considers deadtime of the intended Receiver by filling up with calculated number of stuffing characters.
  • Those "Preambles" constitute for guaranteed minimum intervals between data blocks.
  • Preamble characters enable the Receiver to re-synchronise, autobaud and catch-up.
  • This one-way Transmission does not need any return channel for flow control and error detection.
  • This one-way Transmission presents minimum requirements to RS232 hardware and software.
  • The continuous and encrypted signal is digitally balanced, thus quite robust also in a technical way.
Timing diagram on a typical short OWL Transmission
Timeline of a rather short OWL Transmission for ATtiny25 target running at 10 MHz (4 blocks of EEPROM data, 8 blocks of Flash data, Transmission speed of 9600 baud)
General: The diagram above illustrates timing characteristics of an "OWL Transmission". Different data types have been highlighted for clarity, but actually there is no interruptions of serial data flow. At a glance we can see, that those PREAMBLE runs can have very different size, since these are adjusted to the individual processing- and deadtimes of the respective target controller and its technical set-up.

Block time: Every data block consists of a starting character and 16 bytes of payload data. At given baudrate, the block transmission time is therefore a constant. For example, with 9600 baud, each block will take about 18 milliseconds (tB).

Introductory Preamble: To catch an OWL Transmission and initially synchronise serial data reception, the Receiver must wait for the very first Preamble. Actually, it does not necessarily have to "wait for" that signal. With regard to a manual coordination of Sender's and Receiver's activation, the OWL Preambles provide capability to step into an already running Preamble (see below). So there is option to first start the OWL Transmission with comparably long INTRO PREAMBLE, then activate the target device in serenity.

Block decryption time: Data decryption consumes computation time. At clock frequencies of some MHz, this decryption time (tD) is in the range of a few milliseconds only. Since each block must be decrypted, the decryption time prescribes minimum duration of all Preambles. (To be safe, there should always be some extra Preamble characters to compensate for runtime deviations and to allow the receiver to re-synchronise and autobaud.) Yet, block decryption is not the longest delay that occurs with AVR Bootloaders. Physical writes into EEPROM and Flash eventually cost much more time.

Authentication sequence (S1):
The first three blocks constitute for a cryptographic protocol that safely authenticates the Sender towards the Receiver. As this step takes place only in SRAM memory and does not involve other computationally demanding tasks, minimum Preambles (tD) are sufficient in this sequence.

EEPROM sequence (S2): EEPROM memory could be directly overwritten, but this will take several milliseconds per byte, so that the write time for a block of 16 EEPROM locations will sum up to a whopping of 60 milliseconds (tEW). Fortunately this is only the case for EEPROM data actually to be written. Those comparably long Preambles between EEPROM blocks are plain to see in the diagram.

Flash sequence (S3): A pretty long Preamble of about 180 ms (tFE) is following the first payload block of Flash data. This is because of the Flash Erase cycle that is necessary before Flash may be overwritten with new data (at least on ATtinys). After Flash Erase was complete, the Flash session gets the gear. But you might have noticed that the Preambles between Flash data blocks have slightly different size. This is intentionally due to the organisation of Flash memory in the microcontroller. In this example, the target chip (ATtiny25), features Flash memory pagesize of 32 bytes. Since the cipher determines unified blocksize of 16 bytes, the Bootloader program must aggregate two successive blocks before the next Flash Page Write. Accordingly, the Sender will have to insert the enlarged Preamble that considers Flash Write time (tFW) only after every 2nd block. (Note: Controllers with lots of Flash memory usually have larger pagesizes. On such devices, the Flash Write cycle will only appear with every 4th, 8th or 16th block; vastly optimising Flash Erase and Flash Write times.)

Outroductory Preamble: In some hardware setups it may be necessary to keep the serial channel open for a while after the actual transmission has already finished. Any number of dummy characters could be appended to the serial data stream to extend transmission time accordingly.

Calculations: In order to generate such customised Transmission, the Sender must know some individual properties of the receiving microcontroller platform, such as memory sizes and organisation, average number of processor cycles needed for decryption, absolute clock frequency and, of course, the correct cryptographic key for the intended Receiver. See Software section for further details on how this meta-information is administered.
Sounds good... literally: OWL Transmission, 1 kB of payload, 9600 bits per second transcoded to an audio file

Note: This acoustic sample is just to illustrate the timing character of an OWL-Transmission. It is not exactly the same as the 'OWL-Audio' export format described below.

Top | Index




OWL Transmission (2) - Synchronisation and Autobauding

Sender inserts number of Preamble characters between data blocks to provide accurate guard intervals, enabling the Receiver to process previous data in time.

When the Receiver gets back on-line, last characters of the preceeding Preamble are just zooming by. Now the Receiver has occasion to repeatedly synchronise and calibrate for the actual baudrate of that serial transmission, enabling for a technically robust reception of further data.


Re-synchronisation and autobauding right before re-entry to data reception
Agreements:
  •   Preamble character ($CC, &b11001100) with special bit pattern allows for synchronisation and autobauding.
  •   Blockstart character ($55, &b01010101) marks the end of Preamble and the beginning of the Data Block.

(P(re))-synchronisation:
  • When it waits for the beginning of a Transmission or comes back to an ongoing Transmission, the Receiver will see the momentary signal in one of these states:

    • Low level, i.e. "Startbit" or "0-bitcell", is always waited-out until the line goes High again.
    • High level, i.e. "Idle", "Stopbit "or "1-bitcell", is waited-out until the line goes Low.
    • The instant of such precisely detected Low phase is the starting shot for the first measure cycle.

Autobauding and Frame-Synchronisation:
  • Preamble character features two low phases: 3 and 2 bitcells of runtime
  • Receiver measures the duration of two consecutive low phases.
  • Second value is to be subtracted from the first value that has been measured:
    • Positive result ( 3 - 2 = 1 ) means that both Low phases were captured within one character frame. Receiver is already in-sync and the resulting value can be taken directly for a timing reference.
    • Negative result ( 2 - 3 = -1 ) indicates that these Low phases were captured from separate frames. This means that the Receiver is not yet in-sync. It will skip the next high-low transition then simply start the next two-stage measurement, which provides positive result and leave with in-sync state.
  • The underlying differential method provides for an error-compensating measurement of one bitcell's timing factor.

Data reception:
  • After the Synchro-Autobaud procedure, all consecutive characters may be received by software decoder with a fresh timing reference.
    Now the Receiver:
    • discards further Preamble characters until the Blockstart character was received;
    • decodes and buffers 16 crypted data bytes that follow the Blockstarter;
    • decrypts and processes this data.
  • In the meantime, while Receiver is busy with decryption and write of decryptes data, the Sender will transmit precalculated amount of Preamble characters.
  • As soon as the Receiver gets back on-line, it finds the current Preamble still running. Receiver starts another autobaud cycle, then waits for the next Blockstarter. Full circle.

Goodies:

  • We get all the benefits from a Software UAR(T) with autobauding. This approach does not depend on on-chip UART component and almost every existing I/O port line can be used as a serial data input. Any baudrate in a wide range may be workable. Serial communications reliability is widely independent from the accuracy of processor clock.
  • The procedure is flexible. It can wait for an incoming signal and skip signal stoppage until Timeout. It can also step right into an already running Preamble with extremely low probability of faulty synchronisation.
  • Only active-low phases are measured. Possible "stuttering" of the serial data flow (caused by buffer underrruns) would not affect accuracy or reliability of the measure cycle under decent technical conditions.
  • The procedure is quite robust. Differential measurement can even compensate for asymmetric rising/falling edges of highly distorted signals.
  • Preambles, Blockstarters and Data Blocks are digitally balanced. Assuming continuous transmission (i.e. no buffer underruns would cause longer periods of the idle state), the digital sum value of the OWL signal would balance exactly into the media between Low and High logic level, which is desirable with regards to DC-free or differential transmission methods and generally improves durability on real-world channels.
  • Synchro-Autobauding is repeating before each block of data. This provides for utmost stability of serial software receiver in longer transmissions (minutes), that otherwise could be affected by instable or drifting clock frequencies.
  • Wide baudrate window. The current programming makes good use of the 16-bit counters so that a wide range of valid baudrates is available. See Table.
  • Synchro and Autobauding get along with 1 up to 2.5 of Preamble characters. Preambles may be calculated 'on the edge' when clock frequencies of narrow tolerance are in use.
Note: For the purpose of this documentation, all RS232 signals are depicted in the same unipolar logic that the microcontroller's UART would normally expect (e.g. coming from MAX232 or FT232). The logical "1" (stopbit or idle) is identified by a logical High (3.3 or 5 volts), a logical "0" corresponds to the Low (0 volts). Just to mention, the OWL Firmware can also be configured for inverse signalling. In some cases, this could simplify the hardware interface even more.

Performance

The table below gives an overview of what is feasible with the Synchro-Autobaud procedure described. In a testing set-up with ATtiny2313-20PU and MAX232 at 5 volts, many different controller clock frequencies and baudrates have been tried out. Test-Transmission contained data samples of 2 x 64 bytes for the EEPROM and a simple LED-flashing programme for the Flash, the latter been filled up with approximately 1 KB of random data to challenge data integrity and also testing some other aspects.
Before every trial, the AVR chip has been fully erased from ISP, to exclude false positive results.

Evaluation: If the LED started flashing immediately after the transmission was complete, the entire transmission must have passed through completely and error-free. In all other cases, the attempt has apparently failed.

Viable baudrates at different clock frequencies

Clock (kHz)
Baud Min. Baud Max.
16
< 30
100
128
30
450
500
50
900
1000
100
1800
2000
200
3600
3000
300
4800
3560
450
7200
4000
450
7200
4433
450
9600
6000
450
14400
8000
600
14400
10000
600
19200
12000
1200
28800
14745
1800
38400
16000
2400
38400
17734
2400
57600
24000
3600
76800
27256
4800
115200
30000
9600
115200

Remarks:
  1. This table is not applicable to TSB or similar Bootloaders. OWL has a quite wide range of workable baudrates!
  2. Rule of thumb on a safe baudrate is: Clock/10 < Baudrate < Clock * 2
  3. With an interface delivering distorted logic levels, the maximum theoretical baudrate possibly don't work, but the next lower baudrate.
  4. The lower limit (Baud Minimum) is determined by the fact that the programme uses a 16-bit counter for pulse width measurement; with a too slow signal, the counter will simply overflow.
  5. The upper limit (Baud Maximum) is the result of increasing timing inaccuracy with faster signal; then measurement won't deliver precise enough readings to reconstruct sufficient bitcell timing for such high baudrate.
  6. Clock frequencies above 24 MHz were injected from an external oscillator, as no special circuit for overtone chrystal was in place (and actually, the respective ATtiny's specification was only up to 20 MHz...)
  7. Only standard baud rates have been used in the test. With USB-COM-converter (like FT232), also non-standard baudrates may be viable. The Autobauding method and Software-UART does not depend on standard baudrates.

Top | Index



OWL Transmission (3) - Logical Format

Preambles implement tailor-made timing on the transport layer, thus eliminate the need for a feedback channel with regards to flow-control. Preambles also provide the Receiver with a reference signal for repeated Synchronisation and Autobauding.

These technical properties do not concern the logical format of an OWL Transmission. On the crypto layer, only the data blocks of 16 bytes are relevant. Their purpose is to transport payload data and provide sequence control.

All blocks are encrypted with RST. By the special key-feedback-mode, each block of data becomes cryptographically dependent on all previous blocks. This is the foundation of a reliable error detection mechanism that covers all transmit data at whole. Thus, the legitimate receiver can safely determine whether all data in a logical section ("RST Sequence") has been completely and correctly transmitted.

An OWL Transmission consists of 3 consecutive RST sequences. These will transmit, always in the same order, Authentication, EEPROM and Flash data. In the OWL-variant of RST, the key generator is never reset in the course of a Transmission. Therefore, the sequences S1, S2 and S3 become cryptographically dependent on each other respectively, and constitute for a forgery-safe cryptogram that provides both; sort of a sequence control and a highly reliable over-all error detection.


OWL Transmission's logical cycle (see also diagram):
  • S1: Authentication
    • IV = random data, modifies original key for successive rounds
    • No data = Blockade!
    • One data block = used for key feedback as an additional entropy boost
    • VI not recognized = Blockade without Timeout!
    • VI recognized = proceed to S2
  • S2: EEPROM data
    • IV = random data, modify key state after S1
    • No data blocks, direct VI = proceed to S3
    • Data blocks = bundle for EEPROM write
    • VI not recognized / Timeout = Error, freeze in Blockade state.
    • VI recognized = proceed to S3
  • S3: Flash data
    • IV = random data, modify key state after S2
    • No data block, direct VI = Finished, immediately start Application firmware!
    • At least one data block = initiate Flash Erase, collect block data for Flash Page Writes
    • VI not recognized / Timeout = Error, Flash Erase!
    • VI recognized = Success, all transmission was error-free, start Application firmware!

Normal case: Sender has applied correct crypto key and Transmission came through without major disturbance. Receiver has managed to decrypt each single block and finish each sequence correctly. Finally it has detected the last VI of S3. At this very moment, it is quite clear that all previous data must have been complete and error-free. Whole session was successful and the OWL Firmware passes control to an existing or newly written Application firmware. Indirect feedback: Immediate start of the Application.

Exceptional cases: Transmission errors; signal interruptions, excessive pauses (Timeout); failed autobauding or synchronisation; accidental use of a wrong key; intentional use of a wrong key (addressing multiple Receivers on the same line); crude manipulation attempts... All of these will be recognized by the Receiver as an error condition. Depending on which sequence currently is in, the Receiver will take appropriate action. Indirect feedback: Blockade until hardware reset.

Remarks: It has to be pointed out, that this elaborate format for a Bootloader transmission comes without control characters and commands. It does not need header blocks nor other metadata that could give rise for a known-plaintext attack. Au contraire, there is additional entropy injected with every new sequence. Albeit this comparably simple and static format, accessing EEPROM or Flash remains optional, since it is possible to write only EEPROM or only Flash contents by skipping the other memory section respectively. The OWL Transmission can incorporate whole firmware updates consisting of programming content for EEPROM and Flash, into one and the same Transmission or Transmission file.
Top | Index


Cryptography

The One-Way-Loader makes use of a block cipher called RST ("Randomised Substitution-Transposition"). This has been a personal study work on block encryption, yet developed with practical application in mind. Therefore, RST did not only define the bare-metal block cipher; it also included certain method of key vectoring, feedback modes, and a suitable file format. It has to be noted, that RST includes uni-directional error detection features from the beginning. So, it turned out that in comparison to other candidates, RST provides a pretty good compromise between code efficiency and security in a microcontroller environment.


How RST works

For its randomised substitutions and transpositions, the algorithm makes use of primitives such as adding, bitshifting, inverting, and swapping, which are in fact the building blocks of many well-established block ciphers. However, RST does not depend on predefined keys, lookup-tables or sophisticated choice of delta-constants. All arithmetics is controlled by dynamically generated pseudo random vectors.
The respective pseudo random number generator, PRNG, is initially loaded with the secret key (seed) at the beginning of a crypto sequence.
The block algorithm of RST achieves medium to good avalanche with minimum round count. That is, one single "flipped" bit in ciphertext will affect about 25 to 50 percent of the resulting plaintext after decryption. After each block round, the state of the PRNG is modified by decrypted plaintext data, resulting in a massive and irrecoverable error propagation over all consecutive blocks. This is the basis for the system's fault detection and authentication mechanisms.


RST crypto scheme: encryption, decryption, rolling key, key feedback from plaintext, error propagation, integrity check
Continuous key modification and plaintext-keystate feedback in RST encryption/decryption 

RST file format

For PC applications, a logical format had been developed, that allows for an authenticated and cryptographically secure file transfer. It can be proven that the resulting data stream meets all general requirements on a good cryptogram, such as data not being discernable from randomness, no identifyable headers etc. Only when decrypted correctly, the RST sequence unfolds to these 3 sections:
  • IV = block with initialisation vector
  • DATA = block or blocks with message (payload)
  • VI = block with end signature (MAC, here: recurrence of IV)
The first block is the only block that is encrypted by the original key. The first block consists of pure random numbers. By the regular feedback onto the PRNG, all subsequent encryption/decryption is done on the basis of a random session key. On this score, the first block in a RST sequence serves the function of a secret Initialisation Vector or IV.

The IV has yet a second function in RST. Its random pattern will, with utmost certainty, never ever repeat in any of the message blocks. Therefore, the IV may be used as a definite end-of-message marker. I suggest naming this closing block the 'VI' to underline its functional inverse to the IV. All that the legitimate Receiver must do is; copy the very first decrypted RST block to a buffer, then compare every consecutively decrypted block with that VI. This will enable the Receiver to do these qualified decisions:
  • If the Receiver finds that the current decrypted block is different from VI, it is treated like an ordinary data block. However, at this stage, the Receiver can not know if data is actually sound.
  • If the Receiver finds that the current block exactly matches the VI, it knows with utmost certainty, that the Transmission at whole was authentic, complete and finished free of errors.
  • In all other cases, the end of file or Transmission would be reached without ever recognising the VI. This is a general ERROR condition.
So we got reliable error detection instead of error localisation and error correction. In fact, this is no streaming format for a random access. It can't recover from errors occurring within a transmission. RST was conceived for encrypted and authenticated file transfers with an all-or-nothing policy.
On IP-based distribution channels, several measures on the transport layer are in charge to ensure an error-free transmission of binary data contents. Rather than trying to repair a corrupted file, nowadays the obvious alternative is to just retry another download or e-mail delivery.

"OWL-RST"

One-Way-Loader uses a modified 128-bit variant of RST with block and key size unified to 128 bits (16 bytes).
The OWL Transmission consists of three contiguous RST sequences that are cryptographically dependent on each other as the PRNG will never be resetted in the same Bootloader session. Intermediate VIs tell the Bootloader that the current logical sequence was successfully finished with no error and the next sequence could start (= control information). Finally, the very last VI constitutes for a cryptographic over-all checksum on the Transmission at-whole.
With OWL-RST, the PRNG is the Software implementation of a classic LFSR of 128 bits with standard polynomial for a Maximum Length Sequence, filtered by the so-called "Self-Shrinking" function. The latter provides for a significant hardening of the LFSR sequence under cryptographic aspects and introduces high degree of non-linearity in connection with an adequate key feedback mechanism. With OWL-RST, all arithmetic operations are optimised for 8-bit processor architecture. For further details see Firmware.
Since RST is no Feistel scheme, the pseudo-random vectors that control cryptographic transpositions and substitutions must be applied in reverse order to get the inverse of encryption or decryption, respectively. So there is a need to buffer the pseudo random sequence for a complete block round at least on one side, while the counterpart may use those random vectors directly out of the PRNG. Of course, with this unidirectional crypto application, the memory- and code-saving variant has been shifted to the microcontroller.

Open for discussion

Current implementation of "OWL-RST" has plain to see advantages: PRNG keeps on running and each block of data is being encrypted/decrypted from a completely new keyset. The unpleasant results of block ciphers using inadequate feedback modes, could never occur with RST. In other words, RST benefits from the properties of a good stream cipher without adopting its vulnerabilities.
The fact that there are no lookup tables and complicated transformations involved, allows for memory-optimised and efficient implementation on microcontrollers. Even with the least number of rounds, the RST ciphertext presents good balancing and diffusion properties. Plaintext-to-key-feedback in RST enables to use secret IVs and causes massive error propagation, which is the foundation of its over-all error-detection and message-authentication abilities.

But there is one notable disadvantage with this concept: PRNG must be kept running all the way. Cryptographically strong PRNGs consume significant computation time. This is the main reason why, on the PC, an experimental implementation of "RST128" has performed 3 to 5 times slower in a benchmark with similar implementation of "AES".

Personal conclusions: In many crypto applications, data throughput is not the most important criterion. Regarding AVR Bootloaders, the bottleneck would always be physical writes of programming data and the serial transmission itself. The OWL-RST cipher provides reasonably safe encryption with good statistic properties and a very small memory footprint while essential features of error detection and authentication are "all-inclusive".
Criticism on this "in-house" solution is legitimate, further analysis desirable. I would like to point out that the most critical component with OWL-RST, the pseudo-random generator, has been chosen from well-understood technologies. Even from today's perspective the 128-bit MLS-LFSR with Self-Shrinking constitutes for a cryptographically strong PRNG.
Further, it is assumed, that downstream blockcipher and the IV-VI-mechanism do not give rise to any vulnerabilities but rather impose more complications to key extraction and sequence analysis. I am open to any qualified arguments and suggestions on that topic!

Top | Index



Firmware

The OWL Bootloader's firmware is written in native Assembly language. It uses only the core instruction set, which is supported by almost all 8-bit AVRs. It does not depend on special hardware components, timers or interrupts and is therefore quite portable with only a few modifications. Bootloader firmware is available for more than 120 different ATtinys and ATmegas. Source code is fully disclosed and comprehensively commented. Below, I will give some "high-level" explanation on its functionality.

Features of the OWL Bootloader firmware:
  • Highly flexible code for many 8-bit AVRs
  • Firmware footprint of 512 bytes only (except for few ATmegas)
  • Can use any existing portline for data reception
  • Elaborate signal processing allows for reliable synchronisation and autobauding
  • Wait until Timeout or step into an already running signal
  • Data reception possible in normal or inverted logic
  • Optional output of a control signal in normal or inverted logic ("Dummy-TXD", RS485-TE)
  • 128-bits random key which also serves as a unique device address
  • Secured authentication, decryption and control flow
  • Write access to EEPROM and/or Flash in a single session
  • Safeguarding address range of Flash writes
  • Clearly defined error-handling
  • Can differentiate between reset sources


Memory footprint of the OWL Firmware on ATtiny
Memory footprint of the OWL Firmware on ATmega
Memory footprint on ATtinys...
... and on ATmegas:
  • Bootloader will occupy 512 bytes below Flash End.

  • Invoke: Modified rjmp/jmp by $0000, jumps to BOOTSTART.

  • Bootloader will start the Application's reset-routine after Timeout or having done some Update.

  • INFO TAG is attached to any Flash data stream, giving the Bootloader indication on the desired Timeout (byte) and Reset-Jump of the Application (word).

  • Crypto-Key hard-coded into the Bootloader-Firmware just below Flash-top.
  • Bootloader occupies BOOT FLASH SECTION
    of only 512 Bytes below Flash End.

  • Invoke: BOOT RESET VECTOR (BOOTRST) will call the Bootloader on BOOTSTART with every hardware-reset.

  • Having done its job or after Timeout elapsed, Bootloader will jump to $0000 where regular Application firmware starts.

  • Crypto-Key und Timeout-Byte hard-coded into the Bootloader-Firmware.

  • Bootloader-Section may be protected comprehensively by Lockbits (direct support for Bootloaders on the ATmegas).

Note: Addresses are Bytes.

Invoke: Usually, a Bootloader is to be invoked via Hardware Reset. Calling the Bootloader from a Reset event is the best method from a technical and legal perspective, since it provides clear separation between the spheres of Bootloader and Application Firmware. Hardware Reset may be triggered by the rising edge on the RESET pin of the Controller, but also by Power-On-, Brown-Out- or Watchdog-Event.
OWL FW 2020: New OWL FW provides option to "filter" Reset sources, that is, circumventing Bootloader invoke completely for one or some reset events. See "Reset Sources Filter" on further details.
OWL FW 2023: The OWL FW now provides additional option circumvent Reset sources' filtering in order to allow controlled hand-over from an Application to the Bootloader while still using the regular Bootstart address. For more details, scroll down to paragraph "MCUSR-Null-Startup".


Initialisation: Bootloader is the very first program to start after any hardware reset. It initialises the stackpointer and all registers, ports, and memory that it uses by itself. It is not the business of a Bootloader firmware on a microcontroller to initialise all SRAM and I/O. This is the clear responsibility of Reset routine of the Application respectively.
  • ATtinys: Bootloader searches Application Flash from top to bottom for a so-called INFO TAG, and fetches Timeout byte from there. It then loads the individual crypto key from Flash top into the working registers (R0-R15). After that, the Bootloader starts to listen on the assigned RX-portline. If any signal occurs before Timeout, the Bootloader will try to synchronise with that signal. If nothing happens and Timeout has elapsed, the Bootloader loads jump address of the Application Firmware's reset routine from said INFO TAG to start the Application. For an "empty" Flash (no Application loaded, all bytes $FF), the Bootloader will use the longest Timeout possible and then restart itself. By this periodical restart, the Bootloader remains activated and permanently listens to incoming Transmission without the need to apply repeated hardware resets.

  • ATmegas: Bootloader loads the individual crypto key from Flash top and starts listening for incoming signal. An INFO TAG is not needed on ATmegas, since the Timeout byte has been hard-coded to the firmware with ATmega OWL, and Application could always be started at $0000 (i.e. from its unaltered interrupt-table). If a valid OWL Preamble occurs before Timeout, the Bootloader will synchronise and follow further Transmission. If Timeout has elapsed, the Bootloader generally jumps to address $0000 to start an Application, if present. In case of an empty Flash, the program counter will swiftly run through all Flash address space until it reaches Bootloader's entry address, periodically restarting the Bootloader, which is therefore kept permanently accessible on ATmegas with empty Flash, too.

Synchro-Autobauding:
Bootloader waits for a level change on the assigned RX-portline until Timeout. First High-Low-transition is always discarded to allow the signal to stabilise. Subsequent level changes are measured by the Synchro-Autobaud method described before. If the incoming signal is actually an OWL Preamble of sufficient quality, synchronisation will succeed and the Bootloader has gained precise timing reference for software-based serial data reception to follow. Yet the Synchro-Autobaud cycle is repeated before each individual block of data; this makes OWL Transmissions pretty immune to a drifting of clock frequencies at both sides, transmitting and receiving end.


Block data reception: After successful Synchro-Autobauding, incoming serial characters are decoded by "Software-UAR(T)". Bootloader will discard further Preamble characters, until the Blockstart indicator is received. The 16 bytes following this Blockstarter are the encrypted data of interest. It is buffered in SRAM for further processing, i.e. decryption.


Decrypt block data: Bootloader gets to ignore the beginning of the next Preamble, because now it has to decrypt current buffer contents. With least number of rounds, the block cipher will fetch 64 of pseudo random vectors from the PRNG for decryption. Decrypted data is available then in the same 16 locations of the SRAM buffer.
Immediately after decryption, block data is being XORed to PRNG state registers (R0-R15). Entropy of an IV-block or payload data will therefore modify all subsequent decryption.
If the just decrypted block is the very first block of an RST sequence, it is copied to a second SRAM buffer as the "VI".
All blocks subsequently decrypted are then compared to that VI (see annotations on Cryptosystem). As long as the current block was not equal to the VI, the Bootloader is assuming that the respective block was regular write data and it will process this data accordingly.
If the program finds, that the current block is identical to the VI, it knows that all previous blocks have been processed error-free and the current sequence has finished with success. Then the Bootloader can procede to the next sequence, or it has finished the whole session successfully.
However, if any error has occured, the VI is never recognized. After the Transmission has ended and Timeout elapsed, the Bootloader will have to auto-erase Flash data that has already been touched, and otherwise fall into a blocking state, thus giving indirect feedback to the user on the failure of this Transmission (see 'General error handling').
Programming: Entire block decryption routine is about 50 machine operations including the PRNG. Block comparison plus XOR feedback was combined to a single loop structure, comprising of only 12 opcodes. (Try this in a high-level kindergarten ... )


Authentication sequence (S1): Bootloader must be sure that an incoming Transmission was actually encrypted with the secret key of the Bootloader and no other key. Faulty Transmissions or Transmissions that were generated with a wrong key must be rejected before any write access to EEPROM or Flash memory could be allowed. A safe mechanism of Authentication towards the Bootloader is provided by sending an initial RST sequence of fixed length that will not contain payload data, but by way of the IV-VI-mechanism, enables the Bootloader to safely check that the right key was in. From the match of IV=VI in this initial RST-sequence the Bootloader knows with utmost certainty, that the Sender has actually used the correct key for encryption and it can progress to the next sequence, which is EEPROM sequence S2.
In all other cases, the Transmission must have been faulty or the Sender was using a mismatching key. Then the Bootloader will fall into a blocking state, which can only be overcome by hardware reset.
This consequent "blockade" behaviour allows operating multiple Bootloaders with different keys and different technical requirements on a common programming line ("one-way-bus"), making sure that per Reset cycle only the one Bootloader that has actually been addressed will follow that Transmission, while all the non-addressed Bootloaders on the same bus will safely turn into the blockade state. That means, all Bootloaders not been addressed by the current crypto Transmission stay passive on the bus and no write access and no uncontrollable start of an Application firmware will happen on the respective controllers.


EEPROM sequence (S2): Bootloader decrypts and copies the IV of the EEPROM sequence. If there are data blocks following the IV, data is written in portions of 16 bytes to the EEPROM memory using Atomic Write Mode. If the EEPROM sequence did not contain data (i.e. IV immediately followed by VI), no single EEPROM location is overwritten.
Note 1: On EEPROM, no erase cycle is needed and data can simply be overwritten. EEPROM contents on higher address range is NOT erased by an EEPROM sequence does simply not touch these locations. This behaviour can be utilized to perform sort of incremental EEPROM updates.
Note 2: To explicitly "delete" entire EEPROM, the Sender has to transmit an EEPROM sequence that will overwrite all EEPROM locations.
Note 3: EEPROM writes do not require any precautions against address overflow. In a well-formed Transmission, of course there is no more EEPROM data sent than the Target controller can take. An address overflow in EEPROM sequence could only happen due to a corrupted Transmission. If this is the case, the VI of EEPROM sequence would never be detected and the Bootloader will stick to the EEPROM write mode and eventually overwrite parts or whole of the EEPROM with garble of data. However, this does not impose an operational risk to the Target device, since EEPROM memory has ten times more write durability than Flash and the EEPROM does not hold executable code. That is why the EEPROM sequence was placed before Flash sequence. It serves as a 'crumple zone' before more critical Flash sequence.
When Timeout occurs in EEPROM sequence, the Bootloader will fall into a blocking state, giving indirect feedback that something went wrong.
If the concluding VI of EEPROM sequence has been detected, the EEPROM sequence S2 is successfully finished and the Bootloader passes over to the final Flash sequence S3.


Flash sequence (S3): Bootloader decrypts and copies the IV of the Flash sequence. If the block directly following the IV is already the VI, i.e. no data blocks in between, the Bootloader knows that there is no Flash at all to be erased and overwritten. Flash will of course be left untouched and the Bootloader session was finished successfully.
If at least one data block was following the IV, the Bootloader must erase the Application Flash before any Flash pages are to be overwritten with that firmware data. As the Flash Erase cycle will take considerable time, the Sender has to calculate a matching Preamble. Flash Erase is performed top-to-bottom for safety reasons on ATtinys. For Flash writes, the Bootloader will buffer incoming Flash data to the Flash write buffer, then trigger Page Write to the current Flash page.
Based on 16-byte-units, the Flash write routine is quite future-proof. It could work up to a pagesize of 4096 bytes... (largest Flash Pages currently seen on ATmegas is 256 bytes.)
An explicit verification of write data seems completely dispensable with OWL. Many years of experience and feedback on 'TSB' have shown that faulty Flash writes did never occur, as soon as the data has made it up to the write buffer and operating conditions were stable enough at least for the duration of the actual Flash write operation. Requirements aren't that hard and any errors on the physical/transport layer would be safely detected by the OWL crypto.
When the last VI from S3 was detected, all data that was just written is supposed to be error-free, and the Bootloader session was successfull at all.
However, if no VI has been detected, and the Bootloader timed-out, it will go into the blocking state, giving the indirect feedback that something has gone terribly wrong with this session ... Should a faulty Flash session have already touched Flash memory, the Bootloader will trigger an emergency erase of all Flash (on ATtiny) to remove executable code that may be corrupted.


Transmission successfull, hand-over to Application firmware: After successful completion of the third sequence, the Bootloader will almost immediately pass to the Application firmware that might have been updated or left untouched.
  • On ATtiny, Bootloader must re-search for the INFO TAG that could have been rewritten and relocated in the course of an Application firmware update. Bootloader then jumps to the referenced address of the Reset routine and thus starts the Application firmware.
  • On ATmega, the Bootloader has to restore access to the RWW memory area (= Application Flash) first of all. Then it simply jumps to the address $0000 where the Application would have been started anyway.
So, with an application that has a distinctive reset behaviour, preferably some LED flashing, beep code or display message, the user will get indirect but clearly noticeable feedback on the success of the Transmission.


General error handling: Failure could happen even before any data has been transmitted. In certain minimalistic hardware designs, a cold start (device just plugged in, power to be hard-switched) could cause an extended period of low or undefined voltage level on peripherals of the controller, while Brown-Out Detector has already released programme execution as from the controller's perspective, supply voltage has sufficiently stabilised. If a Bootloader starts listening on a portline for incoming signal, it may become confused and possibly "crash" as a result of those invalid/insufficient conditions. (As occasionally seen with TSB in certain USB devices.)
Start-up behaviour of the One-Way-Loader is quite favourable in this respect. A bit of "garble" on the line, found immediately after the reset event, is ignored and the Bootloader would simply time-out and hand-over to the Application. Starting the Application firmware has priority over catching any Transmission.
Only after the initial Authentication sequence was successfully verified, the Bootloader will assume that this is indeed a valid Transmission. Either everything was complete and error-free and within Timeout limits, then the Application will be started immediately after Transmission has ended; OR there had been one or more errors or Transmission timed-out, then the Bootloader will go into the blocking state and the user will get an indirect but clear feedback on the failure of this Transmission.


Timeout timing: Bootloader's Timeout in real-world-scenarios should allow for delays of a few 1/100 seconds up to some seconds. The AVR Controllers could work in a wide range of clock rates from 128 kHz up to 25 MHz roughly. To cover this with only one "Timeout byte", it is necessary to calculate an individual prescaler factor, considering actual clock frequency of the controller in order to provide a calibrated timeout subroutine with base unit of 1/100 second. Respective timing factors are then coded directly into the OWL Firmware for that certain Target device in the course of firmware-make. By this, a Timeout byte of "100" will always give a timeout of 1 second, a Timeout byte of only "1" will give 0.01 seconds and the value "255" will result in the largest Timeout of about 2.55 seconds.


PRNG (key generator): PRNG is loaded with the secret key at Bootloader start-up. The internal state of the PRNG will then change with every single transposition-vector being requested and it is also modified by the plaintext of each decrypted block (see diagram on key feedback).
In here, the PRNG is the software implementation of a classic 128-bit LFSR with feedback taps on the bit positions 128, 127, 126 and 121 (Galois-XOR). The output bit sequence goes through the so-called Self-Shrinking Filter. In fact, this constitutes for a cryptographically strong PRNG, actually a *CS*PRNG. Although the decimated SSG LFSR will consume about 3 times more shifting cycles compared to plain LFSR, this is not too much of a disadvantage in the Bootloader Application, since other factors restrict data throughput to a greater extent.
Last but not least, assembler code of this SSG has been vastly optimised. Those LFSR-typical bitshifts are primarily carried out only on two working registers (instead of 16), and just with each 8th bit-shifting, the remaining 14 registers are directly byte-shifted, which saves plenty of clock cycles.
This variant of SSG-LFSR is indeed one of the most cycle- and code-efficient PRNG implementations available for 8-bit MCUs.


Port limits: Present OWL Firmware can make use of all I/O ports directly accessible by assembly instructions cbi, sbi, sbic, sbis. Some rather exotic devices also feature a "PORTG" or "PORTH" whoose I/O addresses are located in higher I/O memory. These can only be addressed by SRAM-I/O commands, whoose opcodes have slower timing and consume more Flash. Therefore, it is not planned to support these extraordinary ports from OWL Firmware.


Code flexibility:
Compared to TSB, the Assembly source of the OWL Firmware is more tightened and clearly structured. It also depends on less conditional assembly, since the OWL Transmission's logical format is basically the same for all devices and timing specialties are covered by the tailored Preambles in Transmission layer. Finally, the One-Way-Loader does not waste a whole Flash page for user data. All Flash, minus 512 bytes for the Bootloader, is available for Application firmware.


Portability: Currently the Assembler source for the OWL Firmware is able to adapt to over 120 ATtinys and ATmegas. Latest extensions cover devices over 64k (see next paragraph). Most of the supported devices have been running successfully with TSB in the past. There is great chance that the wheel does not have to be re-invented regarding some chip's specialties.


Finally! ATmega128x and ATmega256x: OWL Firmware/Software supports the big ones. It keeps up with linear addressing, as this is supposedly the safest and most reliable method of Flash programming in a one-way setup. So, it may take a several minutes to transmit full 128 or even 256 Kilobytes of data into the Controller and patience will be awarded...
But we could apply modified Flash-Erase policy on the ATmegas to make things more comfortable. On the ATtinys that do not support protected Bootloader Section, it was advisable to pre-erase whole Application Flash memory before new contents could be written, and even more important, to immediately erase Flash memory in case of an unrecoverable error in the Flash-related part of a Bootloader session. Otherwise defective code in Flash could have been executed and possibly destroy Bootloader code or lock-out the device from further Bootloader access. On ATmegas, the Bootloader won't be affected by corrupted Application code; no compulsive necessity for a "clean" AppFlash!
Which led us to this variant of Flash programming that also provides additional flexibility: Only those Flash Pages that are actually meant to be overwritten will be erased right before the respective Flash-Write occurs. This gives us an option to reserve higher address space for parts of an Application that will never or seldomly change (e.g. libraries, tables). Application firmware updates will start from $00000 as usual. It may be updated more frequently by way of the Bootloader, but as long as the Flash programming sequence S3 does not reach higher address range, any code that is residing up there will be left untouched.
With comparably high clock frequencies, it is of course possible to speed-up the whole thing, especially since the bigger ATmegas usually have large pagesize (up to 256 bytes), so that the delay caused by physical Page-Erase and Page-Write operations is comparably small.


Watchdog support: As of 02/2019 the OWL Firmware is compatible to Applications that make intensive use of the AVR Watchdog-Reset feature. WD-Resets are intended to restart an Application Firmware after expiration of the WD timer interval, even if such Application has been stuck somewhere in an endless loop and other escape strategies have failed. In a Bootloader setup, of course the Bootloader's reset routine would be started before an Application's reset routine could handle this condition. The Bootloader must therefore consider WD status, since otherwise it may be restarted over and over due to repeated WD-timer-events, resulting in a 'bricked' Application. Yer olde TinySafeBoot simply forwarded to the Application, when detecting WD reset condition. This occured to be a safe method, yet it rendered the Bootloader next to useless in WD scenarios with Resets permanently activated from WDTON fuse. One-Way-Loader finally faces the problem: It tries to turn off the WD first of all. Just in case that this don't work, OWL reconfigures the WD-Prescaler to a maximum (2 or 8 seconds, depends) and integrates regular watchdog timer resets in its program flow (opcode wdr). With enlarged Prescaler interval the WD can never interfere with an OWL session. Once the Bootloader has returned control to the Application firmware, it is the responsibility of the Application's Reset/Init routines to properly reconfigure the Watchdog. Therefore OWL will leave the MCUSR register unaltered, enabling the Application firmware to check Reset conditions and act accordingly. Also the general error condition (Blockade) is secured against WD-Resets. This is to ensure continued compatibility of the new WD-friendly Firmware with hardware setups that operate multiple OWL-featured AVR controllers on the same serial programming line.


Reset Sources filter (since 08/2020): Now the Bootloader can act upon different MCU reset conditions, namely 'Watchdog', 'Brown Out', 'External' and 'Power-on'. Depending on which event has actually triggered the Controller's reset, OWL will either activate (do its Bootloader job and listen to RX until Timeout) or swiftly forward to start the Application (without notable delay).
This feature is supposed to improve compatibility in special setups and/or to Applications that rely heavily on a precise reset timing. It could also help with hardware environments that would occasionally issue reset pulses, to which the bootloader should never react.
Technically, the Firmware reads MCU Status Register and filters for 'reset source' flags WDRF|BORF|EXTRF|PORF by way of simple logical AND and branch instruction.
So, in Firmware Make Mode, the user has option to define the respective bitmask by way of a new argument '--resets=' or '-rs='. For example: '-rs=1110' means 'Watchdog', 'BrownOut' and 'External' resets are allowed to start the Bootloader, while 'PowerOn' resets would immediately forward to the Application. This can make the Bootloader nearly "invisible" regarding certain Reset events. Alternative syntax uses letters W, B, E, P (in arbitrary order) to represent allowable reset sources. Both notations are printed in Target Data listing for clarity. Default setting is, of course, NO Reset Source filter applied (equal to '-rs=1111', allowing all reset sources to activate the Bootloader, which is then a 100% compatible to reset behaviour of older OWL FW.
Please be aware that this new RS feature does not configure anything regarding the actual Reset behaviour of the Controller. It just configures how the Bootloader reacts on certain Reset Sources. In particular, the new OWL Firmware does not modify MCUSR. It remains sole responsibility of an Application Firmware to evaluate reset conditions and clear certain reset flags, when appropriate. Refer to the datasheets and other technical advice on AVR reset mechanisms.


MCUSR-Null-Startup (new 04/2023): For a smooth hand-over from an Application firmware to the Bootloader, sort of a standardised procedure would be desirable. It should use the regular BOOTSTART address, which can be seen as a version-independent constant (since OWL does not provide a jumptable for self-evident reasons).
Solution: The condition of MCUSR = 0 is recognised before RS-bits are checked. The new OWL Firmware will instantly forward to the Bootloader (ie. wait for signal until Timeout), if this exceptional condition applies. This enables for a safe start of OWL from an Application which has cleared MCUSR and jumps to BOOTSTART. Since the new feature does not interfere with OWL's behaviour regarding "natural" Reset conditions, these can still be differentiated by Reset Sources filter, as described above. However, the added opcodes will cost 4 bytes more. The feature fits into most ATmegas and ATtinys, but not all. Have a look at changelog.txt to see the list of currently unsupported devices. It has to be pointed out, that apart from that, all devices have full functionality of current OWL Firmware.


Fuses and Lockbits:

Generally, the Controller must meet certain requirements for the operation of a Bootloader.

ATtinys:
  • Activate SELFPRGEN (otherwise the loader can not write any data to memory!).
  • Enable BODLEVEL and adjust to actual supply voltage. This is vital to prevent Flash corruption in a real-world setup.
  • Lockbits Mode 3 prevents the sniffing or modification of memory contents via the ISP/JTAG.

ATmegas:
  • Enable BOOTRST to activate the Bootloader invoke by hardware reset.
  • Enable BODEN + BODLEVEL prevents Flash corruption with unstable operating voltage.
  • BOOTSZ = 10 or BOOTSZ = 11 to use a Bootloader section of only 512 bytes (256 words).
  • BLB in Mode 2 or 3 protects Bootloader section from uncontrolled write access (immortalises Bootloader)
  • Lockbits in Mode 3 prevents the sniffing or modification of memory contents via ISP/JTAG.

Compatibility Requirements, Precautions, Good Practice:

  • Avoid dependencies. Bootloader firmware shall not depend on any other firmware. Otherwise it could hardly load an initial Application firmware to an otherwise "empty" controller. Yes, it would be possible to call Bootloader routines through an Application, but we should withstand the temptation to do so. Too deep linking between Application and Bootloader will likely lead to technical problems and invite legal trouble, if at least one Software component has been released under different, perhaps more restrictive, licensing. It appears safest to simply avoid any dependency between Application and Bootloader firmware and rather separate them as clearly as possible.

  • Application and Bootloader firmware must not occupy more space than is actually available...  Don't worry, if it doesn't fit in, the Software will tell you.

  • General precautions for ATtinys: An application should do without Flash write operations, preferably, since it could possibly damage Bootloader code or leak key data from the Bootloader!

  • General precautions for ATmegas: Whole Bootloader section may be protected by Fusebits against any unwanted access, making Bootloaders installed on ATmegas quite safe with no further precautions necessary.

Top | Index




Software

The OWL Software is a Command Line Utility for the PC platform and has recently been re-written in plain C99. All sources are fully disclosed. Executables for Windows (32 bit) and Linux (32/64 bit) are available. The current OWL SW can do the following:
  • Make single Bootloaders with custom ports and random keys
  • Make serial Bootloaders with custom ports and random keys
  • Provide information about supported controller models and their hardware options
  • Provide information about self-generated Bootloaders and their configuration
  • Manage authenticated Bootloaders, including keys and metadata, via custom project names
  • Sending encrypted firmware updates for Flash and/or EEPROM in one rush ("Transmission")
  • Export of such Transmission to file container or audio file container for distribution purposes
  • Import of encrypted Transmissions to forward to local target device
  • Testing the crypto layer
  • Comprehensive help screens

Philosophy

My hardware/software projects rely on administrative structures as simple as possible, maximum transparency and minimum dependencies. Call me crazy, but I am still convinced, that technology should serve mankind and digital enslavement not being our destiny. Those who use technology self-confidential with competence, those who know the difference between mutual benefits and exploitation, and those who can forego useless stuff, will keep their freedom, have more fun with technology and can effectively protect his/her private life and business secrets.

Now the OWL Software has reached sort of maturity and features wide range of functionality. Therefore, cryptic command strings, illogical pell-mell case-sensitive syntax, weird dependencies, brainf***in' semantics are inevitable ...

No, no, just kiddin'...! The OWL Software introduces a human-friendly command-line parser and meaningful screen messages. For each commandline option, there is long and short notations available, and both of them are quite memorable. Besides, no case sensitive shit. My parser doesn't even care about the order of arguments (latin semantics). All that the user should remember is the name of the options and additional parameters needed to perform the desired task. Context-based on-screen-help will give advice regarding certain options. Who ever has done anything at the commandline, shall be able to use this OWL Software intuitively. I would even dare to say: Unlike certain 'dudes', this OWL commandline tool is ready-to-use without GUI frontend!

Install

The Software is portable, that is, it will run from any location in a supported platform, as long as user has the necessary rights to access and execute. Of course, the executables would also run from external media, for example USB thumbdrives. For an "Installation", simply unzip the downloaded package to the desired location. These folders are being established then:
  • owl
    • templates
    • targets
    • transmissions
The folder hierarchy may be relocated with no problems, since the OWL-Software by itself does not rely on absolute paths.
NOTE: These folders are used by default, when no other path was specified in file references. With newer versions of the OWL Software tool (2020+), the user is free to specify other locations in the context of many functions!

Folder structure

There are 3 data types to be managed in an OWL environment: Bootloader Templates, Bootloader Targets and Bootloader Transmissions. These are organized as files and folders.

Bootloader Templates: ./templates
For each supported AVR controller there is a template machine code of OWL Firmware available that features default ports (B0/B1), default-key ($0011...EEFF) and device-specific code adjustments. Templates are saved as Intel(R) Hexfiles, short "Hexfile". Whenever the OWL Software is requested to create a custom Bootloader, it will search in the folder ./templates for a Hexfile whoose name is matching the submitted device name. Creation of a new Bootloader is done by the proven method that's been invented for the TSB project and further refined for OWL. That is, all modifications are done directly in machine code, making the process independent from external assembler or compiler environment.

Bootloader Firmware: ./targets
After the Software has modified a code Template according to user's preferences and added unique crypto key, the Software will save this OWL Firmware to a Hexfile under the desired Target name (or an auto-generated name) into the folder ./targets. This is a valid Intel Hexfile which can be read by any AVR-ISP software for programming into the target controller. After setting the Fusebits, the new OWL is ready to go.
REMINDER: Target file is the ONLY location for crypto key and meta-data of a Bootloader!
So better not delete, if we actually want to use that Bootloader ...
However, these Target files contain some more info. Meta-data for each Target is appended as commentary lines below the final hex record. While regular ISP Software would simply ignore these additional lines, the OWL Software can parse them in order to get meta-info for the respective Target at one stroke. (Any time, the user has option to rename, copy and move these files deliberately and to modify some of the parameters in the human readably tags. Again, transparency!)

Bootloader Transmissions: ./transmissions
Software can send serial data directly to a serial port, and it will also save a binary identical copy of any such Transmission to a file, named with timestamp and extension .owl normally to the default folder ./transmissions.
The OWL Transmission file container is the format intended for distributional purposes, since it incorporates all timing information and encrypted data of a compatible Bootloader session.
For convenience, the OWL Software would accept an .owl input file and blindly forward to a specified serial port (i.e. without knowing the secret key of the respective Target). Software will know the correct baudrate as it is being tagged to the .owl file.
But it is almost same as easy to just forward the .owl file from the commandline to a serial port that has been preconfigured for the respective mode and baudrate. This will also result in a flawless OWL Transmission.
Example DOS/windows console:
mode COM1 9600,N,8,1 | copy /b transmission0815.owl COM1
Same under Linux:
stty -F /dev/ttyS0 9600 cs8 -cstopb -parenb | cat transmission0815.owl > /dev/ttyS0

Make single Bootloader

When the Software is submitted with a valid AVR device name, it retrieves the corresponding OWL Firmware from Templates folder. Based on this machine code, it generates a customised version of the firmware according to the port assignments and other specifications found in the commandline. Finally, the Hexfile is extended by some meta info and then saved as a Target file into the folder ./targets.
Example:
owl --device=tn2313 --rxport=d0 --txport=d1 --clock=4000 --targetname=Testloader
will produce a single new OWL in the targets folder for the ATtiny2313 that uses PORTD0 for RX-input and PORTD1 for TX-signalling and featuring 4 MHz of clock frequency with naming Testloader00.hex.

Make series of Bootloaders

Just add the option "number" to the respective commandline for Target Make Mode. Software will then generate the specified number of Bootloaders for same hardware configuration, with totally different crypto keys and systematic numbering in the filenames.
Example:
owl --device=tn2313 --rxport=d0 --txport=d1 --targetname=Testloader --number=10

This will produce 10 Targets named:

Testloader00.hex
Testloader01.hex
...
Testloader09.hex

These Bootloaders all feature same technical parameters but individual cryptographic keys. They may be installed onto 10 Target devices all over the world, armored by Lockbits and/or physical means. As long as the respective Target files are kept safe and secret at our site, no one else but us can produce valid Transmissions to the respective Target devices. Customers may only know the Target filename of their device, so they can obtain authorised firmware updates from the provider's download site or by email, by indicating that serial number featuring filename. Serial numbering has no correlation whatsoever to the cryptographic key of a Bootloader, since all keys are normally derived from a random process.

Make Transmission

Software expects valid target name as a reference to the Bootloader for which to generate a valid OWL Transmission. Provided that the respective Target file is found, the program will know its crypto key and all meta info which is needed to calculate an encrypted Transmission with correct timing for this particular device.
Hexfile with write data intended for Flash and/or EEPROM memory of the Target device should be specified.
The serialport argument is only needed if the Transmission shall be sent-out "live", i.e. immediately after the command has been fired. With no serial port specified, the Transmission will go directly into a binary Transmission file (the "OWL Transmission" with extension ".owl") in the folder ./transmissions.
With the new Software, the Transmission is directly written into a file in the order of concatenated crypto sequences as follows:
  1. Authentication sequence (S1): PRNG is loaded with the initial Bootloader key (from Target file). Enlarged Intro Preamble is generated. The first sequence consists only of an IV, a dummy block (random numbers) and the VI. So, the cryptogram for the authentication sequence S1 will always sum up to 3 blocks including IV and VI. The encrypted blocks are provided with minimum Preambles and Block Starters.
  2. EEPROM sequence (S2): The program reads in EEPROM data from specified Hexfile and fills it up to the next blocksize (padding up). EEPROM data is encrypted via RST, featured with leading IV and concluding VI based on the keystate that the previous sequence has left with. Preambles between crypto data blocks that consider the EEPROM block write duration are inserted between blocks. When there is no EEPROM data to be written, EEPROM sequence S2 only consists of an IV and VI block at the distance of minimum Preamble.
  3. Flash sequence (S3): The program reads Flash data from specified Hexfile. Padding is done to full Flash Pages (and, for ATtiny Targets, an INFO TAG is considered). Flash data is encrypted based on the current keystate. Some extended Preambles may be necessary for full Flash Erase operation and/or individual Flash Page writes. An optional Outro Preamble may be added. This sub-string is appended to the Transmission file, also the numeric ASCII-tag of baudrate.

The OWL Transmission can be forwarded directly to a serial port. It's image file (.owl) preserves all encrypted data and timing information that is needed to replay same Transmission later on. This is the ideal format for encrypted distribution of Firmware Updates.


Single Transmission

With a valid serial port specified in Transmission Mode, the Software will send out OWL data stream through referenced interface, using the default baudrate as specified in the Target file.
Example:
owl --targetname=bootloader_m8 --flashfile=program.hex --serialport=COM2
If no serial port was specified, the respective .owl file, which is normally saved to the folder transmissions, can be sent later to a serial port, or sent via network to a different location. Automatic naming scheme will compose of an ISO timestamp and the original Target name.

Serial Transmission

It is possible to address multiple targets in Transmission Mode. If Targets were systematically named or numbered, we can specify their namespace by wildcards, i.e. "?" or "*". Software will find all Target files matching the referenced pattern and automatically generate individual Transmission for each one of them.
Example:
owl --targetname=Bootloader0? --flashfile=program.hex
This would capture all Bootloaders matching the search pattern, i.e. "Bootloader00" to "Bootloader09", and make custom Transmissions for each single Target including the Flash firmware update of "program.hex". Respective Transmission files are consequently saved with systematic naming, derived from the original Target names, into the folder ./transmissions for distribution.

Audio-Export

The OWL Transmission, consisting of Preamble characters and crypted data blocks, is already a quite balanced bitstream. It recommends for trying out some crazily simple, DC-free or even "floating" transmission methods. Soon the idea came up to abuse the PC soundcard as an alternative serial data output!
To keep it short, the option --audioexport has been developed and refined exactly with this intention; transcoding of serial data to a valid PCM file (naming extension .wav), compatible to almost any multimedia-capable platform.
When this OWL Audio is played back over high level, low impedance outputs, the voltage swing is often sufficient to directly drive red/infrared LEDs or Optocouplers. Decoding of this differential signal is possible with very few electronic components. Since the differential audio encoding applies further layer of signal balancing by itself, it could be of interest for the transmission of non-balanced binary data. That's why --audioexport option is no longer restricted to .owl source files. Find more technical explanation below!

Random keys

Assuming that we can make random keys of 128 bits, it is VERY unlikely that two identical keys would ever conflict in the same universe! This improbability drive allows to generate unique device keys locally and use them in a worldwide dimension without the need to check these keys against some central database. This means more freedom and self-determination for users.
Apart from that, random keys will solve most of the problems that we've had before in conjunction with password schemes à la TSB. For example, in a hardware setup with multiple Bootloaders hooked to one common programming line, no conflicts nor chicken-egg problems are to be expected anymore, since every OWL Bootloader will have its individual 128 bits address right from the beginning. Yet, on the access layer, those keys are conveniently linked to memorable Target names. For the authorised user, access to all his Bootloaders is absolutely transparent.
Downside: The generation of good random keys is not that trivial. Oh, we've had that topic before ...

Random-Pooling

Computers, as we all know, are essentially unable to generate random numbers. The so-called random number generators built into modern CPUs are not trustworthy for legion of reasons. For the occasional generation of crypto keys and initialisation vectors, the usual suspects (timer, mouse movements and entropy from the filesystem) are doing quite well.
However, when it comes to serial generation of OWL-Bootloaders, there is lots of good random bytes being requested in a small timeframe. It would be reasonable then to have some 'stock of entropy' at hand that won't get exhausted prematurely.
Therefore, the OWL Software creates and makes use of an own Random Pool file named randpool.bin, which is located in the executable's home directory. In the current version, the RP is fixed to a 512 bytes of random data. It is refreshed from live-entropy (system timer, system random devices) at least once by every program invoke.
In view of a projected mass-generation of Bootloaders (say, more than 100s of keys in a rush), the file randpool.bin could additionally be refreshed by means of an external True Random Number Generator (such as the XR232USB). [For which a new Software tool is also planned.]
In fact, on a Unix/Linux desktop, there is no desperate need for separate random-pooling, as we got mighty kernel services to provide random data, in particular by way of the very convenient virtual device drivers /dev/random resp. /dev/urandom. IMHO a pretty good and independent solution with built-in entropy-estimation and quality-assurance regarding deliverable randomness. Therefore, all OWL SW since 2019 uses /dev/urandom as the primary random source on a Linux machine.
In lack of any such trustable API under 'Windows', the OWL SW still offers an option --randpool to start some separate entropy-collection, which is based mainly on CPU-load fluctuations on the running system and could be seen as an attempt to perform a minimised random-pooling similar to the Linux kernal. In fact, the refined method, as with new versions of OWL SW 2020xxxx, provably does a good job at it. Please refer to the source code in the owlrst-module for further details. Further testing is in progress and feedback strongly appreciated!


Backups

The Software does not make automatic copies. Backups of a complete OWL folder may be easily prepared by means of standard tools, i.e. simply copy the whole contents of the owl-folder including subfolders to an external media. In particular, the Target files, normally located in folder ./targets should be saved on a regular basis.

Security

The Software does not overwrite Target files that already exist.
To prevent accidental erasure of Target files by other applications, crucial Target files may be protected by a read-only flag, if applicable. Of course this is no replacement for a Backup regime!
It is assumed that any person having physical access to the machine with the OWL Software and keys, is authorised to do so. Consequently, the OWL Software does not provide means of an additional access control. For example, a master password scheme has been tested and found quite cumbersome.
The mature user is always aware of security implications. For example, computers or network accounts that hold or process personal or confidential data must not be accessible by unauthorised persons, that's the baseline. Being or feeling exposed to threats of espionage or sabotage, we will positively implement some "secured environment".

Data formats

  • Export:
    • For cross-platform compatibility, all Hexfiles generated for Targets are standardised to 7-bit ASCII with CRLF line feeds.
    • The OWL format for Transmissions is a binary filetype with file extension .owl containing raw serial data. Contents shall be treated like any other binary, i.e. alteration or transcoding is not allowed. However, since OWL Transmissions contain long runs of uniform preamble characters, they can be nicely compressed by plain ZIP or RAR.
    • The OWL Audio Export is a standard WAV container (RIFF header 44.1kHz/2-ch/16-bits LPCM), suitable for platform-independent playback of serial data stream from any decent soundcard, intended to feed minimalist audio-digital interfaces. The uncompressed WAV files usually become quite large, but there is option to convert to any lossless compressed format (e.g. FLAC), or simply squeeze them by ZIP/RAR for transport, which achieves impressive reduction ratio.
    • The file Random-Pool-File (randpool.bin) is a binary file containing random bytes. As it is only maintained on the respective local system, its 'platform-interoperability' is a second-tier.

  • Import: The Software can read Hexfiles with linefeeds in the LF (Unix), CR (Mac) or CRLF (DOS/Windows) format. It should be able to read Hexfile output from various assemblers and compilers, as long as it is actually conforming the Intel Hex standard.

Crypto-Testing

The OWL Software offers some testing functionality regarding the crypto layer.
Key generator (PRNG):    owl --key=Hexstring
Specifying a hex key without any further options triggers testing mode of the PRNG. The PRNG is then loaded with the specified key and continuously clocked.
This function demonstrates properties of the key generator only. Screen output is in Hex. The first line shows initial state (seed) of the PRNG. The following lines represent the raw sequence of the PRNG module, i.e. 4-bit vectors that would normally control the block cipher.
Specifying default key (--key="00112233445566778899aabbccddeeff") or submission of an "empty" key (--key="") will deliver the following sequence:

00112233445566778899AABBCCDDEEFF

0035F2BFEBBC79D7B6FB6E536D14DCA2
8A41FABFDFA8A7CB278D9B93ED144009
4116BBDB07E70257590C1602B2F35DF4
4C932A9D825C6A464896D1173D8F910C
1A121048A968625C3513DA716419F961
9083A7F4853B5D7F2D08C286E12A8008
08620ECC967578F6AEA63B5FB2B2234F
0F5CBDE922983F8961C6BF9B65D75082
...


File encryption:    owl --key=Hexstring --encrypt=Filename.xxx
Encrypts the specified file with specified crypto key. Cipher is RST128, format of the cryptogram is a regular RST sequence consisting of IV, data, VI. Encrypted file is saved to disk under the source file's original name with extension ".raw" appended to the original filename. This file extension would simplify import of crypted data into certain graphics, audio and analysis tools for further investigation.

File decryption:    owl --key=Hexstring --decrypt=Filename.xxx.raw
Decrypts the specified file with specified crypto key. If decryption was successful (IV = VI), the file is saved to disk with a modified name, consisting of timestamp, source file's name and it's restored original naming extension. Therefore, the decrypted file would never overwrite an original source file. Successfully decrypted file is binary identical and have exactly the same length as the original file due to the padding/depadding mechanism in place. However, if decryption has failed, this demo feature of RST128 will yet save the corrupted file to make it available for analytical purposes.

True Random Key:    owl --key=R
When the non-hex letter of 'r' or 'R' is submitted as the single Key argument, a TRUE RANDOM Key of 128 bits is drawn from OWL-RST module and loaded into CSPRNG. The underlying mechanism is the same that normally provides fresh Keys and IVs in the context of Target Make and RST-encrypted Transmission Modes. Again, the lines that follow are deterministic raw output of the CSPRNG.

Virtual machines

Various constellation with VirtualBox 5.XX have been tested for fun, and it actually was fun, since there were no problems at all with the OWL Software, compiled for the respective Guest machine to run under different Host systems, as soon as the Guest has been granted access to a serial interface on the Host. Most combinations of WinXP/Win7/Ubuntu14/Debian8 ran smoothly. In general, we should not expect best performance in a VM, especially regarding screen output and interface connections! In all setups, the additional abstraction layer caused significant lag on the RS232 transactions. Data sent out was often "stuttering". However, no single character got lost and ALL Transmissions could be decrypted with no problem. "Emergency operation" of the OWL Software in a VM under a different operating system - check!
Additional notes on VMs: REFRAIN FROM UPGRADING TO VIRTUALBOX 6.xx! Under recent Linux, I have experienced what many people have already reported on the web: Windows Guest systems repeatedly crash or freeze when trying to access serial ports of the host system. In fact, this has never been an issue with VirtualBox 5 or lower.
While VM is running that was assigned one or more serial ports, those ports may be BLOCKED on the Host system and/or undefined behaviour could occur when software yet tries to access those ports.

To-Do's and Bugfixes

Surely the new version of the Software tool will be improved, debugged and optimized even more. Do not hesitate to send me your ideas, reports and criticism! You might also have a look at the changelog.txt which is part of the download package and gives explanation on many programming why's and how's. Anyway: Thanks for your ongoing feedback!


Top | Index


Hardware options

  • Target platforms that feature RS232, RS422, RS485 or USB-RS232 connectivity for an AVR. Device would normally communicate to a PC or terminal. Same interface could be used by the Bootloader. For the One-Way-Loader this means:

    • OWL may use the portline normally associated with RXD in an existing RS232 hardware setup for its own data reception. This will enable firmware updates by the existing interface connection without physically opening device and without the need for special programming adaptor. A very convenient option from an end-user's perspective.
      Basically, the OWL will only need the RXD line for data reception, since it does not send back any data. Yet, in a two-wire-setup, from the moment of a hardware Reset, the TXD-associated portline would be left with high-impedance state due to the MCU's coldstart logic. Normally it is the duty of an Application firmware to initialise i/o ports that could otherwise lead to an undefined/unfavourable state of peripherals. In a Bootloader scenario dealing with such two-way serial interfacing, it is recommendable to also consider the TXD-associated portline. OWL can initialise any additional portline to active output state and apply some static logical level to it ("TX dummy port").

    • RS422/RS485 line drivers (SN75176-alike) normally require sort of a control signal (transmit-enable, TE) that is to change data direction on the bidirectional differential bus. Normally, such signal is provided on a dedicated portline of the controller by the Application firmware that is supposed to do RS485 communication. The unidirectional Bootloader that also uses same RS485 interface will have to make sure that for the time of Bootloader session, the interface chip is constantly set-up for receiving-data-direction. OWL can easily provide such signal. Just assign the TE-associated portline in OWL configuration in order to generate a static noninverted (high) or inverted (low) output level (depends on the logic of interface circuit) for the duration of an OWL session.

  • Target platform without RS232 hardware: These are the most interesting fields of application in my opinion. Finally, standalone appliances that normally wouldn't deal with any RS232 or USB, could be featured with a strong crypto Bootloader. An unidirectional transmission with minimized hardware requirements offers plenty of opportunities for simple and/or rugged interfacing:

    • Direct connection: We can reserve any existing portline on the microcontroller for OWL data reception. Connection to the outside may be provided by appropriate connectors. An existing One-Wire adaptor (CI-V-interface) could directly connect to such minimalist interface.
      Further options of direct electrical connection: We may take the TXD-TTL signal directly coming from a USB-COM-converter (FT232, PL2303) and connect to the respective input port on the MCU. The ultra simple unidirectional variant; directly use TXD from genuine RS232 (i.e. -12V / 12V), limit to TTL levels by series resistor and Zener! This will lead to an "inverted logic" - but OWL can be configured for inverted RXD as well!

    • Optocouplers: Standard optocouplers, such as the PC817, enable unidirectional serial data transmission with electrical isolation and easy option for signal inversion. The phototransistor connects to the respective RXD-portline. Pullup resistor of about some kiloohms will improve signal steepness. On the outside, the coupler's LED connects to the TXD line of the serial interface. See circuit samples.

    • Air gap opto: The open-air variant of an optocoupler. At the microcontroller's part, the receiver is a phototransistor that connects directly to the RX-assigned portline, probably with a pull-up resistor attached, but and nothing more. With sufficient intensity of red or infrared light being detected, the port will be pulled-down to a logical low, while otherwise idling in logical high state. The transmitter on the other side is a bright LED, that could be directly driven by the TXD-signal of a serial port or FT232 converter. This constitutes for a very educational data transmission with undoubted electric isolation features and quite interesting operational pproperties. This variant has been tested under various conditions. Thanks to the robust signal processing in the OWL firmware, reception of serial data works pretty well, provided that there is no direct sunlight or other intensive infrared source nearby. Of course, this air gap opto could be optimised by using some more effective infrared LED and infrared filtered phototransistor, but ain't that cool... Air gap opto, a cheap, minimalist and extremely rugged programming interface.



Light saber wirelessly programming an OWL armored device

  • Capacitive/Magnetic couplers: Continuous OWL Transmission is a perfectly balanced bitstream, which is suitable for transmission over DC-free capacitive or inductive transducers without compulsive need for added modulation scheme. Proof of concept was delivered regarding bandwidth-limited and/or resonant channels, preferably in the middle baud range. (However, when galvanic isolation is the primary goal, an opto-based solution should be favoured.)

  • PC-Audio to OWL-Receiver: The OWL Software provides feature to convert an OWL Transmission to digital audio, either directly in Transmission Mode, or retroactively in Transfer Mode. Audio Export provides for an alternative channel of sending out serial data, for example from toy devices such as laptops, smartphones and tablets, which surely do not feature RS232, but likely some Headphones connector. With OWL Audio, the serial data stream is resampled and spread to both stereo channels by means of a bit-synchronous phase modulation with a frame polarity keying compound. As a result, both channels are quite well balanced with regards to their digital sum value, and feature spectral properties comparable to a medium-speed PSK modem. Output amplifiers of L and R stereo channels are used in a push-pull-manner, so that the resulting voltage shift is doubled (in comparison to a single channel with reference to GND). This enables to drive IR-LEDs directly from the headphone terminals of a normal soundcard. The matching counterpart is a so-called 'AC-optocoupler', such as the PC814 (= cheapo standard component!), which on its transmitting side features two antiparallel IR-LEDs, which are used here to restore the unsigned L-R differential the easiest way imaginable. In fact, restored data on phototransistor is 'automagically' in-phase, regardless of signal polarity at the soundcard terminals, modest bias, phase-shift or imbalance between left and right audio channel. Have a look at this disturbingly simple circuit and be amazed. Or amused. Whatever. It works!

    AND HERE'S THE OBLIGATORY "HEALTH AND SAFETY" WARNING: Since OWL Audio may require to push up the volume to the max, be careful when re-plugging your headphones ...!


Simplest unidirectional RS232-Interface for the One-Way-Loader. Note: Inverted signal logic Simplest unidirectional RS232 transfer to a microcontroller.
Simple & Safe One-Way-Interface:
TXD drives an ordinary 4-DIP Optocoupler.

Simplest unidirectional data transfer from classic RS232 to the microcontroller via optocoupler
Simple unidirectional data transfer from classic RS232 via Optocoupler
Simple unidirectional data transfer from classic RS232 via LED and phototransistor on the controller's side

Air gap TXD via LED
.
Air gap TXD using FT232
Circuit suggesttion to drive a transmit-LED from an FT232 (USB-VCP) that will result in the normal logic on the receiving side as in the examples before.
Simple & safe audio interface for differentially encoded stereo transmission
Simplest interface for unidirectional and failsafe data transfer from a high-level soundcard output to a microcontroller





Top | Index


Remarks

Crypto contest for the One-Way-Loader

For the crypto layer, only a few candidates seemed viable with regards to the tight restrictions on memory and computing power on the intended target platforms. When the Bootloader project became more and more advanced, paired sample code for AES, XTEA, a simple XOR cipher and "RST", an in-house development, was available in AVR and PC programming. Intensive testing, comparative statistics and pragmatic consideration led to the conclusion that RST made the grade:

Algorithm
AVR implementation
Pro's
Con's
AES
(Rijndael)
  • Optimised AES 128 bits for AVR-ATmegas is available
    "Rijndael Furious"
  • Code: 1570 bytes
    (block cipher only!)
  • Clocks per block: 2700...3500
  • Reference: http://point-at-infinity.org/avraes/
  • Will perform AES-128 according to specs and thus could be "certified"
  • AES offers good and well understood statistic properties regarding diffusion/confusion
  • Comparably fast on AVR-controllers (Assembler)
  • Different variants of key feedback that work with regular AES could be implemented.
  • Further code required for CRC and Key-Feedback!
  • Large memory footprint (lookup tables, S-boxes)
  • Only ATmegas
  • Decryption slightly slower than encryption
  • Overshot with Bootloader-application
  • Licensing fee for commercial application
XTEA
("eXtended Tiny Encryption Algorithm")
  • Standard of 64 bit blocks and 128 bit keys
  • Code: 206 bytes (core functionality)
  • Clocks per block:
    ~ 12600

    (split in 2 x 8 bytes)
  • Reference: www.efton.sk (link outdated?)
  • No patents, royalty-free
  • Provably good statistics
  • AVR assembly version quite compact and well-designed
  • Notable efforts needed for key feedback, error-detection, etc.
  • Possibly weak with minimum rounds per block
  • Initially works on 64 bit blocks, requires elaborate makeover to adapt for 128 bit blocks and  128 bit system of key chaining and cryptographic checksum
PRNG-XOR
(simple stream cipher)
  • Simplest
  • Code: ~60 bytes
    for plain XOR-stream
  • Blockwise or bytewise XOR by a pseudo-random-number-generator (PRNG)
  • Clocks per block: < 1000
  • Classic stream cipher
  • Using good stream cipher can provide sufficient security for some application.
  • Most compact solution
  • Notable efforts needed for key feedback and error-detection.
  • Vulnerable to known-plaintext attacks
  • Not sufficient for serious crypto bootloader!
RST
("Randomised Substitution-Transposition")
  • "Block cipher controlled by a stream cipher"...
  • PRNG of 128 bits
  • Continuously clocked
  • Use of Init-Vectors, over-all-error-detection
  • Modular options for block cipher, PRNG and key feedback
  • Code: ~ 160 Bytes
  • Clocks per Block: < 10000
    with minimum round count
  • Good statistics
  • IV mechanism and error detections "all inclusive"
  • Inofficially tested well
  • Very compact implementation on 8-Bit-CPUs
  • Use of cyptographically strong PRNG
  • Balance between cryptographic strength, code efficiency and functionality
  • Fully disclosed and quite simple
  • Not "certified" so far
  • Strong PRNG cost lots of computing time
  • Avalanche effect within block varies between approx. 25-50%
  • Storage of reverse key sequence needed for either encryption or decryption.


Block encryption with or without error propagation and rolling key scheme

(1)
(2)
(3)
(4)




Plain graphics data, unencrypted:
Bitmap 200x200 px, 8 bits greyscale, 40 kB.
(which is 2500 blocks of 16 bytes!)
Encrypted in stupid ECB mode which uses the same keyset over and over. Patterns of plaintext remain visible.
(This IS bad!)

Encrypted with continuously running key generator (RST stream cipher). Random result of smooth statistics.
Strong encryption.
Decrypted with plaintext-key-feedback (RST). Only one bit was flipped:
All subsequent data and final checksum corrupted.
Error or attack safely detected!

Top | Index


Quick Starter

1. Prerequisites


AVR controller that is supported by the OWL

Hardware-Software environment for AVR projects, tool for programming AVRs by way of ISP

Hello-World-program for the respective Controller, e.g. simple LED-flasher, in Intel Hex format (standard)

Not being totally clueless at the command line

Respect, but no fear of Fusebits

Workable RS232 interface (or virtual COM adapter for USB)
( COMx, /dev/ttySx, /dev/ttyUSBx )

OWL download for Linux/Windows

2. Install OWL Software on the PC

Actually, there is nothing to "install". Just unpack the download to the desired location in userspace. Open command prompt and change to the 'owl' directory.
Typing owl with no further arguments will bring up the general help screen. (Don't miss to prepend "./" under Linux.)

In these examples, we may use the long form of the commandline options for clarity.
By the way, the prefix of "-" resp. "--" ain't mandatory. The Software will show you a listing of all short and long forms by entering: owl --help.

3. Make tailored OWL Firmware

Sample setup: ATmega8 in a typical RS232 set-up (MAX232, FT232) and 8 MHz external chrystal
The controller is being connected via MAX232 (or FT232) to the RS232 or USB of the host computer. Such appliances will most likely use the UART component of the controller with their regular firmware, thus being determined to PD0/PD1 for RXD/TXD.
Serial communications should have been tested in this setup prior to the Bootloader install. Also it is assumed, that there exists some LED on PB2 which will give us an optical feedback. The sample firmware ledblink_m8.hex will use that port. You may test it once without Bootloader to verify that it basically works.

Now we make an authorised OWL Bootloader for this Hardware. OWL commandline is:

owl --device=m8 --rxport=d0 --clock=8000 --baud=9600 --targetname=testowl_m8

That's all of the "ultra complicated" process of making a customised Bootloader with unique crypto key. You will find the new Firmware file under:   ./targets/testowl_m8.hex

4. Installation of the OWL Firmware (Bootloader)

Now start your preferred ISP-programming software (like "avrdudess", "extreme burner", "TwinAVR") in order to transfer this freshly created Bootloader testowl_m8.hex into the Target chip. Flash should be fully erased before doing so.

And we have to set some Fuses different to the factory defaults. Most important prerequisite for an AVR Bootloader is to enable Flash memory writes (SPM enabled), which is implicit on ATmegas (while on the ATtinys, a certain Fusebit SELFPRGEN must be set-up). In fact, whenever SPM's allowed, it is always recommended to also have the Brown-Out-Detector (BODEN) activated and appropriate voltage level defined. This is to prevent Flash corruption from unsound coldstart conditions. Relevant Fuses for said target device using ATmega8 with external 8-MHz-crystal to start with:


BOOTSZ=10; BODEN=0; BODLEVEL=0; BOOTRST=0; CKSEL=1100; SUT=00
Byte-values:     Ext: $FF    Low: $8D     High: $EC

Note: Fusebit-Calculator makes this a lot easier.

Now we have a workable OWL on this Target device.

5. Transmit your first Transmission

For convenience, our testing firmware ledblink_m8.hex is located right in the owl folder. In this example, the Target is being connected via COM2 (Linux: /dev/ttyS1) to the computer. Timing is not that critical with default Timeout of 1 second on the bootloader and about 1 second of Introductory Preamble on the Transmission. Therefore, you can FIRST fire up the commandline, THEN reset the Target controller:

owl --targetname=testowl_m8 --flashfile=ledblink_m8.hex --serialport=COM2

Type in, reset the controller-device and fire up the command. Transmission will be sent shortly with default speed of 9600 baud (if we had not otherwise specified). Transmission should take only a few seconds. LED at the controller starts to flash? Congratulations!

Note: For different AVR and I/O ports, change test firmware and target loader accordingly.

Some info commands

Short reference on all commandline options:
owl --help

Full reference on all commandline options:
owl --helpall

Detailed reference on submitted options in the Help context (here: 'flasherase', 'timeout' and 'serialport'):
owl --help --flasherase --timeout --serialport

List all devices that are currently supported with firmware templates:
owl --supported

Watch technical data on a certain device:
owl --device=Devicename

Watch master data on authorised Bootloaders (example):
owl --targetname=testowl_m8


One-Way-Loader currently supports the following AVR devices:

m1280 m1281 m1284 m1284P m128A m128 m128RFA1 m128RFR2 m162 m164A m164PA m164P m165A m165PA m165P m168A m168 m168PA m168P m169A m169PA m169P m16A m16 m16HVA m16HVB m16M1 m16U2 m16U4 m2560 m2561 m256RFR2 m324A m324PA m324P m3250A m3250 m3250PA m3250P m325A m325 m325PA m325P m328 m328PB m328P m3290A m3290 m3290PA m3290P m329A m329 m329PA m329P m32A m32C1 m32 m32HVB m32M1 m32U2 m32U4 m406 m48A m48 m48PA m48P m640 m644A m644 m644PA m644P m6450A m6450 m6450P m645A m645 m645P m6490A m6490 m6490P m649A m649 m649P m64A m64C1 m64 m64M1 m64RFR2 m8515 m8535 m88A m88 m88PA m88P m8A m8 m8HVA m8U2 tn13A tn13 tn1634 tn167 tn2313A tn2313 tn24A tn24 tn25 tn261A tn261 tn4313 tn441 tn44A tn44 tn45 tn461A tn461 tn48 tn80 tn840 tn841 tn84A tn84 tn85 tn861A tn861 tn87 tn88


Top | Index


License

Programmes (Firmware, Software) for the One-Way-Loader are subject to the MIT License, respective note being included in all sourcecode. One-Way-Loader programming that has been officially released by the Author, is deliberately Open Source. For special versions or contractual works, individual agreements may apply.

Documentation and images for the One-Way-Loader are available under Creative Commons - Universal (CC0). This applies to photographs, circuit diagrams, drawings and accompanying documents that do not feature an explicit copyright notice and are not implicitly covered by other license. For example, the OWL logos and icons are own creations of the Author, inspired by material that was already in the public domain.

Want to support my work? Suggestions, criticism, donations
Don't worry, I will continue anyway.


Top | Index


Download

Top | Index

Links


Top | Index
Initial release 06/2018