dinosaure's Blog

All the way down, my blog is re-up!

<2020-09-17>

My blog was down for a long time, something like 4 months and this article will explain why?! As a simple introduction, I started to re-implement Conduit (see this article about Tuyau). From this breaking change, it was needed to update libraries such as Cohttp or Git to be able to use this new version needed by my library Paf (which provides an HTTPS service from HTTP/AF).

In an other side, I decided to deeply update Git to integrate some others updates such as Carton or the last version of Decompress. I took the opportunity to fix some bugs and I finally came with a new version of Git.

So the blog was redeployed with the new stack! It uses HTTPS at any points and SSH to get articles from my repository. Finally, update is substantial and it does not change a lot from the point of view of the user (before my update, we was able to use HTTP with TLS and SSH) - and this is what we tried to provide.

But I think it paves the way for a better MirageOS eco-system. Let's start with a deep explanation.

Tuyau / Conduit

For many people, Conduit is a mystery but the goal, with its new version, is clear: it wants to de-functorize your code. Indeed, into the MirageOS ecosystem, we mostly want to abstract everything. Let's talk about HTTP for example, an implementation of an HTTP server must need:

  • a TCP/IP implementation
  • a possible TLS implementation

The problem is not the ability to abstract the TCP/IP implementation, mirage-stack gives to us such abstraction, but it's mostly about the hell-functor. In first instance, we probably should provide something:

module Make_HTTP (TCP : Mirage_stack.V4) (TLS : TLS) = struct

end

Now, imagine an other protocol such as Git which needs an HTTP implementation. To keep the ability of the abstraction, we should provide something like:

module Make_GIT (Hash : HASH) (HTTP : HTTP) = struct

end

module Git = Make_GIT (SHA1) (Make_HTTP (TCP) (TLS))

Finally, think about irmin which uses Git and expect some others implementations such as the format of values, an implementation of branches and an implementation of keys:

module Make_IRMIN
  (Hash : HASH)
  (Key : KEY)
  (Value : VALUE)
  (Git : GIT) = struct

end

module Irmin = Make_IRMIN
  (SHA1) (Key) (Value)
  (Make_GIT (SHA1) (Make_HTTP (TCP) (TLS)))

Now, if I tell you that TCP is the result of a functor… Finally, we have a functor-hell situation and we should not ask to the user to write such code (which can lead several errors - type incompatibility when you use SHA256 for irmin with an implementation of Git which uses SHA1 for example).

Though, this situation is already fixed with Functoria which handles for MirageOS application of functors according to a graph (and depending on your target).

However, we can not ask to people to use Functoria for any of our projects. And, I think, this is where Conduit becomes useful. The idea is:

Instead to functorize your implementation with a Flow, you probably want
something at top (so, something available without functors) which is able to communicate with a peer.

And this is the goal of Conduit. It permits to use recv, send and close as we expect from an implementation of a flow. Then, dynamically and generally at your first entry-point, you will inject such implementation into Conduit.

For example, HTTP, Git and Irmin can expect only one value, a Conduit.resolvers, which represents flow implementation. From this Conduit.resolvers, HTTP, Git and Irmin are able to make a new connection. Then, the user must fill this value with a TCP implementation of a TCP + TLS implementation if he wants - or with something else.

Finally, Conduit.{recv,send,close} is your functor argument FLOW!

An example into this blog

As you may be know, this blog is self-contained - I store articles and the unikernel into the same Git repository. If you look into unikernel.ml, you will how I can fill the Conduit.resolvers:

let start stack =
  let resolvers =
    let tcp_resolve ~port =
      DNS.resolv stack ?nameserver:None dns ~port in
    match ssh_cfg with
    | Some ssh_cfg ->
      let ssh_resolve domain_name =
        tcp_resolve ~port:22 domain_name >>= function
        | Some edn -> Lwt.return_some (edn, ssh_cfg)
        | None -> Lwt.return_none in
      Conduit_mirage.empty
      |> Conduit_mirage.add
          ~priority:10 ssh_protocol ssh_resolve
      |> Conduit_mirage.add
           TCP.protocol (tcp_resolve ~port:9418)
    | None ->
      Conduit_mirage.add
        TCP.protocol (tcp_resolve ~port:9418)
        Conduit_mirage.empty in
  Sync.pull ~resolvers store >>= fun () ->

In this code, I want to fill the Conduit.resolvers with, at least, one implementation, the TCP.protocol. If I'm able to get an SSH configuration (like the private RSA key), I inject an SSH implementation, SSH.protocol, and give the priority on it.

Nothing will change for Irmin or Git (they don't want to be applied with a flow implementation) but when these implementations will try to start a connection, they will start a SSH or (if it fails) a TCP connection. So, with Conduit, we de-functorized Irmin and Git!

The final result

The new version of Conduit does not do a big deal for the end-user. Conduit is an underlying library used by some others such as Cohttp or Git. Finally, from a certain perspective, nothing will change for many users.

However, when we want to go to details, the new version of Conduit comes with a huge feature: the ability to give your configuration value. For a long time, Conduit initialised values such the TLS configuration. It did that without any trust anchor and just accept any TLS certificates. Now, the end-user is able to pass its own TLS configuration and this is what several people requested about the next version of Conduit.

This detail does not really appear from the point of view of the Git implementer or the Irmin implementer who wants only a common way to communicate with a peer. It's not very useful for people who use lwt_ssl which, by default, uses host's trust anchor. But it seems very useful for ocaml-tls which does not have a (file-system dependent) strategy to get trust anchors. And it is very useful for SSH where the configuration of it depends specifically on the user (because it's about its own private RSA key).

New version of Git

This summer, I decided to rewrite ocaml-git! More seriously I wrote a big explanation about the new version of Git here. The idea is to take the opportunity to:

  1. Use the new version of Conduit
  2. Update to the new version of Decompress (1.0.0)
  3. Integrate carton as the library to handle PACK files
  4. Fix the negotiation engine
  5. Fix the support of js_of_ocaml
  6. Pave the way to implement shallow commits and a garbage-collector

Carton

Most of these goals are pretty old. I started to talk about carton in August 2019 (one year before …) and finalised globally the API 6 months before. The real upgrade is about the internal organisation of ocaml-git where I well split the logic of the PACK file independently of the Git logic.

In fact, the PACK file does not care too much about format of Git objects and it's just a format to store 4 kinds of objects. However, the process to extract or generate a PACK file is a bit complex and the idea was to push outside Git all of this logic.

By this way, carton is a little library which depends only on few dependencies such as Duff (re-implementation of libXdiff in OCaml) and, of course, Decompress. I took the opportunity to use the last (faster) version of Decompress in this library - and mechanically improved performances on ocaml-git!

This underground split unlocked the ability for me to start to play with Caravan to be able to inject a read-only KV-store into an unikernel. In fact, a special work was done about what carton needs to extract an object. Finally, we just need mmap (extraction) and append (generation) syscalls to be able to use carton. This last improvement wants to fix a bad underground design into ocaml-git where the Git.Store implementation required an FS implementation which was too POSIX-close - and unavailable for MirageOS.

Finally, an append-only underlying view of a block device compatible with MirageOS will be enough for Git.Store now!

The new version of Conduit and the new package Not-So-Smart

In my previous article about Tuyau / Conduit, I took Git as a example of the need to be abstracted about the protocol. So, of course, the article still is true and I finally did a real application of what I was thinking.

The new API of Conduit unlocked to me the ability to integrate nicely the new feature requested by Hannes, the support of SSH. Of course, Hannes did not wait me to use his PR. However, from the old version of ocaml-git we duplicated the implementation of the protocol 3 times for each underlying protocols (TCP, SSH and HTTP). So, I was not very happy with that and the biggest bottleneck was about the negotiation engine.

Good (or bad) news was that the old negotiation engine was buggy! So it was mostly about a full-rewrite of the Smart protocol and it's why I created the nss (Not-So-Smart) package. Colombe gave me a good experience about how to properly implement a simple protocol with a monad and GADT. So, I retook the design to incorporate it into ocaml-git and re-implement the negotiation engine - I mostly followed what Git does.

This rewrite highlighted to me what the fetch=/=push process really needs about a Git store and I synthesised requirements to:

  1. the PACK file
  2. a function to get commits and its parents
  3. a function to get local references
  4. a function to get the commit given by a reference (de-reference)

And that's all! In fact, we just need to walk over commits to get the common ancestor between the client and the server and we just need to process a PACK file (to save it in the store then).

So, nss requires:

type ('uid, 'ref, 'v, 'g, 's) access = {
  get     : 'uid -> ('uid, 'v, 'g) store -> 'v option Lwt.t;
  parents : 'uid -> ('uid, 'v, 'g) store -> 'v list Lwt.t;
  locals  : ('uid, 'v, 'g) store -> 'ref list Lwt.t;
  deref   : ('uid, 'v, 'g) store -> 'ref -> 'uid option Lwt.t;
}

'uid is specialised to hash used by the Git repository. 'v depends on what the process needs. About fetching we need a mutable integer used by the negotiation engine (to mark commits) and the date of the commit (to walk from the most recent to the older one). Of course, we have a type store which represents our Git store and even 'ref is abstracted!

From it, you surely can plug an ocaml-git store but we can directly use a simple Git repository and implement these actions with some execve of git! Finally, this part of ocaml-git is not tested with the implementation in OCaml of the Git store but with git directly!

By this way, we can ensure that we talk well with Git! Again, the idea is to split well underlying logic in ocaml-git. It does not change too much for the end-user but the core (the Git store implementation) is less complex than before because it does not have anymore the protocol logic.

This rewrite helps me to rework on the negotiation engine and ensure that we use the same negotiation engine for TCP, SSH and HTTP. By this way, I deleted duplication of this process - so it's easier to maintain then this part.

Support of js_of_ocaml

Most of libraries used by ocaml-git are in pure OCaml, no C stubs. However, one of them use C stubs: encore. The goal of this library comes from an old project: finale. The idea of such project is to derive a decoder and an encoder from one and unique description. By this way, we can ensure the isomorphism between the encoder and the decoder such as:

val desc : my_object Encore.t

let decoder = Encore.to_angstrom desc
let encoder = Encore.to_lavoisier desc

assert (Lavoisier.to_string encoder
  (Angstrom.parse_string decoder str) = str)

For the Git purpose, we must ensure that when we extract a Git object, we are able to re-store it without alteration. Encore ensures that by construction.

However:

  1. The internal encoder of Encore was too complex
  2. It used functor which expects the description such as:
module Make (Meta : Encore.META) = struct
  val desc : my_object Meta.t
end

module A = Make (Encore.Angstrom)
module B = Make (Encore.Lavoisier)

assert (Lavoisier.to_string B.desc
  (Angstrom.parse_string A.desc str) = str)

functor was not the best solution and I decided to use GADT instead to be able to describe a format. The documentation of Encore was upgraded, so if you want more details, you can look here.

Then, the internal encoder to be able to serialise an OCaml value was too complex and it used a trick on bigarray. It appeared for me that it was not so good, so I decided to de-complexify the encoder and I provided something much more easier to maintain and use.

By this way, I deleted C stubs and this was the only dependency of ocaml-git which requires C stubs. So, now, users are free to use =ocaml-git=/Irmin in a web-browser as CueKeeper!

Next things about ocaml-git

So all these works does not change too much for end-user or Irmin. However, from what Hannes told me when he tried the new version with its unikernels:

  • We are faster (thanks to Decompress)
  • We use less memory

It's difficult to really explain why and if these points come from what I did - we can talk about the new GC strategy, Decompress, the new strategy given by carton to process a PACK file, etc. At this level, it's hard to really understand which layer did the difference (may be all).

But the real upgrade is for me! I was thinking about shallow and garbage collection on ocaml-git for a long time. But, for that, I needed a cleaner play area where I don't need to figure out about some details such as the protocol, the PACK format, intrinsic dependence between all of these logic.

So it's mostly a way to pave my way to implement shallow (partial git clone) and a proper garbage collector between 2 different heaps (minor-heap which stores loose objects and major-heap which stores PACK files). So we will see if I can finish these tasks :p.

My Blog, Pasteur, my MirageOS ecosystem

A good way to test and see that all work is to upgrade my blog and some others services such as my primary DNS server or pasteur. And, as you can see, IT WORKS!

More concretely, due to the renaming of Tuyau into Conduit, I had an incompatibility between my new version of Conduit and the old one where Git, at this time, still continued to use the old version. So it was impossible for me to try to coexist Tuyau and the old version of Conduit where both wanted to use the same name: Conduit.

I decided to upgraded all the stack at any layers:

  • from the mirage-tcpip implementation
  • to my HTTP/AF server Paf
  • with ocaml-tls
  • including the way to synchronise an Irmin store
  • over SSH
  • including Cohttp

All of this work is done in one Git repository:

https://github.com/dinosaure/conduit-dev

It's an OPAM repository which includes of slightly modified version of all packages.

From that, I was able to COMPILE my unikernels and start to really use the letsencrypt unikernel with my primary DNS unikernel to load TLS let's encrypt certificates. I took the opportunity to only use SSH and HTTPS (even if into my private network) too.

And finally, with some bugs, some weird behaviours, some upgrade of APIs and banishment from let's encrypt because I tried hard to deploy my unikernels, pasteur is up:

https://paste.x25519.net/

Conclusion

It's a bit frustrating to see that all of these updates don't change a lot for the end-user, patch is not huge finally but I think it was needed to deeply upgrade the stack. Several peoples started to complain about Conduit and I started to have some regrets about some decision looking at my stack.

I think it's about our responsibilities to lean the MirageOS ecosystem. Of course, we can say that we have something else to do which is more interesting than rewrite an pretty-old project but I don't want to have regrets about what I did into the MirageOS ecosystem. So, I'm still aware about a global view of that and I tried to do my best effort to simplify (a bit) the life of unikernel's fellow (I hope).

Of course, I learned a lot too when I walked across all of these libraries. But I started to think that we started to have our own Babylon tower now!

Finally, this article convince me to write and explain how to properly deploy an unikernel. I started to really understand all points. So, next time will be about the deployment of Pasteur!

Release cycle about SMTP stack

<2020-03-31>

If you follow a bit my work, you should know about a huge work started few months (years?!) ago about the SMTP stack. As a MirageOS developer, I mostly want to use it to replace some usual services such as a DNS resolver, a blog or a primary DNS service.

But I really would like to replace an old but widely used service, the email service.

Mr. MIME at the beginning

One of my biggest project is mrmime. It's a little library to parse and generate an email according several RFCs. The most difficult part was to handle encoding and multipart.

This library wants to solve 2 simple problems:

  • How to read/introspect an email
  • How to generate an email with OCaml

How to read everything!

An email is easily understandable by a human as a rich document but it can be hard to extract useful information from it by a computer. Indeed, an email can be really complex such as a RFC822's date or, more obviously who should receive the email.

Mr. MIME wants to solve this first problem and it provides an angstrom parser to extract metadata and represent them by OCaml values. Then, the user is able to introspect them and implement something like a filter, an organizer, etc.

FWS and unstrctrd

The main problem about email is the folding-whitespace. It permits the user to extend a value of a field to multiple lines such as:

To: A Group(Some people)
   :Chris Jones <c@(Chris's host.)public.example>,
     joe@example.org,
 John <jdoe@one.test> (my dear friend); (the end of the group)"

As long as the next line starts with a whitespace, it's a part of the current value. At the first time, I tried to parse it with ocamllex but I failed when I got a too big automaton error. Then, I switched to angstrom but I was not really happy with results.

Recently, with @let-def, we agreed that an ocamllex still is possible. At the end, we should be more faster than angstrom. So we did unstrctrd. The project is a bit more general than emails. In fact, some formats such as some used by Debian or HTTP/1.1 headers follow the same rule. unstrctrd wants to flat this kind of value.

Into details, unstrctrd is a nice mix between ocamllex and angstrom.

With this project, we handle FWS described by RFC5322 and obsolete form described by RFC822. It does some post-processes (like it removes useless comments as described by CFWS) and provides a well abstracted API to be able to parse and construct an unstructured form.

To understand the babel tower, any values available into your email as the date, email addresses or subject of your email should respect, at least, the unstructured form. Any of them will be processed, at least, by unstrctrd.

The goal is to hide such complexity to an other lower layer. In fact, before this library, 2 of mine libraries want to solve this problem:

  • emile to parse email addresses
  • and of course, mrmime

To be able to provide as much as possible light libraries, we did unstrctrd and delete FWS handler from emile. By that, mrmime depends on both to properly parse email addresses.

An email address and emile

As you know, we use widely email addresses but the format of them is really complex. We can put more than one domain on it for example (like <@gmail.com:romain.calascibetta@x25519.net>), put a name (which must respect a special format), use special characters such as + or spaces with quoted-string. A domain can directly be an IPv4 or IPv6 or an extensible domain locally specified by the SMTP server. You can use UTF-8 of course since RFC6532.

In other words, email addresses are hard to parse.

With unstrctrd, we simplify a bit the library and emile does not handle anymore FWS. The goal is to let the user to process the input with unstrctrd at first if the input comes from an email and then try to parse the result with emile to extract the email address - and this is what mrmime does of course.

And, of course, usual user does not care about folding-whitespace. The input comes usually from a form (so, without this such token), so emile wants to provide the most easy (and correct) way to parse an email address.

UTF-8, UTF-7, latin1 or YUSCII?

An other obvious problem about email is the encoding used. We talk sometimes about charset but let keep encoding. Of course, we have several encoding such as ISO-8859. With a nice discussion with @dbuenzli with beers, the most interesting way to solve the problem about encoding is to arbitrary choose one and keep it as long as we can.

So, as the uutf author, we chosen UTF-8 of course!

However, we need to provide a way to normalize any encodings to UTF-8. The Unicode consortium provides some translation tables and I picked them to be able to translate some of them to UTF-8. Few projects was made in this goal:

All of them are merged into an other library: rosetta.

This library is used by mrmime to try to normalize any contents to UTF-8. From the point of view of the user, he does not need to know all details. The result is just to say: any contents provided by mrmime use UTF-8!

An other OCaml project to handle such things exists: Camomile. But rosetta wants to be the most easier and simpler as we want.

Base64 & Quoted-Printable

mrmime still wants to be low-level. Even if it wants to extract contents, it does not handle format of contents (this feature should be done by a new other project conan - but we will talk about it in another article).

However, RFC2045 defines some standalone formats independently to the type of the content. The most know is the base64 to encode binary or large files into your email. It's when I discovered that email has his own base64 format that I decided to deeply update the package. decoder of this special format.

In other side, RFC2045 describes an other format: the quoted-printable format. At this time, it was not possible to safely send an UTF-8 email. We still were constrained to encode each byte of our email into 7-bits. To ensure to be able to pass 8-bits values, the quoted-printable was done to encode such byte into a special form.

From that, we did the library pecu which is able to encode and decode such contents. This library was well tested with fuzzer as we do usually to check isomorphism between encoder and decoder.

Some others formats exist and are created specially for emails such as the flowed format but they should be handle by others libraries.

How to generate everything

The major feature about mrmime is not really about all of these libraries used to parse an email. Indeed, mrmime was able to introspect emails at the beginning (from that, we can look into an old conference about it). The notable update is the safe way to emit an email.

Indeed, a large work was done about API to be able to properly emit an email and try to respect as much as we can rules such as:

  • folding-whitespace
  • 80-columns rule
  • base64 and quoted-printable encoding
  • multipart

From that, I think we provide a nice interface to construct and emit an email. Generation of email address for example is pretty-close to what we expect:

let me = Local.[ w "romain"; w "calascibetta" ] @ Domain.(domain, [ a "x25519"; a "net" ]) ;;

Composition with parts is also nice:

let content_type_alternative =
  let open Content_type in
  (with_type `Multipart <.> with_subtype (`Iana "alternative")) default

let header =
  Header.empty
  |> Header.add Field_name.content_type (Field.Content, content_type_alternative)

let part0 = Mt.part (stream_of_string "Hello World!")
let part1 = Mt.part (stream_of_string "Salut le monde!")
let m0 = Mt.multipart ~header [ part0; part1; ] |> Mt.multipart_as_part

let m1 = Mt.part stream_of_file
let m = Mt.(make multi (multipart [ m0; m1; ]))

Then, mrmime handles 80 columns such as when it reaches the limit, it tries to break with a FWS token the value where is permitted such as:

To: thomas@gazagnaire.org, anil@recoil.org, hannes@mehnert.org, gemma.t.gordon@gmail.com

becomes:

To: thomas@gazagnaire.org, anil@recoil.org, hannes@mehnert.org,
  gemma.t.gordon@gmail.com

SMTP then!

Of course, even if some people are really interested by mrmime mostly to pave a way to be able to create yet another email client (in OCaml!), my goal is a bit offbeat. So I mostly focused on the implementation of a SMTP server.

The first notable library is colombe - a low-level implementation of the SMTP protocol.

How to describe a state machine?!

The real goal of colombe is to provide an API which is able to let the user to describe a state machine to communicate to a peer. By this fact, colombe does not want to implement the sendmail command or does not want to implement a SMTP relay or a SMTP submission service.

It is the first stone to be able to easily create such programs/libraries.

So most of people should not care about colombe - as they mostly want to send an email. However, as a client such as sendmail or as a server such as an SMTP submission service, they should use the same ground and avoid a duplicate an implementation of how to talk SMTP with a peer.

Another point is the possibility to use colombe with MirageOS - and make an unikernel with it. From that, we started to use an other kind of abstraction of I/O (such as LWT or ASYNC - or Unix) which uses less functor as we do usually with MirageOS.

But the real good point of colombe is the ability to describe the state machine with monad which provides high-level recv and send operations:

let properly_quit_and_fail ctx err =
  let* _txts = send ctx QUIT () >>= fun () -> recv ctx PP_221 in
  fail err

let authentication ctx username password =
  let* code, txts = send ctx AUTH PLAIN >>= fun () -> recv ctx CODE in
  match code with
  | 504 -> properly_quit_and_fail ctx `Unsupported_mechanism
  | 538 -> properly_quit_and_fail ctx `Encryption_required
  | 534 -> properly_quit_and_fail ctx `Weak_mechanism
  | 334 ->
    let* () = match txts with
      | [] ->
        let payload = Base64.encode_exn (Fmt.strf "\000%s\000%s" username password) in
        send ctx PAYLOAD payload
      | x :: _ ->
        let x = Base64.decode_exn x in
        let payload = Base64.encode_exn (Fmt.strf "%s\000%s\000%s" x username password) in
        send ctx PAYLOAD payload in
    ( recv ctx CODE >>= function
        | (235, _txts) -> return `Authenticated
        | (501, _txts) -> properly_quit_and_fail ctx `Authentication_rejected
        | (535, _txts) -> properly_quit_and_fail ctx `Authentication_failed
        | (code, txts) -> fail (`Unexpected_response (code, txts)) )
  | code -> fail (`Unexpected_response (code, txts))

As you can see, we use monadic operators to simplify the lecture of the code. send and recv take values described by the user with a GADT:

type 'x send =
  | QUIT : unit send
  | AUTH : auth send
  | PAYLOAD : string send

type 'x recv =
  | PP_220 : string list recv
  | PP_221 : string list recv
  | CODE : (int * string list) recv

Then, the user just needs to describe how to process such commands with a given ctx:

  • how to send 'x recv to the ctx
  • how to recv 'x send from the ctx

Of course, this where colombe comes. It already defines few primitives to emit and parse such commands into the ctx.

At another layer (which needs syscalls), a composition between the ctx and a fiber (like authentication) returns a process t such as:

type ('a, 'err) t =
  | Read   of { buffer : bytes
              ; off : int
              ; len : int
              ; k : int -> ('a, 'err) t }
  | Write  of { buffer : string
              ; off : int
              ; len : int
              ; k : int -> ('a, 'err) t }
  | Return of 'a
  | Fail   of 'err

let run socket username password =
  let ctx = Context.create () in
  let fiber = authentication ctx username password in

  let rec go = function
    | Read { buffer; off; len; k; } ->
      let len = Unix.read socket buffer off len in
      go (k len)
    | Write { buffer; off; len; k; } ->
      let len = Unix.write socket buffer off len in
      go (k len)
    | Return v -> Ok v
    | Fail err -> Error err in
  go m

And you have a fully implemented and available way on Unix to communication with a SMTP peer - and be authenticated.

Implement sendmail, the client side

At least, from this core, it should be easy to implement sendmail command. And of course, the distribution of colombe provide such library:

  • sendmail which is free about lwt, async or unix
  • sendmail.tls which uses STARTTLS
  • sendmail-lwt a specialisation of sendmail with lwt

All of them wants to provide the most easy way to send an email. Indeed, it exists 2 ways to submit an email:

  • over a TLS flow available on *:465
  • over a simple TCP flow but mostly of them require to start a TLS flow in-the-fly on *:587 with STARTTLS

facteur

From all of that, we developed a little proof-of-concept to see if colombe and sendmail correspond to what we expect: facteur.

This is a simple tool which wants to send an email as the sendmail command but the complete stack is in OCaml! It's a merge of mrmime and sendmail to be able to produce a well formed email with file attachments.

It still is an experimental software and it requires a bad dependency libmagic to be able to recognise MIME type of file attachments. However, I started to implement something else, conan, to automatically do this job and be MirageOS compatible.

Server side

Finally, I started to implement the server side. colombe handles both side. It can parse response and emit request and vice-versa. From the same ground, we try to implement 2 servers into a single project: ptt.

It provides two libraries:

  • lipap which is an SMTP submission server
  • mti-gf which is an SMTP relay server

The final goal of them is to provide a full stack to be able to create email addresses from a given domain. An example is may be more interesting, we will take my x25519.net.

We will provide a first SMTP relay which will receive any incoming emails. It will be the server notified by my primary DNS server with the MX record.

$ dig +short MX x25519.net
0 163.172.65.89

The goal of it is to transmit incoming email to the real destination. For example, you want to send me an email to romain@x25519.net from your gmail.com address. Google will speak with this server. Internally, I associated romain@x25519.net to romain.calascibetta@gmail.com. Finally, mti-gf will retransmit your email to Google (to romain.calascibetta@gmail.com).

The second server let us to use our x25519.net email address to send email. The goal is to properly configure your MUA to be able to be authenticated to our lipap server. Then, it is able to communicate to others SMTP servers such as Google and send your email to them (with your x25519.net address).

So from my experiments, all should work and I started to deploy some others unikernels mostly to get automatically let's encrypt certificates - and provide 163.172.65.89:587 and 163.172.65.89:465.

Other projects

Along my way, I surely developed some others tools (which need an update with new interfaces or are really experimental) such as:

  • ocaml-dkim to verify DKIM fields from an email
  • received to generate a graph from Received: fields from an email or generate one of them

Conclusion

The stack is huge and it is not really finished. But I believe that I reached a point where all libraries compose nicely and let me to provide something much more complex such as an SMTP server!

All of that is possible of course with the work from others peoples such as ocaml-tls or, more generally, MirageOS people.

I believe that this year will be the year where such service will be exist as a MirageOS unikernel! And may be do an anarchist revolution and do a self re-appropriation of the means of production.

Eq(af), timing attack!

<2020-03-14>

The MirageOS project is aware about security issues. This goal is strong when, at the beginning, the idea of a small unikernel can ensure (by intuition) a small attack surface. By this way, we want to follow as possible as we can improvements into security stacks such as TLS.

Of course, we are not a huge team and some of us don't have a strong knowledge about security. This topic is highly complex and it's easy to think that we are secure - and, by facts, we are not. However, it gives to us an opportunity to learn and improve what we can find about this topic and go on our way to try to provide the best as we can.

This article wants to show a little project which wants to solve a security issue, the timing attack. We will see what we did on this purpose at the beginning and recently to be able to improve mirage-crypto.

A timing attack!

It's clearly not an usual attack for me and I did not (yet!) understand in which way it's possible to use this side-channel attack over a complex black-box such as an SMTP service. However, the first lesson about security is to agree with the fact that when you can imagine this attack (even if, by your technical means, it's not possible), someone else into our world has the ability to use this attack.

The point is not to try to think how this kind of attack is possible but to assert that this attack is possible.

The timing attack is in this case where we use the time to try to introspect data such as a password. The idea is simple, we can take this simple equal function:

let equal s1 s2 =
  let res = ref true in
  let idx = ref 0 in
  if String.length s1 <> String.length s2
  then false
  else
    ( while !idx < String.length s1 && !res
      do res := s1.[!idx] = s2.[!idx] ; incr idx done ; !res )

If we are in the case where we want to compare the given password by the user and the password stored into our database (or their hashes), we will use this function - and allow the user to enter into a secured area.

However, we can see that the time spent by equal depends on given inputs s1 and s2.

The worst case.

Imagine that the atomic operation s1.[!idx] (or s2.[!idx]) spend 1 second (like 1 CPU tick). So, for each iteration into our loop, we will spend 2 seconds while !res still is true. That means when we meet 2 different bytes, we leave the loop1.

Now, imagine we have these values:

# equal "aabb" "aaaa" ;;

We can easily infer that this function will spend 6 seconds (2 seconds for first characters, 2 seconds for second characters, 2 seconds for third characters and we leave the loop). And about equal values such as:

# equal "toto" "toto" ;;

We will spend 8 seconds (and return true). The time needed to compute the equal function depends on given inputs. By this way, if we observe time needed to be authenticated from a login page, we can infer values given into the equal function.

Finally, from that fact, imagine that s1 is the given password by us and s2 is the password stored into our database, we can infer just with the time if we are close to be equal to the password stored into our database.

A smart brute-force attack.

So now we can imagine a tool which will try all possibilities. It will record times spent for each random inputs. Then, when it appears that from an input t0, time spent differs (is lower than) from an input t1, we can imply that our equal function go a bit further when it scanned our input t1 (than t0).

By this fact, we can imply that t1 is more close to the expected password than t0 and we can redo the same operation but with a new base t1 - and by this way, discard all others next (and useless) possibilities with t0 as a base.

Eq(af) gives to you an example of this attack. Of course, we plugged into our operations an Unix.sleep to be able to see a real difference when we call our equal function. Finally, the result is pretty good:

$ dune exec attack/attack.exe
Random: [|253;164;168;66;47;219;88;152;128;242;216;123;|].
7c8ceadc51d33cadc97cce73fc7c86a1
7c8ceadc51d33cadc97cce73fc7c86a1

The second line is the expected hash, the third line is the hash found only by an introspection of the time. The first line is the seed given by the random to be able to replicate the result. This tool does not work all the time!

Too deep to be dangerous?

Of course, into the real world, our equal function is much much much more fast than some seconds. However, even if it can be hard to track how long equal spend, it's not impossible. I mean, it's a bit hard but a way exists to know how long any function spend.

This way is the micro-benchmark! Of course, when you want to compare speed of your functions, you introspect time! So if we can do that usually to check which function is faster than the other, we can apply the idea of the benchmark on two call of the same equal function but with different inputs.

Eq(af) does the test for you

So, our distribution comes with a little benchmark tool which wants to see how long equal spend when we give to it 2 different values and 2 equal values. Then, we extrapolate results.

We run this function 1 time, 2 times, 3 times, … to 3000 times and we record times for each run. From that, we are able to plot a curve. So about expected results:

  • If our curve is a line, that means for each run, we spend the same time whatever inputs.
eqaf_01.png
Figure 1: A good curve
  • If our curves is not exactly a line, that means time spent depends on inputs
eqaf_02.png
Figure 2: A bad curve

So from this curve, we apply a linear-regression to see if it's a line or not. From that, we can get a which is a regression coefficient and it tells to us if it's a true line (where R² >= 0.99) or not.

As you can see, we have some noises (some points are not strictly aligned) - it's because the inherent volatility when we want to record the time. It's why we need to apply a linear-regression to delete as much as possible this noise. An other factor about that is the context where you execute your function, the scheduler can decide to stop our equal function and do something else (but our equal function still spends his time). Finally, it still is hard to track the time.

How to fix it?

The easiest way to avoid this side-channel attack is to spend exactly the same time for any inputs. At the end, even if we encounter different bytes, we will continue to scan inputs.

let equal s1 s2 =
  if String.length s1 <> String.length s2 then false
  else
    ( let res = ref true in
      for idx = 0 to String.length s1 - 1
      do res := s1.[idx] = s2.[idx] && !res done ;
      !res )

Into the OCaml side!

However, into the OCaml world, the deal can be much more complex that what we can imagine. Even if it's easy to translate an OCaml code to an assembly code, the compiler can emit some jump and produce a non constant-time (or a non branch-less) function at the end.

For example, in our example, the only idea to use None and Some instead false and true into our code implies a call to the garbage collector - and a jump in the emitted assembly. This case appear when we want to implement compare and keep somewhere (into our int option) the subtraction of different bytes while the loop.

Into Eq(af), we did a huge introspection about the assembly generated to ensure that we don't have any jump and any assembly instructions will be executed regardless inputs - and only from that we can say that our function is constant-time2.

Conclusion

Eq(af) provides a quite easy function and anybody can re-implement it. But it comes with tools which want to check our assumption, the time spent can not leak any information. Or course, as we showed, it can be hard to check that.

We have different tools to check/prove that where, at this time, we only do a check but we can go further by an introspection of the emitted code by an other tool to see if are surely branch-less.

At the end, Eq(af) is used by some people who are aware about security issues and it's why we decided to use it as a default into digestif to protect any users from this side-channel attack.

Footnotes:

1

: In C, the case appears when we return false directly which is a jump and leaves the loop as we do in OCaml.

2

: constant-time is not a good word where it can have an other signification when we talk about complexity and algorithm. And it's not really true when the time still depends on the length of the given input.

Tuyau, the next conduit

<2020-02-27>

If you look into the MirageOS ecosystem, you should already see conduit as a library used by many others projects such as cohttp. However, even if it is used by these projects, at this time, nobody can really explain the goal of Conduit.

Conduit wants to solve 2 problems:

At this stage, it's mostly a pain to use Conduit for several reasons. But one of them is the lack of documentation. Conduit still exists because people copy/paste some piece of codes available in some projects.

However, to understand how Conduit can resolve your URI and give you a way to communicate to your peer, nobody understands how to extend it, how to trace it and finally how to use it.

From that, one year ago (at the MirageOS retreat), we decided to make a new version of Conduit: Tuyau (french word for a pipe). Of course, we don't want to repeat errors of the past. This article want to describe Conduit, and, by this way, Tuyau.

It can be a good opportunity to see some strange OCaml things!

Start a transmission

In many ways, in some projects, we want to start a transmission with a peer. We would like to communicate with it. However, we don't want to handle by hands details to start this transmission. We can take an easy example with ocaml-git.

When we want to push/pull to an other peer, we have 4 possibilities:

  • Use directly the Smart (Git) protocol over TCP/IP It appears when you do: git clone git://host/repo
  • Use the Smart (Git) protocol over SSH It's the usual case when you do: git clone git@host:repo
  • Use the Smart (Git) protocol over HTTP It's when you do: git clone http://host/repo
  • And final case is over HTTP + TLS or, in other words, HTTPS git clone https://host/repo

For all of these cases, we use systematically the same Smart protocol to communicate with a peer1. So we should abstract all of these cases behind something like a common interface.

Another aspect is from the point of view of the maintainer of Git:

  • we don't want to depend on all of these protocols
  • it's sane to not be aware about underlying implementation

The first point is really important. Git is only about Git and we should not depend on a specific implementation of HTTP or a specific implementation of SSH. The current version of Git did the choice to use Curl to be able to communicate with the HTTP protocol. We should be able to be abstracted from that in OCaml and let the user to choose which implementation of HTTP he wants.

The best is to start a transmission and let the user, at another layer, to feed something which aggregate implementation of protocols. By this way, we can let the user to feed Tuyau only with an SSH implementation and, by this way, ensure that Git will start a transmission only with SSH.

The second point is not really valid when we can argue some security points. As a maintainer, we would be able to enforce a transmission over TLS for example. But we will see later how we can solve that into Tuyau.

Finally, we want something like:

val resolve : Tuyau.t -> uri -> Unix.socket

Where Unix.socket is already connected to our peer. Then, we can start to Unix.read and Unix.write on the given socket and speak with the Smart protocol to our peer.

Tuyau.t represents globally our possibilities (our available protocols). At least, the user should depend on that - but it does not imply a dependence to implementation of available protocols.

A transmission, a protocol or a flow

The first bad point of Conduit is terms used by it which are not really defined. A transmission, a protocol or a flow are not very clear and we can not strictly define the purpose of them with Conduit.

Tuyau wants to be clear on these words and it gives to us a true definition of them. Then, we will use them as Tuyau defines them.

A Protocol

A communication protocol is a system of rules that allows entities to transmit information. In the case of Tuyau, this kind of information must not be arbitrary. The protocol should only solve communication problems such as routing.

When we talk about a protocol, it's only about a standard which is able to transmit a payload. Interpretation of the payload is not done by the protocol but by the user of this library.

For example, the Transmission Control Protocol (TCP) is a protocol according to Tuyau because it is able to transmit payload without interpreting it. A counter example is the Simple Mail Transfer Protocol (SMTP) which gives an interpretation of the payload (such as EHLO which is different to QUIT).

This difference is important to unlock the ability to compose protocols. An other protocol according to Tuyau is Transport Layer Security (TLS) - which wants to solve privacy and data integrity. Tuyau is able to compose protocols together like TCP ∘ TLS to make a new protocol. From this composition, the user is able to implement Secure Simple Mail Transfer Protocol (SSMTP) or HyperText Transfer Protocol Secure (HTTPS) - both use TCP and TLS.

A FLOW

To be able to do this composition, the protocol must respect (at least) an interface: the FLOW interface. It defines an abstract type t and functions like recv or send. These functions give to us the payload. Rules to solve communication problems are already processed internally.

In other terms, from a given FLOW, the user should not handle routing, privacy or data integrity (or some others problems). The user should only be able to process the payload.

Finally, representation of a TCP protocol is a FLOW. VCHAN protocol or User Datagram Protocol (UDP) can be represented as a FLOW. However, TLS is not a flow as is but a layer on top of another protocol/~FLOW~. Composition with it should look like:

val with_tls : (module FLOW) -> (module FLOW)

From a given FLOW, we wrap it with TLS and return a new FLOW. Such a composition exists also for WireGuard or Noise layers. Tuyau wants to solve this composition by a strict OCaml interface of the FLOW.

About Conduit

These ideas already exist with Conduit_mirage.Flow and Conduit_mirage.with_tls. However, it appears 2 problems:

  • extension of implementations
  • composition with user-defined FLOW

Currently, Conduit delimits implementations by a polymorphic variants Conduit.{client,server}. We should not blame that when extensible variants appears only on OCaml 4.02.

Abstract! Abstract everything!

As we said, the most important idea is to be able to:

  1. abstract the flow
  2. still be able to use it to receive and send payload

In your first example, we return an Unix.socket which is obviously not good, especially if we want to make an unikernel (which can not usually have anythings from the Unix module). In this way, we already did an interface to be able to easily abstract our implementations: mirage-flow.

We say that any protocols like TCP or VCHAN can be described with this interface where we have the recv function and the send function. So, instead to return a concrete type, we return an abstract type like:

module type FLOW = sig
  type t

  val recv : t -> bytes -> int
  val send : t -> string -> unit
end

type flow = Flow : 'flow * (module FLOW with type t = 'flow) -> flow
val resolve : Tuyau.t -> uri -> flow

let () =
  let Flow (flow, (module Flow)) =
    resolve tuyau "https://google.fr/" in
  Flow.send flow "Hello World!"

In our example, we use a GADT to keep the type equality between our value 'flow and the type t of our module Flow. We usually call it an existential type wrapper. It allows us to create a new type 'flow and associate it to an implementation Flow.

The idea behind is: the type t can concretely be anything. It can be an Unix.socket if we want to make an unikernel for Unix but it can be something else like a Tcpip_stack_direct.t (the TCP/IP implementation usually used by MirageOS).

With the associated module, we still continue to be able to read and write something as we can do with an Unix.socket.

And of course, we can forget about details. You can denote that we already prepare the concrete value to be able to communicate with our peer. I mean, resolve do something more complex than just create a new resource such as an Unix.socket. It connects the socket to our peer. It's why we talk about a resolution process.

Resolution

Tuyau can not define by itself the resolution. Resolution is commonly a DNS resolution to get the IP from a domain-name. However, into an unikernel, nothing ensures that we properly have a DNS resolver (such as our /etc/resolv.conf).

In other side, definition of an endpoint can not fully exist where it depends on the returned 'flow. For example, if we give to you a TCP/IP Flow, used endpoint to connect your 'flow should be an IP and a port. However, the endpoint can represent something else like a serial-port connected to our MirageOS or a virtual network kernel interface (TUN/TAP), etc. Finally, definition of an endpoint is intrinsic to our implementation of the Flow.

Concretely, for an Unix.socket flow, we need an Unix.sockaddr. For a Tcpip_stack_direct.t flow, we need an Ipaddr.V4.t and an int as a port.

At the end, we agree that the most general (by convention) description of the endpoint is the domain-name. By knowing that, we decided to let the user to construct an endpoint from a concrete [`host] Domain_name.t (as Conduit decided to construct an Conduit.endp from an Uri.t).

How Conduit does that?!

Conduit do the same job where it wants to construct an endpoint (Conduit.endp) from an Uri.t. To choose which implementation we will use, it looks at the scheme of the Uri.t.

From our perspectives, this is not a good choice where the scheme is not a real definition of the underlying protocol used as it's explained into the RFC7595:

A scheme name is not a "protocol."

However, even if Conduit.endp should be extensible as Conduit.{client,server} (because they are intrinsic each other), they still are delimited by an exhaustive list of constructors:

type endp =
  [ `TCP of Ipaddr.t * int
  | `Unix_domain_socket of string
  | `Vchan_direct of int * string
  | `Vchan_domain_socket of string * string
  | `TLS of string * endp ]

type client = [ tcp_client | vchan_client | client tls_client ] 

Abstract, again!

Tuyau comes with an heterogeneous map to be able to let the user to define a resolve function which is able to return any (structurally different) endpoint. The user must create a type witness which corresponds to a value 't Tuyau.key and represents type of the endpoint.

With that, the user can register a resolve function which returns the same type as your 't Tuyau.key. In others words, we are able to provide:

type resolvers
type 't key

val key : name:string -> 't key
val register
  :  key:'t key
  -> ([ `host ] Domain_name.t -> 't)
  -> resolvers
  -> resolvers

By this way, the user is able to implement the resolution process and can use a DNS resolver or a fixed resolution table (like an Hashtbl.t). Tuyau needs to know who can create a concrete endpoint from a [ `host ] Domain_name.t to pass it to a protocol implementation. It's why you need to register your resolve function into our resolvers.

Finally, Tuyau will execute all of your resolvers and create a list of heterogeneous endpoints. Then, from them, it is able to try to start a transmission to your peer.

Give me the priority

Of course, resolver can be registered with a priority. By that, not only will we use your priority resolver, but we will also prioritize initialization of your associated protocol.

The idea is to let the user to prioritize secure transmission over unsecure transmission even if both are available (like https and http).

Tuyau by an example

Tuyau (and Conduit) wants to solve a difficult task which does not appear into usual cases. If you want to make an UNIX program, all of that is useless because we can directly use the UNIX environment.

However, for MirageOS where nothing exists (even a DNS resolver), we need a way to start a transmission according to the context of the compilation. In fact, the TCP/IP implementation depends on the target, the configuration of your unikernel, what the user wants, etc.

We will see a little example to fully understand the underlying Tuyau stack. What you should do as the maintainer of Git, as the developer of the protocol or basically as the user of Tuyau.

Register your protocol with Tuyau

To play with protocols, we must register our protocol to Tuyau. The registration is global to your program. Indeed, Tuyau is able to extract your implementation from anywhere - internally, we save it into a global Hashtbl.t.

Let's start to provide an UNIX TCP/IP protocol and register it into Tuyau!

module TCP = struct
  type flow = Unix.file_descr
  type endpoint = Unix.sockaddr

  let make sockaddr =
    let socket = Unix.socket Unix.PF_INET Unix.SOCK_STREAM in
    Unix.connect socket sockaddr ; socket

  let recv socket buf off len =
    Unix.read socket buf off len

  let send socket buf =
    let len = String.length buf in
    let _ = Unix.write socket (Bytes.unsafe_of_string buf) 0 len
    in ()

  let close socket = Unix.close socket
end

We must provide these functions into our module and 2 types:

  • the flow type
  • the endpoint type

From that, Tuyau (a specialized version according to your backend) provides a way to register your protocol globally. We must create our type witness about our endpoint and associate it with your protocol:

let sockaddr : Unix.sockadr Tuyau.key = Tuyau.key ~name:"sockaddr"
let tcp : Unix.file_descr Tuyau.protocol =
  Tuyau.register_protocol ~key:sockaddr (module TCP)

And it's enough! You probably should expose sockaddr and tcp. We will see where we can use it. But the registration is done into our internal & global Hashtbl.t. Any link with this piece of code will make your protocol available through Tuyau.

Register your resolver with Tuyau

Into another project/library/executable/unikernel, you are able to define your resolution process. Of course, you must link with unix_tcp to be able to use Unix_tcp.sockaddr and register your resolver with this type-witness - and it's why you should expose it into your interface.

Let's use an usual resolver:

let resolve_http domain_name =
  match Unix.gethostbyname (Domain_name.to_string domain_name) with
  | { Unix.h_addr_list; _ } ->
    if Array.length h_addr_list > 0
    then Some (Unix.ADDR_INET (h_addr_list.(0), 80))
    else None
  | exception _ -> None

This resolver wants to usually resolve a domain-name to an HTTP endpoint2. Of course, you can use something else like ocaml-dns instead Unix.gethostbyname to be compatible with MirageOS.

Then, we must fill Tuyau.resolvers with our resolve_http:

let resolvers = Tuyau.empty
let resolvers =
  Tuyau.register_resolver ~key:Unix_tcp.sockaddr resolve_http

You can not do a mistake between Unix_tcp.sockaddr and resolve_http. type-witness and returned value by resolve_http must correspond - otherwise, OCaml will complain with a type error which is nice!

Come back to Git!

From the maintainer of Git's perspective, all of previous codes is outside Git. As we said, we don't want to depend on an implementation of TCP/IP protocol (or a SSH implementation). However, we should depend on Tuyau.

Finally, the Tuyau core library defines only few things, the resolvers type and the 'a key type. By this way, in our library we can write something like:

let clone ~resolvers domain_name repository =
  let payload = Bytes.create 0x1000 in
  let Tuyau_unix.Flow (flow, (module Flow)) =
    Tuyau_unix.resolve ~resolvers domain_name in
  Flow.send flow (Fmt.strf "# git-upload-pack /%s.git" repository ;
  Flow.recv flow payload ;
  ... 

Of course, we must choose a backend like LWT, ASYNC or UNIX to correctly deal with the scheduler about I/O operations. But for a MirageOS-compatible library, Tuyau_lwt should be enough.

And run all of that!

Come back to our main.ml where we filled your resolvers, we properly can do:

let resolve_http domain_name =
  match Unix.gethostbyname (Domain_name.to_string domain_name) with
  | { Unix.h_addr_list; _ } ->
    if Array.length h_addr_list > 0
    then Some (Unix.ADDR_INET (h_addr_list.(0), 80))
    else None
  | exception _ -> None

let resolvers = Tuyau.empty
let resolvers =
  Tuyau.register_resolver ~key:Unix_tcp.sockaddr resolve_http

let () =
  clone ~resolvers
    (Domain_name.(host_exn <.> of_string_exn) "github.com")
    "decompress"

Finally, we manually defined our resolvers by hands, we used a specific implementation of the TCP/IP protocol (the UNIX one) and we magically/dynamically plug all of that to your Git implementation through Tuyau.

Go further with composition!

Of course, we can go further and provide a TCP + TLS implementation:

let sockaddr_and_tls_config, tcp_with_tls =
  Tuyau_tls.with_tls ~key:sockaddr (module TCP)

The composition gives to us 2 values:

  • the type-witness sockaddr_and_tls_config : Unix.sockaddr * Tls.Config.client. In fact, creation of a TCP + TLS connection is a bit more complex than TCP. We need a Tls.Config.client which verify certificate provided by the peer.
  • the type-witness tcp_with_tls : Unix.file_descr with_tls.

From that, we must provide an other resolver which give to us the Tls.Config.client:

let resolve_https domain_name =
  match resolve_http domain_name with
  | Some sockaddr ->
    let tls_config =
      Tls.Config.client ~authenticator:X509.Authenticator.null () in
    Some (sockaddr, tls_config)
  | None -> None

let resolvers =
  Tuyau.register_resolver ~priority:0 ~key:sockaddr_and_tls_config
    resolve_https
    resolvers

With the priority, we can enforce to try at the first time the TCP + TLS transmission instead the TCP transmission - and by this way, prefer to use the secure one.

Again, this code still appears outside the Git implementation. We are able to fill Tuyau with a SSH implementation and fill the resolvers with a specific SSH configuration (like a set of private key like .ssh/config).

In our example, we use X509.Authenticator.null but we can restrict the authenticator to some internals certificates. Again, the way to resolve a domain-name is on the responsibility of the user.

Composition is not magic!

Composition with TLS or something else is not magic. It seems easy when we provide with_tls but we wrote the way to compose TLS with an other protocol - where we handled handshake, etc.

The composition is, at the end, a functor which takes a FLOW:

module With_tls (Flow : FLOW) = struct
  type endpoint = Flow.endpoint * Tls.Config.client
  type flow = Flow.endpoint * Tls.Engine.state

  ...
end

We just hidden it with a nice function and play a bit with first-class modules.

More possibilities on the user-side

One other request about Tuyau is to be predictable by the kind of flow used. Some maintainers want to enforce a secure flow such as SSH. In this case, of course, the maintainer should be aware about the implementation - and link with it.

The resolve function is much more complex than before on this way:

val resolver
  :  resolvers
  -> ?key:'edn key
  -> ?protocol:'flow protocol
  -> [ `host ] Domain_name.t -> flow

Optional arguments let the user to enforce a specific endpoint3 or a specific protocol (or both). When we advised to expose val tcp : Unix.file_descr Tuyau.protocol before, it's for this case. Imagine an SSH implementation where a val ssh : SSH.t Tuyau.protocol exists, the maintainer can write:

let clone ~resolvers domain_name repository =
  let payload = Bytes.create 0x1000 in
  let Tuyau_unix.Flow (flow, (module Flow)) =
    Tuyau_unix.resolve ~resolvers ~protocol:ssh domain_name in
  Flow.send flow (Fmt.strf "# git-upload-pack /%s.git" repository ;
  Flow.recv flow payload ;
  ... 

By this way, we ensure to use SSH when we communicate to our peer.

Conclusion

As we said, Tuyau and Conduit a complex problem when we should have an easy way to start a transmission and be able to extend protocol implementations without a static dependency at the library level.

Composition is done by the possibility to give a nice interface such as with_tls with Tuyau. But, of course, it's not magic when maintainer of TLS/WireGuard/Noise should provide a way to compose such layers with a given FLOW.

Finally, it's hard to really understand the goal of Tuyau when, from the library, it's hard to reach the global view over protocols, users and finally the ecosystem. This article wants to give materials about that.

Server-side

Tuyau provides something about the server-side which differs a lot from what Conduit does but we should explain that into an other article.

Footnotes:

1

: It's not really true when a transmission over HTTP must be stateless. Smart over SSH differs too when it must expect a END-OF-LINE ('\n') at the end of each packet - this character is optional over TCP/IP.

2

: by HTTP endpoint, we enforce the port 80. Our UNIX TCP/IP flow is not an HTTP flow. However, an HTTP client must be connected to the port 80 over the TCP/IP protocol.

3

: A type-witness key can be used and re-used with many protocols. We can imagine a TCP/IP protocol and a UDP/IP protocol which use the same sockaddr type-witness.

Functor, Application and magick!

<2020-02-17>

While I try to make an SMTP server in OCaml as an unikernel, I tried to deal with Set.Make. Imagine a situation where you define your type elt = string into a module A and you want to apply Set.Make inside the given module.

Interface

Then, you would like to write a proper interface which describe result of your functor. It should be easy than:

type elt = string

include Set.S with type elt = elt

But in my example, Set.S wants to (re)define elt. You probably miss the destructive substitution of the type elt.

type elt = string

include Set.S with type elt := elt

Implementation

The implementation will be more trickier. Indeed, we probably want to do something like this:

include Set.Make(struct type t = string let compare = String.compare end)

And, fortunately for you, this snippet should work. However, it starts to be pretty incomprehensible when type elt is one of your type (string or String.t exists outside the scope of your module). We can take this example:

include Set.Make(struct
  type t = { v : string }
  let compare { v= a; } { v= b; } = String.compare a b
end)

Into the interface, by side the redefinition of the type elt, nothing should change. However, the compilation fails with:

$ ocamlc -c a.ml
Error: The implementation a.ml does not match the interface a.cmi:
       Type declarations do not match:
         type elt
       is not included in
         type elt = { v : string; }

Indeed, we should have a definition of elt outside the struct ... end:

type elt = { v : string }

include Set.Make(struct
  type t = elt
  let compare { v= a; } { v= b; } = String.compare a b
end)

However, now, OCaml complains about a multiple definition of the type elt. May be we can play more with the destructive substitution?

type elt = { v : string }

include
  (Set.Make(struct
     type t = elt
     let compare { v= a; } { v= b; } = String.compare a b
   end)
   : Set.S with type elt := elt)

Just a tip

And it's work! So I leave this trick here to help some people.

MirageOS compilation

<2020-02-08>

MirageOS is not only one software but many libraries and tools which want to provide a good user-experience about developing a full operating system. By this way, they want to solve many problems with patterns and designs used by the core team. However, as I said in my previous article, documentation or materials don't really exist about these details.

So let's start with one about the compilation of an unikernel.

Abstraction, interface and functor

The biggest goal of MirageOS is to provide a set of interfaces. Go back to the OCaml world, we separate two things, the implementation (.ml) and the interface (.mli). An implementation can declare lot of things where an interface wants to restrict access to some underlying functions/constants/variables.

The interface can abstract definition of type where, inside (into the implementation), the underlying structure is well-known and outside, the ability to construct the value must be done by described functions into the .mli.

A simple module with its interface

type t = string

let v x = String.lowercase_ascii x
let compare = String.compare
type t

val v : string -> t
val compare : t -> t -> int

In your example, our type t is a string. However. to make a t, we must use v which applies String.lowercase_ascii. Then, we provide the compare function to be able to make a Set or a Map of t. On that, we can express a simple idea:

> a field-name is a string where the comparison of them is case-insensitive, such > as Received and received are equivalent.

Then, for any who wants to use this module, he must use v to create a field-name and be able to use it with compare. Generally, we provide a pp (Pretty-Printer) to debug, and the couple to_string~/~of_string.

But the point is to able, by the interface, to restrict the user about what he can do and define about what he can rely when he uses such value.

Trust only on the given interface

MirageOS did the choice to trust only on the interface. For us, a device, a protocol or a server can be well defined by an interface. This is the purpose of `mirage-types` which provides such things.

The key now is: because for each artifact we have, we use them with their interfaces, how to compose them into on a specific computation?

This is the purpose of MirageOS: a tool to compose implementations (.ml) according expected interfaces (.mli) and produce then a operating system (the specific computation).

A MirageOS project

Indeed, the global idea of an unikernel is: develop the main computation of your operating system and be able to abstract it over protocols, devices and, at the end, targets.

Let's start to talk about the TCP/IP stack. Usually, on UNIX, we create a socket and we use it to receive and send data. Then, the role of your operating system is to handle it with your ethernet/wlan card.

We can abstract the idea of the socket by this interface:

type t
type error

val recv : t -> bytes -> ([ `Eoi | `Data of int ], error) result
val send : t -> string -> (int, error) result

Then, we can trust over this interface to represent the way to send and receive data. Of course, at this stage, we don't know details about implementation - and this is what we want.

module Make (Flow : FLOW) = struct
  let start flow =
    Flow.send flow "Hello World!"
end

The abstraction is done. Now, we have our main computation which can be use with something implements our socket.

And it comes with another tool, `Functoria` to orchestrate, depending on the target, which implementation will be use to apply the final functor. For UNIX, we will apply the functor with `mirage-tcpip.stack-socket` and for Solo5/Xen, we apply with `mirage-tcpip.stack-direct`.

functor everywhere

Functorize the code seems to be a good idea where:

  • the cost at the runtime is minimal
  • abstraction is powerful (we can define new types, constraints, etc.)

An example

We can show what is really going on about MirageOS about a little example on the abstraction of the `Console` to be able to write something. Imagine this unikernel:

module type CONSOLE = sig
  type t

  val endline : t -> string -> unit
end

module Make (Console : CONSOLE) = struct
  let start console =
    Console.endline console "Hello World!"
end

This unikernel expects an implementation of the Console. The idea behind the Console is to be able to write something on it. In MirageOS, the interface should provide something to represent the console (the type t) and what you can do with it (the function val endline).

Then, usually, Functoria will generate a main.ml according the chosen target and apply our functor with the right implementation. But let's talk about implementations.

Implementations

We probably should have 2 implementations:

  • an UNIX implementation which will use the syscall write
  • a standalone implementation which should work on any targets (like Solo5) - and it should depend only on the caml runtime
type t = Unix.file_descr

let endline fd str =
  let _ = Unix.write_substring fd str 0 (String.length str) in
  let _ = Unix.write_substring fd "\n" 0 1 in
  ()
;;
type t = out_channel

let endline oc str =
  output_string oc str ;
  output_string oc "\n"
;;

Orchestration

As I said, then, Functoria will take the place and will generate a main.ml which will:

  • apply Unikernel.Make
  • call the start function with the representation of the Console

Concretely, this file appears when you do mirage configure where you can specify the target. So, imagine we want to use the UNIX target (the default one), Functoria will generate:

include Unikernel.Make(Console_unix)

let () = start Unix.stdout

Compilation

The compilation can be separated into 2 steps where we compile object files first and we do the link according the target:

$ ocamlopt -c unikernel.ml
$ ocamlopt -c console_unix.ml
$ ocamlopt -c main.ml
$ ocamlopt -o main -c unix.cmxa \
  console_unix.cmx unikernel.cmx main.cmx

We can see that the most specific command according the target is the link step where unix.cmxa appears. Of course, for another target like Solo5, we will use console_caml.ml. The link step will be a bit complex where we will produce a main.o (with -output-obj option). Then, the mirage tool will properly call ld with a specific link script according the target.

Results

Of course, all of this process is done by the mirage tool but it's interesting to understand what is going on when we do the usual:

  • mirage configure
  • mirage build

Implementation according the target

For some others targets - much more specials targets - implementation can directly use the syscall available on the target (like solo5_console_write) with external.

external solo5_console_write : string -> unit = "solo5_console_write"

type t = unit

let endline () str =
  solo5_console_write str ;
  solo5_console_write "\n"
;;

As you can see, we still continue to follow the interface CONSOLE even if the representation of t is unit (so, nothing).

The power of the abstraction

The goal of all of that is to be able to switch easily from an implementation to another one - like, switch from socket given by the Unix module to our own implementation of the TCP/IP stack.

Finally, the end user can completely discard details about underlying implementations used for his purpose and he can focus only on what he wants - of course, he must trust on what he uses. But if he does correctly the job, then others users can go further by composition and hijack underlying implementations by something else without any update of the main computation.

An example of that is to make a website and plug without any headache a TLS support. It should only be a composition between the TCP/IP flow with TLS to emit the same abstraction as before:

val with_tls
  :  (module Flow with type t = 'flow)
  -> (module Flow with type t = 'flow * Tls.t)

Globally, each piece of your unikernel can be replaced by something else (more fast, more secure, etc.). MirageOS is not a monolithic software at end, it's a real framework to build your operating system.

Other posts