gemini://://ainent.xyz/devlog/smolver/2023-02-17-smolver-development-log-part-9.gmi
View on Gemini
#

smolver development log, part 9 - client certificates

##

Intro

This is the ninth in a planned series of posts (well, tenth if you count the announcement) where I'll share my experience writing smolver, my Gemini server software, written in Swift.

You can find the previous iterations below.

##

License

The code snippets in here taken directly from smolver and one of its dependencies, SocketServer, which are free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License only.

smolver and SocketServer are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with smolver or SocketServer. If not, see <https://www.gnu.org/licenses/>.

##

v1.1.0

###

What's changed?

  • Updated dependency, BlueSocket, from 2.0.2 to 2.0.4
  • Updated dependency, BlueSSLService, from 2.0.1 to 2.0.2, which upgraded to OpenSSL 3.0
  • Forked and patched BlueSSLService to fully support client certificates
  • Rebased BlueSSLService changes on top of 2.0.2
  • Added configurable client certificate support, with supporting documentation
  • Fixed invalid example JSON in a couple locations
  • Add some clarification to documentation
  • Other documentation tweaks
  • Updated link to backlog items to point to future state items instead of already-implemented beta items
  • Renamed a struct from SingleDirectoryConfiguration to LocalConfiguration
  • Wrapped logging mechanism in a protocol for dependency injection purposes
  • Refactored the Request struct to be less dependent on the entirety of the global configuration file contents
  • Simplified a function signature
  • Almost pulled some hair out ;)
  • Ripped out some backlog items related to static site generation, put into separate dedicated repository
  • Reorganized and expanded backlog

###

Changelog brain dump

I had delayed this feature for so long, despite it plus the next planned one (dynamic content) being able to enable some app ideas I have, because I suspected it would be a pain in the ass and just wasn't in the mood to deal with it. Had some extra time the week when I originally started this, so thought I'd get started. Turns out, I was right: this was a huge pain in the ass.

This post is a more of a brain dump than the others in this series. If this seems disjointed, it is because I wrote this over the course of about 5 months, outlining along the way and then expanding at the end.

Update: 5 months later, I have just merged the implementation and am still outlining and polishing this post.

Update 2: weeks later, and damn this post is getting long.

###

Enhancing SocketServer framework

SocketServer is the base socketing framework underlying smolver, as well as my MUD. As this is where the raw TCP socket is created and managed and has SSL applied to it, this layer required some code changes to retrieve the client certificate from the SSL connection. This ended up being a deep rabbit hole.

The first thing I needed to do, before even writing any code, was to research what metadata is available in an arbitrary client certificate. I then narrowed that down to 3 items, at least for this first iteration:

  • Not valid before date
  • Not valid after date
  • Fingerprint

Next up was to determine how to programmatically retrieve an attached client certificate from the SSL connection. This was a learning process for me, and consisted of:

  • Fighting with the SSL library that smolver uses, BlueSSLService
  • Reading OpenSSL (which BlueSSLService uses) documentation
  • Fighting OpenSSL APIs

After many hours of this, I decided to ask ~skyjake a question, and his response had me revisiting something I had tried the previous day (shoutout and thank you to ~skyjake!), only with a slightly different return value given to OpenSSL. It worked!

The next problem was that BlueSSLService refused to call its SSL verification callback that SocketServer needed to implement to retrieve the certificate. I patched that and then found another defect in BlueSSLService: requesting the client certificate from within that callback always returned nil. Begrudgingly, I patched that as well. I still am not thrilled about maintaining a fork of it, but the official repository appears to be unmaintained, including being unresponsive to questions, and this was easier than ripping it out and finding a different socketing and SSL library. More than likely, I'll end up doing that at some point to avoid the long term maintenance that experience tells me is going to come with such a thing.

Another problem: getting the certificate fingerprint. This one literally had me almost pulling my hair out (fortunately, none came out). I almost resorted to delegating to the openssl terminal command from within Swift code, and even ended up saving a local copy of a client cert to disk just to test out the CLI to manually retrieve the 3 data points I needed from it, that I then intended to automate. This would have required parsing openssl's human-readable string output, which would not have been not ideal, but I was at my wits' end. For now, I gave up and called it a day.

The next day, I tried again with fresh debug logs, using the same code as the previous day, and it turns out the code was mostly right to begin with. I blame fatigued brain and eyes. The only further change I needed was to iterate farther into an UnsafeMutablePointer<UInt8> to retrieve the data I needed. This is a C data type bridged to Swift. I realize that sounds a bit hand-wavy, but the code is quite arcane. I am pretty sure I found the algorithm on an OpenSSL forum then translated it to Swift. The result:

```
let chars = [CUnsignedChar](repeating: 0, count: Int(EVP_MAX_MD_SIZE))
```
let md = UnsafeMutablePointer<UInt8>.allocate(capacity: chars.count)
```
```
X509_digest(peerCertificate, EVP_sha256(), md, nil)
```
```
let fingerprint = (0..<32)
```
    .map { pos in
```
        String(format: "%02x", arguments: [md[pos]])
```
    }
```
    .reduce("") { $0 + $1.uppercased() + ":" }
```
    .dropLast()

With that out of the way, I could now scrap the dreaded backup plan to delegate to the CLI, sticking to native Swift.

The next battle was figuring out how to programmatically get the 'not valid before date' and 'not valid after date' from the certificate. This was a battle both in understanding the OpenSSL C API documentation and in figuring out how to call it from Swift.

The return values for those dates come into Swift as UnsafePointer<ASN1_TIME>? (the ? is intentional, that indicates an Optional type in Swift) types, which needed converted to the standard C tm struct (which itself needed an awkward dance to wrap it into an UnsafeMutablePointer) and then again converted into Swift's native Date type. Converting from tm to Date was not straightforward either. Once I finally figured it out via trial and error, I learned the hard way that tm's tm_year property is relative to 1900. In hindsight, the code for this looks trivial:

```
var tmpTime = tm()
```
let time = withUnsafeMutablePointer(to: &tmpTime) { $0 }
```
```
let notValidBeforeDate = X509_get0_notBefore(peerCertificate)
```
ASN1_TIME_to_tm(notValidBeforeDate, time)
```
let notValidBeforeDateComponents = DateComponents(tm: time.pointee)
```
```
let notValidAfterDate = X509_get0_notAfter(peerCertificate)
```
ASN1_TIME_to_tm(notValidAfterDate, time)
```
let notValidAfterDateComponents = DateComponents(tm: time.pointee)
```
```
guard let notValidBeforeDate = notValidBeforeDateComponents.date,
```
    let notValidAfterDate = notValidAfterDateComponents.date else {
```
        return nil
```
}

By this point I finally had a fledgling understanding of how C is bridged to Swift and vice versa, likely just enough to be dangerous. Despite having been writing Swift since it was first publicly announced in 2014, this was a new one for me, and was exciting.

With the hard part done, I was on to code cleanup, including encapsulating the X509 certificate code into its own dedicated struct: X509Certificate. I wanted no OpenSSL nor BlueSSLService code in smolver itself.

For reference, here's the final combined version of the meat of this struct:

```
private init?(peerCertificate: OpaquePointer) {
```
    var tmpTime = tm()
```
    let time = withUnsafeMutablePointer(to: &tmpTime) { $0 }
```
```
    let notValidBeforeDate = X509_get0_notBefore(peerCertificate)
```
    ASN1_TIME_to_tm(notValidBeforeDate, time)
```
    let notValidBeforeDateComponents = DateComponents(tm: time.pointee)
```
```
    let notValidAfterDate = X509_get0_notAfter(peerCertificate)
```
    ASN1_TIME_to_tm(notValidAfterDate, time)
```
    let notValidAfterDateComponents = DateComponents(tm: time.pointee)
```
```
    guard let notValidBeforeDate = notValidBeforeDateComponents.date,
```
        let notValidAfterDate = notValidAfterDateComponents.date else {
```
            return nil
```
    }
```
```
    self.notValidBeforeDate = notValidBeforeDate
```
    self.notValidAfterDate = notValidAfterDate
```
```
    let chars = [CUnsignedChar](repeating: 0, count: Int(EVP_MAX_MD_SIZE))
```
    let md = UnsafeMutablePointer<UInt8>.allocate(capacity: chars.count)
```
```
    X509_digest(peerCertificate, EVP_sha256(), md, nil)
```
```
    let fingerprint = (0..<32)
```
        .map { pos in
```
            String(format: "%02x", arguments: [md[pos]])
```
        }
```
        .reduce("") { $0 + $1.uppercased() + ":" }
```
        .dropLast()
```
    self.fingerprint = String(fingerprint)
```
    self.cPeerCertificate = peerCertificate
```
}

By this point I had patched BlueSSLService yet again to pass the socket associated with the SSL verification into the callback I mentioned fixing above. Of course BlueSSLService had no way of letting you associate the socket with the SSL verification callback that I could figure out. So what's another patch to the fork, at this point? This simplified the SocketServer layer so all it needs to do in the callback is create an X509Certificate instance off the C structure and associate the certificate with the attached socket for later retrieval and passing back up to smolver. This callback is asynchronous and wholly unrelated to my pre-existing socket connection processing logic, so the simplest way to achieve this was a dictionary where the keys are the socket and the values are the certificate for said socket.

###

Enhancing smolver itself

While the SocketServer layer was fun in that it was a new challenge for me, I was glad to be back to the higher level Gemini protocol layer. Unsurprisingly, this was also more complex than anticipated due to the number of possible scenarios. The configurability I implemented also contributed to the complexity here, but without that the server would have been too rigid. I needed flexibilty to be able to build authenticated services, making configuration a hard requirement. This complexity justified extensive unit tests, mostly via Test Driven Development (TDD).

After implementing most of the tests, I took a break for a couple months (more than once, actually...) and came back to finish up. It was easy enough to pick up where I had left off after I studied the TODO list I left myself and the code I had written. Though before I did that refresher, I did not understand my own code. That's always fun.

After implementing the remaining test cases and fixing the failing ones, next up was code cleanup. Before I could get to that, I needed to study my temporary debug logs to try to understand a particular piece of the implementation a bit more. This was after having already studied it after the breaks -- yes, this bit of code was complicated.

While studying the logs, I noticed some local file paths looked wrong. They had the configured whitelist multiple times in the URLs being checked. This should have been a red flag, but I initially ignored it. I came back to it because it continued to bother me. After studying it, I realized the code didn't make any sense for the problem it was trying to solve: given that the requested resource requires authentication and only these whitelisted certs are allowed, is the client certficate on that list? I started removing code to see what would happen, and my tests still passed. I continued until there was almost nothing left of one function, and the tests still passed. Puzzled, I thought I must be doing something wrong. Then it hits me: it really is this simple. The loop wasn't needed, the code just needed to check for the existence of a matching client cert in the configured directory. In retrospect, this seems obvious, but this was clearly a symptom of a tired, burned out brain that wrote the code months beforehand. Good thing I put that TODO reminder to do some code cleanup, otherwise I probably would not have even noticed this.

I think another reason this was writen the way it was was because I had written the non-whitelist code path (and unit tests) first and then wrote the whitelist unit tests, copy/pasted the non-whitelist code into the whitelist section of the code, then tweaked until the tests passed. I hadn't stopped to think if the directory walking was even necessary for this scenario. The unit tests let me iterate on this algorithm and get instant feedback -- a classic example of 'red/green/refactor', a TDD philosophy.

The resulting code boiled down to simply:

```
private func checkAuthorization(of x509Certificate: ClientCertificateProtocol, using whitelistedPath: URL) throws {
```
    let certPath = GlobalConfiguration.configDirectory + whitelistedPath.path + "/" + x509Certificate.fingerprint + ".pem"
```
```
    guard fileManager.fileExists(atPath: certPath),
```
```
        // A common reason this could fail is if a server admin puts a corrupted .pem file in place.
```
        //
```
        // It's necessary because, with whitelists, the intention is for the server admin
```
        // to add (either manually, or through a custom automated process) the certificate
```
        // (as stated explicitly in the documentation), which obviously has the possibility
```
        // of human error. This check will detect that and not let the whitelisted user in,
```
        // which should prompt the admin to investigate.
```
        type(of: x509Certificate).create(from: URL(fileURLWithPath: certPath)) != nil else {
```
            logger.debug("Client certificate not whitelisted for this resource, disallowing access",
```
                         from: .remote(hostname: clientHostname))
```
            throw Error.clientCertificate(.notAuthorized(reason: "This client certificate is not authorized for this resource."))
```
    }
```
```
    logger.debug("Client certificate whitelisted for this resource, allowing access",
```
                 from: .remote(hostname: clientHostname))
```
}

###

Known issues

These issues are all related to the requirement in the Gemini spec wherein client certificates bound to one path are to be bound to paths below it; i.e., if a client certificate is allowed to access 'localhost/gemlog/', it should also be allowed to access 'localhost/gemlog/tech/', 'localhost/gemlog/any/subdirectory/like/so/', etc.

  • For smolver to behave like so, 'gemlog/', 'gemlog/tech/', and 'gemlog/any/subdirectory/like/so/' will need configured with a .smol.json file and the directory key. Need to detect this scenario automatically so you won't have to duplicate your configuration(s).
  • When configured to allow anonymous authenticated access, some scenarios of multiple nested subdirectories -- with the topmost containing the .smol.json configuring the authentication -- are skipping parent directories when checking if the client certificate is already known. This is theoretically a security issue because it could allow a user to be able to see more directories than intended, but this is for anonymous public authentication anyway, not whitelists... so it's already accessible. Ergo, is it really a problem in practice? This is more likely to be noticeable in practice once CGI scripts are supported. Until such time that a non-theoretical use case runs into this problem, there are no plans to fix it; it should be easy enough to add test(s) and/or assertion(s) into the existing tests to patch this if and when that happens.
  • When configured to allow anonymous authenticated access, smolver is walking up the directory tree to look for an already-stored matching client certificate. It is supposed to stop once it checks '~/.smolver/', but it is continuing to go one more directory up, to '~/'. This is unintentional and should be fixed; however, when putting a matching client certificate in that final directory ('~/'), smolver does not see it (despite the logs stating that it is indeed looking there) and instead stores it in the directory serving the request (this part of the behavior is correct). I suspect this is related to the above item? Need to confirm.

###

Post-release findings

Something in v1.1.0 has caused production RAM usage to go from ~8-9MB to ~21-22MB. While not a big jump in absolute terms, in relative terms this is roughly a 250% increase -- unacceptable for the smolnet. This data may be somewhat subjective, because I got these numbers from periodic manual monitoring of the systemd status output, and I am not sure how often the data shown there is itself updated. This last point would be the first thing to research. If that updates relatively quickly, then I'll look into the code changes for v1.1.0. I suspect the culprit is something in SocketServer, but that is just a hunch. Profile the code for performance data before trying any code changes.

Update 3: two weeks of running in production and it has reverted to ~8-9MB of RAM. I'm beginning to suspect that the systemd status output may not necessarily show live data?

###

Code metrics

Overall, this feature required significant code changes.

Running cloc on smolver production code v1.0.0 returns:

```
cloc Sources
```
      12 text files.
```
      12 unique files.
```
       0 files ignored.
```
```
github.com/AlDanial/cloc v 1.86  T=0.08 s (156.6 files/s, 21816.9 lines/s)
```
-------------------------------------------------------------------------------
```
Language                     files          blank        comment           code
```
-------------------------------------------------------------------------------
```
Swift                           12            215            311           1146
```
-------------------------------------------------------------------------------
```
SUM:                            12            215            311           1146
```
-------------------------------------------------------------------------------

Running cloc on smolver production code v1.1.0 returns:

```
cloc Sources
```
      13 text files.
```
      13 unique files.
```
       0 files ignored.
```
```
github.com/AlDanial/cloc v 1.86  T=0.09 s (142.2 files/s, 22469.9 lines/s)
```
-------------------------------------------------------------------------------
```
Language                     files          blank        comment           code
```
-------------------------------------------------------------------------------
```
Swift                           13            267            365           1422
```
-------------------------------------------------------------------------------
```
SUM:                            13            267            365           1422
```
-------------------------------------------------------------------------------

These numbers do not include smolver's dependencies nor unit tests.

Running cloc on smolver test code v1.0.0 returns:

```
cloc Tests
```
       1 text file.
```
       1 unique file.
```
       0 files ignored.
```
```
github.com/AlDanial/cloc v 1.86  T=0.03 s (29.7 files/s, 1692.8 lines/s)
```
-------------------------------------------------------------------------------
```
Language                     files          blank        comment           code
```
-------------------------------------------------------------------------------
```
Swift                            1             11             15             31
```
-------------------------------------------------------------------------------

The tests in this count were auto-generated, unchanged, and of no value.

Running cloc on smolver test code v1.1.0 returns:

```
cloc Tests
```
       1 text file.
```
       1 unique file.
```
       0 files ignored.
```
```
github.com/AlDanial/cloc v 1.86  T=0.08 s (12.2 files/s, 21305.9 lines/s)
```
-------------------------------------------------------------------------------
```
Language                     files          blank        comment           code
```
-------------------------------------------------------------------------------
```
Swift                            1            247            122           1376
```
-------------------------------------------------------------------------------

This time, I'll also include line changes for one of the dependencies, SocketServer, since that required significant changes as well.

Running cloc on SocketServer production code v0.0.17 returns:

```
cloc Sources
```
       6 text files.
```
       6 unique files.
```
       0 files ignored.
```
```
github.com/AlDanial/cloc v 1.86  T=0.05 s (120.1 files/s, 9825.9 lines/s)
```
-------------------------------------------------------------------------------
```
Language                     files          blank        comment           code
```
-------------------------------------------------------------------------------
```
Swift                            3             60             52            325
```
XML                              1              0              0             24
```
C/C++ Header                     1              6             18              3
```
YAML                             1              0              0              3
```
-------------------------------------------------------------------------------
```
SUM:                             6             66             70            355
```
-------------------------------------------------------------------------------

Running cloc on SocketServer production code v0.0.18 returns:

```
cloc Sources
```
       7 text files.
```
       7 unique files.
```
       0 files ignored.
```
```
github.com/AlDanial/cloc v 1.86  T=0.07 s (107.6 files/s, 10656.4 lines/s)
```
-------------------------------------------------------------------------------
```
Language                     files          blank        comment           code
```
-------------------------------------------------------------------------------
```
Swift                            4             87            116            436
```
XML                              1              0              0             24
```
C/C++ Header                     1              6             18              3
```
YAML                             1              0              0              3
```
-------------------------------------------------------------------------------
```
SUM:                             7             93            134            466
```
-------------------------------------------------------------------------------

These numbers do not include SocketServer's dependencies, including but not limited to BlueSSLService.

That is a net change of 276 lines to smolver production code, 1345 lines to smolver unit tests, and 111 lines to SocketServer, for a total increase of 1732 lines. Incidentally, this means this feature added more lines of code than even existed in the previous version.