I recently gave a guest lecture on operability at University College, London (UCL) for the MSc in Software Systems Engineering students. This post is a summary of part of that talk.
Summary: with little access to underlying systems, Serverless and IoT platforms should encourage us to make good application logging a fundamental practice to enhance operability.
Serverless is an emerging “new normal” for web-based applications and services where the unit of deployment and billing is a single function a few lines of code in length. Serverless enables radical transparency in billing and radical ease of use for developers; when combined with with marketplace (“app store”) offerings from cloud vendors like AWS — allowing teams to compose applications from pre-built functions rather than writing everything themselves — the Serverless provides hugely accelerated development times and (often) reduced operational cost. With no access to the underlying servers (hence “Serverless”) and therefore little infrastructure to maintain, organisations can focus on delivering differentiating business value.
Likewise, the IoT (connected devices) landscape is rapidly maturing, with many comprehensive IoT platforms available for both consumer focus (from AWS, Azure, etc.) and an industrial focus (Bosch, GE, Siemens). There are also some nice crossovers from the cloud sector such as the Resin container platform for IoT (resin.io) with its comprehensive device management and software update capabilities (if you’re into IoT, check out Etcher from Resin.io with its very cool multi-device flashing).
All these platforms allow development teams to focus on core business logic, reducing the need to deal with the underlying hardware infrastructure. However, to get the best out of these evolving platforms, organisations are wise to invest in good logging approaches to improve operability and avoid their software becoming the “new legacy”.
We need good logging for Serverless and IoT systems partly because we have little access to the physical infrastructure. Good logging becomes the way in which we see how the software is behaving. Logging is the way in which we detect anomalies and race conditions, and logging is the way in which we trace transactions across machine boundaries. Using the analogy of a parcel delivery tracking, we identify and log discrete states reached in the software (ArrivedAt Depot, InTransit, Delivered) and also a single “journey” through the code:
The days of “logorrhoea” (careless, overly verbose logging) should be behind us now. Instead, good logging uses a curated set of distinct software states — agreed through collaboration between development and operations teams — to define what gets logged.
Exactly how the logging happens depends on the technology: with AWS Lambda (Amazon’s Serverless offering), we can use Kinesis Firehose to grab logs from Lambda functions and stream them to a central log aggregator for analysis and search. With IoT devices, we might run a local log shipper on the device (if we have enough local storage and CPU). Given that the IoT device might be only intermittently connected to the network and/or might have limited network bandwidth, we probably do some compression and aggregation before sending the logs to a remote log aggregator.
The goal of good logging is rich operational awareness for development and operations teams; by aggregating and searching carefully curated log data, teams gain understanding of how their systems work “in the wild” and can respond rapidly to unexpected events — essential for effective operability for Serverless and IoT.
Learn more about operability in the Team Guide to Software Operability by Matthew Skelton and Rob Thatcher.
Free sample chapter available from operabilitybook.com