Types Of Logs
Start with "what's a log"? Let's take the definition from here
an append-only, totally-ordered sequence of records ordered by time.
I would add to this definition as
an append-only, totally-ordered sequence of records ordered by time generated by engineering systems
Engineering systems, our systems supporting websites, games, jets produce data that is required for many use cases across a company including
- Machine Learning
- Reporting
- Debugging
The above definition differs from a lot of other data sets used in websites in that it's "append only". The word "log" and "logging" gets many definitions and conversations can easily get muddled. My attempt here is to tease out the differences between different types of logs to have more clear communication. I hope at the end to not have anything called a "log" as that would apply to all the below categories.
I propose that there are 3 categories of logs
Metrics
Metrics are a series of records that follow a typical pattern,
- Name
- Timestamp
- Value
Standard examples are
- API latency
- Count of failures
Development teams often monitor the behavior of a system for uptime and to decide where changes need to be made.
Is the system working well? Look at the plot of errors over time.
Which endpoint is slow? Look at the latency of all endpoints and see which one has the highest value.
Metric logging is an essential part of understanding the real time behavior of a system.
Errors
If a piece of software hits an exception and throws an error the error and stacktrace are needed to fix the error.
Oftentimes software systems have a variety of logging levels such as DEBUG, INFO and ERROR. In practice in production there's generally none of these standard logs except for ERROR. This typically means an exception has occurred and should be corrected as soon as possible.
Sometimes metrics are calculated on top of errors but should be standalone and independent to increase both the speed, efficiency, and cost of the metrics being stored. Why calculate the metric off of thousands of strings when you just have the metric "12789" sitting there?
Error logging is an essential part of correcting fatal errors in a system.
Events
Events are a record of what happened when and any associated data required to understand the full extent of this event.
Such as
- New user created - userid, timestamp, user type
- Recommendations are available for loading - in systems where product recommendations are produced in batch mode, this can be an efficient method of informing a production system to load new data in.
Event driven architectures are commonplace and typically powered by a queuing system such as Kafka which allows many listeners to pick and choose which events are required to power their specific service.
These systems are very powerful and uniquely suited to fulfill many data architectures such as
- Real time data science model execution
- Reporting decoupled from production application databases
These events are almost always needed perpetuity for data science teams to build new models as well as for reporting to understand systems and metrics.
As with many new AI/LLM powered architectures, Event Logging can also be used to track what actually happened, what input caused what output using which LLM. This can greatly aid in the understanding of AI systems making many thousands of decisions.
Conclusion
Append only data sets are commonly used across websites and are often called "logs" or "logging" when in reality development teams are generally talking about Metrics, Errors or Events.
Each type of log has it's own unique use cases and stakeholder and should be considered as an essential building block production systems.Being able to breakdown the type of record being asked for into the type of log that is required makes understanding and implementation much more straightforward and can be generalized as well.