There are two fundamental advances with the structured approach that can’t be emulated using text logs without (sometimes extreme levels of) additional effort.
Event Types
When you write two events with log4net like:
log.Debug("Disk quota {0} exceeded by user {1}", 100, "DTI-Matt");
log.Debug("Disk quota {0} exceeded by user {1}", 150, "nblumhardt");
These will produce similar text:
Disk quota 100 exceeded by user DTI-Matt
Disk quota 150 exceeded by user nblumhardt
But, as far as machine processing is concerned, they’re just two lines of different text.
You may wish to find all “disk quota exceeded” events, but the simplistic case of looking for events like 'Disk quota%'
will fall down as soon as another event occurs looking like:
Disk quota 100 set for user DTI-Matt
Text logging throws away the information we initially have about the source of the event, and this has to be reconstructed when reading the logs usually with more and more elaborate match expressions.
By contrast, when you write the following two Serilog events:
log.Debug("Disk quota {Quota} exceeded by user {Username}", 100, "DTI-Matt");
log.Debug("Disk quota {Quota} exceeded by user {Username}", 150, "nblumhardt");
These produce similar text output to the log4net version, but behind the scenes, the "Disk quota {Quota} exceeded by user {Username}"
message template is carried by both events.
With an appropriate sink, you can later write queries where MessageTemplate="Disk quota {Quota} exceeded by user {Username}"
and get exactly the events where the disk quota was exceeded.
It’s not always convenient to store the entire message template with every log event, so some sinks hash the message template into a numeric EventType
value (e.g. 0x1234abcd
), or, you can add an enricher to the logging pipeline to do this yourself.
It’s more subtle than the next difference below, but a massively powerful one when dealing with large log volumes.
Structured Data
Again considering the two events about disk space usage, it may be easy enough using text logs to query for a particular user with like 'Disk quota' and like 'DTI-Matt'
.
But, production diagnostics aren’t always so straightforward. Imagine it’s necessary to find events where the disk quota exceeded was below 125 MB?
With Serilog, this is possible in most sinks using a variant of:
Quota < 125
Constructing this kind of query from a regular expression is possible, but it gets tiring fast and usually ends up being a measure of last resort.
Now add to this an event type:
Quota < 125 and EventType = 0x1234abcd
You start to see here how these capabilities combine in a straightforward way to make production debugging with logs feel like a first-class development activity.
One further benefit, perhaps not as easy to prevent up front, but once production debugging has been lifted out of the land of regex hackery, developers start to value logs a lot more and exercise more care and consideration when writing them. Better logs -> better quality applications -> more happiness all around.
When you are collecting logs for processing, be it for parsing into some database and/or searching through the processed logs later, using structured logging makes some of the processing easier/more efficient. The parser can take advantage of the known structure (e.g. JSON, XML, ASN.1, whatever) and use state machines for parsing, as opposed to regular expressions (which can be computationally expensive (relatively) to compile and execute). Parsing of free-form text, such as that suggested by your coworker, tends to rely on regular expressions, and to rely on that text not changing. This can make parsing free-form text rather fragile (i.e. parsing is tightly coupled to the exact text in the code).
Consider also the search/lookup case, e.g.:
SELECT text FROM logs WHERE text LIKE "Disk quota";
LIKE
conditions require comparisons with every text
row value; again, this is relatively computationally expensive, particularly so when wildcards are used:
SELECT text FROM logs WHERE text LIKE "Disk %";
With structured logging, your disk-error related log message might look like this in JSON:
{ "level": "DEBUG", "user": "username", "error_type": "disk", "text": "Disk quota ... exceeded by user ..." }
The fields of this kind of structure can map pretty easily to e.g. SQL table column names, which turn means the lookup can be more specific/granular:
SELECT user, text FROM logs WHERE error_type = "disk";
You can place indexes on the columns whose values you expect to search/lookup frequently, as long as you don’t use LIKE
clauses for those column values. The more you can break down your log message into specific categories, the more targeted you can make your lookup. For example, in addition to the error_type
field/column in the example above, you could make even be "error_category": "disk", "error_type": "quota"
or somesuch.
The more structure you have in your log messages, the more your parsing/searching systems (such as fluentd
, elasticsearch
, kibana
) can take advantage of that structure, and perform their tasks with greater speed and less CPU/memory.
Hope this helps!
You won’t find much benefit from structured logging when your app creates a few hundred log messages per day. You definitely will when you have a few hundred log messages per second coming from many different deployed apps.
Related, the setup where log messages end up in the ELK Stack is also appropriate for scale where logging to SQL becomes a bottleneck.
I have seen the setup of “basic logging and searching” with SQL select .. like
and regexps pushed to its limits where it falls apart – there are false positives, omissions, horrible filter code with knwon bugs that’s hard to maintain and no-one wants to touch, new log messages that don’t follow the filter’s assumptions, reluctance to touch logging statements in code lest they break reports, etc.
So several software packages are emerging to deal with this problem in a better way. There is Serilog, I hear that the NLog team is looking at it, and we wrote StructuredLogging.Json
for Nlog, I also see that the new ASP.Net core logging abstractions “make it possible for logging providers to implement … structured logging”.
An example with StructuredLogging. You log to an NLog logger like this:
logger.ExtendedError("Order send failed", new { OrderId = 1234, RestaurantId = 4567 } );
This structured data goes to kibana. The value 1234
is stored in the OrderId
field of the log entry. You can then search using kibana query syntax for e.g. all log entries where @LogType:nlog AND Level:Error AND OrderId:1234
.
Message
and OrderId
are now just fields that can be searched for exact or inexact matches as you need, or aggregated for counts. This is powerful and flexible.
From the StructuredLogging best practices:
The message logged should be the same every time. It should be a
constant string, not a string formatted to contain data values such as
ids or quantities. Then it is easy to search for.The message logged
should be distinct i.e. not the same as the message produced by an
unrelated log statement. Then searching for it does not match
unrelated things as well.