Happy Structured Logging in R

構造化ログ入門.R

atusy

Atusy

Publication

Why logging?

To record or help identifying (potential) issues and enhance improvements.

  • unexpected behaviors, typically errors
  • performance issues
  • security issues
  • auditing and compliance

The logger package helps you.

Why not print?

  • print-debugging is not suitable for long-term maintenance.
  • print-logging lacks some features
    • generate analyzable logs
    • threshold by log level
    • control output destinations
    • distinguish program output from logging
      • typically on CLI tools
      • less important in R

Avoid reinvention with print, unless intentionally.

Generate analyzable logs

  • has traceable context
    • timestamp
    • log level
    • message
  • has machine readable format (e.g., JSON Lines)
{"time":"2024-12-03 22:38:17","level":"INFO","ns":"japanr2024","message":"Started session","session_id":"01JE6BWEWP0BAAFGPZTZ9GBNM6"}
{"time":"2024-12-03 22:38:17","level":"INFO","ns":"japanr2024","message":"Received request","session_id":"01JE6BWEWP0BAAFGPZTZ9GBNM6","trace_id":"01JE6BWEXF5F6HDD4YFZBP157F"}
{"time":"2024-12-03 22:38:17","level":"DEBUG","ns":"japanr2024","message":"Received request parameters","value":{"locations":"tokyo, london"},"session_id":"01JE6BWEWP0BAAFGPZTZ9GBNM6","trace_id":"01JE6BWEXF5F6HDD4YFZBP157F"}
{"time":"2024-12-03 22:38:17","level":"DEBUG","ns":"japanr2024","message":"Locations parsed","value":["tokyo","london"],"session_id":"01JE6BWEWP0BAAFGPZTZ9GBNM6","trace_id":"01JE6BWEXF5F6HDD4YFZBP157F"}
{"time":"2024-12-03 22:38:17","level":"DEBUG","ns":"japanr2024","message":"Processing location","value":"tokyo","span_id":"01JE6BWEXSW65DD1EC858XMZQ3","session_id":"01JE6BWEWP0BAAFGPZTZ9GBNM6","trace_id":"01JE6BWEXF5F6HDD4YFZBP157F"}
{"time":"2024-12-03 22:38:17","level":"DEBUG","ns":"japanr2024","message":"Plotting weather forecast","span_id":"01JE6BWEXSW65DD1EC858XMZQ3","session_id":"01JE6BWEWP0BAAFGPZTZ9GBNM6","trace_id":"01JE6BWEXF5F6HDD4YFZBP157F"}

R-equivalent of JSON Lines

Named list of lines.

list(time = "2024-12-03 22:38:17", level = "INFO", ns = "japanr2024", message = "Started session", session_id = "01JE6BWEWP0BAAFGPZTZ9GBNM6")
list(time = "2024-12-03 22:38:17", level = "INFO", ns = "japanr2024", message = "Received request", session_id = "01JE6BWEWP0BAAFGPZTZ9GBNM6", trace_id = "01JE6BWEXF5F6HDD4YFZBP157F")
list(time = "2024-12-03 22:38:17", level = "DEBUG", ns = "japanr2024", message = "Received request parameters", value = {locations = "tokyo, london"}, session_id = "01JE6BWEWP0BAAFGPZTZ9GBNM6", trace_id = "01JE6BWEXF5F6HDD4YFZBP157F")
list(time = "2024-12-03 22:38:17", level = "DEBUG", ns = "japanr2024", message = "Locations parsed", value = ["tokyo", "london"], session_id = "01JE6BWEWP0BAAFGPZTZ9GBNM6", trace_id = "01JE6BWEXF5F6HDD4YFZBP157F")
list(time = "2024-12-03 22:38:17", level = "DEBUG", ns = "japanr2024", message = "Processing location", value = "tokyo", span_id = "01JE6BWEXSW65DD1EC858XMZQ3", session_id = "01JE6BWEWP0BAAFGPZTZ9GBNM6", trace_id = "01JE6BWEXF5F6HDD4YFZBP157F")
list(time = "2024-12-03 22:38:17", level = "DEBUG", ns = "japanr2024", message = "Plotting weather forecast", span_id = "01JE6BWEXSW65DD1EC858XMZQ3", session_id = "01JE6BWEWP0BAAFGPZTZ9GBNM6", trace_id = "01JE6BWEXF5F6HDD4YFZBP157F")

Threshold by log level

  • For production
    • FATAL: when an error stops the program
    • ERROR: when an error is caught and program continued running
    • WARNING: when an unexpected behavior occurs without error
    • INFO: various info for debugging/monitoring purpose
  • Additionally for development
    • DEBUG: further detailed info for debugging purpose
logger::log_threshold("INFO") # DEBUG logs are ignored

logger::log_info(message = "Program started.")

params <- list(x = 1)
logger::log_debug(message = "Received parameters", value = params)

Control output destinations

  • console (stdout/stderr)
  • file
  • DB
  • SaaS (e.g., Datadog, New Relic, …)

The more about analyzable logs

Previous sections introduced some key features of logging.

Let’s dive into the content of logs.

What is the preferred format?

Typically JSON Lines.

{"time":"2024-12-04 21:07:28","level":"INFO","message":"Program started."}
{"time":"2024-12-04 21:07:28","level":"WARN","message":"Some warning."}

Then, packages parse them and outputs data frame (e.g., jsonlite, yyjsonr, duckdb, …).

time level message
2024-12-04 21:07:28 INFO Program started.
2024-12-04 21:07:28 WARN Some warning.

What about human-readable format?

The log requires self-maintained parsers…

2024-12-04 21:07:28 [INFO] Program started
2024-12-04 21:07:28 [WARN] Some warning.

Can we use both?

YES!

  • human readable logs to console
  • machine readable logs to file

But you need advanced configurations.

How to design log schema?

key type desc example
time string timestamp 2024-12-03 22:38:17
level string log level INFO
ns string name of package, project, … japanr2024
message string static message Received request
value any value related to the message list(input = "tokyo")
session_id string unique per access 01JE6BWEWP0BAAFGPZTZ9GBNM6
trace_id string unique per request 01JE6BWEXF5F6HDD4YFZBP157F
span_id string unique per task 01JE6BWEXSW65DD1EC858XMZQ3
context list (Optional) additional context list(...)

Use static message and dynamic value

key type desc example
time string timestamp 2024-12-03 22:38:17
level string log level INFO
ns string name of package, project, … japanr2024
message string static message Received request
value any value related to the message list(input = "tokyo")
session_id string unique per access 01JE6BWEWP0BAAFGPZTZ9GBNM6
trace_id string unique per request 01JE6BWEXF5F6HDD4YFZBP157F
span_id string unique per task 01JE6BWEXSW65DD1EC858XMZQ3
context list (Optional) additional context list(...)

Use static message and dynamic value

# BAD: dynamic message
x <- 1
logger::log_info(message = glue::glue("x is {x}"))
{"time":"2024-12-07 14:08:20","level":"INFO","message":"x is 1"}
# GOOD: static message and dynamic value are tidyverse-friendly
logger::log_info(message = "x is defined", value = x)
{"time":"2024-12-07 14:08:20","level":"INFO","message":"x is defined","value":1}

Cautions for dynamic value

  • do not inlude sensitive information
  • do not include too large data

Add identifiers for traceability

key type desc example
time string timestamp 2024-12-03 22:38:17
level string log level INFO
ns string name of package, project, … japanr2024
message string static message Received request
value any value related to the message list(input = "tokyo")
session_id string unique per access 01JE6BWEWP0BAAFGPZTZ9GBNM6
trace_id string unique per request 01JE6BWEXF5F6HDD4YFZBP157F
span_id string unique per task 01JE6BWEXSW65DD1EC858XMZQ3
context list (Optional) additional context list(...)

Add identifiers for traceability

  • session_id
    • R sessions that loads the package
    • browser sessions in Shiny app
  • trace_id
    • function calls
    • user-interactions in Shiny app
  • span_id
    • various tasks within the same trace_id

These keywords typically appears in observable engineering.

Identifiers are often generated by ULID or UUIDv4.

Add context for additional information

key type desc example
time string timestamp 2024-12-03 22:38:17
level string log level INFO
ns string name of package, project, … japanr2024
message string static message Received request
value any value related to the message list(input = "tokyo")
session_id string unique per access 01JE6BWEWP0BAAFGPZTZ9GBNM6
trace_id string unique per request 01JE6BWEXF5F6HDD4YFZBP157F
span_id string unique per task 01JE6BWEXSW65DD1EC858XMZQ3
context list (Optional) additional context list(...)

Add context for additional information

Optional data that helps analyzing the log.

For example, parent-child relationship between logs.

{
  ...,
  "span_id": "01JDNN0EK0715J048SHYW7XZ2V",
  "context": {
    "parent_span_id": "01JDNN0F76GR0398VJ4MMGBCY7",
  }
}

Recap of introduction to logging

  • Logging helps identifying (potential) issues and enhances improvements
  • Logs should be analyzable
    • Use structured logs with JSON Lines format
    • Design schema that increments observability

How to log with logger?

# Configure
logger::log_appender(logger::appender_stdout)
logger::log_formatter(logger::formatter_json)
logger::log_layout(logger::layout_json_parser(c("time", "level", "ns")))
# Log
logger::log_info(message = "Program started.")
{"time":"2024-12-07 14:08:20","level":"INFO","ns":"global","message":"Program started."}

logger in package

Demo of logging in a Shiny app

https://github.com/atusy/japanr-2024-logging

JAPANR2024_LOG_LEVEL=INFO R --no-save -e 'devtools::load_all(); run()'
JAPANR2024_LOG_LEVEL=DEBUG R --no-save -e 'devtools::load_all(); run()'

Examples of log analysis

logdata <- system.file("extdata/japanr2024.jsonl", package = "japanr2024") |>
  yyjsonr::read_ndjson_file() |>
  dplyr::mutate(time = clock::date_time_parse(time, "Asia/Tokyo")) |>
  dplyr::glimpse()
Output
#> Rows: 184
#> Columns: 8
#> $ time       <dttm> 2024-12-03 22:38:17, 2024-12-03 22:38:17, 2024-12-03 22:38…
#> $ level      <chr> "INFO", "INFO", "DEBUG", "DEBUG", "DEBUG", "DEBUG", "DEBUG"…
#> $ ns         <chr> "japanr2024", "japanr2024", "japanr2024", "japanr2024", "ja…
#> $ message    <chr> "Started session", "Received request", "Received request pa…
#> $ session_id <chr> "01JE6BWEWP0BAAFGPZTZ9GBNM6", "01JE6BWEWP0BAAFGPZTZ9GBNM6",…
#> $ trace_id   <chr> NA, "01JE6BWEXF5F6HDD4YFZBP157F", "01JE6BWEXF5F6HDD4YFZBP15…
#> $ value      <list> <NULL>, <NULL>, ["tokyo, london"], <"tokyo", "london">, "t…
#> $ span_id    <chr> NA, NA, NA, NA, "01JE6BWEXSW65DD1EC858XMZQ3", "01JE6BWEXSW6…

Error count

Code
logdata |>
  dplyr::summarize(
    time = min(time),
    errors = sum(level == "ERROR"),
    .by = "trace_id"
  ) |>
  ggplot2::ggplot() +
  ggplot2::aes(time, errors) +
  ggplot2::geom_point() +
  ggplot2::scale_x_datetime(name = "Requested at", date_breaks = "10 sec")

Oh no! Many requests suffer errors!

Response time

Code
logdata |>
  dplyr::reframe(
    start = min(time),
    duration = max(time) - start,
    locations = lengths(value[message == "Locations parsed"]),
    .by = "trace_id"
  ) |>
  ggplot2::ggplot() +
  ggplot2::aes(start, duration, color = locations) +
  ggplot2::geom_point() +
  ggplot2::scale_x_datetime(name = "Requested at", date_breaks = "10 sec") +
  ggplot2::scale_y_continuous(name = "Response time") +
  NULL

Hmm… users have to wait for several seconds…

Deep dive into errors and slow responses.

Code
logdata |>
  dplyr::filter(!is.na(span_id)) |>
  dplyr::summarize(
    start = min(time),
    end = max(time),
    location = unlist(value[message == "Processing location"]),
    error = "ERROR" %in% level,
    .by = c("trace_id", "span_id")
  ) |>
  dplyr::mutate(
    location = forcats::fct_relevel(location, rev(unique(location)))
  ) |>
  ggplot2::ggplot() +
  ggplot2::aes(
    x = start, xend = end,
    y = location, yend = location,
    color = error,
  ) +
  ggplot2::geom_point() +
  ggplot2::geom_segment() +
  ggplot2::facet_grid(rows = "trace_id", scales = "free_y") +
  ggplot2::theme(
    strip.text.y = ggplot2::element_blank(),
    axis.title.x = ggplot2::element_blank()
  ) +
  ggplot2::scale_x_datetime(name = "", date_breaks = "10 sec")

Insights from the analysis

  • Fast user inputs cause too many requests
    • Same locations are processed multiple times
  • Invalid/unsupported locations cause errors
    • They are ignorable

Ideas to solve the issues

  • Improve performance by
  • Supress unimportant error logs by
    • skip obviously unimportant input (empty string or single character string)
    • handling errors with try-catch

Memoising slow functions

# Memoised versions of openmeteo functions
geocode <- memoise::memoise(openmeteo::geocode)
weather_forecast <- memoise::memoise(
  openmeteo::weather_forecast,
  # Invalidate cache daily
  hash = function(x) rlang::hash(list(x, Sys.Date()))
)

# Efficient forecast function
forecast <- function(x) {
  g <- geocode(x)
  weather_forecast(c(g$latitude, g$longitude), hourly = "temperature_2m")
}

Handling errors with try-catch

geocode <- function(x) {
  tryCatch(
    openmeteo::geocode(x),
    error = function(e) {
      if (e$message == "No matches found") {
        logger::log_debug("No matches found for geocode", value = x)
        return(NULL)
      }
      stop(e)
    }
  )
}

ENJOY HAPPY STRUCTURED LOGGING!