This document details the deployment of Loki, a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus, on a seven-node on-premises Kubernetes cluster. We’ll explore not only the Loki deployment itself but also its integration with Mimir (for long-term log storage) and Grafana (for visualization and querying). The cluster, running AlmaLinux on bare-metal servers with a combined storage capacity of approximately 300TB, leverages Rook Ceph for storage management. This deployment will serve as a central repository for logs gathered from multiple Promtail endpoints, primarily for Ceph clusters, providing a robust and scalable solution for long-term log storage and analysis. The configuration emphasizes high availability and resilience, leveraging the resources of the seven-node cluster to ensure optimal performance and minimize potential single points of failure. Specific configurations and considerations regarding resource allocation, storage management, and security are outlined in the following sections.
Understanding Loki’s Architecture
Before diving into the deployment, let’s understand Loki’s core components:
Promtail: This is Loki’s agent. It runs on your application servers and collects logs, sending them to the Loki stack. It supports various log formats and can be configured to scrape logs from files, systemd journals, and more.
Distributor: This component receives log streams from Promtail and distributes them to the Ingesters. It acts as a load balancer, ensuring even distribution across the Ingester cluster.
Ingester: These components receive and process the log streams. They perform tasks such as label indexing and chunk creation. The number of Ingesters should scale with the volume of logs ingested.
Querier: This component handles log queries. It receives queries from clients (like Grafana) and retrieves the relevant log data from the storage backend (in this case, Mimir).
Compactor: This component merges and compresses log chunks, optimizing storage space over time.
Index: Loki uses an index to quickly search and filter logs based on labels. The
values.yaml
file configures the index’s properties (periodicity, storage, etc.).
Loki, Mimir, and Grafana: A Powerful Trio
While Loki handles log ingestion and querying, it’s often used in conjunction with Mimir and Grafana to create a complete log monitoring solution:
Loki: Handles the ingestion, indexing, and querying of logs.
Mimir: Acts as the long-term storage backend for Loki. It provides highly scalable and efficient storage for your logs, crucial for retaining logs for extended periods. Mimir uses a time-series database optimized for storing large volumes of log data, ensuring efficient retrieval.
Grafana: Provides a user-friendly interface for querying, visualizing, and exploring logs stored in Loki. It allows users to create dashboards, set alerts, and perform complex log analysis.
This combined approach allows for effective log management: Loki handles the real-time ingestion and querying needs, Mimir ensures cost-effective long-term storage, and Grafana empowers users to easily analyze and understand the logs.
1. Helm Repository Addition and Update:
|
|
2. Custom values.yaml
Configuration:
This values.yaml
file customizes the Loki deployment. Key customizations are explained below. The commented-out lines represent optional configurations which can be uncommented and adjusted as needed.
|
|
Explanation of Key values.yaml
Configurations:
global.clusterDomain
: Specifies the cluster domain for internal DNS resolution.loki.auth_enabled
: Disables authentication for simplicity (recommended for initial setup, but should be enabled in production).loki.schemaConfig
: Defines the schema version and storage settings for Loki’s data.object_store: s3
indicates that the logs are stored in an S3-compatible object store (provided by Rook Ceph in this case).loki.ingester.chunk_encoding
: Specifies the compression algorithm for log chunks.snappy
offers a good balance between compression and speed.loki.tracing.enabled
: Enables tracing, helpful for debugging and performance monitoring.loki.limits_config
: Sets various limits on ingestion rates, query sizes, and other resources to prevent overload and ensure stability. The values provided here are examples and should be tuned based on your expected load.loki.storage.type
: Defines the storage type as S3-compatible.loki.storage.s3
: Configures the connection to the Rook Ceph S3-compatible object store. Ensure that theendpoint
,region
,secretAccessKey
, andaccessKeyId
are correctly configured.insecure: true
should only be used in development or testing environments; it’s crucial to disable this in production.deploymentMode: Distributed
: Configures Loki to run in distributed mode, leveraging multiple components for high availability and scalability.ingester
,querier
,queryFrontend
,distributor
,compactor
,indexGateway
: These sections specify the number of replicas for each Loki component. The values are set to 7 to match the number of nodes in the cluster, ensuring high availability.maxUnavailable
limits the number of pods that can be unavailable during updates.
3. CephObjectStoreUser Creation:
This creates a user for Loki in the Rook Ceph object store.
|
|
|
|
4. Retrieving S3 Access Keys:
|
|
5. Bucket Creation:
These commands create the necessary S3 buckets for storing Loki’s data. The use of --no-verify-ssl
is strongly discouraged in production.
|
|
6. Loki Installation:
This installs Loki using the customized values.yaml
file.
|
|
7. Grafana Configuration (Visualization):
After Loki is installed, you’ll need to configure Grafana to connect to it. This involves adding a Loki data source in Grafana, specifying the Loki endpoint (usually http://loki-query-frontend.<namespace>.svc.cluster.local
). Refer to the Grafana documentation for detailed instructions on configuring Loki as a data source, which is very straightforward.
8. Promtail:
This is an example of Promtail running on a HAProxy server. In my case, the server was using rsyslog and the log format was not standard, so i needed to create a veryu standard scrape config with just one label, since each label create a new index shard on Loki, thus considerably increasing resource usage.
Installation (RHEL/CentOS):
|
|
/etc/promtail/config.yml
:
|
|
Adjust the systemd unit to run as root:
|
|
Create the positions folder:
|
|
Enable and start the service:
|
|
This expanded post provides a more comprehensive guide to deploying Loki on Kubernetes, emphasizing its integration with Grafana for a complete log management solution.