How to Monitor Cron Jobs in Grafana and Get Notified via Email

How to Monitor Cron Jobs in Grafana and Get Notified via Email

Story:

It all started when I decided to monitor my cron jobs and set up email alerts using Grafana and Prometheus. Everything seemed smooth until I encountered an unexpected hurdle—my alerts weren’t going through. After hours of troubleshooting, I realized my system couldn’t connect to the internet. Debugging network issues felt like unravelling a mystery, from checking DNS settings to fixing a broken gateway configuration. The moment I sent a successful test email was incredibly rewarding—it proved that persistence pays off. This journey taught me not just technical skills but also the value of problem-solving and patience.

Prerequisites:

  • Install Prometheus

  • Install Grafana

  • Install Node Exporter

Collect the log file from Cron job

Prepare Log Directory and Metrics Script

  1. Create a Log Directory & provide the appropriate permissions: This directory will store logs for your cron jobs

     sudo mkdir -p /var/log/cron_jobs
     sudo chmod 777 /var/log/cron_jobs
    
  2. Create the Metrics Script: Create a script to monitor cron jobs

     sudo vi /usr/local/bin/cron_job_metrics.sh
    

    Add the following content:

     #!/bin/bash
    
     log_dir="/var/log/cron_jobs"
     metric_file="/var/lib/node_exporter/textfile_collector/cron_jobs.prom"
    
     # Clear or create the metric file
     > "$metric_file"
    
     # List of cron jobs to monitor
     declare -A cron_jobs=(
         ["job1"]="/path/to/your_script1.sh"
         ["job2"]="/path/to/your_script2.sh"
    
     )
    
     for job in "${!cron_jobs[@]}"; do
         log_file="$log_dir/${job}.log"
    
         if [ ! -f "$log_file" ]; then
             echo "cron_job_status{job=\"$job\"} 0" >> "$metric_file" # Log file missing
         elif tail -n 1 "$log_file" | grep -q "error"; then
             echo "cron_job_status{job=\"$job\"} 0" >> "$metric_file" # Last run had an error
         else
             echo "cron_job_status{job=\"$job\"} 1" >> "$metric_file" # Last run was successful
         fi
     done
    
  3. Make the script executable:

     sudo chmod +x /usr/local/bin/cron_job_metrics.sh
    

Until here we have performed the steps to collect cron job logs in the prom file of node exporter. Now let’s let the node exporter know to collect custom metrics.

Custom Metric: Setting Up the Textfile Collector

Textfile collector:

The textfile collector is a feature of the Prometheus Node Exporter that allows you to add custom metrics by writing them to a file. Node Exporter reads these metrics from the file and makes them available to Prometheus. Read more about it here https://www.robustperception.io/monitoring-directory-sizes-with-the-textfile-collector/#more-1230

  • Steps to configure the textfile collector in Node Exporter:

    1. Create a directory for custom metrics and change ownership to the user by which we are running the Prometheus service, In my case prometheus

       sudo mkdir -p /var/lib/node_exporter/textfile_collector
       sudo chown -R prometheus:prometheus /var/lib/node_exporter/textfile_collector
      
    2. Edit the Node Exporter service file to include the path. --collector.textfile.directory=/var/lib/node_exporter/textfile_collector

    3. Create a file cron_jobs.prom where our metrics will be added If it’s not created don’t worry our script will create it.

    4. After making the changes in the service file. We should reload the daemon and restart the node exporter.

       sudo systemctl daemon-reload
       sudo systemctl status node_exporter
      

Configure Cron Jobs

For simplicity, I am creating 2 scripts which will run after every 1 minute.

script1.sh

echo  "This is from script1"
echo $(date)

script2.sh

echo "this is from script 2"
echo $(date)

echo "error"

Now, that we have scripts ready let’s create cron jobs.

  1. Set Up Cron Jobs with Logging:

    • Edit your crontab: crontab -e

    • During the creation of cron jobsEnsure you are following the proper script file path to the cron file. Also, Note that I am adding the cron_jobs_metrics.sh to collect the logs and write them into prom file after every minute in the same crontab.

      If you want to create cron expressions visit https://crontab.guru/ or ask Chatgpt

  2. So let’s check the metrics and logs from the respective files,

    • Logs: logs /var/log/cron_jobs/ are getting

  • Metrics: cat /var/lib/node_exporter/textfile_collector/cron_jobs.prom

    So we are receiving the metrics successfully

Verify Custom Metrics

To verify the custom metric we can visit locahost:9100 which is the node-exporter default port and search the metrics we have created.

Also, We can query the same metric in Prometheus too by hitting localhost:9090 in the browser

job2 has 0 value because we added an echo “error“ statement in the script2 which will captured as an error in the job2.log file. As per the script cron_job_metrics.sh if the logs have an error then the metric cron_job_status will return 0 value.

Create a dashboard

  • Create a dashboard by selecting prometheus as data source and cron_job_status as the metric

Setup and Email notification for alert

To set up and mail alerts follow the steps mentioned below.

  1. To set up an email alert we are using Gmail’s SMTP details. So first we should create an app password using our Gmail account. The best article to create a Gmail app password is this, https://itsupport.umd.edu/itsupport?id=kb_article_view&sysparm_article=KB0015112 Follow the steps mentioned in this article. I have created one for grafana-alert

Setting up grafana Gamail SMTP setting

  1. Now that we have set up an app password let’s modify the grafana configuration file. sudo vi /etc/grafana/grafana.ini

     enabled = true
     host = smtp.gmail.com:587
     user = sachin.bmp@gmail.com
     # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
     password = <your gmail app password>
     skip_verify = true
     from_address = admin@grafana.localhost
     from_name = Grafana
    

    Note: When you mention password remove the space between the Gmail app password and the paste here.

Prometheus Alert Rule for Cron Jobs

  1. Update Prometheus Alert Rules Edit your sudo vi /etc/prometheus/prometheus.yml/ert_rules.yml file to include a rule that applies to all cron_job_status metrics where the value equals 0

     groups:
       - name: cron-job-alerts
         rules:
           - alert: CronJobFailure
             expr: cron_job_status == 0
             for: 1m
             labels:
               severity: critical
             annotations:
               summary: "Cron Job Failure Detected"
               description: |
                 The cron job '{{ $labels.job }}' has failed. 
                 Status: {{ $value }}. Please investigate.
    
  2. Reload Prometheus Configuration After saving the updated rule file, reload Prometheus sudo systemctl restart prometheus

Grafana Alert Rule for Cron Jobs

  1. Create a Panel for All Jobs

    • Go to Grafana and create a new dashboard.

    • Add a panel to visualize the metric for all cron jobs.

    • Query:

        cron_job_status
      
  2. Set Up Alert for All Jobs

    • Go to the Alert tab in the panel settings.

    • Click Create Alert Rule.

    • Set the condition to trigger when any job fails:

        WHEN max() OF query (A, 1m, now) IS BELOW 1
      
    • Grafana will apply this condition to all cron_job_status metrics.

  3. Save the Panel Save the dashboard with the alert rule configured.

Testing the Setup

  1. Simulate Failures for Multiple Jobs

    • Add the following to your cron_jobs.prom file to simulate failures for multiple jobs:

        echo 'cron_job_status{job="job1"} 0' > /var/lib/node_exporter/textfile_collector/cron_jobs.prom
        echo 'cron_job_status{job="job2"} 0' >> /var/lib/node_exporter/textfile_collector/cron_jobs.prom
      
  2. Verify Alerts

    • Wait for Prometheus to scrape the metrics and Grafana to evaluate the alert rule.

    • Ensure alerts are triggered for both job1 and job2 in Prometheus and Grafana.

    • Check your email inbox for detailed notifications.