Story:

It all started when I decided to monitor my cron jobs and set up email alerts using Grafana and Prometheus. Everything seemed smooth until I encountered an unexpected hurdle—my alerts weren’t going through. After hours of troubleshooting, I realized my system couldn’t connect to the internet. Debugging network issues felt like unravelling a mystery, from checking DNS settings to fixing a broken gateway configuration. The moment I sent a successful test email was incredibly rewarding—it proved that persistence pays off. This journey taught me not just technical skills but also the value of problem-solving and patience.

Prerequisites:

Install Prometheus
Install Grafana
Install Node Exporter

Collect the log file from `Cron` job

Prepare Log Directory and Metrics Script

Create a Log Directory & provide the appropriate permissions: This directory will store logs for your cron jobs
```
 sudo mkdir -p /var/log/cron_jobs
 sudo chmod 777 /var/log/cron_jobs
```

Create the Metrics Script: Create a script to monitor cron jobs

 sudo vi /usr/local/bin/cron_job_metrics.sh

Add the following content:

 #!/bin/bash

 log_dir="/var/log/cron_jobs"
 metric_file="/var/lib/node_exporter/textfile_collector/cron_jobs.prom"

 # Clear or create the metric file
 > "$metric_file"

 # List of cron jobs to monitor
 declare -A cron_jobs=(
     ["job1"]="/path/to/your_script1.sh"
     ["job2"]="/path/to/your_script2.sh"

 )

 for job in "${!cron_jobs[@]}"; do
     log_file="$log_dir/${job}.log"

     if [ ! -f "$log_file" ]; then
         echo "cron_job_status{job=\"$job\"} 0" >> "$metric_file" # Log file missing
     elif tail -n 1 "$log_file" | grep -q "error"; then
         echo "cron_job_status{job=\"$job\"} 0" >> "$metric_file" # Last run had an error
     else
         echo "cron_job_status{job=\"$job\"} 1" >> "$metric_file" # Last run was successful
     fi
 done

Make the script executable:

 sudo chmod +x /usr/local/bin/cron_job_metrics.sh

Until here we have performed the steps to collect cron job logs in the prom file of node exporter. Now let’s let the node exporter know to collect custom metrics.

Custom Metric: Setting Up the Textfile Collector

Textfile collector:

The textfile collector is a feature of the Prometheus Node Exporter that allows you to add custom metrics by writing them to a file. Node Exporter reads these metrics from the file and makes them available to Prometheus. Read more about it here https://www.robustperception.io/monitoring-directory-sizes-with-the-textfile-collector/#more-1230

Steps to configure the textfile collector in Node Exporter:
1. Create a directory for custom metrics and change ownership to the user by which we are running the Prometheus service, In my case prometheus
```
 sudo mkdir -p /var/lib/node_exporter/textfile_collector
 sudo chown -R prometheus:prometheus /var/lib/node_exporter/textfile_collector
```
2. Edit the Node Exporter service file to include the path. --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
3. Create a file cron_jobs.prom where our metrics will be added If it’s not created don’t worry our script will create it.
4. After making the changes in the service file. We should reload the daemon and restart the node exporter.
```
 sudo systemctl daemon-reload
 sudo systemctl status node_exporter
```

Configure Cron Jobs

For simplicity, I am creating 2 scripts which will run after every 1 minute.

script1.sh

echo  "This is from script1"
echo $(date)

script2.sh

echo "this is from script 2"
echo $(date)

echo "error"

Now, that we have scripts ready let’s create cron jobs.

Set Up Cron Jobs with Logging:
- Edit your crontab: crontab -e
- During the creation of cron jobsEnsure you are following the proper script file path to the cron file. Also, Note that I am adding the cron_jobs_metrics.sh to collect the logs and write them into prom file after every minute in the same crontab.
  
  If you want to create cron expressions visit https://crontab.guru/ or ask Chatgpt
So let’s check the metrics and logs from the respective files,
- Logs: logs /var/log/cron_jobs/ are getting

Metrics: cat /var/lib/node_exporter/textfile_collector/cron_jobs.prom

So we are receiving the metrics successfully

Verify Custom Metrics

To verify the custom metric we can visit locahost:9100 which is the node-exporter default port and search the metrics we have created.

Also, We can query the same metric in Prometheus too by hitting localhost:9090 in the browser

job2 has 0 value because we added an echo “error“ statement in the script2 which will captured as an error in the job2.log file. As per the script cron_job_metrics.sh if the logs have an error then the metric cron_job_status will return 0 value.

Create a dashboard

Create a dashboard by selecting prometheus as data source and cron_job_status as the metric

Setup and Email notification for alert

To set up and mail alerts follow the steps mentioned below.

To set up an email alert we are using Gmail’s SMTP details. So first we should create an app password using our Gmail account. The best article to create a Gmail app password is this, https://itsupport.umd.edu/itsupport?id=kb_article_view&sysparm_article=KB0015112 Follow the steps mentioned in this article. I have created one for grafana-alert

Setting up grafana Gamail SMTP setting

Now that we have set up an app password let’s modify the grafana configuration file. sudo vi /etc/grafana/grafana.ini

 enabled = true
 host = smtp.gmail.com:587
 user = sachin.bmp@gmail.com
 # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
 password = <your gmail app password>
 skip_verify = true
 from_address = admin@grafana.localhost
 from_name = Grafana

Note: When you mention password remove the space between the Gmail app password and the paste here.

Prometheus Alert Rule for Cron Jobs

Update Prometheus Alert Rules Edit your sudo vi /etc/prometheus/prometheus.yml/ert_rules.yml file to include a rule that applies to all cron_job_status metrics where the value equals 0

 groups:
   - name: cron-job-alerts
     rules:
       - alert: CronJobFailure
         expr: cron_job_status == 0
         for: 1m
         labels:
           severity: critical
         annotations:
           summary: "Cron Job Failure Detected"
           description: |
             The cron job '{{ $labels.job }}' has failed. 
             Status: {{ $value }}. Please investigate.

Reload Prometheus Configuration After saving the updated rule file, reload Prometheus sudo systemctl restart prometheus

Grafana Alert Rule for Cron Jobs

Create a Panel for All Jobs
- Go to Grafana and create a new dashboard.
- Add a panel to visualize the metric for all cron jobs.
- Query:
```
  cron_job_status
```
Set Up Alert for All Jobs
- Go to the Alert tab in the panel settings.
- Click Create Alert Rule.
- Set the condition to trigger when any job fails:
```
  WHEN max() OF query (A, 1m, now) IS BELOW 1
```
- Grafana will apply this condition to all cron_job_status metrics.
Save the Panel Save the dashboard with the alert rule configured.

Testing the Setup

Simulate Failures for Multiple Jobs

Add the following to your cron_jobs.prom file to simulate failures for multiple jobs:

  echo 'cron_job_status{job="job1"} 0' > /var/lib/node_exporter/textfile_collector/cron_jobs.prom
  echo 'cron_job_status{job="job2"} 0' >> /var/lib/node_exporter/textfile_collector/cron_jobs.prom

Verify Alerts
- Wait for Prometheus to scrape the metrics and Grafana to evaluate the alert rule.
- Ensure alerts are triggered for both job1 and job2 in Prometheus and Grafana.
- Check your email inbox for detailed notifications.

How to Monitor Cron Jobs in Grafana and Get Notified via Email

Table of contents

Story:

Prerequisites:

Collect the log file from `Cron` job

Prepare Log Directory and Metrics Script

Custom Metric: Setting Up the Textfile Collector

Textfile collector:

Configure Cron Jobs

Verify Custom Metrics

Create a dashboard

Setup and Email notification for alert

Setting up grafana Gamail SMTP setting

Prometheus Alert Rule for Cron Jobs

Grafana Alert Rule for Cron Jobs

Testing the Setup

How to Monitor Cron Jobs in Grafana and Get Notified via Email

Table of contents

Story:

Prerequisites:

Collect the log file from Cron job

Prepare Log Directory and Metrics Script

Custom Metric: Setting Up the Textfile Collector

Textfile collector:

Configure Cron Jobs

Verify Custom Metrics

Create a dashboard

Setup and Email notification for alert

Setting up grafana Gamail SMTP setting

Prometheus Alert Rule for Cron Jobs

Grafana Alert Rule for Cron Jobs

Testing the Setup

Collect the log file from `Cron` job