FME Server basic status check python script

We all love FME Server, right? One use case for us is to use FME server as a "watcher" - we have automations that check if other systems processes has run the way they should.

But what happens when the watcher crashes? Who watches the watcher? This happened to us during a network issue. Our FME Server engine seemed to freeze, causing all jobs to be stuck in the queue, but casting no error. Thus, it took a while before we found out. It felt awfully quiet in our mutual inbox. A restart of the engine solved the issue.

I contacted our FME Support at Sweco and asked if there was any way to monitor the status of our FME Server. They pointed me towards the FME Server REST API.

I couldn't resist and got to work. I decided to keep it simple and check the most important parts.

The script checks:

  • If the server is responsive and ready to accept jobs
  • If there is too many jobs in the queue
  • If there is a job in the queue that has been waiting too long
  • If your most important automations threw errors or warnings during the last run

It's all parameterized and you should be able to get it up and running on most systems in less than half an hour. Although there is some caveats: I don't know if it will work with FME cloud? And also, it is not developed to handle multiple queues, since we only use the default queue as of now.

Important note: LinkedIn code window couldn't handle hashtags, so I had to replace them with /COMMENT/ - you need to convert these back to hashtags. I hope the code window at least preserves indents etc. You can also download a zip-file with the .py here: https://kartportal.se/staffanstorp/Python/FMEServer_status_check.zip

Feel free to use and share.

import request
import re
import json
import smtplib
from email.mime.text import MIMEText


"""
This script was originally created by Martin Ekstrand (https://se.linkedin.com/in/ekstrandmartin)


The scripts calls FME server REST API and checks:
* If the server is responsive and ready to accept jobs
* If there is too many jobs in the queue
* If there is a job in the queue that has been waiting too long
* If your most important automations threw errors or warnings during the last run


If any errors are found, it will send an email via SMTP - note, SMTP will probably not work forever. But it works for now.


It is meant to be scheduled via task scheduler on a server with python 3 installed (ArcGIS Python works fine - you can run it on a server with ArcGIS Server installed)


You need to change some settings to match your environment. I've tried to add instructions below. Everything marked like {ENTERSOMETHINGHERE} are parameters you need to set


The minimum required parameters are:
* fme_server_url
* token
* important_automations - unless you set run_status_important_automations to False
* All SMTP settings
"""




/COMMENT/System variables
fme_server_url = r"https://{YORUFMESERVERURLHERE}/fmerest/v3" /COMMENT/The URL to your fme server
payload = {} /COMMENT/placeholder in case you need it
headers = {} /COMMENT/don't touch unless absolutely needed
token = "fmetoken token={YOURTOKENSTRINGHERE}" /COMMENT/Create a token in your fme server (set access automations and metrics and FME server will prompt you for additional permissions needed), then set token string
error_report = {} /COMMENT/don't touch


/COMMENT/Parameters for settings
max_queue_size = 5 /COMMENT/This number controls how many jobs that can be queued before the script should consider it wrong
max_wait_time = 3600 /COMMENT/Maximum accepted waiting time in seconds for the job in the queue that has been waiting the longest
important_automations = ["{IDnr1}","{IDnr2}"] /COMMENT/list the ID:s of your most important automations
/COMMENT/These parameters can be set to false to skip certain tests
run_health_check = True
run_metrics = True
run_status_important_automations = True




/COMMENT/SMTP settings - you need to provide these (default office365)
smtp_server = "smtp.office365.com"
smtp_port = 587
mail_user = '{EMAILUSERNAME}'
mail_password = "[EMAILPASSWORD}"
sent_from = "{MAILFROM}"
mail_to = ["{MAILTO1}", "{MAILTO2}"]


/COMMENT/E-mail message variables - don't touch unless necesssary
msg_subject = f"FME server ({fme_server_url}) error report"
msg_start = """
<!DOCTYPE html>
<html>


<head>
    <title>
        FME server error report
    </title>
</head>


<body>
<pre>"""
msg_middle = ""
msg_end = """
</pre>
</body>


</html>"""




/COMMENT/script begins - nothing below this line should be changed:


/COMMENT/Checks if server is responsive and ready to process jobs
def HealthCheck():
    accept = r"application/json"
    url = fme_server_url + r"/healthcheck?ready=true&textResponse=false"
    headers['Accept'] = accept
    headers['Authorization'] = token
    response = requests.request("GET", url, headers=headers, data=payload)
    if "ok" in response.json().get("status"):
        print("FME server ready to process jobs")
    else:
        error_report["HealthCheck"] = response.text


/COMMENT/Checks number of jobs in queue and time waited for the job that has been in the queue longest
def Metrics():
    accept = r"text/plain"
    url = fme_server_url + r"/metrics"
    headers['Accept'] = accept
    headers['Authorization'] = token
    response = requests.request("GET", url, headers=headers, data=payload)
    for row in response.text.splitlines():
        if re.search(r"^fme_queued_jobs_total\s+\d+$",row):
            x = re.search(r"\d+",row)
            x = int(x.group())
            if x > max_queue_size:
                error_report["MetricsTotalQueuedJobs"] = f"Total number oj jobs in queue: {x}"
            else:
                print(f"Number of jobs in queue are below {max_queue_size}")
        /COMMENT/This is run on Default queue, might have to loop through every queue if multiple queues exist in your environment
        elif re.search(r"^fme_queued_jobs_time",row):
            x = re.search(r"\d+", row)
            x = int(x.group())
            if x > max_wait_time:
                error_report["MetricsTotalQueuedJobs"] = f"One job has waited {x} seconds"
            else:
                print(f"No job has waited more than {max_wait_time} seconds")


/COMMENT/Checks errors and warnings of your most important automations
def StatusImportantAutomations():
    accept = r"application/json"
    headers['Accept'] = accept
    headers['Authorization'] = token
    x = 0
    for aid in important_automations:
        url = fme_server_url + r"/automations/workflows/" + aid + "/status"
        response = requests.request("GET", url, headers=headers, data=payload)
        errors = response.json().get("errors")
        warnings = response.json().get("warnings")
        if errors == 0 and warnings == 0:
            print(f"Automation {aid} has no errors or warnings")
        else:
            error_report[f"StatusImportantAutomations_{x}"]= {aid : {}}
            error_report[f"StatusImportantAutomations_{x}"][aid]["errors"] = errors
            error_report[f"StatusImportantAutomations_{x}"][aid]["warnings"] = warnings
            x += 1


/COMMENT/Sends email based on settings in the beginning of the script
def SendEmail():
    msg = MIMEText(msg_text, 'html')
    msg['Subject'] = msg_subject
    msg['From'] = sent_from
    server = smtplib.SMTP(smtp_server, smtp_port)
    server.ehlo()
    server.starttls()
    server.login(mail_user, mail_password)
    for to in mail_to:
        msg['To'] = to
        server.sendmail(sent_from, to, msg.as_string())
    server.quit()


if run_health_check:
    try:
        HealthCheck()
    except Exception as e:
        error_report["HealthCheck"] = str(e)
if run_metrics:
    try:
        Metrics()
    except Exception as e:
        error_report["Metrics"] = str(e)
if run_status_important_automations:
    try:
        StatusImportantAutomations()
    except Exception as e:
        error_report["StatusImportantAutomation"] = str(e)


if error_report != {}:
    for error in error_report:
        if "StatusImportantAutomations" in error:
            for automation in error_report[error]:
                msg_middle += f"<p>Automation {automation} report: {error_report[error][automation]}</p>"
        else:
            msg_middle += f"<p>{error}: {error_report[error]}</p>"
    msg_text = msg_start + msg_middle + msg_end
    print("Errors encountered, sending email")
    try:
        SendEmail()
        print("Email sent")
    except Exception as e:
        print(e)
        print("Couldn't send email")
print("Script complete")        

Åh så bra! Tack för du delar med dig!

Snyggt! Watch the watcher är ett koncept vi bör prata mer om 😆

Fortsatta konstverk Martin! Mycket värdefullt! 👌🏼🧑🎨

To view or add a comment, sign in

More articles by Martin Ekstrand

Others also viewed

Explore content categories