Offboard long-running jobs with KEDA

Posted on October 10, 2022

5 minutes • 983 words • Other languages: Deutsch

Every developer somewhen comes to the point, that their application has to do some long-running task that keeps the server busy. Most of the time, you want to split the code and offboard it into a different taskrunner. The most obvious solution (when you are in the cloud) is, that you will create some Azure functions / AWS Lambda / Google Cloud functions. There’s nothing wrong with that, but I personally prefer something more provider independent.

Preperations

Okay, so we want to put the load heavy task into some kind of task runner, that we’ll execute afterwards via kubernetes.

Our plan could be:

Seperate the code, that is doing something right now in the core application
Create a docker container, that runs our code and dies afterwards
Make sure, we provide all the necessary information (Database credentials, configuration etc.) via environment variables

Finally: Test if the task-runner does what it should be. If it does, we made a great first step. But now we have the challenge, that we need to start it somehow. Let’s assume, that we want to run our task for a specific user - how can we do this and utilize our Kubernetes infrastructure? A good choice might be a simple “job” - but we don’t know, when and how to start them.

What is KEDA?

KEDA is an acronym for “Kubernetes Event-driven Autoscaling”. The original idea is, that KEDA provides some scaling mechanisms, that the internal metrics of kubernetes don’t allow. While I’m writing this article, KEDA has already 56 different scalers available, that can be used to scale things - from SQL Query to Prometheus monitoring data. It allows to scale Deployments, StatefulSets, CustomResources and…Jobs! This will become handy for what we want to do.

An example with Redis

Most of you will know what redis is, but for the others: Redis is an in-memory store with plenty of options, from caching to message broker, usage as database and plenty of additional use cases. For our purpose, I’d like to create a simple FIFO-Queue (first in, first out), where we will queue some jobs to run on our users. To keep it simple, I store some JSON-Object in the queue, so that we can identify the user afterwards:

RPUSH usertask "{\"user\": 1}"

This will push the object {"user": 1} on the right side (RPUSH) into a list, that is called usertask. Let’s push some more, because we want to scale:

RPUSH usertask "{\"user\": 2}"
RPUSH usertask "{\"user\": 3}"
RPUSH usertask "{\"user\": 4}"
RPUSH usertask "{\"user\": 5}"
RPUSH usertask "{\"user\": 6}"

Pick tasks with our runner

Now that we have some things in our queue, we also want to process them. To give an example, I like to share some node.js code. Of course this works also with every other language:

import { createClient } from 'redis';

const authentication = process.env.REDIS_USER && process.env.REDIS_USER.trim() != '' ? `${process.env.REDIS_USER}:${process.env.REDIS_PASSWORD}@` : ``;

const client = createClient({
  url: `redis://${authentication}${process.env.REDIS_HOST}:${process.env.REDIS_PORT || 6379}/0`,
});

client.on('error', (err) => console.log('Redis Client Error', err));

console.log('Trying to get a single job...');

await client.connect();

const amount = await client.LLEN(process.env.REDIS_LIST);
if (amount == 0) {
  console.log('...nothing to do!');
  process.exit(0);
}
const job = await client.LPOP(process.env.REDIS_LIST);
console.log(`Executing job with parameters: ${job}`);

/**
 * Some execution code
 */

console.log(`Completed`);
process.exit(0);

This code would basically start, connect to redis and pick the first job from the left (remember, we pushed to the right, so we have to take our tasks from the other side) with LPOP and then do something with it. So in and out is solved, how to start this jobs?

KEDA Jobscaling

This is where KEDA comes into play. It introduces a so-called “ScaledJob” Resource, which is able to create a job for every item in our redis queue, up to a certain amount that we define:

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: redis-runner-job
spec:
  jobTargetRef:
    parallelism: 1                            # [max number of desired pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
    completions: 1                            # [desired number of successfully finished pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
    activeDeadlineSeconds: 600                #  Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
    backoffLimit: 5                           # Specifies the number of retries before marking this job failed. Defaults to 6
    template:
      spec:
        restartPolicy: Never
        containers:                           # The image, that we want to execute here
        - name: redis-runner-dummy
          image: marcelcremer/redis-runner-dummy:latest
          imagePullPolicy: Always
          envFrom:
            - secretRef: { name: redis-runner }
  pollingInterval: 30                         # Optional. Default: 30 seconds
  successfulJobsHistoryLimit: 5               # Optional. Default: 100. How many completed jobs should be kept.
  failedJobsHistoryLimit: 5                   # Optional. Default: 100. How many failed jobs should be kept.
  # envSourceContainerName: {container-name}    # Optional. Default: .spec.JobTargetRef.template.spec.containers[0]
  maxReplicaCount: 25                        # Optional. Default: 100
  scalingStrategy:
    strategy: "custom"                        # Optional. Default: default. Which Scaling Strategy to use.
    customScalingQueueLengthDeduction: 1      # Optional. A parameter to optimize custom ScalingStrategy.
    customScalingRunningJobPercentage: "0.5"  # Optional. A parameter to optimize custom ScalingStrategy.
  triggers:
    - type: redis
      metadata:
        listName: {{ required "Please specify redis.list" .Values.redis.list | quote }} # The list that we observe, in our case "usertask"
        listLength: "1"                       # How many items need to exist, before KEDA starts to scale?
        enableTLS: "false"                    # optional
        databaseIndex: "0"                    # optional
        hostFromEnv: REDIS_HOST
        portFromEnv: REDIS_PORT
        passwordFromEnv: REDIS_PASSWORD

That little piece of code will do something awesome:

we define how many pods should be triggered in parallel, how often they can fail until we give up etc.
we define, what image is our actual “runner” and how often it should be spawned at maximum (25 Replicas)
we define a trigger, in our case our redis queue, that will be observed to determine, if new pods need to be spawned or not

And that’s basically it! Deploy everything (e.g. in a helm chart) and see some magic work. 🧙

When I first tried it, I could not believe how easy this is and I really love KEDA so far. In hope you also got some ideas, what to do with it.