Apache Airflow is a tool to express and execute workflows as directed acyclic graphs (DAGs). It includes utilities to schedule tasks, monitor task progress and handle task dependencies.
[Overview of Apache Airflow](https://airflow.apache.org/)
Trademarks: This software listing is packaged by Bitnami. The respective trademarks mentioned in the offering are owned by the respective companies, and use of them does not imply any affiliation or endorsement.
This chart bootstraps an [Apache Airflow](https://github.com/bitnami/containers/tree/main/bitnami/airflow) deployment on a [Kubernetes](https://kubernetes.io) cluster using the [Helm](https://helm.sh) package manager.
Bitnami charts can be used with [Kubeapps](https://kubeapps.dev/) for deployment and management of Helm Charts in clusters.
## Prerequisites
- Kubernetes 1.19+
- Helm 3.2.0+
## Installing the Chart
To install the chart with the release name `my-release`:
These commands deploy Airflow on the Kubernetes cluster in the default configuration. The [Parameters](#parameters) section lists the parameters that can be configured during installation.
| `dags.image.digest` | Init container load-dags image digest in the way sha256:aa.... Please note this parameter, if set, will override the tag | `""` |
| `web.containerSecurityContext.runAsUser` | Set Airflow web containers' Security Context runAsUser | `1001` |
| `web.containerSecurityContext.runAsNonRoot` | Set Airflow web containers' Security Context runAsNonRoot | `true` |
| `web.lifecycleHooks` | for the Airflow web container(s) to automate configuration before or after startup | `{}` |
| `web.hostAliases` | Deployment pod host aliases | `[]` |
| `web.podLabels` | Add extra labels to the Airflow web pods | `{}` |
| `web.podAnnotations` | Add extra annotations to the Airflow web pods | `{}` |
| `web.affinity` | Affinity for Airflow web pods assignment (evaluated as a template) | `{}` |
| `web.nodeAffinityPreset.key` | Node label key to match. Ignored if `web.affinity` is set. | `""` |
| `web.nodeAffinityPreset.type` | Node affinity preset type. Ignored if `web.affinity` is set. Allowed values: `soft` or `hard` | `""` |
| `web.nodeAffinityPreset.values` | Node label values to match. Ignored if `web.affinity` is set. | `[]` |
| `web.nodeSelector` | Node labels for Airflow web pods assignment | `{}` |
| `web.podAffinityPreset` | Pod affinity preset. Ignored if `web.affinity` is set. Allowed values: `soft` or `hard`. | `""` |
| `web.podAntiAffinityPreset` | Pod anti-affinity preset. Ignored if `web.affinity` is set. Allowed values: `soft` or `hard`. | `soft` |
| `web.tolerations` | Tolerations for Airflow web pods assignment | `[]` |
| `web.topologySpreadConstraints` | Topology Spread Constraints for pod assignment spread across your cluster among failure-domains. Evaluated as a template | `[]` |
| `web.priorityClassName` | Priority Class Name | `""` |
| `web.schedulerName` | Use an alternate scheduler, e.g. "stork". | `""` |
| `web.terminationGracePeriodSeconds` | Seconds Airflow web pod needs to terminate gracefully | `""` |
| `web.updateStrategy.type` | Airflow web deployment strategy type | `RollingUpdate` |
| `web.updateStrategy.rollingUpdate` | Airflow web deployment rolling update configuration parameters | `{}` |
| `web.sidecars` | Add additional sidecar containers to the Airflow web pods | `[]` |
| `web.initContainers` | Add additional init containers to the Airflow web pods | `[]` |
| `web.extraVolumeMounts` | Optionally specify extra list of additional volumeMounts for the Airflow web pods | `[]` |
| `web.extraVolumes` | Optionally specify extra list of additional volumes for the Airflow web pods | `[]` |
| `web.pdb.create` | Deploy a pdb object for the Airflow web pods | `false` |
| `web.pdb.minAvailable` | Maximum number/percentage of unavailable Airflow web replicas | `1` |
| `web.pdb.maxUnavailable` | Maximum number/percentage of unavailable Airflow web replicas | `""` |
| `scheduler.image.digest` | Airflow Schefuler image digest in the way sha256:aa.... Please note this parameter, if set, will override the tag | `""` |
| `scheduler.topologySpreadConstraints` | Topology Spread Constraints for pod assignment spread across your cluster among failure-domains. Evaluated as a template | `[]` |
| `scheduler.priorityClassName` | Priority Class Name | `""` |
| `scheduler.schedulerName` | Use an alternate scheduler, e.g. "stork". | `""` |
| `scheduler.terminationGracePeriodSeconds` | Seconds Airflow scheduler pod needs to terminate gracefully | `""` |
| `worker.topologySpreadConstraints` | Topology Spread Constraints for pod assignment spread across your cluster among failure-domains. Evaluated as a template | `[]` |
| `worker.priorityClassName` | Priority Class Name | `""` |
| `worker.schedulerName` | Use an alternate scheduler, e.g. "stork". | `""` |
| `worker.terminationGracePeriodSeconds` | Seconds Airflow worker pod needs to terminate gracefully | `""` |
| `ldap.uri` | Server URI, eg. ldap://ldap_server:389 | `ldap://ldap_server:389` |
| `ldap.basedn` | Base of the search, eg. ou=example,o=org. | `dc=example,dc=org` |
| `ldap.searchAttribute` | if doing an indirect bind to ldap, this is the field that matches the username when searching for the account to bind to | `cn` |
| `ldap.binddn` | DN of the account used to search in the LDAP server. | `cn=admin,dc=example,dc=org` |
| `ldap.bindpw` | Bind Password | `""` |
| `ldap.userRegistration` | Set to True to enable user self registration | `True` |
| `ldap.userRegistrationRole` | Set role name to be assign when a user registers himself. This role must already exist. Mandatory when using ldap.userRegistration | `Public` |
| `ldap.rolesMapping` | mapping from LDAP DN to a list of roles | `{ "cn=All,ou=Groups,dc=example,dc=org": ["User"], "cn=Admins,ou=Groups,dc=example,dc=org": ["Admin"], }` |
| `ldap.rolesSyncAtLogin` | replace ALL the user's roles each login, or only on registration | `True` |
| `ldap.tls.enabled` | Enabled TLS/SSL for LDAP, you must include the CA file. | `false` |
| `ldap.tls.allowSelfSigned` | Allow to use self signed certificates | `true` |
| `ldap.tls.certificatesSecret` | Name of the existing secret containing the certificate CA file that will be used by ldap client | `""` |
| `ldap.tls.certificatesMountPath` | Where LDAP certifcates are mounted. | `/opt/bitnami/airflow/conf/certs` |
| `ldap.tls.CAFilename` | LDAP CA cert filename | `""` |
| `service.annotations` | Additional custom annotations for Airflow service | `{}` |
| `service.extraPorts` | Extra port to expose on Airflow service | `[]` |
| `ingress.enabled` | Enable ingress record generation for Airflow | `false` |
| `ingress.ingressClassName` | IngressClass that will be be used to implement the Ingress (Kubernetes 1.18+) | `""` |
| `ingress.pathType` | Ingress path type | `ImplementationSpecific` |
| `ingress.apiVersion` | Force Ingress API version (automatically detected if not set) | `""` |
| `ingress.hostname` | Default host for the ingress record | `airflow.local` |
| `ingress.path` | Default path for the ingress record | `/` |
| `ingress.annotations` | Additional annotations for the Ingress resource. To enable certificate autogeneration, place here your cert-manager annotations. | `{}` |
| `ingress.tls` | Enable TLS configuration for the host defined at `ingress.hostname` parameter | `false` |
| `ingress.selfSigned` | Create a TLS secret for this ingress record using self-signed certificates generated by Helm | `false` |
| `ingress.extraHosts` | An array with additional hostname(s) to be covered with the ingress record | `[]` |
| `ingress.extraPaths` | An array with additional arbitrary paths that may need to be added to the ingress under the main host | `[]` |
| `ingress.extraTls` | TLS configuration for additional hostname(s) to be covered with this ingress record | `[]` |
| `postgresql.enabled` | Switch to enable or disable the PostgreSQL helm chart | `true` |
| `postgresql.auth.enablePostgresUser` | Assign a password to the "postgres" admin user. Otherwise, remote access will be blocked for this user | `false` |
| `postgresql.auth.username` | Name for a custom user to create | `bn_airflow` |
| `postgresql.auth.password` | Password for the custom user to create | `""` |
| `postgresql.auth.database` | Name for a custom database to create | `bitnami_airflow` |
| `postgresql.auth.existingSecret` | Name of existing secret to use for PostgreSQL credentials | `""` |
The above command sets the credentials to access the Airflow web UI.
> NOTE: Once this chart is deployed, it is not possible to change the application's access credentials, such as usernames or passwords, using Helm. To change these application credentials after deployment, delete any persistent volumes (PVs) used by the chart and re-deploy it, or use the application's built-in administrative tools if available.
Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,
> **Tip**: You can use the default [values.yaml](values.yaml)
## Configuration and installation details
### [Rolling VS Immutable tags](https://docs.bitnami.com/containers/how-to/understand-rolling-tags-containers/)
It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.
Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.
### Generate a Fernet key
A Fernet key is required in order to encrypt password within connections. The Fernet key must be a base64-encoded 32-byte key.
Learn how to generate one [here](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/fernet.html#generating-fernet-key)
### Generate a Secret key
Secret key used to run your flask app. It should be as random as possible. However, when running more than 1 instances of webserver, make sure all of them use the same secret_key otherwise one of them will error with "CSRF session token is missing".
### Load DAG files
There are two different ways to load your custom DAG files into the Airflow chart. All of them are compatible so you can use more than one at the same time.
#### Option 1: Specify an existing config map
You can manually create a config map containing all your DAG files and then pass the name when deploying Airflow chart. For that, you can pass the option `dags.existingConfigmap`.
#### Option 2: Get your DAG files from a git repository
You can store all your DAG files on GitHub repositories and then clone to the Airflow pods with an initContainer. The repositories will be periodically updated using a sidecar container. In order to do that, you can deploy airflow with the following options:
> NOTE: When enabling git synchronization, an init container and sidecar container will be added for all the pods running airflow, this will allow scheduler, worker and web component to reach dags if it was needed.
If you use a private repository from GitHub, a possible option to clone the files is using a [Personal Access Token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) and using it as part of the URL: `https://USERNAME:PERSONAL_ACCESS_TOKEN@github.com/USERNAME/REPOSITORY`
### Loading Plugins
You can load plugins into the chart by specifying a git repository containing the plugin files. The repository will be periodically updated using a sidecar container. In order to do that, you can deploy airflow with the following options:
> NOTE: When enabling git synchronization, an init container and sidecar container will be added for all the pods running airflow, this will allow scheduler, worker and web component to reach plugins if it was needed.
This is useful if you plan on using [Bitnami's sealed secrets](https://github.com/bitnami-labs/sealed-secrets) to manage your passwords.
### Setting Pod's affinity
This chart allows you to set your custom affinity using the `affinity` parameter. Find more information about Pod's affinity in the [kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity).
As an alternative, you can use of the preset configurations for pod affinity, pod anti-affinity, and node affinity available at the [bitnami/common](https://github.com/bitnami/charts/tree/main/bitnami/common#affinities) chart. To do so, set the `podAffinityPreset`, `podAntiAffinityPreset`, or `nodeAffinityPreset` parameters.
### Install extra python packages
This chart allows you to mount volumes using `extraVolumes` and `extraVolumeMounts` in all 3 airflow components (web, scheduler, worker). Mounting a requirements.txt using these options to `/bitnami/python/requirements.txt` will execute `pip install -r /bitnami/python/requirements.txt` on container start.
### Enabling network policies
This chart allows you to set network policies that will rectrict the access to the deployed pods in the cluster. Basically, no other pods apart from Scheduler's pods may access Worker's pods and no other pods apart from Web's pods may access Worker's ones. To do so, set `networkPolicies.enabled=true`.
### Executors
Airflow supports different executors runtimes and this chart provides support for the following ones.
#### CeleryExecutor
Celery executor is the default value for this chart with it you can scale out the number of workers. To point the `executor` parameter to `CeleryExecutor` you need to do something, you just install the chart with default parameters.
#### KubernetesExecutor
The kubernetes executor is introduced in Apache Airflow 1.10.0. The Kubernetes executor will create a new pod for every task instance using the `pod_template.yaml` that you can find [templates/config/configmap.yaml](https://github.com/bitnami/charts/blob/main/bitnami/airflow/templates/config/configmap.yaml), otherwise you can override this template using `worker.podTemplate`. To enable `KubernetesExecutor` set the following parameters.
> NOTE: Redis® is not needed to be deployed when using KubernetesExecutor so you must disable it using `redis.enabled=false`.
```console
executor=KubernetesExecutor
redis.enabled=false
rbac.create=true
serviceaccount.create=true
```
### CeleryKubernetesExecutor
The CeleryKubernetesExecutor is introduced in Airflow 2.0 and is a combination of both the Celery and the Kubernetes executors. Tasks will be executed using Celery by default, but those tasks that require it can be executed in a Kubernetes pod using the 'kubernetes' queue.
#### LocalExecutor
Local executor runs tasks by spawning processes in the Scheduler pods. To enable `LocalExecutor` set the following parameters.
The LocalKubernetesExecutor is introduced in Airflow 2.3 and is a combination of both the Local and the Kubernetes executors. Tasks will be executed in the scheduler by default, but those tasks that require it can be executed in a Kubernetes pod using the 'kubernetes' queue.
#### SequentialExecutor
This executor will only run one task instance at a time in the Scheduler pods. For production use case, please use other executors. To enable `SequentialExecutor` set the following parameters.
```console
executor=SequentialExecutor
redis.enabled=false
```
### Scaling worker pods
Sometime when using large workloads a fixed number of worker pods may make task to take a long time to be executed. This chart provide two ways for scaling worker pods.
- If you are using `KubernetesExecutor` auto scaling pods would be done by the Scheduler without adding anything more.
- If you are using `SequentialExecutor` you would have to enable `worker.autoscaling` to do so, please, set the following parameters. It will use autoscaling by default configuration that you can change using `worker.autoscaling.replicas.*` and `worker.autoscaling.targets.*`.
```console
worker.autoscaling.enabled=true
worker.resources.requests.cpu=200m
worker.resources.requests.memory=250Mi
```
## Persistence
The Bitnami Airflow chart relies on the PostgreSQL chart persistence. This means that Airflow does not persist anything.
## Troubleshooting
Find more information about how to deal with common errors related to Bitnami's Helm charts in [this troubleshooting guide](https://docs.bitnami.com/general/how-to/troubleshoot-helm-chart-issues).
## Upgrading
### To 14.0.0
This major updates the PostgreSQL subchart to its newest major, 12.0.0. [Here](https://github.com/bitnami/charts/tree/master/bitnami/postgresql#to-1200) you can find more information about the changes introduced in that version.
### To any previous version
Refer to the [chart documentation for more information about how to upgrade from previous releases](https://docs.bitnami.com/kubernetes/infrastructure/apache-airflow/administration/upgrade/).