Running an Airflow DAG/Schedule on a Time Zone
Based on experience, this use case will take you through the steps, demonstrating how to run an Airflow DAG/schedule on a time zone.
Join the DZone community and get the full member experience.
Join For FreeIrrespective of the amount of experience you have, it is not possible for us to be aware of and remember each and every facet of a programming environment. I have experienced it multiple times.
Some of you will definitely say that I would be superficial for not having explored each and every facet of a language like Java if I claim to have sufficient experience using it. So, be it. But extending help to others has helped me, in turn, because I have been able to learn many new things that I did not have a chance to encounter during my project work.
Now, on to the topic of this article.
Recently, a team that was facing an issue with getting a schedule to execute as per daylight savings was referred to me. While I have (proudly) mentioned how I managed a complex Airflow schedule of 4,000 DAGs, all using a DAG generator application (obviously after suitable exploration), I did not get a chance to implement the time zone feature in Airflow.
By default, Airflow uses UTC.
I did what comes naturally to each one of us these days — I did a Google search. Some links:
- How To Use Timezones in Apache Airflow
- How To Consider Daylight Savings Time When Using Cron Schedule in Airflow
It is possible to update the Airflow config to make it work as per the required time zone, but the recommendation is to let Airflow work on UTC while configuring DAGs to recognize the required time zone.
To make an Airflow DAG understand daylight savings, we have to use the pendulum library, create a time zone object and add it to the start_time
parameter of the DAG object as below
import pendulum
from airflow import DAG
from datetime import datetime, timedelta
local_tz = pendulum.timezone("Europe/Paris")
default_args = {
"start_date": datetime(2023, 1, 1, tzinfo=local_tz),
"owner": "Airflow",
other parameters as required
}
dag = DAG(
dag_id="timezone_dag",
schedule_interval="10 5 * * *",
default_args=default_args
)
In this example, we are using 'Paris' as our time zone, which is one hour ahead of UTC in the winter period and two hours ahead when daylight savings are enabled/active. Once we set this information, we were set.
The problem we ran into was while demonstrating that the DAG works — more so when daylight savings are enabled, given that daylight saving is yet to kick in for this year.
Let me reiterate. Specifying the time zone parameter for a DAG was as easy as passing a hot knife through butter. Carrying this analogy further, the problem we faced was how to get the liquid butter on the bread using a knife :-)
Here is how we went about testing the DAG. Given that we are in the Jan/Feb period of the year, setting the time zone meant we could set the schedule for the DAG and wait for the clock to hit the proper time, and the DAG was off. The only issue was how Airflow displays the information on the screen. As Airflow is still running on the UTC clock, it executes the DAG at 10 minutes past five AM, as expected, but the UI displays the time as 10 minutes past four AM.
Let me repeat that. The DAG schedule means the DAG is set to execute at 10 part 5 am Paris time. Airflow executes the DAG at 10 past 4 am. Confusing? Yes, but this is expected behavior because Paris time is one hour ahead of UTC.
Once we demonstrated that the time zone setting was working as expected, we wanted to test that the DAG works when daylight savings was enabled. To test daylight savings, we had to change the clocks. This was easy to do on a Linux VM using the date -s
command. When we tried to change the clock on our company-issued laptops, we ran into trouble. We did not have permission to change the clock. We also tried adding a new clock and setting the value but were not successful.
So, we did the next logical thing. We spun up a Windows VM where we had control to change clock settings. We changed the time to Jun 17, 2023, and off we went. We set the start date on the DAG to Jun 17 and waited anxiously. When the DAG did not execute, I realized that we had to set the date to at least one day prior to the clock. (Remember that Airflow executes a DAG only after the scheduled period for that DAG has passed/completed) So we set the clock to Jun 15, 2023. After this change, once again, we watched Airflow, but nothing happened — the DAG did not execute.
After giving it some thought, I realized that the time at which we wanted to execute the DAG in the production schedule was not important for the demonstration. So, I changed the DAG schedule to execute every 5 minutes. As it happens, we set the schedule to "5 * * * *", expecting the DAG to execute every five minutes. It does not. This schedule means the DAG will execute five minutes after midnight each day. So, I fired usecrontab.guru
and checked the expression, and set it correctly to "*/5 * * * *". Once we reset the clocks on all VMs (three Linux and one Windows), we waited. Four VMs for a simple demonstration? One for editing the DAG, one for running the Airflow webserver, one for running the Airflow scheduler, and the Windows VM for connecting to the Airflow UI.
We set the clocks on each of the VMs to 10 am, Jun 17, 2023, and started monitoring the Airflow UI. When the clock went past 10:05, the DAG did not execute. And we were left scratching our heads. We tabbed through each of the VMs. Then I noted the clock displayed in the Airflow scheduler log. As it took us time to set the clock on each VM, we introduced a difference of between one to three minutes between all the VMs. So, if the Windows VM was running at 10:03 am, at least one Linux VM was at 10:00 am. Once we realized this, we went back to monitoring the UI. And on the dot, the DAG was executed at 10:08 am (as per the clock displayed by the UI). Then the DAG was executed consistently at every five-minute interval.
And finally, we had a mechanism where we could demonstrate to anyone — consistently — that the time zone setting specified on the DAG works.
Opinions expressed by DZone contributors are their own.
Comments