How to build unit tests for Azure Data Factory
Leverage Azure DevOps and pytest for Data Factory unit testing
0. Introduction
TLTR: Create Azure DevOps git project using azure-pipelines.yml, create build artifact, deploy ADFv2 and SQLDB bacpac, trigger pytest to do unit tests
Unit testing is a software engineering practice that focuses on testing individual parts of code. In unit testing, the following best practices are applicable:
- Fast. Unit tests should take very little time to run. Milliseconds.
- Isolated. Unit tests are standalone, can be run in isolation
- Repeatable. Running a unit test shall always return the same results
- Self-Checking. Automatically detect whether a test passed or failed
- Timely. No disproportionately long time to write code for unit test
Creating a unit test in Azure Data Factory (ADFv2) can be a challenge since there is always a dependency on external data sources. Furthermore, ADFv2 needs to be deployed to Azure first and pipelines can only be tested as a whole. To (partly) overcome these challenges and adhere to the best practices above, an ADFv2 unit test project is created as follows:
- Setup Azure DevOps CI/CD project to make tests repeatable
- Create build artifact containing all scripts and deploy ADFv2 ARM template, SQLDB bacpac and csv file in release pipeline. By making data source part of the release pipeline, external dependencies are limited and more isolated
- Run two ADFv2 pipelines using SQLDB and ADLSGen2 using pytest and propagate test results to the test tab in Azure DevOps. This way, there is self-checking of test results
Typically, an ADFv2 project contains multiple pipelines. Unit tests on pipelines tests can be run in parallel using pytest-xdist. This way, testing can be made faster (although milliseconds will never be achieved, total test time is roughly equal to longest running ADFv2 pipeline). Using the existing code base with pytest, pyodbc and azure-blob-storage new tests can be timely created. See also picture below.

In the remaining of this blogpost, the project will be explained in more detail. In the next chapter, the project will be deployed.
1. Setup Azure DevOps CI/CD project
In this chapter, the project comes to live and ADFv2 unit test project will be created. In this, the following needs to be done:
- 1.1 Prerequisites
- 1.2 Create Azure DevOps project
- 1.3 Create Service connection
- 1.4 Configure and build/release YAML pipeline
1.1 Prerequisites
The following resources are required in this tutorial:
- Azure Account
- Azure DevOps
- Azure CLI (recommended, also for troubleshooting)
Subsequently, go to the Azure portal and create a resource group in which all Azure resources will be deployed. This can also be done using the following Azure CLI command:
az group create -n <<your resource group>> -l <<your location>>
1.2 Create Azure DevOps project
Azure DevOps is the tool to continuously build, test, and deploy your code to any platform and cloud. Create a new project in Azure DevOps by the following tutorial. Once you create a new project, click on the repository folder and select to import the following repository:
See also the picture below.

1.3 Create Service connection
A Service connection is needed to access the resources in the resource group from Azure DevOps. Go to project settings, service connection and then select Azure Resource Manager, see also picture below.

Select Service Principal Authentication and limit scope to your resource group which you created earlier, see also picture below.

By default, the Service Principal (SPN) of the service connection has Contributor rights to the resource group. However, for this pipeline, the SPN needs Owner rights (or additional User Access Administrator rights next to Contributor) on the resource group, since the ADFv2 MI needs to get granted RBAC rights to the ADLSgen2 account. When clicking on “Manage Service Principal” on your service connection in Azure DevOps, the application id can be found. Use the following Azure CLI script to assign owner rights to the SPN (can also be done in portal):
# get your subscription id
az account list
# create role
az role assignment create --assignee "<<application id>>" --role "Owner" --scope "/subscriptions/<<your subscription Id>> /resourcegroups/<<resource group name>>"
Finally, verify if the SPN was assigned Owner role to your resource group in the Azure Portal or using CLI command below.
az role assignment list --resource-group <<resource group name>>
1.4 Configure and build/release YAML pipeline
Go to your Azure DevOps project, select Pipelines and then click “New pipeline”. Go to the wizard, select the Azure Repos Git and the git repo you created earlier. In the tab configure, choose “Existing Azure Pipelines YAML file” and then azure-pipelines.yml that can be found in the git repo, see also below.

Subsequently, the following variables need to be substituted with your own values:
variables:
#
# 1. Azure DevOps settings, change with your own
AzureServiceConnectionId: '<<your service connection Id>> '
SUBSCRIPTIONID: '<<your subscription Id>> '
Once the variables are substituted, the pipeline is created and run immediatelly, see below.

After the job is run, all resources are deployed and tests are executed. In the next chapter, the results are verified.
2. Verify tests results
In the first step of the Azure DevOps pipeline, ADFv2, SQLDB and ADLSgen2 are deployed. After deployment is done, it can be verified using Azure CLI whether all resources are deployed.
az resource list -g <<your resource group>>
In the second step of the Azure DevOps pipeline, two ADFv2 pipelines are tested. It can be verified in the ADFv2 monitor tab whether both pipelines were executed. In the pytest results, it can be verified whether the tests were successful.

The following tests are executed for both pipelines:
adfv2_dataflows_adlsgen2_delete_piicolumns:
- Pipeline returned HTTP 200 after being triggered by REST
- Check whether not time out occurred in the pipeline
- Check whether table OrdersAggregated was created and does not contain NULL values in comment columns
adfv2_dataflows_sqldb_remove_nullvalues:
- Pipeline returned HTTP 200 after being triggered by REST
- Check whether not time out occurred in the pipeline
- Check whether file AdultCensusIncomePIIremoved.parquet can be found in the curated file system of ADLSgen2
- Check whether PII sensitive age column was removed from parquet file
The tests can also be verified in the monitor tab of data factory, see below.

3. Conclusion
Unit testing is a software engineering practice that focuses on testing individual parts of code. Speed, isolation, repeatability, self-checking and timeliness are best practices for unit testing. In this blog, an ADFv2 unit test project is described that leverages the power of Azure DevOps, SQLDB bacpac and pytest to make tests more isolated, repeatable and self-checking. See also the picture below.
