How to run python script in Azure Data Factory

Are you wondering how to execute a python script from the azure data factory (ADF) pipeline, then you have reached the right place. In this blog, I will take you through a step-by-step approach with the practical example demo of calling the python script from inside the azure data factory pipeline. Azure data factory is one of the most popular services of the Azure cloud platform for performing the data migration from on-premises data center to the Azure cloud. You may be looking to call the python script through azure data factory to perform some transformation or scripting work based on your business need. Whatever will be the reason for doing so, this article will help you to guide in using python script and ADF together. Let’s dive into it.

1. Create the Azure Batch Account

2. Create the Azure Pool

3. Upload the python script in the Azure blob storage

4. Add the custom activity in the Azure Data factory Pipeline and configure to use the Azure batch pool and run the python script.

Run Python Script from Azure Data Factory Pipeline Example in Detail

Prerequisite:

For executing this example the prerequisite are as follows :

  1. You should have an active subscription to create the Azure data factory Account.
  2. For existing ADF you should have contributor access to make changes in the ADF pipeline.
  3. Azure Batch account needed.
  4. Azure Blob storage is needed.

Implementation:

  • First of all, we have to create the Azure Data factory account. If you already have the azure data factory account available you can go to the next step directly. In case you don’t have the azure data factory account then you can follow the attached link article to create your first Azure data factory account: How to create Azure data factory account Step by Step guide
  • Create the Azure batch account.

Azure Pool in the Azure Batch account
Figure 1: Azure Pool in the Azure Batch account
  • Create your Python script or if you already have the python script ready then just go to the blob storage and upload. In case if you don’t have the blob storage account created, then please create one storage account as well.

Python script in the Azure Storage account
Figure 2: Python script in the Azure Storage account
  • Now its time to go the Azure data factory account and create the pipeline. For our demo purpose I am creating the pipeline with name ‘Python-adf-demo

Create Azure Data Factory Pipeline
Figure 3: Create Azure Data Factory Pipeline

In the activity search tab either write custom and select the custom batch service activity or you can directly go to the batch service under the activities and select the custom activity. Drag this activity in the pipeline.

Custom Activity in the Azure Data Factory account
Figure 4: Custom Activity in the Azure Data Factory account

Now configure the custom activity of the azure data factory. You can provide the name of the activity. Under the Azure batch select the Azure batch linked service. However, as of now, we don’t have any azure batch linked service so let’s just create one. Click on the new as follows :

How to create Azure Batch Linked Service

  • Give the proper name for the Azure Batch Linked service.
  • It will ask for the account key, go to the azure batch account ->Keys and copy the account key from there.

Get Keys and other details for Azure Batch account
Figure 5: Get Keys and other details for Azure Batch account
  • Copy the azure batch URL as well and entered in the ADF linked service.
  • Enter the pool name as well. For getting the pool name go to the Azure batch Account -> Pools and copy the pool id. Now paste this under the pool name field in the azure batch linked service.
  • Enter the storage account linked service where you will going to place the script file. If you already have a linked service for that specific blob storage choose that otherwise create one.
  • Finally, click on the test connection to check if all looks ok. Once your test connection is successful that means you have successfully completed the Azure Batch Linked Service creation.

Azure Batch Linked Service Creation
Figure 6: Azure Batch Linked Service Creation

Configure Custom Activity in ADF

Now go back again to the pipeline’s custom activity. You have Azure batch linked service is available just select that.

Configure Custom Activity in the Azure Data Factory-1
Figure 7: Configure Custom Activity in the Azure Data Factory-1

Go to the Settings tab. Under it now type in the command which you want to execute. As I wanted to run the python script only, hence I will type the command to execute the python script which looks like this: python main.py

Here main.py is the name of my python. In case if you have some run time arguments also to pass with the python script you can provide here.

To ensure you Custom Activity of azure Data Factory pick up your script file, you have to provide the azure blob storage correct path and linked service associated with it.

Configure Custom Activity in the Azure Data Factory-2
Figure 8: Configure Custom Activity in the Azure Data Factory-2

Now our pipeline is ready to run. Just click on the debug and try to run and test the pipeline. If you have followed all the steps defined then you must be able to successfully run the python script in the Azure data factory using this pipeline.

What are the use cases to run python script from the Azure Data factory

Though Azure data factory is meant for doing the ETL work (Extract, Transform and Load) and Lift & Shift your workload. However, sometimes you may want to execute some other custom services or applications as well through it. For example, you have requirements like: Once my data load is completed I should trigger the custom application. Now you may like to use python script to start your custom application.

Or let’s say once your data load is completed you wanted to delete some server or database. For this, you may want to write the python script and execute it.

Hence there could be some scenarios where you may want to execute the python script in the Azure Data factory.

Is there any alternative to run python script in Azure Data Factory except Custom Activity

Yes, there is an alternative to run the python script in the Azure Data Factory. However, it depends on your business need and what exactly you are trying to accomplished from your python script. For example, you can use python activity from under the Azure Databricks activities list. Using it also you can execute the python code.

Can we run only python script from Azure Data Factory

No you can also run some other languages scripts as well from the azure data factory. Only thing which you need to take care is to create the correct Azure pool which has all those libraries installed to run the script. Also, change the command in the custom activity as per the language of the script you want to execute.

Recommendations

Most of the Azure Data engineer finds it little difficult to understand the real world scenarios from the Azure Data engineer’s perspective and faces challenges in designing the complete Enterprise solution for it. Hence I would recommend you to go through these links to have some better understanding of the Azure Data factory.

Azure Data Factory Insights

Azure Data Engineer Real World scenarios

Azure Databricks Spark Tutorial for beginner to advance level

Latest Azure DevOps Interview Questions and Answers

You can also checkout and pinned this great Youtube channel for learning Azure Free by industry experts

IT Skills Upgrade – YouTube

Final Thoughts

By this, we have reached the last section of the article. In these article, we have learned how we can use the python script from the Azure data factory using the Azure Batch service and custom activity. Hope you have found this article insightful and learned the new concept of custom activity in the azure data factory.

Please share your comments suggestion and feedbacks in the comment section below.

DeepakGoyal

Deepak Goyal is certified Azure Cloud Solution Architect. He is having around decade and half experience in designing, developing and managing enterprise cloud solutions. He is also Big data certified professional and passionate cloud advocate.