Are you looking to find how you can use the filter activity within the Azure data factory or maybe you are looking for a solution to the scenario where you get an input array and out of that you want to filter out few values? Whatever be the reason for filtering out your input variable or parameter or output from other activities like getmetadata, filter activity is the way to go forward. In this article, I will take you through a step-by-step process of creating the pipeline and using the filter activity to remove unwanted values and keep only the meaningful values out of your variable or parameter. Let’s get into the details of the Azure Data Factory filter condition example.
- 1 What is the filter activity in the Azure data factory (ADF V2)?
- 2 How to use filter activity in the Azure data factory (ADF V2) pipeline with example?
- 3 How to check the output of the filter activity in the Azure data factory pipeline?
- 4 What is the difference between the if activity and filter activity in the Azure data factory?
- 5 Recommendations
- 6 Final Thoughts
What is the filter activity in the Azure data factory (ADF V2)?
Filter activity is used to filter out the input array based on certain conditions. You have to provide the input array and it is mandatory to provide the certain condition as well. Once you provide both of these two inputs, then filter activity will filter the input variable based upon the condition and generate an output array. This output can further pass to another activity to work on it.
For example you will use the filter activity when you want to remove a certain filename from the list of filenames that you want to copy or export.
How to use filter activity in the Azure data factory (ADF V2) pipeline with example?
- Go to the Azure data factory account and create one demo pipeline I am giving the name as filter-activity-demo pipeline. You can give any name as per your need or you may be using your existing pipelines.
- Go to the variable section under the variable tab create one variable with the name fileNames. You can give any different name if you want. Keep the type of this variable as an array because we want to pass this array as an input to our filter activities.
- Set the value for this variable as an array of the file names. You can use the list of fileNames given below for demo purpose from here and just go and copy paste into your variable value.
['employee.txt', 'sales.txt', 'mkt.json', 'emp.parquet', 'customer.txt','zen.yml']
- Go to the activity search box and type filter and in the result section, you will see the filter activity. Just drag and drop this filter activity into the adf pipeline designer tab.
- Under the setting tab of filter activity you can see two properties one is items and another one is condition. In the items property, you have to give your input array which in our case is a variable named ‘fileNames’. Click on add dynamic content and at the bottom, you will see the variable just click on that variable and it will add the expression for variable automatically.
- In the conditions property, you have to provide the condition which will filter out the list of files. Let’s say for example we want to keep only the file name which is of type ‘.TXT’ rest all we want to filter out.
Items : @variables('fileNames') Condition : @contains(item(), '.txt')
- For this case we have to write the condition, For putting the condition we are going to use the contains function of string. Contains function will check whether the string is containing the ‘.txt’ or not. If a file name contains ‘.TXT’ then it will return true otherwise it will return false.
Our pipeline is ready to run just go to the debug tab. Click on debug this will execute the pipeline. If you have followed all the steps mentioned above your pipeline will run successfully and get completed.
How to check the output of the filter activity in the Azure data factory pipeline?
Under the output tab of the pipeline check the output of the pipeline we executed in the above example. You will see the filter activity there and just click on the input of the filter activity. There you will see the input array and the conditions which we are passing. Now click on the output of the filter activity you will see the filtered result array. You can see that it contains all the filenames with ‘.TXT’ and there is no other file name without ‘.TXT’
What is the difference between the if activity and filter activity in the Azure data factory?
|If Activity||Filter Activity|
|If activity is used to verify if certain condition is meeting or not. For example if you want to check if a file created within a folder or not.||Filter activity is used to filter out the input array only based on the condition provided. For example you have array of ids and you want to filter input array where ids greater than 1000.|
|Output of the if activity is boolean value i.e either true or false only.||Output of the filter activity is array. Number of items within this output depends on input conditions.|
Microsoft Official Documentation for Azure Data Factory Filter Activity Link
Most of the Azure Data engineer finds it little difficult to understand the real world scenarios from the Azure Data engineer’s perspective and faces challenges in designing the complete Enterprise solution for it. Hence I would recommend you to go through these links to have some better understanding of the Azure Data factory.
You can also checkout and pinned this great Youtube channel for learning Azure Free by industry experts
By this, we have reached to the last section of our article. sofa in this article we have learned what is the filter activity in adf, how you can use the filter activity in the Azure data factory pipeline. we also talked about what are the various real world use cases and scenarios where you will use filter activity along with getmetadata activity. I also explain the difference between the if activity and the filter activity of adf. I hope now you are conceptually clear on how you can use the filter activity and solve some of your business use cases.
Please share your comments and suggestions in the comment section below and I will try to answer all your queries as time permits.
Thank you for reading. See you in the next insightful article.
Deepak Goyal is certified Azure Cloud Solution Architect. He is having around decade and half experience in designing, developing and managing enterprise cloud solutions. He is also Big data certified professional and passionate cloud advocate.