Azure Data Factory Tutorial – Create Your First Pipeline – Lesson 3

In this lesson 3 of our Azure Data Factory Tutorial for beginners series I will take you through how to create your first ever pipeline in the ADF. I will also take you through step by step processes creating the various components needed to create the pipeline for example Linked Service, Dataset, integration runtime and triggers. We will also see how you can run your first pipeline in Azure data factory and monitor the output. Besides these I also share my own experience of creating the pipeline in the most efficient way. In case you haven’t gone through my first Lesson 1 of Azure Data Factory tutorial,  I would highly recommend going to lesson 1 to understand the Azure Data factory from scratch because we have covered quick concepts about ADF and how to create your first ADF account. Let’s dive into the tutorial now.

Use Case Scenario

Use case for our first pipeline would be as follows:

There is CSV file available in one folder location in Azure blob storage. You need to copy the this file to different folder in the Azure blob storage using the pipeline in the Azure Data Factory.

Assumptions or Prerequisite

  • You have already create the Azure Blob storage account and a sample CSV file is available inside it. (In case you don’t know how to create the storage account please follow this link : Link)

How to use Copy Data Tool in Azure Data Factory?

In this example we will create our first pipeline using the Copy Data tool. It is basically used to copy the data from source to destination. Copy data tool provides the wizard where it will ask you step by step details about the source and destination and help you to eventually create the pipeline without writing single line of code.

  • Log in to the Azure portal and go to the respective azure data factory studio. Once you reaches to the home page of the adf studio. There you will see the copy data tool icon. Click on it. It will take you to the copy data tool wizard.
Home Page Copy Tool
Figure 1: Home Page Copy Tool

The tool launches a guided multistep process to create a pipeline.

  • On the Properties page, it will ask the task schedule like how many times we want to run the pipeline. You can also schedule the pipeline to run at a specific interval or maybe you can use a tablet window. For now just  select run once only.  click next.

Copy Tool Wizard - Properties
Figure 2: Copy Tool Wizard – Properties
  • In the source tab we have to select the source type, as we are moving the data from blob storage location, Hence select the source type as blob storage and click on new connection to create the linked service.

Image 3

Copy Tool Wizard - Source
Figure 3: Copy Tool Wizard – Source
  • Create linked service wizard open up. Provide the details like linked service name, description. In the integration runtime keep the ‘autoresolveintegrationruntime’ as default. In the storage account name select the source account. Click on test connection. If everything goes fine it will be successful.

Image 4

Create Linked Service
Figure 4: Create Linked Service
  • You will again landed back to copy data tool. You have to just select the path of csv file by browsing the storage account.

 Copy Data Tool Wizard - Source Dataset
Figure 5: Copy Data Tool Wizard – Source Dataset
  • Select the file format in case you want. By default the separator is ‘,’ Hence for now we don’t need to make any changes.

Copy Data Tool Wizard - Dataset configuration
Figure 6: Copy Data Tool Wizard – Dataset configuration

  • In the Destination tab we have to select the destination type. As we are moving the data to blob storage location, Hence select the destination type as blob storage. As our source and destination both are blob storage. Hence no need to create the linked service again. We can choose the same linked service. Click on the browse to select the correct destination folder.
  • Figure 7: Copy Data Tool Wizard – Destination Dataset

    • Select the file format in case you want. By default the separator is ‘,’ Hence for now we don’t need to make any changes.
    • Now you reach to last tab i.e. the setting tab. You have been asked to provide the task name. It is the name of your pipeline. You can choose anything as per your wish. I am giving the name ‘Azurelib_Pipeline_1’. You can provide the description as well for the pipeline.

    image 8

    Copy Data Tool Wizard - Setting
    Figure 8: Copy Data Tool Wizard – Setting
    • In the review tab, you can review all the settings which we have done so far. You have the option to make any changes if you want.

    Figure 9: Copy Data Tool Wizard – Review and Publish

    Finally it will deploy the pipeline and execute the pipeline. You can go to the destination folder and check if the file has been arrived. You can also go to the monitor tab to monitor the execution.

    Microsoft Official Documentation for Azure data factory link

    Final Thoughts

    By this we have reached the last section of our Lesson 3 of Azure data factory tutorial for beginners. In this lesson we have seen how to create our first pipeline from using the copy data tool. I have also shown you how you can create the linked service and dataset.

     In the next lesson we will go deeper into the Azure Data factory and learn new concepts with some exciting practicals.

    Please share your feedback and your comments. In case you have any questions or query please drop them in the comment box below and I will try to answer them as early as possible.

    DeepakGoyal

    Deepak Goyal is certified Azure Cloud Solution Architect. He is having around decade and half experience in designing, developing and managing enterprise cloud solutions. He is also Big data certified professional and passionate cloud advocate.