CI/CD pipeline for data movement from Azure Repository to ADLS

Author: Akhil M Anil || DevOps Engineer


Azure Repos is a cloud-based version control system that helps organizations manage and collaborate on their code and other files. It is an essential tool for software development teams to keep track of changes and maintain a history of their codebase.

On the other hand, Azure ADLS is a highly scalable and secure data lake storage solution that enables organizations to store, manage, and analyze large amounts of structured and unstructured data. It is a cost-effective solution that can store and process data at any scale, making it an ideal option for big data analytics and machine learning workloads.

In this blog, we will discuss how to move data from Azure Repos to Azure ADLS. We will explore the different methods and tools available for data movement and the benefits of using Azure ADLS as a data storage solution. Additionally, we will also discuss the best practices for data migration and how to ensure data integrity during the process.

Steps:

1. Create an ADLS in Azure.
  • Open your Azure portal and select storage account. Create a storage account for data movement.
  • Click on the containers and create a new container, where the data from Azure Repository will be stored.
2. Create a new repository in Azure Repos for storing data which we will move to ADLS using CI/CD pipeline.
  • Open your Azure DevOps Project, go to project settings and select repositories and click on Create.

  • Add the name of the repository and click create. 
  • This will create a repository with main branch. Navigate to Azure Repos and select the repository which we created now.

  • Now we add some data to the repository. Let's create a new folder ADLS and a subfolder Data and we will add some sample data to the folder and commit.

3. Create CI/CD pipeline for ADLS data movement.
  • The pipeline will have two stages:
    • Build Stage: The pipeline will create an artifact which contain the data for ADLS data movement.
    • Deploy Stage: The pipeline will move the data from artifact to ADLS.
  • Build Stage:

    • In the build stage, we initially checkout the repository where our data is residing.
    • The data is copied using the CopyFiles task and stored in the agent.
    • Finally, we publish the copied data as an artifact using PublishBuildArtifact task.
  • Deploy Stage

    • In the deploy stage, initially we download the artifact which we created in the build stage.
    • Now we can use AzureCli task for uploading the artifact to an ADLS container. 
    • We will be using az storage blob upload-batch command to upload files from a local directory to a blob container.

4. Run the pipeline and verify the data is moved to ADLS.
Full code can be found here: ADLS-CI/CD
  • Create a new pipeline with the azure-pipelines.yml file, make sure you have the service connection for Azure Resource Manager. Replace the name of service connection and run the pipeline.
  • Make sure you set the trigger to your preferred branch so that the latest commit to the branch will automatically trigger the pipeline and the files will be automatically moved to ADLS.
  • Let's run the pipeline.


  • The pipeline ran was successful and the files were moved to ADLS.
  • Now let's check our ADLS container for the files.

Connect me via:

References: 


Comments

Popular posts from this blog

Configure an Azure DevOps self-hosted Windows agent in Docker

Install Java silently using powershell in Azure Windows VM

List of Repositories - Authorize Rest API calls using Job Access Token