CI/CD pipeline for data movement from Azure Repository to ADLS
Author: Akhil M Anil || DevOps Engineer
Azure Repos is a cloud-based version control system that helps organizations manage and collaborate on their code and other files. It is an essential tool for software development teams to keep track of changes and maintain a history of their codebase.
On the other hand, Azure ADLS is a highly scalable and secure data lake storage solution that enables organizations to store, manage, and analyze large amounts of structured and unstructured data. It is a cost-effective solution that can store and process data at any scale, making it an ideal option for big data analytics and machine learning workloads.
In this blog, we will discuss how to move data from Azure Repos to Azure ADLS. We will explore the different methods and tools available for data movement and the benefits of using Azure ADLS as a data storage solution. Additionally, we will also discuss the best practices for data migration and how to ensure data integrity during the process.
Steps:
- Open your Azure DevOps Project, go to project settings and select repositories and click on Create.
- Add the name of the repository and click create.
- This will create a repository with main branch. Navigate to Azure Repos and select the repository which we created now.
- Now we add some data to the repository. Let's create a new folder ADLS and a subfolder Data and we will add some sample data to the folder and commit.
- The pipeline will have two stages:
- Build Stage: The pipeline will create an artifact which contain the data for ADLS data movement.
- Deploy Stage: The pipeline will move the data from artifact to ADLS.
- Build Stage:
- In the build stage, we initially checkout the repository where our data is residing.
- The data is copied using the CopyFiles task and stored in the agent.
- Finally, we publish the copied data as an artifact using PublishBuildArtifact task.
- Deploy Stage
- In the deploy stage, initially we download the artifact which we created in the build stage.
- Now we can use AzureCli task for uploading the artifact to an ADLS container.
- We will be using az storage blob upload-batch command to upload files from a local directory to a blob container.
- Create a new pipeline with the azure-pipelines.yml file, make sure you have the service connection for Azure Resource Manager. Replace the name of service connection and run the pipeline.
- Make sure you set the trigger to your preferred branch so that the latest commit to the branch will automatically trigger the pipeline and the files will be automatically moved to ADLS.
- Let's run the pipeline.
- The pipeline ran was successful and the files were moved to ADLS.
- Now let's check our ADLS container for the files.
LinkedIn: Akhil M Anil | LinkedIn
Comments
Post a Comment