Continuous Deployment of Treasure Workflow with Azure DevOps

Treasure Workflow can be used with version control tools like Git and CI/CD. However, a recent enablement now allows you to set up a continuous deployment pipeline using Azure Repos and Azure Pipelines in Azure DevOps Sevices.

Prerequisites

Azure Repos
Azure Pipelines
Treasure Data User Account

Azure Repo repository

If you have not already done so, create a git repository on Azure Repo for your workflow project. For more information on how to use Azure Repos, see the Azure Repos documentation.

I recommend having the following directory structure in your Treasure Workflow repo.

Copy

Copied

my_project
├── README.md
├── config
│   ├── params.test.yml     <- Configuration file for run through test. Mirrors params.yml except for `td.database`
│   └── params.yml          <- Configuration file for production
├── awesome_workflow.dig    <- Main workflow to be executed
├── ingest.dig              <- Data ingestion workflow
├── py_scripts              <- Python scripts directory
│   ├── __init__.py
│   ├── data.py             <- Script to upload data to Treasure Data
│   └── my_script.py        <- Main script to execute e.g. Data enrichment, ML training
├── queries                 <- SQL directory
│   └── example.sql
├── run_test.sh             <- Test shell script for local run through test
└── test.dig                <- Test workflow for local run through test
└── azure-pipeline.yml           <- Deploy this repo to Treasure Workflow through Azure Pipeline (This file is automatically created when a new pipeline is created)

Directory Layout

For more information on custom script development, see the blog post "py> operator development guide for Python users."

Configure Azure Pipeline

To createa new Azure Pipeline for your project:

Select Azure Repos Git in Select section
Select Python Package in Configure section

Set Variables on the right-hand side.
Enter td_apikey into the Name field
Enter your TD API key in the Value field.
Select Keep this value secret .
Select Save variables .

Update the Azure Pipeline configuration file for the project. You must update tdWorkflowEndpoint for your account region and tdPrjName for your workflow project in variables

Copy

Copied

jobs:
- job: 'td_workflow_deployment'
  pool:
    vmImage: 'ubuntu-latest'
  strategy:
    matrix:
      Python311:
        python.version: '3.11'

  variables:
    tdWorkflowEndpoint: api-workflow.treasuredata.com
    # US: api-workflow.treasuredata.com
    # EU01: api-workflow.eu01.treasuredata.com
    # Tokyo: api-workflow.treasuredata.co.jp
    # AP02: api-workflow.ap02.treasuredata.com
    # Ref. https://docs.treasuredata.com/display/public/PD/Sites+and+Endpoints
    tdPrjName: azure_devops_wf # YOUR PROJECT NAME


  steps:
  - script: pip install tdworkflow mypy_extensions
    displayName: 'Install tdworkflow lib'

  - task: PythonScript@0
    inputs:
      scriptSource: inline
      script: |
        import os
        import sys
        import shutil
        import tdworkflow
        endpoint = "$(tdWorkflowEndpoint)"
        apikey = "$(td_apikey)"
        project_name = "$(tdPrjName)"
        shutil.rmtree('.git/') # Remove unnessary temp files
        client = tdworkflow.client.Client(endpoint=endpoint, apikey=apikey)
        project = client.create_project(project_name, ".")

For more information on Azure Pipeline configuration, see "Azure Pipelines: Create and target an environment."

Configure TD Credentials

For workflows to be pushed to Treasure Data, you must configure a TD Master API Key for the project. See Getting Your API Keys for more information.

Start Deploying

Azure Pipelines pushes your workflow to Treasure Data and does this every time you push a change to Azure Repos. You can change a deployment condition depending on your needs. Once deployment is completed, Treasure Workflow on Treasure Data is updated automatically.

Continuous Deployment of Treasure Workflow with Azure DevOps

Prerequisites

Azure Repo repository

Configure Azure Pipeline

Configure TD Credentials

Start Deploying

Latest from our blog

Orchestrate dbt with Treasure Workflow Episode 2

Upcoming Evolution of Treasure Data Query Engines

Journey to Containers in Core Services Worker Platform