Continuous Deployment of Treasure Workflow with Azure DevOps

Treasure Workflow can be used with version control tools like Git and CI/CD. However, a recent enablement now allows you to set up a continuous deployment pipeline using Azure Repos and Azure Pipelines in Azure DevOps Sevices.

Prerequisites

  • Azure Repos
  • Azure Pipelines
  • Treasure Data User Account

Azure Repo repository

If you have not already done so, create a git repository on Azure Repo for your workflow project. For more information on how to use Azure Repos, see the Azure Repos documentation.

I recommend having the following directory structure in your Treasure Workflow repo.

Copy
Copied
my_project
├── README.md
├── config
│   ├── params.test.yml     <- Configuration file for run through test. Mirrors params.yml except for `td.database`
│   └── params.yml          <- Configuration file for production
├── awesome_workflow.dig    <- Main workflow to be executed
├── ingest.dig              <- Data ingestion workflow
├── py_scripts              <- Python scripts directory
│   ├── __init__.py
│   ├── data.py             <- Script to upload data to Treasure Data
│   └── my_script.py        <- Main script to execute e.g. Data enrichment, ML training
├── queries                 <- SQL directory
│   └── example.sql
├── run_test.sh             <- Test shell script for local run through test
└── test.dig                <- Test workflow for local run through test
└── azure-pipeline.yml           <- Deploy this repo to Treasure Workflow through Azure Pipeline (This file is automatically created when a new pipeline is created)

Directory Layout

For more information on custom script development, see the blog post "py> operator development guide for Python users."

Configure Azure Pipeline

To createa new Azure Pipeline for your project:

  1. Select Azure Repos Git in Select section
  2. Select Python Package in Configure section

image2

  1. Set Variables on the right-hand side.
  2. Enter td_apikey into the Name field
  3. Enter your TD API key in the Value field.
  4. Select Keep this value secret .
  5. Select Save variables .

image3

  1. Update the Azure Pipeline configuration file for the project. You must update tdWorkflowEndpoint for your account region and tdPrjName for your workflow project in variables
Copy
Copied
jobs:
- job: 'td_workflow_deployment'
  pool:
    vmImage: 'ubuntu-latest'
  strategy:
    matrix:
      Python311:
        python.version: '3.11'

  variables:
    tdWorkflowEndpoint: api-workflow.treasuredata.com
    # US: api-workflow.treasuredata.com
    # EU01: api-workflow.eu01.treasuredata.com
    # Tokyo: api-workflow.treasuredata.co.jp
    # AP02: api-workflow.ap02.treasuredata.com
    # Ref. https://docs.treasuredata.com/display/public/PD/Sites+and+Endpoints
    tdPrjName: azure_devops_wf # YOUR PROJECT NAME


  steps:
  - script: pip install tdworkflow mypy_extensions
    displayName: 'Install tdworkflow lib'

  - task: PythonScript@0
    inputs:
      scriptSource: inline
      script: |
        import os
        import sys
        import shutil
        import tdworkflow
        endpoint = "$(tdWorkflowEndpoint)"
        apikey = "$(td_apikey)"
        project_name = "$(tdPrjName)"
        shutil.rmtree('.git/') # Remove unnessary temp files
        client = tdworkflow.client.Client(endpoint=endpoint, apikey=apikey)
        project = client.create_project(project_name, ".")

For more information on Azure Pipeline configuration, see "Azure Pipelines: Create and target an environment."

Configure TD Credentials

For workflows to be pushed to Treasure Data, you must configure a TD Master API Key for the project. See Getting Your API Keys for more information.

Start Deploying

Azure Pipelines pushes your workflow to Treasure Data and does this every time you push a change to Azure Repos. You can change a deployment condition depending on your needs. Once deployment is completed, Treasure Workflow on Treasure Data is updated automatically.

image4

Latest from our blog

Upcoming Evolution of Treasure Data Query Engines

Journey to Containers in Core Services Worker Platform

Sharing our journey with container technology from Worker Platform that manages the fair scheduling of jobs and helps manage the state of individual jobs.

Automatic Customer Segmentation with Machine Learning

Making Your Auto-Segmentation Model Work For You