#TDTechTalk : 5 challenges in CDP

During the November 2022 TDTechTalk Meetup, five developers from various engineering team presented challenges and solutions in their area.

This post is a summary of the meetup. We are very happy to see the return of in-person and hybrid meetups and conferences here in Japan!

I usually work from home in the mountainous countryside in Kiso, Japan. This time, I took a three-hour train ride to Tokyo to attend the meetup there as one of the speakers.

The cover picture captured the catered meals I ate after my presentations. Participating in offline events comes with such kinds of fun!


1) Embulk in TD, and in the future

embulk in td

(Follow the image link above to see the original presentation slides, in Japanese.)

Dai talked about Embulk, an open-source bulk data loader. Embulk was released as open source initially from Treasure Data. Many companies have adopted the open-source Embulk inside their business.

Embulk is also the key component of Treasure Data's "Data Connector" to import customer data into Treasure Data from varieties of data sources, such as AWS, types of relational databases, file servers, and cloud services like Shopify, Zendesk, and more.

Dai is one of the core contributors of Embulk. Starting from how he addressed technical debts in Embulk and in Data Connector, he talked about his ambivalence about being a volunteer OSS maintainer, and an employee of the for-profit company behind OSS at the same time.

2) Journey to Improve Stability and Scalability of Plazma


Keisuke, from the Storage team works on Plazma, the petabyte-scale storage system at Treasure Data. When data is ingested into Treasure Data by the Data Connector (e.g. Embulk), it is stored in Plazma.

Keisuke's story was the scariest at this meetup. Because it was about the potential system meltdown which was about to happen if no countermeasures were implemented within 72 hours!!!

His retrospection in the presentation was very insightful.

3) Hive Distributed Profiling System in Treasure Data


Okumin, from the Query Engine team, which is responsible for running queries (SQL) over Plazma in a fast and cost-efficient way. At Treasure Data, customers run a variety of queries in large volumes and sometimes suffer from slow queries.

To tackle slow queries running inside Hive, one of the query engines used at TD, Okumin first collected tons of stack traces of JVM into Plazma, our own storage system. Then he created a profiler for the massive amount of traces. He identified several bottlenecks visually, and contributed to Hive by fixing those issues.

He praised Treasure Data as an exciting place for query engineers, since incredibly large volumes of real-world queries bring interesting challenges.

4) Treasure Data CDP in 30 Minutes: Magical Technology Expo by Applications Team

(In Japanese)

If you know Treasure Data from the early period, you may still recognize Treasure Data as a "kinda cloud-based data warehouse (DWH) or data lake". Yes, Treasure Data offers a platform for that purpose, but not only that nowadays.

Aoki, from the Application team, introduced Treasure Data CDP (Customer Data Platform) built on top of Plazma, Query Engine and many components. CDP is a flagship product of today's Treasure Data's product portfolio. CDP gives valuable insights about customers for many roles in enterprise: marketing, sales, customer service, operations, digital engineering, and finance.

Aoki also demonstrated some "magical" technologies (ab)used in the early days of CDP. He and his team are continuously modernizing the implementation to make our CDP more usable and robust.

Aoki's talk refreshed the attendee's impression of Treasure Data products in a very amusing fashion.

5) Empowering App Dev by Nicely-crafted High-Level AWS Components


I, Taz, from the Operational DB team, introduced an in-house application development platform for application developers across Treasure Data and affiliated companies, for customers' custom applications.

This story showed a different area of Treasure Data compared to the other 4 stories. Since the in-house app development platform is quite new and is being incubated internally.



We hope you find these five presentations interesting and insightful. In case you missed the meetup, check the video recording on YouTube (only in Japanese). I hope we will see you at other meetups and conferences coming in 2023!!


Many thanks to Wovn Technologies for providing the venue for the meetup. At Treasure Data, we are empowered by Wovn's internationalization solution.


The below photos captured what I enjoyed during this "business" trip.

Latest from our blog

Integrating Kafka with Treasure Data

Visual Studio Code extension for Treasure Data

Boost Your Data Analysis Workflow with TD Query Tool for VS Code.

Hive Table scan optimization

Hive highly parallelized table scans net 20-30% speed increase with PlazmaDB!