On Data Migration. Q&A with Eric Hanson

Q1. What are the typical challenges when performing data migration? 

Performing data migration requires you to decide what to migrate, build tables in the target system with the right schema, do initial movement of the data, set up CDC, and monitor CDC. Doing this with your own scripts or apps can take person-weeks of effort even for relatively small projects. We’re slashing the work associated with these challenges to a tiny fraction of what it traditionally has been.

Q2. To what extent can so-called “change data capture (CDC)” software pipeline design patterns help?

CDC software pipeline design patterns, and no-code tools like Flow, really help because they reduce the labor needed to set up and monitor and maintain CDC transfers. That can enable you to build apps, say to migrate to a new modern platform like SingleStore, or enrich a legacy app with real-time analytics, in a fraction of the time. Or do things you couldn’t do before because you couldn’t afford the developer time.

Also, one thing many people might miss is that you need CDC for migrations, not just ongoing CDC augmentations. That’s because it can take a few days to move a large database. While you’re moving it, the source system typically has to remain live. So you need CDC to capture changes from between when you started the migration and when you finish it.

Q3. What are the benefits of setting up CDC pipelines?

Once you have a CDC pipeline set up, it keeps your target SingleStore database updated with fresh data. That enables you to scale your read workload against the target SingleStore database, doing analytics at 100x faster speeds than the source typically can do. It’s an enabler for real-time analytics. And it ultimately enables you to move from “brown” to “green” technology, from legacy to modern DBMS benefits.

Q4. You just announced SingleStore Flow. What is it? And what is it useful for?

Flow is a no-code tool for data migrations and ongoing CDC from database sources, including SQL Server, Oracle, MySQL, and PostgreSQL. You can install it in minutes, then connect to your source, connect to your target, pick your tables, and go. Flow has two main components: Ingest and XL Ingest. Ingest is for schema transfer, small table transfer, and ongoing CDC. XL Ingest is for moving large tables in manageable-size chunks. It’s easy and reliable. If the source, target, or Flow itself restarts, it automatically reconnects and picks up where it left off. The labor savings are a huge benefit. You can move a database with dozens of tables in half an hour. Bigger data sets take longer, but setup is easy.

Q5. Does this solution require coding? 

Nope. It’s no-code.

Q6. If SingleStore Flow is indeed a no-code solution, how do you control that the result of this “migration” satisfies your initial requirements?

You can easily connect to the target and browse the created schema and query the data to see if you’re getting what you want. And Flow has graphical dashboards and logs to show you what it’s doing and what it has done in the last couple of days.

Q7. Could you tell how are you able to achieve accelerated migrations and efficient CDC pipelines?

We have a no-code, simplicity-first mindset. Flow is fast, but first and foremost, it saves your personal time and the time of your devs and devops people. Flow gets its speed and resilience for large table transfer from its parallel, partitioned approach. Multiple chunks are brought over on multiple threads by XL Ingest. Interestingly, the limits in data transfer speed tend to be the speed of the source, not the target. SingleStore can soak in the data Flow feeds it faster than you can extract it from legacy sources.

Q8. Do you have real-world use cases showcasing SingleStore Flow’s capabilities? Which ones? 

Absolutely. We’re really proud of our work with Origin Energy. Their infrastructure couldn’t handle rising analytical demand. They had SAP data silos hosted on Oracle databases that needed to be urgently integrated with other enterprise data. Oracle GoldenGate and Qlik Replicate couldn’t handle the volume, at least not economically. They had data silos and multiple versions of the truth. With Flow, they were able to create a cloud data lake and data warehouse from this data with ease. 

Qx. Anything else you wish to add?

Our future plans for Flow are just so exciting. You’ll see us add more sources, including SingleStore, and a universal JDBC source, make a Flow cloud service, and make Flow not just a no-code platform but an effortless data transfer system. It’ll have a hybrid of no-code UI/UX and AI Agent technology for setup, monitoring, troubleshooting, and automated operations. You won’t believe how easy it can be to get data where you want it to be to get the most out of it for your business.

……………………………………

Eric Hanson is a Director of Product Management at SingleStore, responsible for query processing, storage and extensibility feature areas. He joined the SingleStore product management team in 2016.

Resources

For more information about SingleStore Flow and to explore its capabilities, please review this recent webinar.

Sponsored by SingleStore.

You may also like...