Few individuals have had as much impact on the market for real-time data streaming as Karthik Ramasamy, who is the creator of Apache Storm and Apache Pulsar and the Head of Streaming at Databricks. That’s why we chose him as a Person to Watch for 2023.
Here is a recent conversation we had with Ramasamy:
Datanami: Every year, real time data processing is predicted to go mainstream, but so far it hasn’t broken out of its niche status. Will 2023 be different, and if so, why?
Karthik Ramasamy: At Databricks, we predict 2023 is going to be yet another great year for real time data processing. Streaming workloads on our platform have been growing at 140-150% YoY (as presented in Data + AI summit 2022) and we are running more than 7 million of them. The launch of Delta Live Tables (DLT) makes streaming extremely simple, using declarative language like SQL and automated operations. It is definitely going mainstream.
Datanami: What will be the biggest impediments to success with stream data processing in 2023? What are the biggest technical or business hurdles?
Ramasamy: One of the biggest challenges will be around new APIs and languages to learn. It’s difficult to enable existing data teams when they’re so familiar with the languages and tools they already know. Another challenge is the need to build the complex operational tooling required to deploy and maintain streaming data pipelines that run reliably in customers’ production environments. Finally, real time and historical data often live in separate systems, and incompatible governance models can limit the ability to control access for the right users and groups.
Datanami: Databricks wants to be the one-stop-shop for data analytics, machine learning, and stream processing. Why will it succeed?
Ramasamy: The lakehouse architecture is key to success because all the data is stored in a common format. Databricks provides tightly integrated solutions for different types of data processing with a well-known compute engine that is based on open source Apache Spark. In the context of data streaming, Databricks’ Lakehouse offers a single platform for streaming and batch data so data teams can eliminate silos and centralize their security and governance models.
Databricks enables data engineers, data scientists and analysts to easily build streaming data workloads with the languages and tools they already know and with the APIs they already use. We simplify development and operations by leveraging out-of-the-box capabilities that automate much of the production aspects associated with building and maintaining real-time data pipelines.
Datanami: Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
Ramasamy: My favorite hobby is photography. I took a class while in grad school about how to compose what goes in a photo and how to get the correct settings. I mainly shoot photographs of natural scenic beauties. I started with a Nikon SLR film camera and graduated to using slides and then moved to digital SLR. Now phone cameras are so advanced that I just carry my iPhone.
You can read the rest of our interviews with the 2023 Datanami People to Watch here.