What are Streaming and Stream Processing?

William Mclane
4 min readJan 19, 2021

Can we have a candid conversation about the technology industry? Most of us live and breathe technology everyday. From integration of technology into our everyday lives to the career paths we take, technology has become a foundation for everything we do.

But can we recognize something that we all know to be true, but seem unwilling to accept or at the very least acknowledge? In the technology industry we are constantly finding new ways to do the same thing and calling it something different. From a marketing perspective this is great, but from an architecture perspective this can be problematic.

Take for example the trend over the last few years for streaming and/or stream processing. What is streaming, how is it different from stream processing, what problems are solved by streaming and/or stream processing? There are many questions about streaming, but are streaming and stream processing just a new way to do the same thing but with a different name?

Adoption of open source software has increased awareness of technology use cases. Streaming is only one example. Solutions like Apache Kafka (“A distributed streaming platform”) and Apache Pulsar (“a cloud-native, distributed messaging and streaming platform”) have been built with the primary purpose of streaming in mind.

But what the heck is streaming?

Merriam-Webster defines streaming as 1: “the act, the process, or an instance of streaming data or of accessing data that is being streamed” and 2: “an act or instance of flowing”.

Well, that really cleared things up. Didn’t it? In its simplest form streaming is the process of providing a flow of data. Take for example one of the first things most people think of when they hear “streaming,” Netflix. Netflix revolutionized how we consume data. They built a platform that allows us to access a wealth of content on demand. But how is Netflix different from traditional television? Sure, they have access to data on demand, but so does traditional television. Sure, they transmit their data over a different medium, TCP/IP versus coax or satellite. Sure they have a different interface to accessing their data compared to traditional television, but is Netflix really doing anything different compared to traditional television solutions like cable or satellite?

I am not talking about the business model of Netflix, because the business model has arguably changed the game with regards to access to content. Netflix, like traditional television, has a stream of data that people want to access so they “stream” their data to users for consumption, which in reality is not any different than what traditional television provides.

The Netflix example is a simpler visualization of what I see happening around the concepts of “streaming” and “stream processing”. Apache Kafka hit the open source market as one of the leading solutions for streaming, but streaming data for consumption has been around for years. Financial services companies have been providing market data distribution and streaming market data since the early 1980s. Messaging solutions providers like IBM, TIBCO, RedHat, Microsoft, and Amazon have been providing that ability to provide data distribution and communication between applications for years. So how are solutions like Apache Kafka and Apache Pulsar any different?

The reality is they are not, both Kafka and Pulsar are built on a foundation of providing a way to distribute data between applications. Sure they are purpose built to solve some specific challenges in data distribution. Sure they have been built and have added features to allow for unique ways to distribute and ultimately process data. All that is true but at their foundation they provide the same patterns for streaming that have been present for years.

Solutions like Kafka and Pulsar all provide the ability to distribute data as input, ultimately process data and then redistribute that data as output. Solutions like Kafka allow for applications to consume data and process that data outside of the distribution platform and have features built into the distribution platform that are optimized for data processing. Whereas solutions like Pulsar allow for the processing of the data as a part of the distribution platform.

The key here is that the concept of streaming isn’t new; the concept of stream processing isn’t new. We have new ways to send and receive data, we have new ways and approaches to processing that data but streaming of data has been around for a long time.

What has changed is the number of rivers that have to feed into our flows. Early on the number of rivers was relatively small and the amount of data was more like creeks versus rivers. Today we have mighty rivers of information feeding into massive river deltas that ultimately feed the oceans of information we need to access.

That is where these new technologies like Kafka, Pulsar and many others can provide significant benefit. Providing additional entry points and processing power to servicing the massive amount of data that is flowing in our Enterprise Event Stream.

--

--

William Mclane

Messaging Evangelist, with a background in Computer Science and Communications working on messaging and streaming communication for over 20 years.