11 Unbeaten Capabilities of WSO2 Stream Processor

I hope all of you who are reading through this post well aware about WSO2 Stream Processor. If not, please refer one of my previous post and official site for more information. In this blog post, I will be discussing 11 most impressive and prominent capabilities of WSO2 Stream Processor and its uniqueness compared to other Stream Processor systems in the market.

WSO2 Stream Processor is a lightweight, open source, high performance, stream processing platform that understands streaming SQL queries in order to capture, analyze, process and act in real time. WSO2 Stream Processor is powered by Siddhi CEP engine which is one of the fastest and feature rich CEP engines in the market.

Support for Lambda Architecture

High Level View of Lambda Architecture by Dr. Srinath Perera

Lambda architecture is one of the common practices used in the big data world. It is an architecture style introduced to provide understanding and to solve concerns involved with the use cases that dealt with both stream processing and batch processing.

WSO2 Stream Processor highly aligned to the lambda architecture with its capabilities provided by WSO2 Siddhi CEP engine. Event tables and query API features in Siddhi allow users to achieve the lambda needs easily.

Event Tables / Data Store

Event tables allow Siddhi to work with stored data. Events which arrives in Siddhi engine can join with the stored data to make useful decisions. By defining a schema for tables Siddhi enables them to be processed by queries using their defined attributes with the streaming data. You can also interactively query the state of the stored events in the table.

Sample Query

In the below Siddhi streaming query, we are updating the entries which are in the table when a specific condition met based on the incoming events.

define table RoomOccupancyTable (roomNo int, people int);
define stream UpdateStream (roomNumber int, arrival int, exit int);

from UpdateStream
select *
update RoomOccupancyTable
set RoomOccupancyTable.people = RoomOccupancyTable.people +
arrival - exit
on RoomOccupancyTable.roomNo == roomNumber;

As similar to above, we could also join with an event table (which mirrors an external data source such as RDBMS) for the incoming events and make decisions.

define table RoomTypeTable (roomNo int, type string);
define stream TempStream (deviceID long, roomNo int, temp double);
from TempStream join RoomTypeTable
on RoomTypeTable.roomNo == TempStream.roomNo
select deviceID, RoomTypeTable.type as roomType, type, temp
having roomType == 'server-room'
insert into ServerRoomTempStream;

Please refer below references for more details on this.

Query API for RDBMS

This function performs SQL retrieval queries on WSO2 data sources which are backed by RDBMS; then, it allows you to query underneath persistence store for the incoming events which arrive in real-time, get the results and performs necessary joins/decision accordingly.

Sample Query

from TriggerStream#rdbms:query('SAMPLE_DB',
'select * from Transactions_Table', 'creditcardno string,
country string, transaction string, amount int')
select creditcardno, country, transaction, amount
insert into recordStream;

Other than above, you can also perform insertion, deletion and update through CUD processor support provided by Siddhi.

Support for Incremental Aggregation

Data summarization is one of the common features supported by a stream processing engine. It is divided into two segments; they are summarization over the short time and summarization over a long time.

Summarization over short time happens using the constructs such as windows and aggregators. You can refer below links for better understanding on that.

Incremental Aggregation Approach in WSO2 Stream Processor

But, the interesting fact about WSO2 Siddhi is the summarization over a long time duration. It has a very unique and powerful feature incremental aggregation to handle summarization over a long time duration. Incremental aggregation is achieved in WSO2 Stream Processor due to its deep alignment with the Lambda architecture. Incremental aggregation allows you to obtain aggregates in an incremental manner for a specified set of time periods. This not only allows you to calculate aggregations with varied time granularity, but also allows you to access them in an interactive manner for reports, dashboards, and for further processing. Its schema is defined via the aggregation definition.

Support for On-demand Stream Processing

Usually, Stream Processors are processing events in an asynchronous manner. It receives events from various event sources, processes those events based on the predefined queries which are configured and then alert/notify the output events to the respective parties. In this whole process, event source simply push events to the Stream Processor or Stream Processor pull events from event source and they forgot about the event source and continue to process them; simply there is no any request-response or synchronous behavior.

But, in the current context, there are use cases that require stream processing with synchronous nature. For example, adaptive authentication is one such use case which stream processing is required in a synchronous manner.

In WSO2 Stream Processor. there are two flavors of on-demand stream processing. They are;

  • Store Query API : It allows users to query the data which are in event tables or aggregation by passing simple SQL like query.
  • Synchronous HTTP Request-Response : A user can make a request, then the request is converted into an event and processed in synchronous manner and respond back to the user with the result. By this approach, a user can perform a simple filter or even a complex join and etc...

Please refer below blog post authored by Suhothayan on this. It will provide more details on this.

Non Occurrence Pattern Detection

Pattern detection is a most common feature supported by a Complex Event Processing engines. In most of these CEP engines, pattern detection is performed only based on the events that are arrives to the CEP engine; which means patterns detection with the absent data (or events not arrived) is cumbersome; but this is one of the common requirement in pattern detection use cases.

WSO2 Siddhi engine which powers the WSO2 Stream Processor provides extensive support to achieve this. It allows you to write pattern queries to identify a non-occurrence pattern for various use cases. Please refer to documentation link for more details. Above feature contributed to Siddhi by the Google Summer of Code 2017 participant Gobinath. Please refer his blog post on this.

Sample Query

Generate an alert if the temperature does not reduce to 12 degrees within 5 minutes of switching on the regulator.

define stream RegulatorStateChangeStream
(deviceID long, roomNo int, tempSet double, action string);
define stream TempStream (deviceID long, roomNo int, temp double);from e1=RegulatorStateChangeStream[action == 'start']
-> not TempStream[e1.roomNo == roomNo and temp < 12] for '5 min'
select e1.roomNo as roomNo
insert into AlertStream;

Support for Online Machine Learning

https://www.flickr.com/photos/cwkarl/15433742780/

WSO2 Stream Processor comes with inbuilt support for predictive analytics through machine learning. Usually, in machine learning, historical data is collected to build a predictive model in order to predict the future. However, models can be outdated with time if it is not updated with new data that arrives. WSO2 SP addresses this issue by supporting online machine learning algorithms that enable you to evolve your models while they are alive. A key characteristic of an online predictive algorithm is that the memory and time requirements of the predictive analytics process do not grow over time.

WSO2 Stream Processor provide online machine learning (streaming machine learning) support through classification, clustering, and regression. Other than that, online machine learning is also supported through Markov models for anomaly detection.

Other than above mentioned online machine learning approaches, it also supports conventional machine learning approach which uses pre-built and trained ML models. They are,

Support for Change Data Capture

CDC Approach with WSO2 Stream Processor

Change data capture (CDC) become one of the important need in the ETL space. CDC means is a process of capturing the changes that occur in data sources such as RDBMS databases or logs and derive/extract useful relevant data. WO2 Stream Processor provides CDC support from the recent past through a Siddhi extension.

As of now, respective Siddhi extension only tested with few of the RDBMS types (such as Mysql and Oracle). Using the CDC feature, we could capture the changes that occur in the data sources, send those events through a stream, perform event processing with the pre-defined streaming queries and make a decision which helps business operations and etc…

Intelligent Development Studio

View of WSO2 Stream Processor Development Studio

WSO2 Stream Processor comes with an intelligent development studio (editor) which supports for both source view and design view. It provides numerous support to build a streaming query (Siddhi App) in a very short time, test it by simulating events and debug through the queries that implemented. It provides below capabilities.

Introduction Video on WSO2 Stream Processor Development Studio
  • Syntax highlighting & auto completion
  • Deploy and test streaming queries
  • Debugger
  • Event simulation
  • Sample event generation
  • Data explorer for event tables & etc..

You could refer the video in the left to understand the capabilities of the Development Studio in detail.

Support for Citizen Integration

WSO2 Development Studio provides necessary features required for a developer to implement the streaming queries but we have found there is gap/requirement where non-technical users wanted to configure or change existing rules or build some rules using the templates which is already available. Then to cater this requirement WSO2 Stream Processor built the feature called Business Rules Manager for Citizen Integration.

Business Rules Management dashboard is a user-friendly interface that can be used to build business rules (Siddhi applications customized as per business requirements). WSO2 Stream Processor editor runtime comes with a tool which allows the developer to build such templates that the business can use to build his/her own rules.

Refer below posts about Business Rules Manager written by Senthuran who is one of the prominent contributor to WSO2 Stream processor.

Dashboard & Reporting Support

WSO2 Stream Processor comes with feature-rich dashboard and reporting capabilities. It not only allows users to create gadgets and dashboards but also provide necessary features that required for report generation.

Report Generation

The Dashboard Portal of WSO2 Stream Processor allows you to generate reports with graphical illustrations of processed data. It supports below options for report generation.

  • Supported to generate report as CSV and PDF
  • Jasper based report generation

Dashboard and Gadgets

Sample Dashboards and Gadgets in WSO2 Stream Processor

The dashboard which is in WSO2 Stream Processor is based on React JS. There are features to create dashboards and widgets through a wizard-based approach; it provides necessary support to design dashboards and widgets as per our need.It also comes with a fine-grained permission model for dashboards, widgets, and data. It also supports localization and inter widget communication for the better interactive dashboard.

Other than above, it also provides;

  • Shareable dashboards with widget state persistence
  • Dashboard export and import

Status Dashboard

Example View of the Status Dashboard of WSO2 Stream Processor

A built-in dashboard which helps to monitor WSO2 Stream Processor nodes. This involves monitoring whether all processes of the WSO2 SP setup are working in a healthy manner, monitoring the current status of an SP node, and viewing metrics relating to the history of a node or the cluster. Both JVM level metrics or Siddhi application level metrics can be viewed from the monitoring dashboard.

A user can configure monitoring for individual Stream Processor node or to a distributed cluster. Through this dashboard, users/admin can find details on below areas.

  • Memory usage (Both JVM and Siddhi App level)
  • CPU usage (JVM level)
  • System load average
  • Overall throughput (Node and Siddhi App level)
  • Latency (Siddhi App level)

Other than above, it also provides some metrics such as throughput, memory usage and etc.. at the granularity of Siddhi query level as well.

HA Deployment with 100K Throughout

HA Deployment of WSO2 Stream Processor

As mentioned before WSO2 Stream Processor is powered by WSO2 Siddhi CEP engine; it is one of the fastest open source CEP engines in the market. With all these, WSO2 Stream Processor able process around 100,000 events in a second (for simple queries) with just 2 nodes; while other CEP or Steam Processor engines are required more than 5 nodes to achieve that throughput.

The High Availability (HA) deployment is so simple where deployment can be done with a database; it does not require zookeeper, Kafka or any other distributed coordination services. We could also achieve the multi-datacenter requirement with WSO2 Stream Processor HA deployment. The HA deployment is also supported incremental state persistence and recovery as well.

Thanks for reading the post. Hope I have given enough resources to understand the WSO2 Stream Processor and its prominent capabilities. Please tryout WSO2 Stream Processor and let us know your feedback.

Thanks…

Senior Tech Lead, Speaker @ WSO2. Closely works on Stream Processing and Integration