Debs 2011 tutorial on non functional properties of event processing

Non Functional Properties of Event Processing Presenters: Opher Etzion and Tali Yatzkar-Haham Participated in the preparation: Ella Rabinovich and Inna Skarbovsky

Introduction to non functional properties of event processing

The variety There are variety of cheesecakes There are many systems that conceptually look like EPN, but they are different in non functional properties

Two examples Very large network management: Millions of events every minute; Very few are significant, same event is repeated. Time windows are very short. Patient monitoring according to medical Treatment protocol : Sporadic events, but each is meaningful, time windows can span for weeks. Both of them can be implemented by event Processing – but very differently.

Agenda Introduction to Non functional properties of event processing Performance and scalability considerations Availability considerations Usability considerations Security and privacy considerations Summary I II III IV V VI

Performance and Scalability Considerations

Performance benchmarks There is a large variance among applications, thus a collection of benchmarks should be devised, and each application should be classified to a benchmark Some classification criteria: Application complexity Filtering rate Required Performance metrics

Performance benchmarks – cont. Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. Mendes M., Bizarro P., Marques P. Benchmarking event processing systems: current state and future directions. WOSP/SIPEW 2010: 259-260 . Previous studies ‎indicate that there is a major performance degradation as application complexity increases.

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Some benchmarks scenarios Adi A., Etzion O. Amit - the situation manager. The VLDB Journal – The International Journal on Very Large Databases. Volume 13 Issue 2, 2004. 100000 100000 100000 100000 total external events 16503 7903 scenario 3 124319 1742 1372 accumulated latency (ms) 1923 57470 72887 throughput (event/s) scenario 4 scenario 2 scenario 1

Performance indicators One of the sources of variety Observations: The same system provides extremely different behavior based on type of functions employed Different application may require different metrics

Throughput Input throughput output throughput Processing throughput Measures: number of input events that the system can digest within a given time interval Measures: Total processing times / # of event processed within a given time interval Measures: # of events that were emitted to consumers within a given time interval

Latency latency In the E2E level it is defined as the elapsed time FROM the time-point when the producer emits an input event TO the time-point when the consumer receives an output event The latency definition But – input event may not result in output event: It may be filtered out, participate in a pattern but does not result in pattern detection, or participates in deferred operation (e.g. aggregation) Similar definitions for the EPA level, or path level

Latency definition – two variations: Producer 1 Producer 2 Producer 3 EPA Detecting Sequence (E1,E2,E3) within Sliding window of 1 hour E1 E2 E3 Consumer 11:00 12:00 11:10 11:15 11:30 E1 E2 E3 11:40 E2 Variation I: We measure the latency of E3 only Variation II: We measure the Latency of each event; for events that don’t create derived events directly, we measure the time until the system finishes processing them

Performance goals and metrics ,[object Object],[object Object],Max throughput All/ 80% have max/avg latency < δ All/ 90% of time units have throughput > Ω minmax latency minavg latency latency leveling

Optimization tools Blackbox optimizations: Distribution Parallelism Scheduling Load balancing Load shedding Whitebox optimizations: Implementation selection Implementation optimization Pattern rewriting

Scalability Scalability is the ability of a system to handle growing amounts of work in a graceful manner, or its ability to be enlarged effortlessly and transparently to accommodate this growth Scale up Vertical scalability Adding resources within the same logical unit to increase capacity Scale out Horizontal scalability Adding additional logical units to increase processing power

Vertical Scalability- Scaling up ,[object Object],Qualifications of application designed for scale-up ,[object Object],[object Object],[object Object],Adding resources to a single logical unit to increase it’s processing abilities ,[object Object],[object Object]

Horizontal Scalability - Scaling out Qualifications of application designed for scale-out For stateful applications ,[object Object],[object Object],[object Object],[object Object],Different patterns associated ,[object Object],[object Object],Adding multiple logical units and making them work as a single unit ,[object Object],[object Object],[object Object],[object Object]

Scale-out and scale-up tradeoffs Scale up Scale out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

General approach to scalability Usually applications combine the two approaches… Scaling out by… ,[object Object],[object Object],[object Object],[object Object],Scaling up by… ,[object Object],[object Object]

Scalability in event processing: various dimensions # of producers # of input events # of EPA types # of concurrent runtime instances # of concurrent runtime contexts Internal state size # of consumers # of derived events Processing complexity # of geographical Locations # of geographical Locations

Event-processing techniques for scalability Load shedding Load partitioning according to EPAs topology and Runtime Contexts

Scalability in event volume ,[object Object],Scale out techniques ,[object Object],[object Object],Scale up techniques ,[object Object],Applicable scale-up and scale-out techniques ,[object Object],Scale out techniques Some applications requiring high event throughput financial weather phone-call tracking

Scalability in quantity of event processing agents ,[object Object],[object Object],Applicable scale-up and scale-out techniques ,[object Object],Optimization in agent assignment (mapping between logical and physical artifacts) ,[object Object]

Scalability in quantity of event processing agents – partitioning and parallelism ,[object Object],[object Object],[object Object],[object Object],Parallelism/Distribution Partitioning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Scalability in a number of producers/consumers Growth in a number of producers usually results in growth in event load even if number of events produced by each one is small Growth in a number of consumers Requires optimization at routing level, such as multicasting

Scalability in a number of context partitions and context-state size Hash (customer id) Nodes events Each context partition is represented by internal state of a certain size ,[object Object],Growth in a number of context partitions ,[object Object],Significant growth of internal state for a single context partition ,[object Object]

Availability Availability is ratio of time the system is perceived as functioning by its users to the time it is required or expected to function Can be expressed as ,[object Object],[object Object],[object Object]

Availability expectations and solutions Continuous availability provides the ability to keep the business application running without any noticeable downtime Major outages… ,[object Object],[object Object],[object Object],[object Object],Continuous operation is the ability to avoid planned outages Minor outages… ,[object Object],[object Object],[object Object],[object Object],[object Object]

Components of high availability Fault avoidance – redundancy and duplication ,[object Object],[object Object],[object Object],[object Object],Fault tolerance -recoverability ,[object Object]

Redundancy and duplication Redundancy ,[object Object],Scale out techniques ,[object Object],Failover – automatic reconfiguration Load balancing is one of the players ,[object Object],[object Object],Duplication ,[object Object],[object Object]

Recoverability in stateful applications – state management tradeoffs Data grid – replication of state between multiple machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Memory based state Better performance than pure db ,[object Object],In-memory db with caching capabilities ,[object Object],[object Object]

High availability costs Implementing some of HA practices can be very expensive… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Availability in event processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Using the general availability techniques…

Cost-effectiveness of recoverability techniques in EP Have to consider if implementing recoverability is cost-effective? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Usability 101 Definition by Jakob Nilsen * * http :// www . useit . com / alertbox / 20030825 . html Learnability: How easy it is for Users to accomplish basic tasks the first time they encounter the system? Efficiency: Once users have Learned the system, How quickly can they perform tasks? Memorability: When users return after period of not using the system, How easily can they reestablish proficiency ? Errors: How many errors do users make, how severe are these errors, and how easily they can recover from the errors? Satisfaction: How pleasant is it to use the system? Utility: Does the system do what the user intended?

In this part of the tutorial we’ll talk about Build time IDE Runtime control and audit tools Correctness – internal Consistency Debug and validation Consistency with the environment - Transactional behavior

Build time interfaces Text based programming languages Visual languages Form based languages Natural languages interfaces

Another Text-based IDE (Apama)

Visual language – StreamSQL EventFlow (Streambase)

Visual language – StreamSQL EventFlow (Streambase) – cont.

Form based language – Websphere Business Events (IBM) ,[object Object],[object Object]

[object Object],[object Object],Natural language for event processing Based on work done by Mark H Linehan (IBM T.J.Watson Research Center) free text Frequent big cash deposit pattern is defined as “at least 4 big cash deposits to the same account”, where big deposit decision depends on customer’s profile. structured English A derived event that is derived from a big cash deposit using the frequent deposits in same account applying threshold the count of the participant event set of frequent big cash deposits is greater than or equal to 4.

Run time tools Performance monitoring Dashboards Audit and provenance Two types of run time tools: Monitoring the application Monitoring the event processing systems

Performance Monitoring (Aleri/Sybase)

Dashboard Construction (Apama)

Provenance and audit Tracking all consequences of an event Tracking the reasons that something happens Within the event processing system: Derivation of events, routing of events, Actions triggered by the events

Example: Pharmaceutical pedigree

Validation and debugging Debugger Testing and simulation Validation

Breakpoints and Debugging (StreamBase)

Testing & simulation – IBM WBE

[object Object],[object Object],[object Object],[object Object],Application validation

Validation techniques Static Analysis ,[object Object],[object Object],Dynamic Analysis ,[object Object],[object Object],[object Object],Build-time Development phase Run-time Development and production phases Analysis with Formal Methods ,[object Object],Build-time Development phase

[object Object],[object Object],[object Object],[object Object],Static analysis

[object Object],[object Object],[object Object],[object Object],Dynamic Analysis Runtime Scenario Dynamic Analysis Component EP Application Definition History Data Store Observations for dynamic analysis EP system invocation on runtime scenario Results analysis for correctness and coverage Analysis results

[object Object],[object Object],[object Object],Advanced verification with formal methods ,[object Object],[object Object],[object Object],[object Object]

Correctness The ability of a developer to create correct implementation for all cases (including the boundaries) Observation: A substantial amount of effort is invested today in many of the tools to workaround the inability of the language to easily create correct solutions

Some correctness topics The right interpretation of language constructs The right order of events The right classification of events to windows

The right interpretation of language constructs – example All (E1, E2) – what do we mean? A customer both sells and buys the same security in value of more than $1M within a single day Deal fulfillment: Package arrival and payment arrival 6/3 10:00 7/3 11:00 8/3 11:00 8/3 14:00

Fine tuning of the semantics (I) When should the derived event be emitted? When the Pattern is matched ? At the window end?

Fine tuning of the semantics (II) How many instances of derived events should be emitted? Only once? Every time there is a match ?

Fine tuning of the semantics (III) What happens if the same event happens several times? Only one – first, last, higher/lower value on some predicate? All of them participate in a match?

Fine tuning of the semantics (IV) Can we consume or reuse events that participate in a match?

Fine tuning of semantics – conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],In other cases – explicit programming and workarounds are used if semantics intended is different than the default semantics

The right order of events - scenario ,[object Object],[object Object],[object Object],===Input Bids=== Bid Start 12:55:00 credit bid id=2,occurrence time=12:55:32,price=4 cash bid id=29,occurrence time=12:55:33,price=4 cash bid id=33,occurrence time=12:55:34,price=3 credit bid id=66,occurrence time=12:55:36,price=4 credit bid id=56,occurrence time=12:55:59,price=5 Bid End 12:56:00 ===Winning Bid=== cash bid id=29,occurrence time=12:55:33,price=4 Trace: Race conditions: Between events; Between events and Window start/end

Ordering in a distributed environment - possible issues Even if the occurrence time of an event is accurate, it might arrive after some processing has already been done If we used occurrence time of an event as reported by the sources it might not be accurate, due to clock accuracy in the source Most systems order event by detection time – but events may switch their order on the way

Clock accuracy in the source Clock synchronization Time server, example: http://tf.nist.gov/service/its.htm

Buffering technique ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sorted Buffer (by occurrence time) To t > To +  Producers Event Processing

Retrospective compensation ,[object Object],[object Object],[object Object]

Classification to windows - scenario Calculate Statistics for each Player (aggregate per quarter) Calculate Statistics for each Team (aggregate per quarter) Window classification: Player statistics are calculated at the end of each quarter Team statistics are calculated at the end of each quarter based on the players events arrived within the same quarter All instances of player statistics that occur within a quarter window must be classified to the same window, even if they are derived after the window termination.

Transactional Behavior ,[object Object],[object Object],Nothing gets out of the system until the transaction is committed ,[object Object],[object Object]

Transactional behavior in event processing? Typically, event processing systems have decoupled architecture, and does not exhibit transactional behavior However, in several cases event processing is embedded within a transactional environment

CASE I: Transactional ECA at the consumer side When a derived event is emitted to a consumer, there is an ECA rule, with several actions, that is required to run as atomic unit. If failed, the Derived event should be withdrawn

CASE II: An event processing system monitors transactional system In this case, the producer may emit events that are not confirmed and may be rolled back.

Case III: Event processing is part of a chain There is some transactional relationship between the producer and consumer The event processing system should transfer rollback notice from the consumer to the producer ,[object Object],[object Object]

Case IV: A path in the event processing network should act as “unit of work” Example: the “determine winner” fails, and the bid is cancelled, all bid events are not kept in the event stores, and are withdrawn for other processing purposes

Transactions in event processing systems ,[object Object],[object Object],All (E1, E2) - E2 arrived 5 days after E1 - The processing of the pattern failed – What do we mean? Withdraw only E2? Withdraw also E1 after 5 days?

Security and Privacy Considerations

Security, privacy and trust Security requirements ensure that operations are only performed by authorized parties, and that privacy considerations are met. Based on Enhancing the Development Life Cycle to Produce Secure Software [DHS/DACS 08] Characteristics of secure application: Containing no malicious logic that causes it to behave in a malicious manner. Trustworthiness Recovering as quickly as possible with as little damage as possible from attacks. Survivability Executing predictably and operating correctly under all conditions, including hostile conditions. Dependability

Towards security assurance Identify and categorize the information the software is going to contain Low sensitivity – The impact of security violation is minimal High sensitivity – Violation may pose a threat to human life Develop security requirements ,[object Object],[object Object],[object Object],[object Object]

Security in event processing systems ,[object Object],[object Object],[object Object],[object Object],authorized authorized

Security in event processing systems – cont. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Security patterns in event processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Summary Non Functional properties determine the nature of event processing applications – distribution, availability, optimization, correctness and security are some of the dimensions There are often the main decision factor in selecting whether to use an event processing system, and in the selection among various alternatives.

Debs 2011 tutorial on non functional properties of event processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Debs 2011 tutorial on non functional properties of event processing

Similar to Debs 2011 tutorial on non functional properties of event processing (20)

More from Opher Etzion

More from Opher Etzion (20)

Recently uploaded

Recently uploaded (20)

Debs 2011 tutorial on non functional properties of event processing

Editor's Notes