US9459929B2 - Configurable dynamic load shedding method in distributed stream computing system - Google Patents
Configurable dynamic load shedding method in distributed stream computing system Download PDFInfo
- Publication number
- US9459929B2 US9459929B2 US13/962,971 US201313962971A US9459929B2 US 9459929 B2 US9459929 B2 US 9459929B2 US 201313962971 A US201313962971 A US 201313962971A US 9459929 B2 US9459929 B2 US 9459929B2
- Authority
- US
- United States
- Prior art keywords
- events
- drop ratio
- load shedding
- load
- applications
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 239000000872 buffer Substances 0.000 claims description 14
- 230000015654 memory Effects 0.000 abstract description 4
- FBOUIAKEJMZPQG-AWNIVKPZSA-N (1E)-1-(2,4-dichlorophenyl)-4,4-dimethyl-2-(1,2,4-triazol-1-yl)pent-1-en-3-ol Chemical compound C1=NC=NN1/C(C(O)C(C)(C)C)=C/C1=CC=C(Cl)C=C1Cl FBOUIAKEJMZPQG-AWNIVKPZSA-N 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Definitions
- the present invention relates generally to information systems used in distributed stream computing. Particularly, the present invention relates to overload management in distributed stream computing systems. Still more specifically, the present invention relates to load shredding methods in distributed stream computing systems.
- Stream computing is about producing a continuous stream of fresh computational results as new data or events are being input in real-time.
- Resource provisioning and allocation are particularly difficult due to the time-varying and sporadic nature of the occurrence of new data or events that induces unknown resource demands over time.
- the system Under overload condition in which the arriving rate of new data or event exceeds the capacity of the system, the system lacks the resources to process the new incoming data or events within a tolerable time period. Consequently the processing latency grows uncontrollably, compromising the freshness of the stream of computational results.
- Computing architectures and techniques have been developed to address the abovementioned problem.
- One such architecture is to divide computational resources into physical or logical units (nodes) for processing the input data or events, and distribute the incoming input data or events to the nodes according to a distribution scheme.
- the distribution scheme can be as simple as round robin or as complex as intelligent distribution based on constantly monitored load levels of the nodes.
- the advantage of such architecture is that computational processing can be distributed and performed in parallel, and physical/logical units of computational resources can be added or removed according to the actual runtime load levels, thus achieving scalability.
- Load shedding is a computing technique that discards some fraction of unprocessed input data or events in order to reduce the system load, in turn reduces the observable latency of the stream of computational results.
- One issue with load shedding is how to most efficiently discard unprocessed input data or events and yet ensuring the deviations from the perfect computing results are minimized.
- One of the load shedding strategies is to eliminate the incoming input data or events once the system resource capacity is reach; for example, when a buffer for holding input data or events to be processed is full.
- this strategy treats all input data or events in-discriminatively and does not account for the difference in importance of the input data or events. This leads to unpredictable or poor accuracy in the computational results.
- the rate of data or event elimination cannot be adjusted for varying input data or event arriving rate and observable processing latency during runtime.
- Another load shedding strategy is to continuously monitor the actual processing latency and/or resource (such as CPU and memories) utilization, compare with a pre-determined optimal processing latency and/or resource utilization rate, and discard randomly selected unprocessed input data or events based on the differences between the actual and optimal processing latencies and/or resource utilization rates.
- This strategy is described in the document: Kalyvianaki et al., Overload Management in Data Stream Processing Systems with Latency Guarantees , Sweden, 2012; the content of which is incorporated herein by reference in its entirety. This strategy, however, suffers the same problem of unpredictable or poor accuracy in the computing results.
- Some other load shedding strategies require the system to have active knowledge of the usage of the input data.
- the usage can be in the form of data queries of the input data specified by a user.
- the decisions of when and what to discard rely on the analysis of these queries in order to determine the different levels of importance of the input data. Runtime control of the discard decisions can be achieved by specially designed queries.
- the U.S. Patent Application Publication No. 2012/027,843 discloses a method of controlling load shedding for excluding data streams of a data process input into a data stream management system.
- Another example of one such load shedding strategy applies XML query processing on input data and makes discard decisions based on patterns of XML data structures.
- the details of this example is disclosed in the document: Wei et al., Utility - driven Load Shedding for XML Stream Processing , Worcester Polytechnic Institute, U.S.A., 2003; the content of which is incorporated herein by reference in its entirety.
- the downside of these load shedding strategies is that they are not flexible, and highly application and data specific.
- the load shedding method first observes the workload of each application and arriving rate of the incoming input data or events. If the system is under an overloading condition, calculate a input data or event drop ratio for each application such that the projected sum of all applications' workload will be at or below the system capacity when the unprocessed input data or events are dropped according to the drop ratio for each application.
- FIG. 1 shows, in a 2-dimensional space, the system capacity line of an exemplary distributed stream computing system having two applications, a current system load under an overloading condition, and three target projection points as aids in illustrating the presently claimed loading shedding method;
- FIG. 2 further shows the current system load being moved towards the target projection point in incremental steps
- FIG. 3 further shows a revised target projection point for system stability with guaranteed probability of buffer overflow control.
- load shedding methods and systems used in distributed stream computing systems and the likes are set forth as preferred examples. It will be apparent to those skilled in the art that modifications, including additions and/or substitutions may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.
- the load shedding method approaches the problem of how much and which input data or events to drop by first defining an architecture of distributed stream computing system where a plurality of applications are deployed in one or more physical computing processing units with each including all necessary computing resources such as CPUs and memories, virtual partitions of computing processing units, or logical computing processing units (collectively referred to as “nodes”).
- Each node is running one or more instances of the applications.
- An application running in one or more nodes is denoted by App i .
- the application App i requires a certain amount of computing resources, denoted by C i , of the nodes to process an incoming input data or event.
- the arriving rate of the incoming input data or events to be processed by App i is the number of incoming input data or events which arrive in a unit of time, denoted by i .
- the required workload of the application App i for processing the input data or event in runtime is then ( 1 *C i ).
- the actual processing rate of the input data or events being processed by App i is the number of input data or events processed in a unit of time, denoted by x i .
- the computing capacity of a node is denoted by M j . Therefore, a foreseeable overloading condition can be defined as Sum i ( i *C i )>Sum j (M j ). In other words, when the sum of required workload of all applications exceeds the sum of all nodes' computing capacities, an overloading condition occurs.
- P(x 1 , x 2 , x 3 , . . . x N ) be a point in the multi-dimensional space and represents the current system load with all the applications running.
- P(x 1 , x 2 , x 3 , . . . x N ) is located on the system capacity line, the sum of all applications' actual workloads equals to the sum of all nodes' computing capacities.
- the load shedding module is to drop certain input data or events, and by doing so bring the system load to a target projection point on or below the system capacity line.
- FIG. 1 shows, in a 2-dimensional space, the system capacity line of an exemplary distributed stream computing system having two applications: App 1 and App 2 experiencing an overloading condition.
- a current system load, P 0 is located above the system capacity line; three target projection points of system load, P′ 1 , P′ 2 , and P′ 3 are identified.
- P′ 1 is achieved by dropping input data or events to be processed by App 1
- P′ 3 is achieved by dropping input data or events to be processed by App 2
- P′ 2 is achieved by dropping input data or events to be processed by App 1 and App 2 .
- P′ 2 is the optimal target projection point as the least number of input data or events will be dropped for each of App 1 and App 2 , hence least impacting the computational result accuracy of both applications.
- P′ 2 ( x 1 [2 ],x 2 [2]) ( x 1 [0 ] ⁇ C 1 *( C 1 *x 1 [0 ]+C 2 *x 2 [0] ⁇ Sum j ( M j ))/( C 1 ⁇ 2 +C 2 ⁇ 2)
- the applications' process rates, x 1 [0] and x 2 [0], at the current system load, P 0 are the input data or event arriving rates, 1 and 2 respectively.
- incoming input data or events are dropped incrementally according to an increasing load shedding percentage calculated for each application.
- the calculation takes into consideration the available buffer for each application to hold unprocessed incoming input data or events. Referring to FIG. 2 .
- the incremental drop ratio delta is: ( i ⁇ x′ i )/(n i * i ); where n i is a number proportional to the available buffer in App i for holding unprocessed incoming input data or events.
- the incremental drop ratio delta is then modified to be: ( i ⁇ x′ i )/(n i * i *s i ).
- the relative importance coefficient can be pre-configured, dynamically adjusted, and updated in runtime based on conditions of the applications and the distributed stream computing system. For example, to increase the computational result accuracy of an application, its corresponding relative importance coefficient value can be made larger.
- the arriving rates ( 1 , 2 , 3 , . . . N ) take the mean values with a standard deviation of r.
- the current system load, P becomes the center point of a shape in the multi-dimensional space having a volume proportional to r. Inside this shape are all the probable current system load values.
- the target projection point of system load, P′ must be set somewhere below the system capacity line to ensure system stability with guaranteed probability of buffer overflow control. For example, if P′ is set at a distance of 1 ⁇ r below the system capacity line, there is 68% confidence that the buffers will not overflow; 2 ⁇ r for 95% confidence; and 3 ⁇ r for 99.7% confidence.
- the current system load, P 0 is the center of a circle having a radius of r.
- the circle area contains all the probable current system load values.
- the target projection point of system load, P′ 2 is set at 3 ⁇ r below the system capacity line.
- a load shedding module implementing the method of the present claimed invention monitors the processing latencies of the nodes and if any one node is exhibiting an observed latency that is greater than a pre-defined user acceptable latency value, the load shedding module computes the target projection point of system load, a drop ratio, and an incremental drop ratio delta for each of the applications running in the distributed stream computing system.
- the target projection point can optionally be revised according to guaranteed probability of buffer overflow control requirements and a revised drop ratio and incremental drop ratio delta for each of the applications are determined.
- Each application drops its unprocessed input data or events by a load shedding percentage that is equal to its corresponding incremental drop ratio delta initially and increments by the same delta for each cycle until the observed average latency at each nodes is not greater than the pre-defined user acceptable latency value.
- the embodiments disclosed herein may be implemented using general purpose or specialized computing devices, computer processors, or electronic circuitries including but not limited to digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA), and other programmable logic devices configured or programmed according to the teachings of the present disclosure.
- DSP digital signal processors
- ASIC application specific integrated circuits
- FPGA field programmable gate arrays
- Computer instructions or software codes running in the general purpose or specialized computing devices, computer processors, or programmable logic devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.
- the present invention includes computer storage media having computer instructions or software codes stored therein which can be used to program computers or microprocessors to perform any of the processes of the present invention.
- the storage media can include, but are not limited to, floppy disks, optical discs, Blu-ray Disc, DVD, CD-ROMs, and magneto-optical disks, ROMs, RAMs, flash memory devices, or any type of media or devices suitable for storing instructions, codes, and/or data.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multi Processors (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
Abstract
Description
System capacity line=C 1 *x 1 +C 2 *x 2 −Sum j(M j)=0
For P 0(x 1[0],x 2[0]),
P′ 2(x 1[2],x 2[2])=(x 1[0]−C 1*(C 1 *x 1[0]+C 2 *x 2[0]−Sumj(M j))/(C 1^2+C 2^2),
X 2[0]−C 2*(C 1 *x 1[0]+C 2 *x 2[0]−Sum j(M j))/(C 1^2+C 2^2)
x′ i =x i −C i*(Sumi(C i *x i)−Sumj(M j))/Sumi(C i^2).
p i=(x i −x′ i)/x i or ( i −x′ i)/ i for x i= i.
delta1=( 1 −x′ 1)/(n 1* 1) and delta2=( 2 −x′ 2)/(n 2* 2) respectively.
Sumi(x i *s i *C i)−Sumj(M j)=0, where s i is the relative importance coefficient of Appi.
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/962,971 US9459929B2 (en) | 2013-08-09 | 2013-08-09 | Configurable dynamic load shedding method in distributed stream computing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/962,971 US9459929B2 (en) | 2013-08-09 | 2013-08-09 | Configurable dynamic load shedding method in distributed stream computing system |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150046506A1 US20150046506A1 (en) | 2015-02-12 |
US9459929B2 true US9459929B2 (en) | 2016-10-04 |
Family
ID=52449556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/962,971 Expired - Fee Related US9459929B2 (en) | 2013-08-09 | 2013-08-09 | Configurable dynamic load shedding method in distributed stream computing system |
Country Status (1)
Country | Link |
---|---|
US (1) | US9459929B2 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102196492B1 (en) * | 2014-07-04 | 2020-12-30 | 삼성전자주식회사 | Method and apparatus for transmitting and receiving data in a communication system |
CN108052375B (en) * | 2017-12-29 | 2021-06-29 | 哈尔滨工业大学 | A host overload detection method |
US10649688B1 (en) | 2018-11-01 | 2020-05-12 | Intel Corporation | Precise longitudinal monitoring of memory operations |
US11729119B2 (en) * | 2021-11-18 | 2023-08-15 | Cisco Technology, Inc. | Dynamic queue management of network traffic |
US20230418676A1 (en) * | 2022-06-27 | 2023-12-28 | Uber Technologies, Inc. | Priority-based load shedding for computing systems |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6826620B1 (en) * | 1998-08-26 | 2004-11-30 | Paradyne Corporation | Network congestion control system and method |
US20050052992A1 (en) * | 2003-08-01 | 2005-03-10 | Cloonan Thomas J. | Method and system for dynamically managing cable data bandwidth based on channel congestion state and subscriber usage profile |
US20050223089A1 (en) * | 2004-04-05 | 2005-10-06 | Lee Rhodes | Network usage analysis system and method for detecting network congestion |
US6990529B2 (en) * | 2000-02-24 | 2006-01-24 | Zarlink Semiconductor V.N., Inc. | Unified algorithm for frame scheduling and buffer management in differentiated services networks |
US20120144064A1 (en) * | 2010-11-05 | 2012-06-07 | Cray Inc. | Progressive adaptive routing in a dragonfly processor interconnect network |
US20120278463A1 (en) | 2011-04-28 | 2012-11-01 | Seung-Woo Ryu | Method and apparatus for controlling load shedding in data stream management system |
US20130007223A1 (en) * | 2006-06-09 | 2013-01-03 | Qualcomm Incorporated | Enhanced block-request streaming system for handling low-latency streaming |
US20130031282A1 (en) | 2006-06-13 | 2013-01-31 | International Business Machines Corporation | Dynamic stabilization for a stream processing system |
US20130046901A1 (en) | 2011-08-18 | 2013-02-21 | International Business Machine Corporation | System and method for stream processing |
-
2013
- 2013-08-09 US US13/962,971 patent/US9459929B2/en not_active Expired - Fee Related
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6826620B1 (en) * | 1998-08-26 | 2004-11-30 | Paradyne Corporation | Network congestion control system and method |
US6990529B2 (en) * | 2000-02-24 | 2006-01-24 | Zarlink Semiconductor V.N., Inc. | Unified algorithm for frame scheduling and buffer management in differentiated services networks |
US20050052992A1 (en) * | 2003-08-01 | 2005-03-10 | Cloonan Thomas J. | Method and system for dynamically managing cable data bandwidth based on channel congestion state and subscriber usage profile |
US20050223089A1 (en) * | 2004-04-05 | 2005-10-06 | Lee Rhodes | Network usage analysis system and method for detecting network congestion |
US20130007223A1 (en) * | 2006-06-09 | 2013-01-03 | Qualcomm Incorporated | Enhanced block-request streaming system for handling low-latency streaming |
US20130031282A1 (en) | 2006-06-13 | 2013-01-31 | International Business Machines Corporation | Dynamic stabilization for a stream processing system |
US20120144064A1 (en) * | 2010-11-05 | 2012-06-07 | Cray Inc. | Progressive adaptive routing in a dragonfly processor interconnect network |
US20120278463A1 (en) | 2011-04-28 | 2012-11-01 | Seung-Woo Ryu | Method and apparatus for controlling load shedding in data stream management system |
US20130046901A1 (en) | 2011-08-18 | 2013-02-21 | International Business Machine Corporation | System and method for stream processing |
Non-Patent Citations (5)
Title |
---|
Kalyvianaki et al., Overload Management in Data Stream Processing Systems with Latency Guarantees, Dept. of Computing, Imperial College, London, UK. |
Marz, Storm Distributed and Fault-tolerant Realtime Computation, Twitter. |
Neumeyer et al., S4: Distributed Stream Computing Platform, Yahoo! Labs, Santa Clara, CA, U.S.A. |
Tatbul et al., Load Shedding in a Data Stream Manager, Proceedings of the 29th VLDB Conference, 2003, Berlin, Germany. |
Wei et al., Utility-driven Load Shedding for XML Stream Processing, Dept. of Computer Science, Worcester Polytechnic Institute, U.S.A. |
Also Published As
Publication number | Publication date |
---|---|
US20150046506A1 (en) | 2015-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109194584B (en) | Flow monitoring method and device, computer equipment and storage medium | |
US9459929B2 (en) | Configurable dynamic load shedding method in distributed stream computing system | |
CN107317864B (en) | A data equalization method and device for a storage device | |
CN111614746B (en) | Load balancing method and device of cloud host cluster and server | |
US8863140B2 (en) | Method for resource management allocating and freeing credits from and to a resource credit tree | |
US8479205B2 (en) | Schedule control program and schedule control method | |
US20080109814A1 (en) | Apparatus and method for balancing load in multi-core processor system | |
US10445344B2 (en) | Load balancing for large in-memory databases | |
CN109525500B (en) | Information processing method and information processing device capable of automatically adjusting threshold | |
US20150286492A1 (en) | Optimized resource allocation and management in a virtualized computing environment | |
US20110167427A1 (en) | Computing system, method and computer-readable medium preventing starvation | |
US20190253357A1 (en) | Load balancing based on packet processing loads | |
US9684366B2 (en) | Distributed power management system with plurality of power management controllers controlling zone and component power caps of respective zones by determining priority of other zones | |
CN110018781B (en) | Disk flow control method and device and electronic equipment | |
CN111078391A (en) | Service request processing method, device and equipment | |
CN107948084B (en) | Current limiting method and device | |
CN111245732A (en) | Flow control method, device and equipment | |
US20050155032A1 (en) | Dynamic load balancing | |
CN105677484A (en) | A multi-core CPU real-time data processing method with automatic load balancing | |
CN105740076B (en) | A kind of load-balancing method and device | |
CN112805684A (en) | Resource allocation using recovery borrowing | |
CN108200185B (en) | Method and device for realizing load balance | |
CN102622274A (en) | Computer device and interrupt task allocation method thereof | |
WO2018228323A1 (en) | Service level control method and system for on-line service system, and readable storage medium | |
WO2016065198A1 (en) | High performance hadoop with new generation instances |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HONG KONG APPLIED SCIENCE AND TECHNOLOGY RESEARCH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PARK, JI HYOUN;WU, KANGHENG;LEI, ZHI BIN;REEL/FRAME:030974/0430 Effective date: 20130722 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20241004 |