Quality of Service (QoS) and Distributed System Management

Description of the research project

Objectives

This project is concerned with distributed systems management in the context of electronic commerce applications, especially considering quality of service (QoS) issues and distributed query processing. The objectives are the following:

Background and rationale

In the context of electronic commerce, standardization of the equipment and software packages is not possible. Furthermore, the Internet is evolving as a collection of heterogeneous networks with different capabilities. Therefore the infrastructure has to be able to deal with subsystems of different capabilities. This is particularly true for the quality of service (QoS) of communication between the different end-systems involved in the application. It is expected that the Internet will, in the future, provide different classes of service quality, as opposed to the present uniform “best effort” service. This will probably be at a cost, and possibly involve the reservation of some appropriate system resources. It must also be noted that certain network access technologies, such as telephone lines with modems, impose restrictions on the available throughput. In certain situations, particularly for users accessing the system through mobile equipment, the available QoS may change over time, depending on the location of the user and other factors influencing the communication link.
It is difficult to provide a guaranteed quality by the network and by the host computers on which the distributed applications run, because (a) it is difficult to foresee the demand on the resources in the context where these resources are shared amount a very large number of users, and (b) because the requirements of each user may change over time. For example, if a video on demand client program is running in a window that gets minimized then this implies that this is of no interest to the user and hence the required has changed.  Another example occurs when a user requests access to information sources that were not foreseen in advance, thus increasing the bandwidth needs in the middle an on-going session.
One approach to accommodating this situation is to build applications that can adapt to different QoS availability. Some examples of such are described in [1], however, it would be interesting to identify certain more general negotiation schemes which could be used by applications to adapt to such low-quality situations. In our previous project on “Quality of service negotiation and adaptation”, we have developed solutions for applications involving access to remote multimedia databases [2, 3] and for certain forms of broadcasting applications [4]. However, the global system management aspects related to the provisioning of negotiated QoS have not yet been sufficiently. For example, there is a need to understand (i) scalable models for distributed systems management, (ii) the role played by the different components such as database systems, search engines or image servers, or (iii) the influence of QoS performance of the networks on parallel database systems.
A complementary approach to QoS management is to try to adapt the resource management by providing for different classes of users. This corresponds to the notion of "differentialed services" in the Internet and to the frequent practice in commerce to distinguish between different classes of clients. This is the approach that we will pursue within this project.
There are two other issues of general concern in this area. One is the need for systems that can handle a very large number of users. This implies the need for a scalable system design. So far, the question of system scalability has largely been studied (a) in relation with parallel processing computer architectures, where a large number of processing units should be easily integrated, and (b) in relation to database storage systems, where a large number of documents should be easily accessible, often to a large number of concurrent users. The latter case also applies to the electronic commerce application; however, in this project we intend to mainly concentrate on the scalability of the system management functions which are related to QoS negotiation, for instance finding the database server and network connection which can provide the desired QoS.
Another issue is the need for a high-level characterization of the QoS from the user's point of view, and the need to translate this view into the details of the system parameters of the various resources, such as CPU and network bandwidth usage. At the high level, QoS expectations can be expressed in the form of policies. For example, the developer of an application with a video component might identify a QoS such as "normal television" as opposed to specifying the number of CPU cycles and the data throughput required.  Polices allow a user or application developer to specify what their expectations of the QoS are, not how to achieve the expected QoS.   Violations of policies are violations of expected QoS.   Diagnosis algorithms determine the cause.  Adaptation actions include resource allocations, which are dynamically adjusted until the delivered QoS meets policy expectations.
Our previous work in application management [5, 6, 7] concluded most existing approaches to the management of applications is ad-hoc. We believe that the use of policies is the basis for answers to these questions. The importance of policies has been recognized by many researchers, but most attention has been focused on specific issues, such as policy definition and analysis [8], policy classification [9], and policy enforcement [10], which usually refers to a formal definition of how policies are presented as well as an architecture that shows how a policy system can be realized.

Significance to the shared goal of the major project

In the large-scale distributed environment provided for electronic commerce applications, a large number of servers store and manage information (e.g., catalogs) for buying and selling goods and services. In such an environment, it is mandatory to choose between different possibilities for accessing and delivering information over different network interconnections. Thus, the quality dimension of the provided service and information becomes of prime interest to avoid being overloaded with irrelevant information, or inadequate or costly services. Quality of service management should then be considered at the different levels of the application, from a user-perspective to a technical perspective inside the different components. Addressing the issue of QoS management is one of the general objectives of the major project to allow the development of the technological infrastructure to facilitate electronic commerce on the Internet.
This project is particularly related to the project “System and network architecture and application behavior modeling” (98-6-5), which performs detailed studies of performance issues. In this project we are more concerned with the management of performance issues and the scalability of different algorithms at a higher level of abstraction. Collaboration with Wong is assured by joint supervision of a Ph.D. student (M.  Salem).

Research progress

Milestone 1: Definition of several distributed algorithms for load sharing and management between brokers, users and servers:   We have defined an architecture that introduces a brokerage function between clients and servers. Brokers continuously monitor the performance of affiliated servers and assign them to clients according to a previously defined server selection policy. We have defined several server selection policies to optimize the capacity of the system based on server performance characteristics and users requirements. In order to make realistic predictions about the performance of the different servers, we have developed an approach to estimate the server performance as a function of the number of concurrent requests that may share the server at a given time (report in preparation). This estimation is based on a server performance model, which is obtained by monitoring the server and its behavior under various load conditions. We do also monitor the behavior of the clients through the estimation of their think time and the mean response time they are receiving at the different servers. This information is used to estimate the load that each client generates at a server. During the fall of 1999, we worked on an analysis study of the performance of our architecture and its different load-sharing algorithm by simulation. This study is expected to be completed by February 2000, and will be submitted for publication.

Milestone 2: Prototype implementation of one of the above distributed algorithm for load sharing:   We are working on an implementation of the above algorithms in a broker prototype, which will be used to validate the conclusion of our simulations. The selection of a server at the broker is based on performance information obtained by monitoring the servers. A Master student at the University of Ottawa developed a first version of the monitoring agents at the server side, which monitors the performance of an Apache server under the Windows-NT operating system. Work is ongoing to port this system to a Unix platform in order to integrate it into the electronic commerce prototype system developed in collaboration with the other project partners. A first implementation of the broker is expected to be completed by the end of March 2000.

Milestone 3: Definition of a QoS-based cost model for query processing: We have investigated distributed query processing, particularly cost-based query optimization, and we have proposed an approach in [23] which considers QoS (Quality of Service) both from the user's requirements perspective and from the network service availability. We have also proposed an adaptive cost model for distributed query processing in [24]. Our cost model is adaptive in the sense that first, it combines multiple optimization criteria, response time and money cost, into a simple cost model and second, it can give a more precise communication cost estimation according to the information captured by the QoS manager. This cost model is flexible because it can capture the user's willingness to pay for the query and the performance dynamics of the computer system. Accordingly, we can also consider two different optimization criteria: the user’s criteria considering the delivered response time versus the cost of the query (based on existing tariff structures), and the system’s criteria considering overall optimal resource utilization, the satisfaction of the user’s response time requirements and the net income from the usage charges. We also identified two network QoS parameters: end-to-end delay and available bandwidth,  and introduced methods for measuring them.

Milestone 4: Design and prototype implementation of query optimization strategies integrating QoS-based cost models and translating QoS requirements into query optimization criteria: Given the general approach described above, we plan to implement a prototype for QoS based distributed query processing. Based on this proposal, we have defined the data models used for the prototype implementation. Three data models are of interest: the user profile, the global catalog of distributed database schemas and the measured QoS information concerning the network and server load. The user profile is helpful for translating QoS requirements into query optimization criteria; it is also useful for guiding the optimizer to choose the correct cost model. The global catalog and the QoS information are mainly used by the cost models. We have identified the basic functional modules in the prototype and plan to have an implementation ready this spring.

Milestone  5: Design and prototype implementation of algorithm for adjusting CPU priorities in order to maintain several differentiated classes of service on a single server: We specifically examined applying an algorithm for service differentiation to Web servers (since it is the cornerstone of many Web  applications) and implemented a version of our design for the Apache server.   When a user request comes into the Apache server, it gets assigned its own process.  This process registers with the QoS module.  Its reference handle is put into a queue and the process is put to sleep. The QoS module has a scheduling algorithm that wakes up processes based on policies.  For example, assume that we have two classes of service: A and B. Assume that the "A" class is the premium class. One policy is that there can be at most "M" B class processes executing and at most "N" A class processes executing (M < N).  The disadvantage of this approach is that if there are few A class processes then the CPU is not being effectively used.  Another algorithm treats A and B classes equally until there are a certain number of complaints from A processes indicating that they are taking too long to process.  The number of non-sleeping B class processes is reduced.    We have developed and experimented with several scheduling algorithms. The QoS module was designed so that it is relatively easy to change the scheduling algorithm.

Research plan

In the context of large-scale applications such as electronic commerce running over the Internet, the handling of very large numbers of users is essential. We propose therefore to investigate the impact of the overall system architecture and the QoS strategies on scalability. Scalability depends on the distribution of the different system functions and data over the various physical system components and the distributed algorithms used for the processing of the user requests, such as electronic commerce transactions, and for the system management aspects, such as QoS management. The system architecture under consideration is heterogeneous and consists of large number client machines, specific servers such as database servers, video servers, names servers, transport system, and networks. In addition to the servers that contain the virtual catalogs and those that perform the commercial and financial transactions, we pay particular attention to the functions related to management of QoS issues, distributed query processing, and the adaptation of applications to low QoS availability. Some of these functions may be integrated in the electronic commerce applications (client and server sides), while others may be provided by separate agents, such as name servers or traders.
Within this context, we will develop algorithms for QoS management at the application level. Good scalability properties will be our major design goal. Quality of service management is a distributed functionality and can be decomposed into different steps: specification, mapping, negotiation, resource reservation, adaptation and monitoring.
The different components of the distributed system for EC have specific performance constraints that should be taken into account to provide the QoS level desired by the user. Each component can also execute specific tasks during the different steps of QoS management. We will extend our previous work in order to define the role of the different components in making QoS management decisions. We will propose a QoS architecture allowing the different components to process specific tasks of QoS management strategies and to contribute to system scalability.
The scalability of a system can be defined as its capacity to handle the addition of users and resources without suffering a noticeable loss of performance or increase in administrative complexity. Performance is related to the QoS delivered by the system to the user, while administrative complexity is related to the management of QoS and the interactions between the different components in the system.
We identify three subprojects that address these issues, looking at specific questions in the context of electronic commerce applications.

(A) Performance model for the electronic shopping application, including access control and quality of service negotiation.

We consider a collection of servers and a large number of geographically distributed users. We consider the problem of load sharing which is coordinated by brokers which use QoS information obtained from the servers and other brokers through dynamic monitoring. We also consider access control mechanism. We will study scalable distributed algorithms for such systems and do performance modelling of the global system, including the network.
During 1999-2000, we will define several distributed algorithms for load sharing and management operating between brokers, users and servers, and evaluate their performance properties. We will also build a prototype broker that will implement one of load sharing algorithms. This will be built using a performance model that characterizes the performance of a single electronic commerce server and will be used to predict the performance of a server under various access conditions. The system will dynamically monitor the performance status each system component and use this information to balance the load.
During the subsequent years, we plan to perform experiments with the prototype system in order to understand eventual bottlenecks. We will also prototype a few other promising algorithms and do a comparison through simulation and real-world experiments.

 (B) Distributed Query Processing Using QoS Information

Multimedia streams and high volumes of data are an important characteristic of actual distributed information systems such as electronic commerce applications. High performance database servers are then required to support these applications. Specific architectures have been proposed and among them, parallel shared-nothing database servers running over an ATM network seem to be very promising. Such architectures lead to a revision of the traditional assumptions for database query processing since network performance and server load are unpredictable.  Traditional query optimizers aim at achieving one of the two optimization goals: response time or total resource consumption. However, in most distributed multimedia applications, this assumption is not enough, because (i) user's concerns vary in different applications and (ii) the quality of communication is not constant.  For example, the user might care more about the service charge he can afford. Therefore, the optimizer should also take different optimization goals into account. In the case of an ATM cluster of servers, or in the case of geographically distributed servers communicating over the future Internet with differentiated services, optimization could also take into account the negotiated QoS parameters of the network, such as available throughput, loss rate, or communication delay.
In this part of the project we investigate how we could integrate QoS management strategies for parallel query optimization. The problem is to optimize execution time and/or resource consumption for executing a query over an ATM cluster in a shared-nothing parallel environment. We aim at proposing distributed query processing strategies in the presence of QoS information and examining how such strategies can be integrated into the IBM/DB2 Parallel Edition database system.
During 2000-2001, we will finalize our prototype implementation for QoS based query processing based on our proposed cost model. We also plan to do some experiments and simulations to justify its agility in the  face of changing network performance. We will demonstrate how our QoS-based query processing strategy can be implemented using existing database systems. This experimentation is expected to provide us with feedback about the feasibility of our approach and may lead to new ideas for improvements of the proposed algorithms. Such improvements, including the case of multiple copies of data, will be proposed during the last year of the project.

 (C) Service differentiation in the servers

In the context of electronic commerce, different classes of service are often considered, either depending on a classification of users in occasional buyers, regular clients and VIP clients, or depending on a service charge which the user is willing to pay. It is therefore appropriate to provide different classes of services, not only in terms of available options, but also in terms of QoS, especially server response time. In this subproject, we look at the issues related to providing differentiated classes of performance at the server level.
A policy can be used to state the expected waiting time with each class of customer.  We have begun work that dynamically adjusts the CPU allocations given to an application until the quality of service it delivers meets policy expectations.  If the delivered quality of service exceeds expectations by a large margin, the process is given a less generous CPU allocation (CPU allocations are done by adjusting the priority of processes).  During 1999-2000, we will extend this work by examining algorithms for determining how to best adjust CPU priorities and how to take into account other hardware resources. During the subsequent years, we will investigate the inclusion of differentiated classes of response time within the context of load sharing and distributed query processing, as discussed in the other subprojects above.

(D) Research under a proposed IOR project

In the context of a related NSERC IOR grant application, we propose to do related research on policies on QoS management and management information bases (MIBs) for QoS related information in a distributed systems context.
For policy-enforcement, we need to characterize the data that describes the run-time behavior of the managed system. This characterization is done by identifying entities and attributes that describe the run-time behavior of those entities. These are derived from the policies. For many types of policies, especially those representing QoS requirements, the entities and attributes are not yet well understood. To date, the focus has been on issues such as bandwidth, cell delay, cell delay variance, ensuring servers are operational, etc. [19,20]. Very little research has been done on the mapping of application layer policies to network and host resources. Hence, part of our research will conduct experiments to analyze the impact that an application has on host resource usage. Our investigation will include the following:
• Determine a set of measurement variables, including those that measure responsiveness, productivity, utilization and errors/failures. This would provide a possible set of data that characterize application behavior.
• Run a variety of experiments with varying system parameters, workload parameters, and measurable applications that can be used to determine the usefulness of the different measurement variables in determining whether an application-level policy is being satisfied. This work will become the basis for research in adaptive statistical techniques for early policy violation (i.e., early detection of the existence of a fault that causes a policy violation).

We will apply our previous experience with management of distributed applications to develop Management Information Bases (MIB) for QoS management, containing the set of QoS parameters describing the performance, availability or reliability of the different components of the distributed multimedia system including application components. In wide-area systems, integration, federation and inter-operation of QoS MIB should be provided. For that purpose, we propose to design and implement extensible QoS MIB managers offering basic services to store, access, share, transfer, produce or analyze QoS information. QoS MIB managers should be extensible in the sense that they should integrate mechanisms for integrating new QoS information and services. Services provided by QoS managers will be dedicated to the different components of the distributed multimedia system to support distributed QoS decision models.
We will also apply our previous experience to develop techniques to monitor the measurement variables found in the QoS MIBs. This implies the need for application processes with embedded instrumentation. In previous work, we developed a set of sensors (instrumentation code) that encapsulates management data and provides controlled access to that management data, and actuators that are used for control purposes. We will also consider the remote access to these MIB using appropriate communication protocols.
Based on our previous experience, we have come to realize that the development of monitoring and control services is difficult, time-consuming and ad-hoc.  One of the reasons is that design issues have not been separated from implementation issues.   Toolkits are needed that facilitate the development of the monitoring and control services.  An example of such a toolkit would be for instrumentation where, based on the developer’s choices, the instrumentation code is semi-automatically embedded into application code.  This means that the developer can focus on the instrumentation needed (design issue) and not the implementation details needed to get the instrumentation embedded into the application.

Milestones (not including subproject (D)

March 01 • Analysis of the performance of several distributed algorithms for load sharing, using experiments with prototype implementation and performance simulations.
 • Performance analysis of query optimization strategy based on experimentation with the prototype and simulation studies; improved optimization strategies.
 • Design and prototype implementation of algorithms for adjusting multiple resource allocations in order to maintain several differentiated classes of service on a single server.
March 02 • Improved distributed algorithms for load sharing, including management of differentiated service classes.
 • Extension of query optimization strategies for the case of duplication of data.

References

[1] J. Gecsei, “Adaptation in distributed multimedia systems,” IEEE Multimedia, 1997.
[2] G. v. Bochmann and A. Hafid, “Some principles for quality of service management”, Distributed Systems Engineering Journal, 4: 16-27, 1997.
[3] A. Hafid and G. v. Bochmann, “Quality of service adaptation in distributed multimedia applications”, ACM Multimedia Systems 1998, Vol. 6, no 5,  pp 299-315
[4] S. Fischer, A. Hafid, G. v. Bochmann, and H. d. Meer, “Cooperative quality of service management for multimedia applications”, in Proceedings of the 4th IEEE International Conference on Multimedia Computing and Systems, Ottawa, Canada, June 1997, pp. 303-310.
[5] M.A. Bauer, R.B. Bunt, A. El Rayess, P.J. Finnigan, T. Kunz, H.L. Lutfiyya, A.D. Marshall, P. Martin, G.M.  Oster, W. Powley, J. Rolia, D.J. Taylor, and M. Woodside, “Services supporting management of distributed applications and systems”, IBM Systems Journal, 36 (4), 1997.
[6] M. Katchabaw, S. Howard, H. Lutfiyya, A. Marshall, and M. Bauer, “Making distributed applications manageable through instrumentation”, In Press,  The Journal of Systems and Software, 1999.
[7] H. Lutfiyya, A. Marshall, H. Bauer, P. Martin, and W. Powley, “Configuration Maintenance for Distributed Application Management”, Journal of Network and Systems Management, In Press, 1999.
[8] E. Lupu, and M. Sloman, “Conflict Analysis for Management Policies”, Integrated Network Management V, May 1997, pages 430-444.
[9] R. Wies, “Using a Classification of Management Policies for Policy Specification and Policy Transformation”, Integrated Network Management IV. Elsevier Science Publishers, 1995.
[10] T. Koch. C. Krell, and B. Kramer, “Policy Definition Language for Automated Management of Distributed Systems”, In Proc. Second International IEEE Workshop on Systems Management, Toronto, June 1996.
[11] G.v. Bochmann, B. Kerhervé and M. V. Mohamed-Salem, Quality of service management issues in electronic commerce applications, presented at IBM Workshop on Electronic Commerce, Sept. 1998; to be published in a book.
[12] S. Brobst and B. Vecchione, DB2 UDB: Starbust Grows Bright, Database Programming and Design, February 1998.
[13] IBM DB2 Universal Database Administration Guide, Version 5.2, IBM Corp. 1998
[14] H. Ye, B. Kerhervé, G. Bochmann "Quality of Service Management and Distributed/Parallel Query Processing", Technical Report, June 1998
 [15] A. Hafid, G. v. Bochmann, and R. Dssouli, “Quality of service negotiation with present and future reservations: a detailed study,  to appear in Computer Networks and ISDN Systems.
[16] A. Hafid and G. v. Bochmann, “Quality of service adaptation in distributed multimedia applications, ACM Multimedia Systems Journal, Vol. 6, No. 5, 1998
[17] A. Hafid and G. v. Bochmann, “A general framework for quality of service managementÏ,  to appear in Multimedia Tools and Applications Journal.
[18] J.W. Wong, K.A. Lyons, R.J. Velthuys, G.v. Bochmann, E. Dubois, N.D. Georganas, G. Neufeld, M.T. Ozsu, J. Brinskelle, D.F. Evans, A. Hafid, N. Hutchinson, P. Iglinski, B. KerhervÈ, L. Lamont, D. Makaroff, and D. Szafron, “Enabling Technology for Distributed Multimedia Applications”, IBM Systems Journal, 36(4): 489-507, 1997.
[19] E.Madja, A. Hafid, R.Dssouli, G.v. Bochmann and J.Gecsei, Meta-data modelling for QoS management in the WWW, Int. Conf. on Multimedia Modeling, Lausanne, Switzerland, 1998
[20] H. Lutfiyya, A. Marshall, M. Bauer, and D. Stokes, “A Policy-Driven Approach to Availability and Performance Management in Distributed Systems”, Journal of Network and Systems Management, conditionally accepted, 1998.
[21] M. Katchabaw, H. Lutfiyya, and M. Bauer,  “Driving Resource Management with Application-Level Quality of Service Specifications”.  First International Conference on Information and Computation Economies (ICE98), October, 1998. Also to be published in Journal of Decision Support Systems.
[22] M. Katchabaw, H. Lutfiyya, and M. Bauer,  “A Model of Resource Management to Support End-to-End Application-Driven Management”.  First International Conference on Information and Computation Economies (ICE98), October, 1998. Also to be published in Journal of Decision Support Systems.
[23] H. Ye, B. Kerhervé, G. v. Bochmann, QoS-aware distributed query processing, DEXA Workshop on Query Processing in Multimedia Information Systems (QPMIDS),  10th International Workshop on Database & Expert Systems Applications, Florence, Italy, 1-3 September, 1999, Proceedings published by IEEE Computer Society, 1999.
[24] H. Ye, G.v. Bochmann, B. Kerhervé, An adaptive cost model for distributed query processing, UQAM Technical Report, November 1999.

Last updated: June 2000