Quality of Service (QoS) and Distributed System Management
Description of the research project
Objectives
This project is concerned with distributed systems management in the context
of electronic commerce applications, especially considering quality of
service (QoS) issues and distributed query processing. The objectives are
the following:
-
To develop policies using the principle of different classes of quality
for the management of distributed applications in servers and over the
Internet.
-
To develop scalable models for distributed system management, which are
suitable for supporting a very large number of electronic commerce buyers
and vendors, especially addressing QoS issues.
-
To develop query optimization techniques for parallel shared-nothing database
servers using QoS information provided by the underlying networks.
Background and rationale
In the context of electronic commerce, standardization of the equipment
and software packages is not possible. Furthermore, the Internet is evolving
as a collection of heterogeneous networks with different capabilities.
Therefore the infrastructure has to be able to deal with subsystems of
different capabilities. This is particularly true for the quality of service
(QoS) of communication between the different end-systems involved in the
application. It is expected that the Internet will, in the future, provide
different classes of service quality, as opposed to the present uniform
“best effort” service. This will probably be at a cost, and possibly involve
the reservation of some appropriate system resources. It must also be noted
that certain network access technologies, such as telephone lines with
modems, impose restrictions on the available throughput. In certain situations,
particularly for users accessing the system through mobile equipment, the
available QoS may change over time, depending on the location of the user
and other factors influencing the communication link.
It is difficult to provide a guaranteed quality by the network and
by the host computers on which the distributed applications run, because
(a) it is difficult to foresee the demand on the resources in the context
where these resources are shared amount a very large number of users, and
(b) because the requirements of each user may change over time. For example,
if a video on demand client program is running in a window that gets minimized
then this implies that this is of no interest to the user and hence the
required has changed. Another example occurs when a user requests
access to information sources that were not foreseen in advance, thus increasing
the bandwidth needs in the middle an on-going session.
One approach to accommodating this situation is to build applications
that can adapt to different QoS availability. Some examples of such are
described in [1], however, it would be interesting to identify certain
more general negotiation schemes which could be used by applications to
adapt to such low-quality situations. In our previous project on “Quality
of service negotiation and adaptation”, we have developed solutions for
applications involving access to remote multimedia databases [2, 3] and
for certain forms of broadcasting applications [4]. However, the global
system management aspects related to the provisioning of negotiated QoS
have not yet been sufficiently. For example, there is a need to understand
(i) scalable models for distributed systems management, (ii) the role played
by the different components such as database systems, search engines or
image servers, or (iii) the influence of QoS performance of the networks
on parallel database systems.
A complementary approach to QoS management is to try to adapt the resource
management by providing for different classes of users. This corresponds
to the notion of "differentialed services" in the Internet and to the frequent
practice in commerce to distinguish between different classes of clients.
This is the approach that we will pursue within this project.
There are two other issues of general concern in this area. One is
the need for systems that can handle a very large number of users. This
implies the need for a scalable system design. So far, the question of
system scalability has largely been studied (a) in relation with parallel
processing computer architectures, where a large number of processing units
should be easily integrated, and (b) in relation to database storage systems,
where a large number of documents should be easily accessible, often to
a large number of concurrent users. The latter case also applies to the
electronic commerce application; however, in this project we intend to
mainly concentrate on the scalability of the system management functions
which are related to QoS negotiation, for instance finding the database
server and network connection which can provide the desired QoS.
Another issue is the need for a high-level characterization of the
QoS from the user's point of view, and the need to translate this view
into the details of the system parameters of the various resources, such
as CPU and network bandwidth usage. At the high level, QoS expectations
can be expressed in the form of policies. For example, the developer of
an application with a video component might identify a QoS such as "normal
television" as opposed to specifying the number of CPU cycles and the data
throughput required. Polices allow a user or application developer
to specify what their expectations of the QoS are, not how to achieve the
expected QoS. Violations of policies are violations of expected
QoS. Diagnosis algorithms determine the cause. Adaptation
actions include resource allocations, which are dynamically adjusted until
the delivered QoS meets policy expectations.
Our previous work in application management [5, 6, 7] concluded most
existing approaches to the management of applications is ad-hoc. We believe
that the use of policies is the basis for answers to these questions. The
importance of policies has been recognized by many researchers, but most
attention has been focused on specific issues, such as policy definition
and analysis [8], policy classification [9], and policy enforcement [10],
which usually refers to a formal definition of how policies are presented
as well as an architecture that shows how a policy system can be realized.
Significance to the shared goal of the major project
In the large-scale distributed environment provided for electronic commerce
applications, a large number of servers store and manage information (e.g.,
catalogs) for buying and selling goods and services. In such an environment,
it is mandatory to choose between different possibilities for accessing
and delivering information over different network interconnections. Thus,
the quality dimension of the provided service and information becomes of
prime interest to avoid being overloaded with irrelevant information, or
inadequate or costly services. Quality of service management should then
be considered at the different levels of the application, from a user-perspective
to a technical perspective inside the different components. Addressing
the issue of QoS management is one of the general objectives of the major
project to allow the development of the technological infrastructure to
facilitate electronic commerce on the Internet.
This project is particularly related to the project “System and network
architecture and application behavior modeling” (98-6-5), which performs
detailed studies of performance issues. In this project we are more concerned
with the management of performance issues and the scalability of different
algorithms at a higher level of abstraction. Collaboration with Wong is
assured by joint supervision of a Ph.D. student (M. Salem).
Research progress
Milestone 1: Definition of several distributed algorithms for load
sharing and management between brokers, users and servers:
We have defined an architecture that introduces a brokerage function between
clients and servers. Brokers continuously monitor the performance of affiliated
servers and assign them to clients according to a previously defined server
selection policy. We have defined several server selection policies to
optimize the capacity of the system based on server performance characteristics
and users requirements. In order to make realistic predictions about the
performance of the different servers, we have developed an approach to
estimate the server performance as a function of the number of concurrent
requests that may share the server at a given time (report in preparation).
This estimation is based on a server performance model, which is obtained
by monitoring the server and its behavior under various load conditions.
We do also monitor the behavior of the clients through the estimation of
their think time and the mean response time they are receiving at the different
servers. This information is used to estimate the load that each client
generates at a server. During the fall of 1999, we worked on an analysis
study of the performance of our architecture and its different load-sharing
algorithm by simulation. This study is expected to be completed by February
2000, and will be submitted for publication.
Milestone 2: Prototype implementation of one of the above distributed
algorithm for load sharing: We are working on an implementation
of the above algorithms in a broker prototype, which will be used to validate
the conclusion of our simulations. The selection of a server at the broker
is based on performance information obtained by monitoring the servers.
A Master student at the University of Ottawa developed a first version
of the monitoring agents at the server side, which monitors the performance
of an Apache server under the Windows-NT operating system. Work is ongoing
to port this system to a Unix platform in order to integrate it into the
electronic commerce prototype system developed in collaboration with the
other project partners. A first implementation of the broker is expected
to be completed by the end of March 2000.
Milestone 3: Definition of a QoS-based cost model for query processing:
We have investigated distributed query processing, particularly cost-based
query optimization, and we have proposed an approach in [23] which considers
QoS (Quality of Service) both from the user's requirements perspective
and from the network service availability. We have also proposed an adaptive
cost model for distributed query processing in [24]. Our cost model is
adaptive in the sense that first, it combines multiple optimization criteria,
response time and money cost, into a simple cost model and second, it can
give a more precise communication cost estimation according to the information
captured by the QoS manager. This cost model is flexible because it can
capture the user's willingness to pay for the query and the performance
dynamics of the computer system. Accordingly, we can also consider two
different optimization criteria: the user’s criteria considering the delivered
response time versus the cost of the query (based on existing tariff structures),
and the system’s criteria considering overall optimal resource utilization,
the satisfaction of the user’s response time requirements and the net income
from the usage charges. We also identified two network QoS parameters:
end-to-end delay and available bandwidth, and introduced methods
for measuring them.
Milestone 4: Design and prototype implementation of query optimization
strategies integrating QoS-based cost models and translating QoS requirements
into query optimization criteria: Given the general approach described
above, we plan to implement a prototype for QoS based distributed query
processing. Based on this proposal, we have defined the data models used
for the prototype implementation. Three data models are of interest: the
user profile, the global catalog of distributed database schemas and the
measured QoS information concerning the network and server load. The user
profile is helpful for translating QoS requirements into query optimization
criteria; it is also useful for guiding the optimizer to choose the correct
cost model. The global catalog and the QoS information are mainly used
by the cost models. We have identified the basic functional modules in
the prototype and plan to have an implementation ready this spring.
Milestone 5: Design and prototype implementation of algorithm
for adjusting CPU priorities in order to maintain several differentiated
classes of service on a single server: We specifically examined applying
an algorithm for service differentiation to Web servers (since it is the
cornerstone of many Web applications) and implemented a version of
our design for the Apache server. When a user request comes
into the Apache server, it gets assigned its own process. This process
registers with the QoS module. Its reference handle is put into a
queue and the process is put to sleep. The QoS module has a scheduling
algorithm that wakes up processes based on policies. For example,
assume that we have two classes of service: A and B. Assume that the "A"
class is the premium class. One policy is that there can be at most "M"
B class processes executing and at most "N" A class processes executing
(M < N). The disadvantage of this approach is that if there are
few A class processes then the CPU is not being effectively used.
Another algorithm treats A and B classes equally until there are a certain
number of complaints from A processes indicating that they are taking too
long to process. The number of non-sleeping B class processes is
reduced. We have developed and experimented with several
scheduling algorithms. The QoS module was designed so that it is relatively
easy to change the scheduling algorithm.
Research plan
In the context of large-scale applications such as electronic commerce
running over the Internet, the handling of very large numbers of users
is essential. We propose therefore to investigate the impact of the overall
system architecture and the QoS strategies on scalability. Scalability
depends on the distribution of the different system functions and data
over the various physical system components and the distributed algorithms
used for the processing of the user requests, such as electronic commerce
transactions, and for the system management aspects, such as QoS management.
The system architecture under consideration is heterogeneous and consists
of large number client machines, specific servers such as database servers,
video servers, names servers, transport system, and networks. In addition
to the servers that contain the virtual catalogs and those that perform
the commercial and financial transactions, we pay particular attention
to the functions related to management of QoS issues, distributed query
processing, and the adaptation of applications to low QoS availability.
Some of these functions may be integrated in the electronic commerce applications
(client and server sides), while others may be provided by separate agents,
such as name servers or traders.
Within this context, we will develop algorithms for QoS management
at the application level. Good scalability properties will be our major
design goal. Quality of service management is a distributed functionality
and can be decomposed into different steps: specification, mapping, negotiation,
resource reservation, adaptation and monitoring.
The different components of the distributed system for EC have specific
performance constraints that should be taken into account to provide the
QoS level desired by the user. Each component can also execute specific
tasks during the different steps of QoS management. We will extend our
previous work in order to define the role of the different components in
making QoS management decisions. We will propose a QoS architecture allowing
the different components to process specific tasks of QoS management strategies
and to contribute to system scalability.
The scalability of a system can be defined as its capacity to handle
the addition of users and resources without suffering a noticeable loss
of performance or increase in administrative complexity. Performance is
related to the QoS delivered by the system to the user, while administrative
complexity is related to the management of QoS and the interactions between
the different components in the system.
We identify three subprojects that address these issues, looking at
specific questions in the context of electronic commerce applications.
(A) Performance model for the electronic shopping application, including
access control and quality of service negotiation.
We consider a collection of servers and a large number of geographically
distributed users. We consider the problem of load sharing which is coordinated
by brokers which use QoS information obtained from the servers and other
brokers through dynamic monitoring. We also consider access control mechanism.
We will study scalable distributed algorithms for such systems and do performance
modelling of the global system, including the network.
During 1999-2000, we will define several distributed algorithms for
load sharing and management operating between brokers, users and servers,
and evaluate their performance properties. We will also build a prototype
broker that will implement one of load sharing algorithms. This will be
built using a performance model that characterizes the performance of a
single electronic commerce server and will be used to predict the performance
of a server under various access conditions. The system will dynamically
monitor the performance status each system component and use this information
to balance the load.
During the subsequent years, we plan to perform experiments with the
prototype system in order to understand eventual bottlenecks. We will also
prototype a few other promising algorithms and do a comparison through
simulation and real-world experiments.
(B) Distributed Query Processing Using QoS Information
Multimedia streams and high volumes of data are an important characteristic
of actual distributed information systems such as electronic commerce applications.
High performance database servers are then required to support these applications.
Specific architectures have been proposed and among them, parallel shared-nothing
database servers running over an ATM network seem to be very promising.
Such architectures lead to a revision of the traditional assumptions for
database query processing since network performance and server load are
unpredictable. Traditional query optimizers aim at achieving one
of the two optimization goals: response time or total resource consumption.
However, in most distributed multimedia applications, this assumption is
not enough, because (i) user's concerns vary in different applications
and (ii) the quality of communication is not constant. For example,
the user might care more about the service charge he can afford. Therefore,
the optimizer should also take different optimization goals into account.
In the case of an ATM cluster of servers, or in the case of geographically
distributed servers communicating over the future Internet with differentiated
services, optimization could also take into account the negotiated QoS
parameters of the network, such as available throughput, loss rate, or
communication delay.
In this part of the project we investigate how we could integrate QoS
management strategies for parallel query optimization. The problem is to
optimize execution time and/or resource consumption for executing a query
over an ATM cluster in a shared-nothing parallel environment. We aim at
proposing distributed query processing strategies in the presence of QoS
information and examining how such strategies can be integrated into the
IBM/DB2 Parallel Edition database system.
During 2000-2001, we will finalize our prototype implementation for
QoS based query processing based on our proposed cost model. We also plan
to do some experiments and simulations to justify its agility in the
face of changing network performance. We will demonstrate how our QoS-based
query processing strategy can be implemented using existing database systems.
This experimentation is expected to provide us with feedback about the
feasibility of our approach and may lead to new ideas for improvements
of the proposed algorithms. Such improvements, including the case of multiple
copies of data, will be proposed during the last year of the project.
(C) Service differentiation in the servers
In the context of electronic commerce, different classes of service are
often considered, either depending on a classification of users in occasional
buyers, regular clients and VIP clients, or depending on a service charge
which the user is willing to pay. It is therefore appropriate to provide
different classes of services, not only in terms of available options,
but also in terms of QoS, especially server response time. In this subproject,
we look at the issues related to providing differentiated classes of performance
at the server level.
A policy can be used to state the expected waiting time with each class
of customer. We have begun work that dynamically adjusts the CPU
allocations given to an application until the quality of service it delivers
meets policy expectations. If the delivered quality of service exceeds
expectations by a large margin, the process is given a less generous CPU
allocation (CPU allocations are done by adjusting the priority of processes).
During 1999-2000, we will extend this work by examining algorithms for
determining how to best adjust CPU priorities and how to take into account
other hardware resources. During the subsequent years, we will investigate
the inclusion of differentiated classes of response time within the context
of load sharing and distributed query processing, as discussed in the other
subprojects above.
(D) Research under a proposed IOR project
In the context of a related NSERC IOR grant application, we propose to
do related research on policies on QoS management and management information
bases (MIBs) for QoS related information in a distributed systems context.
For policy-enforcement, we need to characterize the data that describes
the run-time behavior of the managed system. This characterization is done
by identifying entities and attributes that describe the run-time behavior
of those entities. These are derived from the policies. For many types
of policies, especially those representing QoS requirements, the entities
and attributes are not yet well understood. To date, the focus has been
on issues such as bandwidth, cell delay, cell delay variance, ensuring
servers are operational, etc. [19,20]. Very little research has been done
on the mapping of application layer policies to network and host resources.
Hence, part of our research will conduct experiments to analyze the impact
that an application has on host resource usage. Our investigation will
include the following:
• Determine a set of measurement variables, including those that measure
responsiveness, productivity, utilization and errors/failures. This would
provide a possible set of data that characterize application behavior.
• Run a variety of experiments with varying system parameters, workload
parameters, and measurable applications that can be used to determine the
usefulness of the different measurement variables in determining whether
an application-level policy is being satisfied. This work will become the
basis for research in adaptive statistical techniques for early policy
violation (i.e., early detection of the existence of a fault that causes
a policy violation).
We will apply our previous experience with management of distributed
applications to develop Management Information Bases (MIB) for QoS management,
containing the set of QoS parameters describing the performance, availability
or reliability of the different components of the distributed multimedia
system including application components. In wide-area systems, integration,
federation and inter-operation of QoS MIB should be provided. For that
purpose, we propose to design and implement extensible QoS MIB managers
offering basic services to store, access, share, transfer, produce or analyze
QoS information. QoS MIB managers should be extensible in the sense that
they should integrate mechanisms for integrating new QoS information and
services. Services provided by QoS managers will be dedicated to the different
components of the distributed multimedia system to support distributed
QoS decision models.
We will also apply our previous experience to develop techniques to
monitor the measurement variables found in the QoS MIBs. This implies the
need for application processes with embedded instrumentation. In previous
work, we developed a set of sensors (instrumentation code) that encapsulates
management data and provides controlled access to that management data,
and actuators that are used for control purposes. We will also consider
the remote access to these MIB using appropriate communication protocols.
Based on our previous experience, we have come to realize that the
development of monitoring and control services is difficult, time-consuming
and ad-hoc. One of the reasons is that design issues have not been
separated from implementation issues. Toolkits are needed that
facilitate the development of the monitoring and control services.
An example of such a toolkit would be for instrumentation where, based
on the developer’s choices, the instrumentation code is semi-automatically
embedded into application code. This means that the developer can
focus on the instrumentation needed (design issue) and not the implementation
details needed to get the instrumentation embedded into the application.
Milestones (not including subproject (D)
March 01 • Analysis of the performance of several distributed algorithms
for load sharing, using experiments with prototype implementation and performance
simulations.
• Performance analysis of query optimization strategy based on
experimentation with the prototype and simulation studies; improved optimization
strategies.
• Design and prototype implementation of algorithms for adjusting
multiple resource allocations in order to maintain several differentiated
classes of service on a single server.
March 02 • Improved distributed algorithms for load sharing, including
management of differentiated service classes.
• Extension of query optimization strategies for the case of
duplication of data.
References
[1] J. Gecsei, “Adaptation in distributed multimedia systems,” IEEE Multimedia,
1997.
[2] G. v. Bochmann and A. Hafid, “Some principles for quality of service
management”, Distributed Systems Engineering Journal, 4: 16-27, 1997.
[3] A. Hafid and G. v. Bochmann, “Quality of service adaptation in
distributed multimedia applications”, ACM Multimedia Systems 1998, Vol.
6, no 5, pp 299-315
[4] S. Fischer, A. Hafid, G. v. Bochmann, and H. d. Meer, “Cooperative
quality of service management for multimedia applications”, in Proceedings
of the 4th IEEE International Conference on Multimedia Computing and Systems,
Ottawa, Canada, June 1997, pp. 303-310.
[5] M.A. Bauer, R.B. Bunt, A. El Rayess, P.J. Finnigan, T. Kunz, H.L.
Lutfiyya, A.D. Marshall, P. Martin, G.M. Oster, W. Powley, J. Rolia,
D.J. Taylor, and M. Woodside, “Services supporting management of distributed
applications and systems”, IBM Systems Journal, 36 (4), 1997.
[6] M. Katchabaw, S. Howard, H. Lutfiyya, A. Marshall, and M. Bauer,
“Making distributed applications manageable through instrumentation”, In
Press, The Journal of Systems and Software, 1999.
[7] H. Lutfiyya, A. Marshall, H. Bauer, P. Martin, and W. Powley, “Configuration
Maintenance for Distributed Application Management”, Journal of Network
and Systems Management, In Press, 1999.
[8] E. Lupu, and M. Sloman, “Conflict Analysis for Management Policies”,
Integrated Network Management V, May 1997, pages 430-444.
[9] R. Wies, “Using a Classification of Management Policies for Policy
Specification and Policy Transformation”, Integrated Network Management
IV. Elsevier Science Publishers, 1995.
[10] T. Koch. C. Krell, and B. Kramer, “Policy Definition Language
for Automated Management of Distributed Systems”, In Proc. Second International
IEEE Workshop on Systems Management, Toronto, June 1996.
[11] G.v. Bochmann, B. Kerhervé and M. V. Mohamed-Salem, Quality
of service management issues in electronic commerce applications, presented
at IBM Workshop on Electronic Commerce, Sept. 1998; to be published in
a book.
[12] S. Brobst and B. Vecchione, DB2 UDB: Starbust Grows Bright, Database
Programming and Design, February 1998.
[13] IBM DB2 Universal Database Administration Guide, Version 5.2,
IBM Corp. 1998
[14] H. Ye, B. Kerhervé, G. Bochmann "Quality of Service Management
and Distributed/Parallel Query Processing", Technical Report, June 1998
[15] A. Hafid, G. v. Bochmann, and R. Dssouli, “Quality of service
negotiation with present and future reservations: a detailed study,
to appear in Computer Networks and ISDN Systems.
[16] A. Hafid and G. v. Bochmann, “Quality of service adaptation in
distributed multimedia applications, ACM Multimedia Systems Journal, Vol.
6, No. 5, 1998
[17] A. Hafid and G. v. Bochmann, “A general framework for quality
of service managementÏ, to appear in Multimedia Tools and Applications
Journal.
[18] J.W. Wong, K.A. Lyons, R.J. Velthuys, G.v. Bochmann, E. Dubois,
N.D. Georganas, G. Neufeld, M.T. Ozsu, J. Brinskelle, D.F. Evans, A. Hafid,
N. Hutchinson, P. Iglinski, B. KerhervÈ, L. Lamont, D. Makaroff,
and D. Szafron, “Enabling Technology for Distributed Multimedia Applications”,
IBM Systems Journal, 36(4): 489-507, 1997.
[19] E.Madja, A. Hafid, R.Dssouli, G.v. Bochmann and J.Gecsei, Meta-data
modelling for QoS management in the WWW, Int. Conf. on Multimedia Modeling,
Lausanne, Switzerland, 1998
[20] H. Lutfiyya, A. Marshall, M. Bauer, and D. Stokes, “A Policy-Driven
Approach to Availability and Performance Management in Distributed Systems”,
Journal of Network and Systems Management, conditionally accepted, 1998.
[21] M. Katchabaw, H. Lutfiyya, and M. Bauer, “Driving Resource
Management with Application-Level Quality of Service Specifications”.
First International Conference on Information and Computation Economies
(ICE98), October, 1998. Also to be published in Journal of Decision Support
Systems.
[22] M. Katchabaw, H. Lutfiyya, and M. Bauer, “A Model of Resource
Management to Support End-to-End Application-Driven Management”.
First International Conference on Information and Computation Economies
(ICE98), October, 1998. Also to be published in Journal of Decision Support
Systems.
[23] H. Ye, B. Kerhervé, G. v. Bochmann, QoS-aware distributed
query processing, DEXA Workshop on Query Processing in Multimedia Information
Systems (QPMIDS), 10th International Workshop on Database & Expert
Systems Applications, Florence, Italy, 1-3 September, 1999, Proceedings
published by IEEE Computer Society, 1999.
[24] H. Ye, G.v. Bochmann, B. Kerhervé, An adaptive cost model
for distributed query processing, UQAM Technical Report, November 1999.
Last updated: June 2000