There has been a significant increase in interest in Grid Computing in the last year—from the computer science community, the platform vendors, and interestingly, from the application scientists (Physicists, Biologists, etc). Significantly, Grid computing emphasises the same challenges in interoperability, security, fault tolerance, performance and data management as the distributed computing community—albeit grounded with specific application scenarios from the science and engineering communities. A key aspect of Grid computing is the need to couple computational and data resources to form Virtual Organisations (VOs), via resource scheduling and discovery approaches. A VO in this context is created by combining capability across a number of different administrative domains, to run a single large problem. Hence a single application program may execute tasks on multiple resources—using the concept of a SuperScheduler. The SuperScheduler does not own any resources itself, but connects to a number of different local resource scheduling systems, within different geographically distributed administrative domains. Choosing suitable resources on which to schedule operations has generally involved a discovery process utilising a search of registry services to locate resources of interest. The mechanisms used within this process have been quite limited in scope, offering a limited set of queries to interrogate a (generally) LDAP based repository (via grid-info-search} in Globus for instance).
Having an efficient discovery mechanism within a Grid environment is quite significant—as it provides the infrastructure needed to support dynamic and fault tolerant VOs. The discovery mechanism also needs to be efficient—as searching for suitable resources—or more recently services (whereby access to all computational and data resources is seen as a computational or data service)—requires querying across distributed registries. The discovery process can also be seen as a way to establish dynamic relationships between Grid services—persistent or transient—as the discover process can be seen as a first stage in establishing an association with another service. Hence, when initiating a discovery process, a service user should identify the type of association to be formed with a service provider. Examples of such associations may be client/server (generally), or Peer-2-Peer. Viewed in this way, service discovery may be undertaken passively—similar to a service lookup in a registry, or actively when the discovery mechanism is intended to form a particular type of association with another service—and based on parameters such as cost, performance, Quality of Service (QoS), or trust.Groups
As the number of services on the Grid increase, so will the potential interactions between these services. This can lead to high traffic volumes on the network, and potentially act as a barrier to scalability. The use of the group paradigm is significant in this context, to limit interaction within a small community of services (in the first instance). Service groups can be formed based on a number of different criteria, ranging from geographical location of services, by service types—such as mathematical services, graphics services, data analysis services etc, service ownership, service costs, or service priorities. Members of a group can interact with each other more efficiently (via multicast messages, for instance), and can load balance requests sent to the group. The concept of forming a group or community also implies some level of trust between the members, implying the availability of efficient mechanisms to share state with other members of the group. The provision of shared memory based abstractions become necessary in this context, as group members may need to repeatedly share state, and be alerted as new members enter/leave the group. An important consideration in this context is the ability to decide which services should be allowed within a group—and how the group structure should evolve. Different service behaviours and roles can co-exist within a group—even though the services are targeted at a particular application domain. Therefore, in a group of mathematical services, one could have broker services, service users, community management services, and monitoring services. Some of these are required to manage and maintain the group itself, whilst others provide specialised capability.
The group idea also leads to the formation of small world networks—primarily interaction networks with a small average path length between members, and a large internal connectivity (i.e. a clustering coefficient that is independent of network size). These networks are likely to be of great significance in scientific computing—as they closely model data sharing between scientists within a given domain (such as genomics, astronomy, etc)—and different from sharing music data in systems like Gnutella. Once a small world network has been established, various assumptions about other members in the group may be made—such as their ability to provide particular types of data (or their speciality), trust that can be placed in them, and their ability to respond in a timely fashion. Identifying scenarios where such networks may be established, and subsequently providing suitable middleware to sustain these, will be important in Grid systems. One member within such a small world network (group) may be allocated the role of being a cluster head—thereby acting as a gateway to other small world networks. Hence a federation of such small world networks may be established, that span domain and organisational boundaries.The Importance of Shared Semantics
Establishing a group/community of services also necessitates the description of common semantics. Such a description should allow roles of each member to be defined, along with specialist services being offered by them, and provide a mechanism for communication between the members. Hence, two types of descriptions are necessary: (1) those that are independent of any particular application domain, and provide a means to define roles, and (2) those that are specific to particular application domains, and services that need to be provided within that domain. Significant progress has been made towards describing attributes of computational and data resources in a unified way—so that a query to a SuperScheduler could be understood by multiple resource managers. Little progress however has been made towards standardising descriptions of services within particular application domains—a time consuming and consensus building process within a scientific community. Both of these descriptions are necessary to enable multi-disciplinary science and engineering—and to enable a better understanding of common requirements of all of these communities. An important concern is the ability of these different data models to interact—as there is unlikely to be a single model used by all within or across communities. Resolving differences in representation, or providing support for automated negotiation between members to resolve semantic differences, become important services that need to be supported within a group.
Although significant progress has been made in Grid computing, the next phase will require more efficient mechanisms to organise and coordinate participants (individuals or institutions) within groups. Mechanisms to support the formation of groups are also significant—i.e. identifying which members should belong to which group, and how their membership can be sustained. Current emphasis has been on infrastructure installation (such as high speed networks) and means to provide common interfaces to resource management systems. To enable more effective interaction between services which run on this infrastructure, standardisation of common data models becomes significant. The Grid should eventually provide the computational and data infrastructure necessary to enable groups of scientists to undertake multi-disciplinary work. It should also provide the necessary entry points for connecting resources with a range of different capabilities—such as high end computational clusters and parallel machines, to sensor networks capable of data capture at source. An important aspect of Grid computing has been the significant interactions between the computer science and the application science communities—and for Grid computing to mature, these need to be strengthened further.Acknowledgement
Thanks to Jos'e C. Cunha of the CITI Centre, New University of Lisbon (Portugal), for interesting discussions on Groups.
Omer F. Rana
and the Welsh E-Science/Grid
Computing Centre, UK