ABSTRACT
We describe an ongoing work of extending the gLite Logging and Bookkeeping (L&B) service to be able to track additional types of jobs, with the vision of being able to uniformly follow jobs on the Grid, even when they pass between different middleware domains. Details are given on the simpler case of PBS jobs, which prove the cababilityof L&B to deal with additional job types,as well as started more complex and challenging work on Condor jobs, where theimpact of eventual success is larger.
References
- Zoltan Balaton et al. From cluster monitoring to grid monitoring based on grm. In Proc. 7th EuroPar2001 Parallel Processings, Manchester, UK., pages 874--881, 2001. Google Scholar
Digital Library
- R Byrom et al. APEL: An implementation of Grid accounting using R-GMA. UK e-Science All Hands Conference, Nottingham, 2005.Google Scholar
- Chiara Curti et al. On advance reservation of heterogeneous network paths. Future Generation Computer Systems, 21(4), 2005.Google Scholar
- S. Fisher. Relational model for information and monitoring. Technical Report GWD-Perf-7-1, GGF, 2001.Google Scholar
- R. Henderson and D. Tweten. Portable batch system: External reference specification. NASA, Ames Research Center, 1996.Google Scholar
- Ales Křenek et al. L&B Users Guide. https://edms.cern.ch/file/571273/1/.Google Scholar
- E. Laure et al. Programming the Grid with gLite. Computational Methods in Science and Technology, 12(1):33--45, 2006.Google Scholar
Cross Ref
- Erwin Laure et al. Middleware for the next generation grid infrastructure. In Computing in High Energy Physics and Nuclear Physics (CHEP 2004), 2004.Google Scholar
- Ludêk Matyska et al. Job tracking on a grid-the Logging and Bookkeeping and Job Provenance services. Technical Report 4/2007, CESNET, 2007. http://www.cesnet.cz/doc/techzpravy/.Google Scholar
- Gavin McCance etal. File transfer service. glite User Guide, 2005. https://edms.cern.ch/file/591792/1/.Google Scholar
- HB. Newman et al. MonALISA: a distributed monitoring service architecture. In Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, CA, 2003.Google Scholar
- M. Ruda et al. Logging and bookkeeping architecture for DataGrid Release 2. Technical report, EU DataGrid, 2002. Part of Deliverable D1.2.Google Scholar
- Douglas Thain, Todd Tannenbaum, and Miron Livny. Distributed computing in practice: the condor experience. Concurrency-Practice and Experience, 17(2-4):323--356, 2005. Google Scholar
Digital Library
- Michal Vocu et al. The megajob challenge-LB performance tests. EGEE JRA1 All-hands meeting, 2006. http://indico.cern.ch/conferenceDisplay.py?confId=a062598.Google Scholar
- S. Zhou. LSF: load sharing in large-scale heterogenous distributed systems. In Proceedings of the Workshop on Cluster Computing, 1992.Google Scholar
Index Terms
A uniform job monitoring service in multiple job universes





Comments