Fábio Olivé Leite
Conectiva S.A.
This talk will present some of the current Linux clustering technology, and
specially how a communication fault injection tool called ComFIRM can be used to
validate the communication protocols they use. Fault injection is an established
technique among Fault Tolerance researchers, which allows one to do experimental
validation of the implementation of fault tolerance mechanisms, guaranteeing
that such implementations actually support the failure models they've been
designed to support. Communication fault injection in ComFIRM is done by creating a set of rules that will be evaluated at every packet reception or transmission. The rules are composed by a special bytecode that represents packet selection and manipulation primitives, and as such can be used to inject faults like delayed or dropped packets on any protocol implemented on Linux. ComFIRM is a flexible and powerful communication fault injection tool, located inside the Linux kernel, created by Fábio Olivé Leite. It is available as a set of patches and some documentation at http://www.conectiva.com.br/~olive/ComFIRM. It is stable and usable, even though a few things still have to be enhanced or fixed.
| Fábio Olivé Leite is a member of the Conectiva High Availability Development Team. He is currently finishing an MSc course on Fault Tolerance, has a BSc degree on Computer Science and also a Technician degree on Industrial Electronics. He has published a few works on Fault Tolerance and Distributed Computing, and enjoys working with reliable communication, clusters and other distributed cool stuff. |