The explabox aims to support data scientists and machine learning (ML) engineers in explaining, testing and documenting AI/ML models, developed in-house or acquired externally. The explabox turns your ingestibles (AI/ML model and/or dataset) into digestibles (statistics, explanations or sensitivity insights)!
The explabox can be used to:
Explore: describe aspects of the model and data.
Examine: calculate quantitative metrics on how the model performs.
Expose: see model sensitivity to random inputs (safety), test model generalizability (robustness), and see the effect of adjustments of attributes in the inputs (e.g. swapping male pronouns for female pronouns; fairness), for the dataset as a whole (global) as well as for individual instances (local).
Explain: use XAI methods for explaining the whole dataset (global), model behavior on the dataset (global), and specific predictions/decisions (local).
A number of experiments in the explabox can also be used to provide transparency and explanations to stakeholders, such as end-users or clients.
The explabox was developed at the National Police Lab AI (NPAI).
text_explainability provides a generic architecture from which well-known state-of-the-art explainability approaches for text can be composed. This modular architecture allows components to be swapped out and combined, to quickly develop new types of explainability approaches for (natural language) text, or to improve a plethora of approaches by improving a single module.
Several example methods are included, which provide local explanations (explaining the prediction of a single instance, e.g. LIME and SHAP) or global explanations (explaining the dataset, or model behavior on the dataset, e.g. token frequency and MMD-Critic). By replacing the default modules (e.g. local data generation, global data sampling or improved embedding methods), these methods can be improved upon or new methods can be introduced.
Uses the generic architecture of text_explainability to also include tests of safety (e.g. ability to handle input characters, or varying input lengths), robustness (how generalizable the model is in production, e.g. stability when adding typos, or the effect of adding random unrelated data) and fairness (if equal individuals are treated equally by the model, e.g. subgroup fairness on sex and nationality).