Evaluation Skill ðŸ§
CommunityEnables systematic testing of agent performance using outcome-focused approaches accounting for non-determinism. Covers rubric design, LLM-as-judge evaluation, human review processes, test set construction, and continuous evaluation pipelines for quality gates and regression detection.
How to Use This Skill
- Click "View SKILL.md" to see the full skill definition
- Copy the contents of the SKILL.md file
- In Claude, go to Project Knowledge and paste the skill
- Start a new conversation and Claude will use the skill automatically
Leave a Comment
Related Skills
FFUF Web Fuzzing
CommunityExpert guidance for ffuf web fuzzing during penetration testing, including authenticated fuzzing with raw requests, auto-calibration, and result analysis
Security & Testingby jthack
securitypentestingfuzzing
PyPICT Test Designer
CommunityDesign comprehensive test cases using PICT (Pairwise Independent Combinatorial Testing) for requirements or code, generating optimized test suites with pairwise coverage
Security & Testingby omkamal
testingpairwisequality-assurance
QA Test Planner
CommunityGenerate comprehensive test plans and bug reports for quality assurance
Security & Testingby James Rochabrun
qatestingbugs