[Feature]: Implement confidence scores for LLM Firewall

### Feature Description

We need to add a confidence score to the ninja-lm outputs to reduce false positives. 

>= 98% Jailbreak
>= 98% benign

Between 92 - 98% flagged for review

Less that 92 false.