We get false positives for SSNs fairly often, causing out filter to encrypt messages that shouldn't be encrypted. I finally nailed down one (and possibly our primary) cause today when I copy/pasted the OCR'd text of a PDF file into notepad.
The copier's OCR software had added an extra space to the document footer's ZIP+4.
"06107-nnnn" becomes "061 07-nnnn"
Going back and reconciling past encrypted messages with the source PDFs that were attached to them, this case explains a LOT of our false positives. This is multiple models of Xerox copiers that are doing this, and our vendor has told us that there is no way to adjust the OCR settings to accommodate for this.
I'm requesting an option to exempt certain strings from the predefined Data Leakage Prevention filters.
E.g. I would like to be able to exempt 061-07-nnnn, with any and all characters used for separators, from the SSN filter.