What if your law firm's confidential client data were feeding the next ChatGPT model right now? For many attorneys using AI tools, this isn't a hypothetical—it's happening by default.
Why This Matters Now
As attorneys, we have an ethical duty to maintain client confidences. Revealing confidences can result in disbarment to the lawyer and severe consequences to clients.
This is why we have to be very careful about how we use technology. No one wants their firm to be a source of the next Panama Papers!
The stakes were already high before large language models (LLMs) appeared on the scene in the last few years; now they're even higher.
The Data Hunger of AI Companies
One of the things we've learned about the companies building LLMs is that they have an insatiable appetite for data and text. OpenAI, Anthropic, and other AI labs, have taken controversial steps to obtain text for LLM training:
- OpenAI is alleged to have scraped millions of articles from the New York Times to include in its training data.
- Anthropic purchased and scanned millions of copyrighted books for use in training.
- Meta trained its models on millions of copyrighted books from Library Genesis.
- AI companies have reportedly ignored robots.txt website directives when scraping from the web.
Given this reporting, it's more important than ever to prevent client confidences from reaching the internet. Once there, your sensitive data could be scraped, memorized by an LLM, and later disclosed to third parties using the same LLM.
But here's the more troubling question: Can we ever feel safe putting client confidences into AI "chat" features, or more generally in any sort of application features powered by LLMs?
Given how eagerly AI companies scraped copyrighted works for model training, who's to say they wouldn't show the same disregard for user privacy?
Further, to comply with other laws and regulations, most AI companies include monitoring systems that could expose sensitive client information to human reviewers or third-party systems.
Platform-by-Platform Analysis
To answer this question, we have to start with each platform's Terms of Service. Note that many platforms treat individual users very differently from business customers when it comes to data training and privacy protections.
1. OpenAI / ChatGPT
Based on OpenAI's documentation, the company uses content provided by individual users to train its models. The policy states: "When you use our services for individuals such as ChatGPT, Sora, Operator, or Codex, we may use your content to train our models."
However, individuals can opt out of training in two ways:
- Through OpenAI's privacy portal
- By turning off the "Improve the model for everyone" setting in Data controls
Critical warning: Regardless of these settings, clicking thumbs up or thumbs down will share your entire conversation with OpenAI for model training.
OpenAI states that, by default, they do not train on any inputs or outputs from their products for business users, including ChatGPT Team, ChatGPT Enterprise, and the API.
Harmful content
OpenAI's usage policies state that they "report apparent child sexual abuse material (CSAM) to the National Center for Missing and Exploited Children."
2. Anthropic / Claude
In contrast with OpenAI, Anthropic's default setting is not to use customer inputs (or model outputs) to train its models.
User data becomes available for training only in two specific cases:
- When a user explicitly opts into training
- When a user submits feedback on a model response
Anthropic displays a prominent warning in the feedback form, noting that submission will make the entire conversation available for model training. The company also provides detailed information on its data retention practices.
3. Harvey / Counsel AI
Harvey's terms of service state that the company will not use customer content or data to train AI models.
However, the company may collect "Usage Data" to develop and improve services, but this specifically excludes content, customer data, or customer confidential information.
On the other hand, any feedback provided to Harvey may be freely used and incorporated into the company's products and services.
4. vLex / Vincent
vLex's Terms and Conditions grant the company "an irrevocable, perpetual, transferable, sublicensable …, royalty-free, and worldwide right and license to use, … distribute and display" user-provided content for various purposes, including improving vLex's products and services.
Key concerns:
- vLex does not explicitly state whether user content is used for AI model training
- They provide no opt-out mechanisms for data use in service improvement
- The terms imply that both vLex staff and "third-foundational [sic]" (third party?) AI models may access user data
Of all the service provisions we've reviewed, vLex's are the most confusing ... and worrying. Caveat emptor.
5. CoCounsel / Thomson Reuters
Thomson Reuters's FAQ for CoCounsel explicitly states: "We do not use your user content and user prompts to train or improve CoCounsel Core or any third-party GenAI LLMs."
The company goes further with these protections:
- CoCounsel partners (including OpenAI and Google) are contractually prohibited from using customer data to train their models
- Thomson Reuters has established controls to turn off third parties' abuse monitoring solutions "where applicable"
Like Anthropic, CoCounsel disables customer data training by default. Their privacy-related terms are clear and should address most lawyers' concerns.
One caveat: The phrase "where applicable" raises questions about whether they've disabled abuse monitoring for all vendors they use. Some of their partners or sub-processors may not offer this option, and hence user-provided content that triggers guardrail systems could be disclosed to third parties.
6. Microsoft
Microsoft is relevant because they're OpenAI's primary cloud partner. They also provide APIs through which other applications (like CoCounsel) access OpenAI models.
Microsoft's abuse monitoring policies include:
- Content classification of text and images performed by classifier models
- Abuse monitoring of overall account behavior
- Review and decision, which can include "human eyes-on review"
- Notification and action, which can include notifying the customer and suspending access
Important exception: Customers using Microsoft for "highly sensitive or highly confidential data" can apply to limit abuse monitoring.
7. Casefleet
Our terms of service explicitly prohibit using customer data to train AI models.
Casefleet's customer data is encrypted at rest and in transit, and we utilize a virtual private cloud to block unauthorized access to user data.
We currently use AWS and Anthropic for all artificial intelligence functionality, with contractual provisions providing enhanced data protections and further guarantees required for HIPAA compliance.
Action Steps for Law Firms
The landscape of AI-powered legal tools presents both tremendous opportunities and significant risks for attorneys. While these technologies can enhance efficiency and provide valuable insights, they also create new pathways for inadvertent disclosure of client confidences.
The key takeaway: Not all AI platforms are created equal when it comes to protecting attorney-client privilege.
The safe choices: Services like Anthropic's Claude, Casefleet, Thomson Reuters's CoCounsel, and Harvey explicitly commit to not training models on user data by default.
The risky choices: Others like OpenAI's ChatGPT use individual user inputs for training unless specifically opted out. Most concerning are platforms like vLex, whose terms grant broad, irrevocable rights to user content with unclear protections.
Essential Steps Before Using Any AI Tool
Before incorporating any AI tool into your practice, take these essential steps:
- Read the fine print — Don't rely on marketing materials; review the actual terms of service and privacy policies
- Understand the default settings — Know whether your data is being used for training unless you opt out
- Verify contractual protections — Ensure the platform has agreements with underlying AI providers that protect your data
- Consider your risk tolerance — For highly sensitive matters, stick to platforms with the strongest privacy commitments
- Stay informed — Terms of service change, so regularly review the policies of tools you use
Conclusion
As the legal profession continues to embrace AI, we must remain vigilant guardians of our clients' confidences. The convenience of these tools should never come at the expense of our fundamental ethical obligations.
Choose platforms that respect the sanctity of attorney-client privilege. Always err on the side of caution when client confidentiality is at stake.
Remember: in our profession, trust is everything—and once lost, it's nearly impossible to regain.
Ready to experience AI-powered legal technology you can trust?
Casefleet combines powerful AI capabilities with the privacy protections your practice demands. Our platform helps law firms organize case facts, generate chronologies, and streamline legal research—all while maintaining the highest standards of client confidentiality.
Start your free 14-day trial today. No credit card required.