Open Data & Tools for Legal AI

The companion repository to LegalRealist AI — datasets, tools, and source code for legal research and practice. All code on GitHub.

Data
court orders on AI in legal proceedings. Search by judge, state, or order type with free links to original sources — no Westlaw or Lexis required. Sources: RAILS (Duke Law), Ropes & Gray, LegalAIGovernance.
289 excluded Medicare providers matched to pre-exclusion CMS billing data against 3.39 million peers. 13 of 15 features significant after Bonferroni correction. Predictive model (AUC 0.79) identifies providers with real enforcement histories who were never excluded.
FinCEN SARs + FCPA Coming soon
Synthetic Suspicious Activity Reports combined with Stanford FCPA enforcement records for anomaly-detection and classification research.
Code
Parser differential attack PoC: Excel number formats that make LLMs read different financials than humans see. Includes SheetGuard detection tool and proof-of-concept files.
Compare traditional vs AI-enhanced eDiscovery workflows. AI handles document processing so attorneys focus on judgment work — human corrections feed back to improve accuracy. Adjust staffing, risk profiles, and AI efficiency.
AI-maintained knowledge base powered by Claude Code. Drop in your documents and get a searchable, interlinked wiki — automatically structured and cross-referenced.