Published clinical case reports are a valuable yet underutilized source of evidence for drug repurposing. However, systematically identifying relevant reports remains a challenge due to the volume of literature and the diversity of candidate compounds. We present TheraMind, an AI system that leverages large language models (LLMs) to automate the identification and analysis of case reports supporting potential drug repurposing for non-small cell lung cancer (NSCLC). Our system screened 10,023 PubMed-indexed case reports across 18 candidate drugs using coordinated data extraction and standardized four-question prompts assessing diagnosis, drug administration, discontinuation, and clinical outcomes. We employed three evaluation strategies, rule-based classifiers, single-model validators, and a majority-vote ensemble integrating GPT-40-mini, Gemini-2.0-Flash, and LLaMA-3-8B. The ensemble approach achieved 92% recall and 99.7% specificity in detecting clinically relevant reports. Structured outputs included patient demographics, therapeutic responses, and case summaries. This LLM-driven framework offers a scalable approach to accelerate drug repurposing by mining real-world evidence from unstructured clinical literature.
DP4SQL: Differentially Private SQL with Flexible Privacy Policies
Andrew Cascio, KinChin Tong, Daniel Kifer, and 2 more authors
In CCS ’26, 2026
ResidualPlanner+: a scalable matrix mechanism for marginals and beyond
Guanlin He, Yingtai Xiao, Levent Toksoz, and 3 more authors
The VLDB Journal, 2026
2025
Avoiding Floating-Point Side Channels in the Report Noisy Max with Gap Mechanism
Zeyu Ding, John Durrell, Daniel Kifer, and 5 more authors
Accurate cell segmentation underpins reliable radiomics and multi-omic analysis in digital pathology, yet foundation-scale models often output masks that require further refinement for clinical use. This study presents ClinSegAI, a post-processing tool designed to refine cell segmentation outputs from the BiomedParse foundation model to preserve radiomic feature integrity in hematoxylin and eosin (H&E) stained whole slide images. The proposed pipeline analyzes whole-slide images, applies preprocessing, and uses BiomedParse for initial nucleus segmentation, followed by a refinement algorithm that corrects segmentation boundaries and merges or splits regions as needed to maintain morphological fidelity. ClinSegAI was evaluated on lung and other cancer pathology slides against six alternative segmentation approaches, including conventional digital pathology software and state-of-the-art deep learning models. It achieved the highest average Dice Similarity Coefficient (DSC) - near 0.80 and substantially reduced segmentation errors, with the lowest 95th-percentile Hausdorff distance (HD95) and average symmetric surface distance (ASSD) among all methods. Crucially, the refined segmentations preserved radiomic feature distributions (shape, intensity, and texture metrics) closer to ground truth, assessed with pooled two-sample t-statistics, improving the fidelity of quantitative features for downstream analysis. These improvements enable more reliable integration of histopathology with other modalities, for example, correlating precise spatial segmentations with spatial transcriptomic data, improving prognostic models, characterizing immune infiltration in the tumor microenvironment, and enhancing treatment response prediction from tissue images. ClinSegAI demonstrates how targeted post-processing built on a foundational visual transformer can bolster segmentation accuracy and radiomics reliability in computational pathology.
2024
Reconstruction Attacks on Aggressive Relaxations of Differential Privacy
Prottay Protivash, John Durrell, Zeyu Ding, and 2 more authors
The Journal of Privacy and Confidentiality, Nov 2024
2023
Free gap estimates from the exponential mechanism, sparse vector, noisy max and related algorithms
Zeyu Ding, Yuxin Wang, Yingtai Xiao, and 3 more authors
Private selection algorithms, such as the exponential mechanism, noisy max and sparse vector, are used to select items (such as queries with large answers) from a set of candidates, while controlling privacy leakage in the underlying data. Such algorithms serve as building blocks for more complex differentially private algorithms. In this paper we show that these algorithms can release additional information related to the gaps between the selected items and the other candidates for free (i.e., at no additional privacy cost). This free gap information can improve the accuracy of certain follow-up counting queries by up to 66%. We obtain these results from a careful privacy analysis of these algorithms. Based on this analysis, we further propose novel hybrid algorithms that can dynamically save additional privacy budget.
2022
VLDBJ
Free Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms
Zeyu Ding, Yuxin Wang, Yingtai Xiao, and 3 more authors
@article{ding22freegapestimates,place={Country unknown/Code not available},title={Free Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms},url={https://par.nsf.gov/biblio/10337484},doi={10.1007/s00778-022-00728-2},journal={The VLDB Journal},author={Ding, Zeyu and Wang, Yuxin and Xiao, Yingtai and Wang, Guanhong and Zhang, Danfeng and Kifer, Daniel},year={2022}}
2021
CCS
DPGen: Automated Program Synthesis for Differential Privacy
Yuxin Wang, Zeyu Ding, Yingtai Xiao, and 2 more authors
In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, Jan 2021
Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then decides where to add noise, and how much of it to add. This can be a non-trivial process – if not done carefully, the algorithm might either violate differential privacy or have low utility.In this paper, we present DPGen, a program synthesizer that takes in non-private code (without any noise) and automatically synthesizes its differentially private version (with carefully calibrated noise). Under the hood, DPGen uses novel algorithms to automatically generate a sketch program with candidate locations for noise, and then optimize privacy proof and noise scales simultaneously on the sketch program. Moreover, DPGen can synthesize sophisticated mechanisms that adaptively process queries until a specified privacy budget is exhausted. When evaluated on standard benchmarks, DPGen is able to generate differentially private mechanisms that optimize simple utility functions within 120 seconds. It is also powerful enough to synthesize adaptive privacy mechanisms.
@inproceedings{dpgen,author={Wang, Yuxin and Ding, Zeyu and Xiao, Yingtai and Kifer, Daniel and Zhang, Danfeng},title={DPGen: Automated Program Synthesis for Differential Privacy},year={2021},isbn={9781450384544},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3460120.3484781},doi={10.1145/3460120.3484781},booktitle={Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security},pages={393–411},numpages={19},keywords={program synthesis, differential privacy},location={Virtual Event, Republic of Korea},series={CCS '21}}
arXiv
The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise
Zeyu Ding, Daniel Kifer, Sayed M. Saghaian N. E., and 4 more authors
@misc{ding21Permute,doi={10.48550/ARXIV.2105.07260},url={https://arxiv.org/abs/2105.07260},author={Ding, Zeyu and Kifer, Daniel and E., Sayed M. Saghaian N. and Steinke, Thomas and Wang, Yuxin and Xiao, Yingtai and Zhang, Danfeng},keywords={Cryptography and Security (cs.CR), FOS: Computer and information sciences, FOS: Computer and information sciences},title={The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise},publisher={arXiv},year={2021},copyright={Creative Commons Attribution 4.0 International}}
2020
CCS
CheckDP: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples
Yuxin Wang, Zeyu Ding, Daniel Kifer, and 1 more author
In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, Jan 2020
We propose CheckDP, an automated and integrated approach for proving or disproving claims that a mechanism is differentially private. CheckDP can find counterexamples for mechanisms with subtle bugs for which prior counterexample generators have failed. Furthermore, it was able to automatically generate proofs for correct mechanisms for which no formal verification was reported before. CheckDP is built on static program analysis, allowing it to be more efficient and precise in catching infrequent events than sampling based counterexample generators (which run mechanisms hundreds of thousands of times to estimate their output distribution). Moreover, its sound approach also allows automatic verification of correct mechanisms. When evaluated on standard benchmarks and newer privacy mechanisms, CheckDP generates proofs (for correct mechanisms) and counterexamples (for incorrect mechanisms) within 70 seconds without any false positives or false negatives.
@inproceedings{checkdp,author={Wang, Yuxin and Ding, Zeyu and Kifer, Daniel and Zhang, Danfeng},title={CheckDP: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples},year={2020},isbn={9781450370899},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3372297.3417282},doi={10.1145/3372297.3417282},booktitle={Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security},pages={919–938},numpages={20},keywords={formal verification, counterexample detection, differential privacy},location={Virtual Event, USA},series={CCS '20}}
2019
PVLDB
Free Gap Information from the Differentially Private Sparse Vector and Noisy Max Mechanisms
Zeyu Ding, Yuxin Wang, Danfeng Zhang, and 1 more author
Noisy Max and Sparse Vector are selection algorithms for differential privacy and serve as building blocks for more complex algorithms. In this paper we show that both algorithms can release additional information for free (i.e., at no additional privacy cost). Noisy Max is used to return the approximate maximizer among a set of queries. We show that it can also release for free the noisy gap between the approximate maximizer and runner-up. This free information can improve the accuracy of certain subsequent counting queries by up to 50%. Sparse Vector is used to return a set of queries that are approximately larger than a fixed threshold. We show that it can adaptively control its privacy budget (use less budget for queries that are likely to be much larger than the threshold) in order to increase the amount of queries it can process. These results follow from a careful privacy analysis.
@article{ding19freegapinfo,author={Ding, Zeyu and Wang, Yuxin and Zhang, Danfeng and Kifer, Daniel},title={Free Gap Information from the Differentially Private Sparse Vector and Noisy Max Mechanisms},year={2019},issue_date={November 2019},publisher={VLDB Endowment},volume={13},number={3},issn={2150-8097},url={https://doi.org/10.14778/3368289.3368295},doi={10.14778/3368289.3368295},journal={Proc. VLDB Endow.},month=nov,pages={293–306},numpages={14}}
PLDI
Proving Differential Privacy with Shadow Execution
Yuxin Wang, Zeyu Ding, Guanhong Wang, and 2 more authors
In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, Phoenix, AZ, USA, Nov 2019
Recent work on formal verification of differential privacy shows a trend toward usability and expressiveness – generating a correctness proof of sophisticated algorithm while minimizing the annotation burden on programmers. Sometimes, combining those two requires substantial changes to program logics: one recent paper is able to verify Report Noisy Max automatically, but it involves a complex verification system using customized program logics and verifiers. In this paper, we propose a new proof technique, called shadow execution, and embed it into a language called ShadowDP. ShadowDP uses shadow execution to generate proofs of differential privacy with very few programmer annotations and without relying on customized logics and verifiers. In addition to verifying Report Noisy Max, we show that it can verify a new variant of Sparse Vector that reports the gap between some noisy query answers and the noisy threshold. Moreover, ShadowDP reduces the complexity of verification: for all of the algorithms we have evaluated, type checking and verification in total takes at most 3 seconds, while prior work takes minutes on the same algorithms.
@inproceedings{shadowdp,author={Wang, Yuxin and Ding, Zeyu and Wang, Guanhong and Kifer, Daniel and Zhang, Danfeng},title={Proving Differential Privacy with Shadow Execution},year={2019},isbn={9781450367127},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3314221.3314619},doi={10.1145/3314221.3314619},booktitle={Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation},pages={655–669},numpages={15},keywords={Differential privacy, dependent types},location={Phoenix, AZ, USA},series={PLDI 2019}}
2018
CCS
Detecting Violations of Differential Privacy
Zeyu Ding, Yuxin Wang, Guanhong Wang, and 2 more authors
In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, Canada, Nov 2018
The widespread acceptance of differential privacy has led to the publication of many sophisticated algorithms for protecting privacy. However, due to the subtle nature of this privacy definition, many such algorithms have bugs that make them violate their claimed privacy. In this paper, we consider the problem of producing counterexamples for such incorrect algorithms. The counterexamples are designed to be short and human-understandable so that the counterexample generator can be used in the development process – a developer could quickly explore variations of an algorithm and investigate where they break down. Our approach is statistical in nature. It runs a candidate algorithm many times and uses statistical tests to try to detect violations of differential privacy. An evaluation on a variety of incorrect published algorithms validates the usefulness of our approach: it correctly rejects incorrect algorithms and provides counterexamples for them within a few seconds.
@inproceedings{statdp,author={Ding, Zeyu and Wang, Yuxin and Wang, Guanhong and Zhang, Danfeng and Kifer, Daniel},title={Detecting Violations of Differential Privacy},year={2018},isbn={9781450356930},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3243734.3243818},doi={10.1145/3243734.3243818},booktitle={Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security},pages={475–489},numpages={15},keywords={differential privacy, statistical testing, counterexample detection},location={Toronto, Canada},series={CCS '18}}