Publications | Zeyu D. Ding

Publications in reversed chronological order. Generated by jekyll-scholar.

For full list please visit my Google Scholar page.

2026

Accurate and Scalable Matrix Mechanisms via Divide and Conquer

Guanlin He, Yingtai Xiao, Jiamu Bai, and 4 more authors

arXiv/2604.00868, 2026
TheraMind: a multi-LLM ensemble for accelerating drug repurposing in lung cancer via case report mining

Vrushket More, Lyra Lu, Zeyu Ding, and 3 more authors

npj Precision Oncology, Jan 2026

Abs DOI

Published clinical case reports are a valuable yet underutilized source of evidence for drug repurposing. However, systematically identifying relevant reports remains a challenge due to the volume of literature and the diversity of candidate compounds. We present TheraMind, an AI system that leverages large language models (LLMs) to automate the identification and analysis of case reports supporting potential drug repurposing for non-small cell lung cancer (NSCLC). Our system screened 10,023 PubMed-indexed case reports across 18 candidate drugs using coordinated data extraction and standardized four-question prompts assessing diagnosis, drug administration, discontinuation, and clinical outcomes. We employed three evaluation strategies, rule-based classifiers, single-model validators, and a majority-vote ensemble integrating GPT-40-mini, Gemini-2.0-Flash, and LLaMA-3-8B. The ensemble approach achieved 92% recall and 99.7% specificity in detecting clinically relevant reports. Structured outputs included patient demographics, therapeutic responses, and case summaries. This LLM-driven framework offers a scalable approach to accelerate drug repurposing by mining real-world evidence from unstructured clinical literature.
DP4SQL: Differentially Private SQL with Flexible Privacy Policies

Andrew Cascio, KinChin Tong, Daniel Kifer, and 2 more authors

In CCS ’26, 2026
ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

Guanlin He, Yingtai Xiao, Levent Toksoz, and 3 more authors

The VLDB Journal, 2026

2025

Avoiding Floating-Point Side Channels in the Report Noisy Max with Gap Mechanism

Zeyu Ding, John Durrell, Daniel Kifer, and 5 more authors

Journal of Privacy and Confidentiality, Dec 2025

DOI
ClinSegAI: A post-processing framework for superior histopathology segmentation accuracy, radiomics feature preservation, and quantitative analysis

Prem Bhajaj, Saiprakash Nalubolu, Bhargavram Gurram, and 7 more authors

Comput Biol Med, Nov 2025

Abs

Accurate cell segmentation underpins reliable radiomics and multi-omic analysis in digital pathology, yet foundation-scale models often output masks that require further refinement for clinical use. This study presents ClinSegAI, a post-processing tool designed to refine cell segmentation outputs from the BiomedParse foundation model to preserve radiomic feature integrity in hematoxylin and eosin (H&E) stained whole slide images. The proposed pipeline analyzes whole-slide images, applies preprocessing, and uses BiomedParse for initial nucleus segmentation, followed by a refinement algorithm that corrects segmentation boundaries and merges or splits regions as needed to maintain morphological fidelity. ClinSegAI was evaluated on lung and other cancer pathology slides against six alternative segmentation approaches, including conventional digital pathology software and state-of-the-art deep learning models. It achieved the highest average Dice Similarity Coefficient (DSC) - near 0.80 and substantially reduced segmentation errors, with the lowest 95th-percentile Hausdorff distance (HD95) and average symmetric surface distance (ASSD) among all methods. Crucially, the refined segmentations preserved radiomic feature distributions (shape, intensity, and texture metrics) closer to ground truth, assessed with pooled two-sample t-statistics, improving the fidelity of quantitative features for downstream analysis. These improvements enable more reliable integration of histopathology with other modalities, for example, correlating precise spatial segmentations with spatial transcriptomic data, improving prognostic models, characterizing immune infiltration in the tumor microenvironment, and enhancing treatment response prediction from tissue images. ClinSegAI demonstrates how targeted post-processing built on a foundational visual transformer can bolster segmentation accuracy and radiomics reliability in computational pathology.

2024

Reconstruction Attacks on Aggressive Relaxations of Differential Privacy

Prottay Protivash, John Durrell, Zeyu Ding, and 2 more authors

The Journal of Privacy and Confidentiality, 2024

2023

Free gap estimates from the exponential mechanism, sparse vector, noisy max and related algorithms

Zeyu Ding, Yuxin Wang, Yingtai Xiao, and 3 more authors

The VLDB Journal, Jan 2023

Abs DOI

Private selection algorithms, such as the exponential mechanism, noisy max and sparse vector, are used to select items (such as queries with large answers) from a set of candidates, while controlling privacy leakage in the underlying data. Such algorithms serve as building blocks for more complex differentially private algorithms. In this paper we show that these algorithms can release additional information related to the gaps between the selected items and the other candidates for free (i.e., at no additional privacy cost). This free gap information can improve the accuracy of certain follow-up counting queries by up to 66%. We obtain these results from a careful privacy analysis of these algorithms. Based on this analysis, we further propose novel hybrid algorithms that can dynamically save additional privacy budget.

2022

VLDBJ

Free Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms

Zeyu Ding, Yuxin Wang, Yingtai Xiao, and 3 more authors

The VLDB Journal, 2022

DOI Bib

@article{ding22freegapestimates,
  place = {Country unknown/Code not available},
  title = {Free Gap Estimates from the Exponential Mechanism, Sparse Vector, Noisy Max and Related Algorithms},
  url = {https://par.nsf.gov/biblio/10337484},
  doi = {10.1007/s00778-022-00728-2},
  journal = {The VLDB Journal},
  author = {Ding, Zeyu and Wang, Yuxin and Xiao, Yingtai and Wang, Guanhong and Zhang, Danfeng and Kifer, Daniel},
  year = {2022}
}

2021

CCS
DPGen: Automated Program Synthesis for Differential Privacy

Yuxin Wang, Zeyu Ding, Yingtai Xiao, and 2 more authors

In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, 2021

Abs DOI Bib

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then decides where to add noise, and how much of it to add. This can be a non-trivial process – if not done carefully, the algorithm might either violate differential privacy or have low utility.In this paper, we present DPGen, a program synthesizer that takes in non-private code (without any noise) and automatically synthesizes its differentially private version (with carefully calibrated noise). Under the hood, DPGen uses novel algorithms to automatically generate a sketch program with candidate locations for noise, and then optimize privacy proof and noise scales simultaneously on the sketch program. Moreover, DPGen can synthesize sophisticated mechanisms that adaptively process queries until a specified privacy budget is exhausted. When evaluated on standard benchmarks, DPGen is able to generate differentially private mechanisms that optimize simple utility functions within 120 seconds. It is also powerful enough to synthesize adaptive privacy mechanisms.
@inproceedings{dpgen, author = {Wang, Yuxin and Ding, Zeyu and Xiao, Yingtai and Kifer, Daniel and Zhang, Danfeng}, title = {DPGen: Automated Program Synthesis for Differential Privacy}, year = {2021}, isbn = {9781450384544}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3460120.3484781}, doi = {10.1145/3460120.3484781}, booktitle = {Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security}, pages = {393–411}, numpages = {19}, keywords = {program synthesis, differential privacy}, location = {Virtual Event, Republic of Korea}, series = {CCS '21} }

arXiv

The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise

Zeyu Ding, Daniel Kifer, Sayed M. Saghaian N. E., and 4 more authors

2021

DOI Bib

@misc{ding21Permute,
  doi = {10.48550/ARXIV.2105.07260},
  url = {https://arxiv.org/abs/2105.07260},
  author = {Ding, Zeyu and Kifer, Daniel and E., Sayed M. Saghaian N. and Steinke, Thomas and Wang, Yuxin and Xiao, Yingtai and Zhang, Danfeng},
  keywords = {Cryptography and Security (cs.CR), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise},
  publisher = {arXiv},
  year = {2021},
  copyright = {Creative Commons Attribution 4.0 International}
}

2020

CCS
CheckDP: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples

Yuxin Wang, Zeyu Ding, Daniel Kifer, and 1 more author

In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 2020

Abs DOI Bib

We propose CheckDP, an automated and integrated approach for proving or disproving claims that a mechanism is differentially private. CheckDP can find counterexamples for mechanisms with subtle bugs for which prior counterexample generators have failed. Furthermore, it was able to automatically generate proofs for correct mechanisms for which no formal verification was reported before. CheckDP is built on static program analysis, allowing it to be more efficient and precise in catching infrequent events than sampling based counterexample generators (which run mechanisms hundreds of thousands of times to estimate their output distribution). Moreover, its sound approach also allows automatic verification of correct mechanisms. When evaluated on standard benchmarks and newer privacy mechanisms, CheckDP generates proofs (for correct mechanisms) and counterexamples (for incorrect mechanisms) within 70 seconds without any false positives or false negatives.
@inproceedings{checkdp, author = {Wang, Yuxin and Ding, Zeyu and Kifer, Daniel and Zhang, Danfeng}, title = {CheckDP: An Automated and Integrated Approach for Proving Differential Privacy or Finding Precise Counterexamples}, year = {2020}, isbn = {9781450370899}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3372297.3417282}, doi = {10.1145/3372297.3417282}, booktitle = {Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security}, pages = {919–938}, numpages = {20}, keywords = {formal verification, counterexample detection, differential privacy}, location = {Virtual Event, USA}, series = {CCS '20} }

2019

PVLDB
Free Gap Information from the Differentially Private Sparse Vector and Noisy Max Mechanisms

Zeyu Ding, Yuxin Wang, Danfeng Zhang, and 1 more author

Proc. VLDB Endow., Nov 2019

Abs DOI Bib

Noisy Max and Sparse Vector are selection algorithms for differential privacy and serve as building blocks for more complex algorithms. In this paper we show that both algorithms can release additional information for free (i.e., at no additional privacy cost). Noisy Max is used to return the approximate maximizer among a set of queries. We show that it can also release for free the noisy gap between the approximate maximizer and runner-up. This free information can improve the accuracy of certain subsequent counting queries by up to 50%. Sparse Vector is used to return a set of queries that are approximately larger than a fixed threshold. We show that it can adaptively control its privacy budget (use less budget for queries that are likely to be much larger than the threshold) in order to increase the amount of queries it can process. These results follow from a careful privacy analysis.
@article{ding19freegapinfo, author = {Ding, Zeyu and Wang, Yuxin and Zhang, Danfeng and Kifer, Daniel}, title = {Free Gap Information from the Differentially Private Sparse Vector and Noisy Max Mechanisms}, year = {2019}, issue_date = {November 2019}, publisher = {VLDB Endowment}, volume = {13}, number = {3}, issn = {2150-8097}, url = {https://doi.org/10.14778/3368289.3368295}, doi = {10.14778/3368289.3368295}, journal = {Proc. VLDB Endow.}, month = nov, pages = {293–306}, numpages = {14} }
PLDI
Proving Differential Privacy with Shadow Execution

Yuxin Wang, Zeyu Ding, Guanhong Wang, and 2 more authors

In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, Phoenix, AZ, USA, 2019

Abs DOI Bib

Recent work on formal verification of differential privacy shows a trend toward usability and expressiveness – generating a correctness proof of sophisticated algorithm while minimizing the annotation burden on programmers. Sometimes, combining those two requires substantial changes to program logics: one recent paper is able to verify Report Noisy Max automatically, but it involves a complex verification system using customized program logics and verifiers. In this paper, we propose a new proof technique, called shadow execution, and embed it into a language called ShadowDP. ShadowDP uses shadow execution to generate proofs of differential privacy with very few programmer annotations and without relying on customized logics and verifiers. In addition to verifying Report Noisy Max, we show that it can verify a new variant of Sparse Vector that reports the gap between some noisy query answers and the noisy threshold. Moreover, ShadowDP reduces the complexity of verification: for all of the algorithms we have evaluated, type checking and verification in total takes at most 3 seconds, while prior work takes minutes on the same algorithms.
@inproceedings{shadowdp, author = {Wang, Yuxin and Ding, Zeyu and Wang, Guanhong and Kifer, Daniel and Zhang, Danfeng}, title = {Proving Differential Privacy with Shadow Execution}, year = {2019}, isbn = {9781450367127}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3314221.3314619}, doi = {10.1145/3314221.3314619}, booktitle = {Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation}, pages = {655–669}, numpages = {15}, keywords = {Differential privacy, dependent types}, location = {Phoenix, AZ, USA}, series = {PLDI 2019} }

2018

CCS
Detecting Violations of Differential Privacy

Zeyu Ding, Yuxin Wang, Guanhong Wang, and 2 more authors

In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, Canada, 2018

Abs DOI Bib

The widespread acceptance of differential privacy has led to the publication of many sophisticated algorithms for protecting privacy. However, due to the subtle nature of this privacy definition, many such algorithms have bugs that make them violate their claimed privacy. In this paper, we consider the problem of producing counterexamples for such incorrect algorithms. The counterexamples are designed to be short and human-understandable so that the counterexample generator can be used in the development process – a developer could quickly explore variations of an algorithm and investigate where they break down. Our approach is statistical in nature. It runs a candidate algorithm many times and uses statistical tests to try to detect violations of differential privacy. An evaluation on a variety of incorrect published algorithms validates the usefulness of our approach: it correctly rejects incorrect algorithms and provides counterexamples for them within a few seconds.
@inproceedings{statdp, author = {Ding, Zeyu and Wang, Yuxin and Wang, Guanhong and Zhang, Danfeng and Kifer, Daniel}, title = {Detecting Violations of Differential Privacy}, year = {2018}, isbn = {9781450356930}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3243734.3243818}, doi = {10.1145/3243734.3243818}, booktitle = {Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security}, pages = {475–489}, numpages = {15}, keywords = {differential privacy, statistical testing, counterexample detection}, location = {Toronto, Canada}, series = {CCS '18} }