Synthetic data generator tool

6/4/2023

Our DP synthesizer prioritises the release of accurate combination counts (with minimal noise) of actual combinations (with minimal fabrication). Use of our differential privacy synthesizer is recommended for repeated data releases where cumulative privacy loss must be quantified and controlled and where provable guarantees against all possible privacy attacks are desired.Īny differentially-private dataset should be evaluated for potential risks in situations where missing, fabricated, or inaccurate counts of attribute combinations could trigger inappropriate downstream decisions or actions. Our approach to synthesizing data with differential privacy first protects attribute combination counts in the aggregate data using our DP Marginals algorithm and then uses the resulting DP aggregate counts to derive synthetic records that retain differential privacy under the post-processing property.įor a detailed explanation of how SDS uses differential privacy, please check our DP documentation.

When applied in the context of private data release, $\varepsilon$ bounds the ratio of probabilities of getting an arbitrary result to an arbitrary computation when using two synthetic datasets – one generated from the sensitive dataset itself and the other from a neighboring dataset missing a single arbitrary record. The paradigm of differential privacy (DP) offers "safety in noise" – just enough calibrated noise is added to the data to control the maximum possible privacy loss, $\varepsilon$ (epsilon). To generate these elements, our tool provides two approaches to create anonymous datasets that are safe to release: (i) differential privacy and (ii) k-anonymity. Data dashboards enabling exploratory visual analysis of both datasets, without the need for custom data science or interface development.Aggregate data reporting the number of individuals with different combinations of attributes, without disclosing exact counts.Synthetic data representing the overall structure and statistics of the input data, without describing actual identifiable individuals.In this project, we provide an automated set of tools for generating the three elements of a synthetic data showcase: Our name for such an interface is a data showcase. In many cases, the best way to share sensitive datasets is not to share the actual sensitive datasets, but user interfaces to derived datasets that are inherently anonymous. Generates synthetic data and user interfaces for privacy-preserving data sharing and analysis.įree-to-use web application for private data release: Overview

0 Comments

Synthetic data generator tool

Leave a Reply.

Author

Archives

Categories