+
Skip to content

gen-too/primat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PRIMAT: Private Matching Toolbox

PRIMAT is an open source (ALv2) toolbox for the definition and execution of PPRL workflows. It offers several components for data owners and the central linkage unit that provide state-of-the-art PPRL methods, including Bloom-filter-based encoding and hardening techniques, LSH-based blocking, metric space filtering, post-processing and more.

PRIMAT is developed by the Database Group of the University of Leipzig, Germany.

⚠️⚠️⚠️ Attention: This repository contains the first PRIMAT release which was presented at the VLDB 2019. Information on the (demo) showcase applications can be found below. Since then we did an extensive refactoring of the code base to simplify usage and to improve extensibility and maintainability. As of December 2021, new PRIMAT versions are released at https://git.informatik.uni-leipzig.de/dbs/pprl/primat.

Privacy-preserving Record Linkage

  • Task of identifying record in different databases reffering to the same person
  • Protection of sensitive personal information
  • Applications in medicine & healthcare, national security and marketing analysis

Key Challenges

  • Gurantee privacy by minimizing disclosure risk
  • Scalability to millions of records
  • High linkage quality

PRIMAT

  • PPRL tool covering the entire PPRL life-cycle
  • Flexible definition and execution of PPRL workflows
  • Comparative evaluation of PPRL approaches
  • Modules for both data owner and the trusted linkage unit

State-of-the-art PPRL Methods

Bloom filter encodings & hardening techniques

Fast & private blocking/filtering techniques

Post-processing methods for one-to-one link restriction

Functional Overview

Component/Module Function/Feature Status
Data generator & corruptor - Data generation
- Data corruption
Implemented
Planned
Data cleaning - Split/merge/remove attributes
- Replace/remove unwanted values
- OCR transformation
Implemented
Implemented
Implemented
Encoding - Bloom filter encoding & hardening
- Support of alternative encoding schemes
Implememnted
Planned
Matching - Standard & LSH-based blocking, Metric Space filtering
- Threshold-based classification
- Post-processing
- Multi-threaded execution
- Distributed matching
- Multi-Party support, match cluster management
- Incremental Matching
Implemented
Implemented
Implemented
Implemented
Integration outstanding
In development
In development
Evaluation - Measure for assessing quality & scalability
- Masked match result visualization
Implemented
Integration outstanding

Requirements

  • Java 11+
  • Maven
  • Ubuntu (recommended)

Showcase Applications

Data Owner App

The data owner application consists of components for pre-processing (data cleaning and stardadization) functions and Bloom-filter-based encoding of records containing person-related data.

To run the data owner application run the following command in the primat directory (where the pom is located):

mvn clean javafx:run -Dprimat.mainClass=dbs.pprl.toolbox.data_owner.gui.DataOwnerApp

Linkage Unit App

The linkage unit application provides linkage functionalities, in particular blocking, similarity calculation and classification, post-processing. Furthermore, it consists of evaluation facilities to compare different PPRL workflows in terms of quality (recall, precision, f-measure) and scalability (runtime, reduction ratio).

To run the linkage unit application run the following command in the primat directory (where the .pom-file is located):

mvn clean javafx:run -Dprimat.mainClass=dbs.pprl.toolbox.lu.gui.LinkageUnitApp

Contributors 2

  •  
  •  
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载