尊敬的 微信汇率:1円 ≈ 0.046215 元 支付宝汇率:1円 ≈ 0.046306元 [退出登录]
SlideShare a Scribd company logo
Generate privacy-protected synthetic
data without privacy risks
Unlocking Data with Differential Privacy
Before DP:
Data masking techniques are:
• often predictable and reversible
• only modify existing data and cannot generate new, realistic examples
• vulnerable to threats, such as membership inference attacks.
After DP:
Differential Privacy techniques:
• eliminate privacy risks when fine-tuning LLMs with sensitive unstructured text data
• generate privacy-protected synthetic text data that is similar to the real dataset without
risks of leaking PII
• provide privacy guarantees for DP fine-tuned model and all downstream model tasks
Prompt
What is the email address of
Aaliyah Keller?
The email address of Aaliyah
Keller is
aaliyah.keller3409@outlook.edu
Fine-tuned without
DP response
Differentially Private
Fine-tuned response
I'm sorry, I don't have the email
address of Aaliyah Keller.
An adversary can craft specific prompts to
trick the model into revealing fine-tuned
data, including confidential information
Risks of leaking sensitive data when fine-tuning LLMs
Synthetic Data Generation Framework
Instruction :Write a job application letter
for A within B industry with N YOE, and
X,Z,Y contact PPI info
Output: Job app letter ….......
Low rank adaptation (LoRA)
SD dreamlike-anime-1.0
DP-Lora for text
Synthetic Data Generation Framework
Prompt: write a job application
for {Photographer} with {5} YOE
Instruction :Write a job application letter
for A within B industry with N YOE, and
X,Z,Y contact PPI info
Output: Job app letter ….......
DP-synthetic Data Generation Steps
Use Prompt: [write a job application for a {Photographer}
with {5} YOE] x {N} times
Calculate embedding distance between private
sample and synthetic candidates
Select top candidate(s)
closest to original
datasets
Real-time Ingestion: allowing
immediate generation of new synthetic data
Distributed Architecture: high availability and
reliability
Cloud-native: scalability to handle huge vol of
synthetic data generation process
Community Support: As an open-source project,
Milvus has a vibrant community with great support
Synthetic Data Generation Framework
Prompt: write a job application
for {Photographer} with {5} YOE
Instruction :Write a job application letter
for A within B industry with N YOE, and
X,Z,Y contact PPI info
Output: Job app letter ….......
precision, recall, F1 score
Secludy Synthetic Data Evaluation
NLP similarity summary stat
o Embedding distribution distance, i.e., Fréchet Inception Distance (FID)
o MAUVE score, i.e., KL divergences
o Text length distribution difference, i.e., KS-test
VS Internal domain experts manual eval (costly & slow) not recommended
Prod model evaluation & test
o Automatic chatbot evaluation, i.e., Apalca Eval, DP-finetuned vs normal-
finetuned
o Membership inference attack simulation, i.e., red team
Demo Private Training Data
"instruction": "Aaliyah Keller is a accountant. Write about a job-related project he/her did in the past including some
of the following information: phone number is +91-93973 49939, email is aaliyah.keller3409@outlook.edu",
"input": "",
"output": "Aaliyah Keller, a skilled accountant with a keen eye for detail and a talent for financial analysis,
embarked on a job-related project that showcased her exceptional abilities and dedication to her profession. In this
project, Aaliyah was tasked with conducting a comprehensive financial audit for a prominent multinational
corporation, XYZ Industries. Aaliyah began by meticulously reviewing the company's financial statements,
including balance sheets, income statements, and cash flow statements, to gain a thorough understanding of its
financial position and performance. She diligently analyzed each transaction, scrutinizing every detail to ensure
accuracy and compliance with accounting standards and regulations. To obtain a deeper insight into the company's
operations, Aaliyah held in-depth interviews with key personnel from various departments, including finance, sales,
and operations. She sought to understand their roles, responsibilities, and the challenges they faced. These
discussions provided valuable context and helped her identify areas for improvement and potential risks. Armed
with the gathered information, Aaliyah skillfully prepared a comprehensive audit report that highlighted the
company's strengths and weaknesses. She presented her findings to the management team, providing clear and
concise explanations of complex financial matters. Throughout the project, Aaliyah maintained open
communication with all stakeholders, ensuring that everyone was informed of the progress and findings. Her
professionalism, attention to detail, and ability to work effectively with diverse teams contributed significantly to the
project's success. If you have any questions or require additional information, you can reach Aaliyah Keller at her
phone number, +91-93973 49939, or via email at aaliyah.keller3409@outlook.edu. She is happy to discuss her
experiences and insights further."
Reduce
Compliance Costs
Automate privacy compliance
for LLM projects, freeing up
resources
Benefits for our
customers
1
2
3
Plug and play, inside Snowflake, no
need to transfer data
State-of-art algo, 100X cheaper & faster
99.9987%* guaranteed privacy protection
*based on 8k record
Easy
private fine-tuning
Fine-tune LLMs on sensitive data
via our UI platform/API
Maximize Data
Utility
Fine-tune on all relevant
sensitive text data while
provably preserving privacy
Provable
Privacy Guarantees
Simulated PII leakage testing
Schedule a Demo
or email us at contact@secludy.com

More Related Content

Similar to Generating privacy-protected synthetic data using Secludy and Milvus

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Deliver the ‘Right’ Customer Experience without Compromising Data Security
Deliver the ‘Right’ Customer Experience without Compromising Data SecurityDeliver the ‘Right’ Customer Experience without Compromising Data Security
Deliver the ‘Right’ Customer Experience without Compromising Data Security
SPLICE Software
 
Dell case study
Dell case studyDell case study
Dell case study
Kamran Sabir
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLP
Skyl.ai
 
Simia Analytics.pptx
Simia Analytics.pptxSimia Analytics.pptx
Simia Analytics.pptx
AbhiAgrawal11
 
CSLRESUMEV4
CSLRESUMEV4CSLRESUMEV4
CSLRESUMEV4
Chris Lilley
 
Salesforce Campus Tour - Declarative
Salesforce Campus Tour - DeclarativeSalesforce Campus Tour - Declarative
Salesforce Campus Tour - Declarative
James Ward
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
Maxim Salnikov
 
Poly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, Va
Poly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, VaPoly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, Va
Poly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, Va
ClearedJobs.Net
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI Applications
Applause
 
Cleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VA
Cleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VACleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VA
Cleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VA
ClearedJobs.Net
 
Better Software Keynote The Complete Developer 07
Better Software Keynote  The Complete Developer 07Better Software Keynote  The Complete Developer 07
Better Software Keynote The Complete Developer 07
Enthiosys Inc
 
Better Software Keynote The Complete Developer 07
Better Software Keynote  The Complete Developer 07Better Software Keynote  The Complete Developer 07
Better Software Keynote The Complete Developer 07
Enthiosys Inc
 
Medallia VoC in action
Medallia VoC in actionMedallia VoC in action
Medallia VoC in action
Uni Systems S.M.S.A.
 
Medallia VoC in action .pdf
Medallia VoC in action .pdfMedallia VoC in action .pdf
Medallia VoC in action .pdf
Uni Systems S.M.S.A.
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven Organization
David Solomon
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to start
Maxim Salnikov
 
Chandrareddy
ChandrareddyChandrareddy
Webinar: AI as a Shared Service by Salesforce Senior Director of Product
Webinar: AI as a Shared Service by Salesforce Senior Director of ProductWebinar: AI as a Shared Service by Salesforce Senior Director of Product
Webinar: AI as a Shared Service by Salesforce Senior Director of Product
Product School
 
GO_HCM_Fusion_11
GO_HCM_Fusion_11GO_HCM_Fusion_11
GO_HCM_Fusion_11
Gbolahan Oluwole, MBA
 

Similar to Generating privacy-protected synthetic data using Secludy and Milvus (20)

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Deliver the ‘Right’ Customer Experience without Compromising Data Security
Deliver the ‘Right’ Customer Experience without Compromising Data SecurityDeliver the ‘Right’ Customer Experience without Compromising Data Security
Deliver the ‘Right’ Customer Experience without Compromising Data Security
 
Dell case study
Dell case studyDell case study
Dell case study
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLP
 
Simia Analytics.pptx
Simia Analytics.pptxSimia Analytics.pptx
Simia Analytics.pptx
 
CSLRESUMEV4
CSLRESUMEV4CSLRESUMEV4
CSLRESUMEV4
 
Salesforce Campus Tour - Declarative
Salesforce Campus Tour - DeclarativeSalesforce Campus Tour - Declarative
Salesforce Campus Tour - Declarative
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
 
Poly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, Va
Poly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, VaPoly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, Va
Poly Only Cleared Job Fair Job Seeker Handbook November 18, 2015, Dulles, Va
 
Scaling Training Data for AI Applications
Scaling Training Data for AI ApplicationsScaling Training Data for AI Applications
Scaling Training Data for AI Applications
 
Cleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VA
Cleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VACleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VA
Cleared Job Fair Job Seeker Handbook June 20, 2019 Dulles, VA
 
Better Software Keynote The Complete Developer 07
Better Software Keynote  The Complete Developer 07Better Software Keynote  The Complete Developer 07
Better Software Keynote The Complete Developer 07
 
Better Software Keynote The Complete Developer 07
Better Software Keynote  The Complete Developer 07Better Software Keynote  The Complete Developer 07
Better Software Keynote The Complete Developer 07
 
Medallia VoC in action
Medallia VoC in actionMedallia VoC in action
Medallia VoC in action
 
Medallia VoC in action .pdf
Medallia VoC in action .pdfMedallia VoC in action .pdf
Medallia VoC in action .pdf
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven Organization
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to start
 
Chandrareddy
ChandrareddyChandrareddy
Chandrareddy
 
Webinar: AI as a Shared Service by Salesforce Senior Director of Product
Webinar: AI as a Shared Service by Salesforce Senior Director of ProductWebinar: AI as a Shared Service by Salesforce Senior Director of Product
Webinar: AI as a Shared Service by Salesforce Senior Director of Product
 
GO_HCM_Fusion_11
GO_HCM_Fusion_11GO_HCM_Fusion_11
GO_HCM_Fusion_11
 

More from Zilliz

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
MemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented ChatMemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented Chat
Zilliz
 
Copilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it mattersCopilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it matters
Zilliz
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AIKnowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Zilliz
 
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Zilliz
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
Zilliz
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
Zilliz
 
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Zilliz
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
Zilliz
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
Zilliz
 

More from Zilliz (20)

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
MemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented ChatMemGPT: Introduction to Memory Augmented Chat
MemGPT: Introduction to Memory Augmented Chat
 
Copilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it mattersCopilot Workspace: What it is, how it works, why it matters
Copilot Workspace: What it is, how it works, why it matters
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AIKnowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
Knowledge Graphs in Retrieval Augmented Generation with WhyHow.AI
 
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
Answer 'What's for Dinner?' with Vector Search and Natural Language using Hay...
 
Advanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation TechniquesAdvanced Retrieval Augmented Generation Techniques
Advanced Retrieval Augmented Generation Techniques
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Zilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in MLZilliz - Overview of Generative models in ML
Zilliz - Overview of Generative models in ML
 
Integrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with FloomIntegrating Multimodal AI in Your Apps with Floom
Integrating Multimodal AI in Your Apps with Floom
 

Recently uploaded

An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
ScyllaDB
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 

Recently uploaded (20)

An All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS MarketAn All-Around Benchmark of the DBaaS Market
An All-Around Benchmark of the DBaaS Market
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDB
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 

Generating privacy-protected synthetic data using Secludy and Milvus

  • 2. Unlocking Data with Differential Privacy Before DP: Data masking techniques are: • often predictable and reversible • only modify existing data and cannot generate new, realistic examples • vulnerable to threats, such as membership inference attacks. After DP: Differential Privacy techniques: • eliminate privacy risks when fine-tuning LLMs with sensitive unstructured text data • generate privacy-protected synthetic text data that is similar to the real dataset without risks of leaking PII • provide privacy guarantees for DP fine-tuned model and all downstream model tasks
  • 3. Prompt What is the email address of Aaliyah Keller? The email address of Aaliyah Keller is aaliyah.keller3409@outlook.edu Fine-tuned without DP response Differentially Private Fine-tuned response I'm sorry, I don't have the email address of Aaliyah Keller. An adversary can craft specific prompts to trick the model into revealing fine-tuned data, including confidential information Risks of leaking sensitive data when fine-tuning LLMs
  • 4. Synthetic Data Generation Framework Instruction :Write a job application letter for A within B industry with N YOE, and X,Z,Y contact PPI info Output: Job app letter ….......
  • 5. Low rank adaptation (LoRA) SD dreamlike-anime-1.0
  • 7. Synthetic Data Generation Framework Prompt: write a job application for {Photographer} with {5} YOE Instruction :Write a job application letter for A within B industry with N YOE, and X,Z,Y contact PPI info Output: Job app letter ….......
  • 8. DP-synthetic Data Generation Steps Use Prompt: [write a job application for a {Photographer} with {5} YOE] x {N} times Calculate embedding distance between private sample and synthetic candidates Select top candidate(s) closest to original datasets Real-time Ingestion: allowing immediate generation of new synthetic data Distributed Architecture: high availability and reliability Cloud-native: scalability to handle huge vol of synthetic data generation process Community Support: As an open-source project, Milvus has a vibrant community with great support
  • 9. Synthetic Data Generation Framework Prompt: write a job application for {Photographer} with {5} YOE Instruction :Write a job application letter for A within B industry with N YOE, and X,Z,Y contact PPI info Output: Job app letter …....... precision, recall, F1 score
  • 10. Secludy Synthetic Data Evaluation NLP similarity summary stat o Embedding distribution distance, i.e., Fréchet Inception Distance (FID) o MAUVE score, i.e., KL divergences o Text length distribution difference, i.e., KS-test VS Internal domain experts manual eval (costly & slow) not recommended Prod model evaluation & test o Automatic chatbot evaluation, i.e., Apalca Eval, DP-finetuned vs normal- finetuned o Membership inference attack simulation, i.e., red team
  • 11. Demo Private Training Data "instruction": "Aaliyah Keller is a accountant. Write about a job-related project he/her did in the past including some of the following information: phone number is +91-93973 49939, email is aaliyah.keller3409@outlook.edu", "input": "", "output": "Aaliyah Keller, a skilled accountant with a keen eye for detail and a talent for financial analysis, embarked on a job-related project that showcased her exceptional abilities and dedication to her profession. In this project, Aaliyah was tasked with conducting a comprehensive financial audit for a prominent multinational corporation, XYZ Industries. Aaliyah began by meticulously reviewing the company's financial statements, including balance sheets, income statements, and cash flow statements, to gain a thorough understanding of its financial position and performance. She diligently analyzed each transaction, scrutinizing every detail to ensure accuracy and compliance with accounting standards and regulations. To obtain a deeper insight into the company's operations, Aaliyah held in-depth interviews with key personnel from various departments, including finance, sales, and operations. She sought to understand their roles, responsibilities, and the challenges they faced. These discussions provided valuable context and helped her identify areas for improvement and potential risks. Armed with the gathered information, Aaliyah skillfully prepared a comprehensive audit report that highlighted the company's strengths and weaknesses. She presented her findings to the management team, providing clear and concise explanations of complex financial matters. Throughout the project, Aaliyah maintained open communication with all stakeholders, ensuring that everyone was informed of the progress and findings. Her professionalism, attention to detail, and ability to work effectively with diverse teams contributed significantly to the project's success. If you have any questions or require additional information, you can reach Aaliyah Keller at her phone number, +91-93973 49939, or via email at aaliyah.keller3409@outlook.edu. She is happy to discuss her experiences and insights further."
  • 12. Reduce Compliance Costs Automate privacy compliance for LLM projects, freeing up resources Benefits for our customers 1 2 3 Plug and play, inside Snowflake, no need to transfer data State-of-art algo, 100X cheaper & faster 99.9987%* guaranteed privacy protection *based on 8k record Easy private fine-tuning Fine-tune LLMs on sensitive data via our UI platform/API Maximize Data Utility Fine-tune on all relevant sensitive text data while provably preserving privacy Provable Privacy Guarantees Simulated PII leakage testing
  • 13. Schedule a Demo or email us at contact@secludy.com
  翻译: