AI: Data Privacy and Confidentiality| Fortra HRM

Off

This October, in celebration of 2025 Cybersecurity Awareness Month and its theme “Stay Safe Online,” we’ll be sharing weekly resources — including blog posts, training videos, and infographics. Each release will spotlight key topics to help strengthen your internal cybersecurity campaigns.

AI tools are transforming the way professionals work — enhancing data analysis, content creation, and decision-making. While they boost productivity and innovation, responsible use is key. Privacy and confidentiality must remain top priorities.

Watch the video below to learn how to use AI tools effectively and ethically across your organization.

In a world where serious cybersecurity breaches occur weekly, it is reasonable to expect that organizations would be a little paranoid about their data. In specific industries such as finance and healthcare, a healthy dose of paranoia may be exactly what the CISO recommends. Combine this “baseline attitude” with the sheer pace of innovation and progress in the field of AI, and it is self-evident that organizational leaders and decision makers must err on the side of caution.

There is another side to the story. Focusing specifically on company data used for AI model training, I’ll make the case that some leaders drank the AI Kool-Aid® and developed “selective paranoia.” Fear not, the root cause of this condition is well understood. In the post-“ChatGPT moment” era, so many acronyms — LLMs, GenAI, AGI, ML, DL, the list goes on — have flooded the spotlight, leaving anyone outside the AI world condemned to a “soup of confusion.” AI experts only make matters worse since they do not define these terms consistently, and who really is a true AI expert. The result is a whirlwind of concern that customer-identifying, private, or confidential data have all been used to train an AI model that could easily leak the data by accident or skillfully manipulated by an adversary.

But AI models have been trained for decades (first commercially viable techniques were published in the late 1980s and early 1990s), and decision makers were hardly bothered until the last two years or so. What’s really going on?

My argument is simple: Ultimately, the issue boils down to confusing terminology. To ensure clarity, let’s establish the terminology we will be using.

Artificial Intelligence (AI)

Using the broadest definition possible: Any computer system that contains an intelligent component, typically an algorithm that automates decisions based on e.g. calculation or search.

Machine Learning (ML)

The study of computer algorithms that can improve automatically through experience and use of data, extracts and recognizes patterns, learns implicit rules, and makes predictions.

Deep Learning

A subfield of ML that covers ML algorithms that use deep (having multiple layers) neural network architectures. The layers extract higher-level features from raw input. By using many layers of transformations, we can do representational learning (automatically find the appropriate way to represent data).

Large Language Models (LLMs)

An LLM is a deep neural network (transformer architecture) trained on large textual datasets. It treats text as tokens and can make a prediction about (generate) the next token.

Generative AI (Gen AI)

Broadly speaking, ML systems are capable of generating text, images, code or other types of content, often in response to a prompt entered by a user. In practice, a large (very deep) transformer model trained on large amounts of data from a specific domain, and given a sequence of tokens (prompt), used to generate new tokens from that domain.

Visually, think of each term in our list as a subfield of the one defined above it. We excluded GenAI from the image below, but in practice, it can be regarded as a sub-category of LLMs.

Armed with these definitions, we can nail down the exact source of confusion. Since late 2022, AI outsiders, which includes many organizational leaders, simply use the term “AI models” when they precisely mean “GenAI models.” As a result, they associate any risks discovered with GenAI models with all models. They might hire a consultant (who might be equally confused) to draft a legal document prohibiting “sensitive” data from being used in any AI model training. We certainly shouldn’t blame a decision maker for making this mistake, but the only practical consequence in this case is wasted time on all sides, and maybe a few consultants who got paid.

In 2025, most effective products and services in cybersecurity rely on ML systems or models as a core component. Most of these models are not generative, cannot be prompted in the same sense as ChatGPT, and have exactly zero risk of leaking data. When we train such a (non-GenAI) model, we need data that, in most cases, is combined and mixed from many sources, including multiple customers. In general, it is in our interest as model developers to remove anything that identifies a customer or a user from the data before training to ensure the model is robust and generalizable (does not overfit to a specific case). But that does not mean we anonymize the data. Quite the contrary, we need to evaluate and understand how the model performs, which we accomplish by analyzing the data alongside the model. Quality data that is representative of the “real-world” is critical for delivering a powerful system or model.

When a customer approaches us with a legal document that prohibits us from using the customer’s “identifiable” data in any AI model training (emphasis on any), we of course push back and arrive at a sensible formulation. But decision makers should be aware this is a futile exercise. If the consultants had their way, it would only hurt the customer in the end, due to the reduced quality of the model.

What Can You Do?

It is always nice to conclude a slightly-too-long blog post with a warm and fuzzy feeling, so in the words of Idiocracy’s President Camacho, “here is a 5–point plan that’s going to fix everything:”

Recognize the "soup of confusion" is real. You may have been affected.
Know your terms ... if this blog didn't already clarify them.
When you are evaluating a vendor, pursue a high-level understanding of their AI/ML capability.
If the vendor is using generative models, dig deeper to understand the risks.
If you enjoy the luxury of a large organization, consider hiring an AI expert in-house, and cut loose your "AI consultant."

AI Tools Offer Powerful Capabilities — But Protecting Data Is Essential

As organizations adopt AI to improve productivity and innovation, it's crucial to recognize the privacy and confidentiality risks associated with it. From data leakage and cyberattacks to the tradeoffs of fine-tuning models with sensitive information, responsible AI use starts with awareness and best practices.

Explore the infographic below to learn how to use AI securely and responsibly while safeguarding your data. Click here to access a printable PDF.

See How Your Organization Can Reduce User-Related Cyber Risk.

Request a Demo

Meet the Expert

Eli Vovsha

Manager, Data Science Development

View Profile

AI: Data Privacy and Confidentiality

Artificial Intelligence (AI)

Machine Learning (ML)

Deep Learning

Large Language Models (LLMs)

Generative AI (Gen AI)

What Can You Do?

AI Tools Offer Powerful Capabilities — But Protecting Data Is Essential

See How Your Organization Can Reduce User-Related Cyber Risk.

Eli Vovsha

Contact Information

Privacy Policy

Cookie Policy

Impressum

Accessibility