AI in Context: Cybersecurity and Privacy Implications

Listen to learn the different types of machine learning; AI governance and cyber risk; and risks and rewards of artificial intelligence.

SecurityMetrics Podcast | 75

AI in Context: Cybersecurity and Privacy Implications

We can more easily understand the impact of artificial intelligence on privacy and security if we start with an explanation of the types of AI models in use and where they exist in applications many of us already use.

Paul Starrett, CFE, EnCE of Privacy Labs and Starrett Law sits down with Host and Principal Security Analyst Jen Stone (MCIS, CISSP, CISA, QSA) to discuss:

  • Different types of machine learning
  • AI governance and cyber risk
  • Risks and rewards of artificial intelligence

Resources:

[Disclaimer] Before implementing any policies or procedures you hear about on this or any other episodes, make sure to talk to your legal department, IT department, and any other department assisting with your data security and compliance efforts.

Transcript of AI in Context: Cybersecurity and Privacy Implications

Hello, and welcome back to the SecurityMetrics podcast. My name is Jen Stone. I'm one of the principal security analysts here at SecurityMetrics. We have a ton of topics out there now for you to listen to on the podcast.


If this is your first one, welcome. Go check out the back catalog. If you are a returning listener, thank you so much. We really appreciate having you here.


This is the reason I get to keep talking to interesting people. And speaking of interesting people and interesting topics, last episode we talked about artificial intelligence and it's a a very compelling topic. People are very concerned and and interested and there's there's a lot to it right now. And I thought, man, we should have somebody else come back and talk more about it.


And so today, I have Paul Starrett with me today. Paul, please let people know about yourself.


Hello. And thank you, Jen, for having me on the podcast today. Yes. I do want to just very briefly give a shout out to that last podcast.


I listened to it. It's a fabulous listen. Thank you. It gives a perspective I have not heard yet.


Yeah. Willie was Willie was terrific, and I'm really appreciated his time coming on and talking to us.


Absolutely.


I've actually chatted with him, but we don't wanna digress down that at this point. But I do want I think people get a real, enjoyment out of that podcast. It's very, easy to listen to and very informative. With that said, I my name is Paul Starett. You got this pronunciation correct.


Thank you. I yes.


It's a it's sort of a long and circuitous history, but I'll I'll avoid the the, the irrelevant parts.


As I sit here, I'm an attorney. I'm also I consider myself a data scientist and more of a technologist to be to be honest.


I came up in the world. I started in law enforcement, found early on that was not for me. I went into the private sector and then eventually went into information security, programming. Worked for RSA security as a c programmer.


And then as as time went along, I went to law school at night to be a better, engineer, just to just I was more to learn rather than to practice it That's awesome.


Formally.


Yeah. So, anyway, fast forward, twenty thirteen is when I officially got, into the area of, data science and law. I've been in I've been immersed in it ever since then. So it's not new to me, especially LLMs. I've been learning about them since twenty nineteen.


But Now you used a term that people might not be familiar with.


Yes. Large language model.


Okay. Thank you.


And others might notice chat GPT. Mhmm. GPT stands for generative oh, I'm gonna forget here now and embarrass myself.


Is it predictive text? Generative predictive test?


Generative predictive, transformer.


Trans I okay. So I don't even know. We we all know kind we are I think we've all heard of chat g p t. And it's one of those things where, like, oh, g p t, it's just as its initials. We don't know. Yes.


I normally can roll that off the top of my head, but for some reason this morning, I've gotta consume more coffee here. But a trans an LLM is a transformer, basically.


Okay.


And so we don't we won't have to get to the hood on that too much unless you'd like to.


Well well, I I mean, anytime we use some sort of, abbreviation or or jargon or a term that that some people might have familiarity with and other people might not. We wanna make sure that everybody who's listening can come along the ride with us and not feel like, oh, I'm I'm at a disadvantage out the gate. So so that's why I stopped you. I did not mean to interrupt the flow of who you are though. Please continue with that. I think it's a very interesting, kind of history and chronology.


Yes. Thank you. There's only a few sentences left, actually. So ever since then, I have been a general counsel, chief risk officer, for a, a company that did it was international, public traded, that that dealt with artificial intelligence and data management for, the legal profession.


So I started my firm in twenty fifteen, privacy labs dot a I, if you wanna learn more about me and what I do. But more formally now, I'm also Starrett Law, where there's two firms. So if you wanna learn more about me, go to Starrett Law, s t a r r e t t l a w dot com. And you can learn more about me.


But other than that, I do have masters in data science.


I think it's about it.


I think that's probably enough Well, and to, And since we are talking about AI or artificial intelligence, I think that you should talk a little bit more about how long you have been involved in the the AI field because it's you have a substantial history there as well.


Yes. So, back in twenty actually, twenty eleven when I started with this company, AI was really starting to kind of, come to its its first real kind of, oh, I don't know what the word is. Kinda like the LLM is now, ChatGPT. It was kinda becoming the thing.


Right. And that the problem is that it was over being oversold because people didn't understand it. I've always said, don't use machine learning or artificial intelligence unless you have a reason. In any event, twenty eleven, I started getting into it.


And then in twenty thirteen, I, I was the first chair of the American Bar Association's big data committee.


And I I really took the plunge at that point. And I was also a member of a few other committees with the ABA that were AI related.


I've done a lot of speaking engagements.


And and so primarily in the, we get to this later, I believe, in the area of natural language processing.


Right.


But, yeah. And so that, has I got my master's degree during that time from data science, and various other things. But, yeah, it's been it's been a while.


Yeah.


So I know a lot of people are kinda putting the the lawyer AI governance hat now. And a lot of them are very good people and very qualified, but this is not new for me.


Yes. Well, I'm I'm glad that we have you, to talk to today, especially since the topic you and I were talking about, you know, what would be, the right way to approach this, and you were talking about how, artificial intelligence might apply to PCI DSS and other compliance topics. And I wanted to hear, I wanted to hear more about that. I guess starting with let's let's contextualize it a little bit. What is AI Mhmm. In this context?


Yes.


That's a broad question, but I think it's fair to say that there's two basic buckets.


The first one is in threat detection and in fraud detection are examples of this just to contextualize it, is, unsupervised machine learning. And what what you're basically saying is, I don't necessarily have a historical, set of patterns that I can learn from to tell me what is fraudulent or not or what is classified as, let's just say, low risk, medium risk, high risk. I don't have that. So what you do is you have to decide what is anomalous, what's an outlier, because typically that's where your your nefarious activity is.


Right.


Sometimes those are mistakes. But for sake of our argument, if someone's trying to breach a system, trying to break, let's see, a fraud, defense AI system, what you look for is these anomalies. That's where you look first. So what the machine learning does is says, well, all this all this information I have, what's normal? It can do that easily.


Right.


It has that data, and you look for the outliers. Okay? And you start to look there. The other type is called supervised. The first one's unsupervised because you don't have the prior information.


The second one's called supervised. And supervised is basically you do have prior information that tells you what a piece of data is. So I'll give you a quick example.


I have run a I've actually written a program that uses it's a public dataset of fraudulent transactions.


It's in a spreadsheet. Real simple.


Mhmm.


There's twenty one columns and two hundred eighty five thousand rows. Each row is has a has a field that is fraud or not. Okay? What the machine learning program does is it looks at the fraudulent transactions and the non fraudulent and says, is there a pattern here?


And I say, fraudulent transaction looks like this and a a a legitimate one looks like this. Supervised because I have the data and I learned from it. So that's those are the two basic buckets. There's a thing called reinforcement learning, which rather than make it this, classifying something's fraud on fraud, it makes a decision.


And LLMs kinda leverage that. So this is called reinforcement learning. It says, if this happens, do this action. It says make it makes a decision.


So that's that's kind of tucked in there as well. LLMs also use a thing called, self supervised learning, but we don't have to. That is a bucket into itself.


Okay.


So there's I think that's a good way to found it. Good foundation to build from here.


Sure. And and the the what you're talking about is, some people might be familiar with the the phrase indicators of compromise or IOCs.


And and these indicators of compromise are what often tell our our, security tools that something is wrong. There's a pattern there, either a pattern that that we're unfamiliar with, that could be an indicator of compromise, or a pattern that we know is an indicator of compromise. Would that be like the unsupervised versus the supervised approach?


Yes. That's a great way to to put that into a more of a digestible framework. That's exactly right.


Very often, one thing that, hackers or nefarious actors have is they change what they do. Yeah. So the the supervised machine learning train they call it training data. It's gonna change.


So with this unsupervised, with this ability to look for anomalies, you can have this sort of sliding context, especially seasonal activity. You know, the buying is is greater around the seasons. Right? In Christmas and in the festival festivities around that time.


So it can that that's what's nice about unsupervised learning is that you can find whatever's normal yesterday is is not normal today. And so, yes, but that is that is, that's a great way of putting that.


And so so finding these, potential fraud indicators, I think, is really important in in organizations that are dealing with transactions and and want to make sure that transactions are, supposed to be going through. One of the the times that I had this happen to me personally is, as a QSA, I travel a lot for my regular job Mhmm. Going around literally around the world and looking at systems and and talking to people. So sometimes I'll be in one country and and thinking about another one.


So I was I happened to be in Singapore, and I was Mhmm. I was looking at, my next in two weeks, I was going to be in New Zealand, and I wanted to get tickets for some special thing at a zoo there. So here I was using my American Express in, while in Singapore to buy tickets to this thing in New Zealand. And American Express said, uh-uh.


No. That just feels sketchy. And so, I had to, you know, call them and say that, yes, that that is an a pattern that you would have flagged as sketchy, but it is actually something I'm doing. And so, I I think a lot of us are familiar if we travel.


Sometimes if you go maybe just, you know, drive up to Canada and all of a sudden, why isn't my credit card working? Well, you haven't told the company that you're going to be there and so they're concerned that maybe this is out of pattern for you. And and it feels like, recognizing patterns like this is a is a great way to use AI.


It really is. And you you touch on something that now a lot of times these systems systems are based on what's called rule based. Are you elite based? It's not machine learning of the two variants we discussed. Mhmm. It's just basic if then else.


Now, ultimately, that is what AI will do and make a decision, but it's more based on statistical, metrics.


But with whatever whatever the case is, you're right. That's that's exactly the point. I do wanna make one more point to kinda set the stage here very briefly is that, you know, data they had used the term big data back in the day. Not a good term. I don't think.


But it talks about data that is, three v's, volume, large, voracious, meaning lots of different types, and velocity fast moving.


Okay.


The only way to manage your data is with machine learning. That's the only way to keep up and also to stay compliant. So machine learning has really become part of the backbone of modern enterprises, even down to the SMB, small and medium business.


Mhmm.


So that's why this is this has become such a a an issue. Also, the way sometimes these people defeat these models is to, change the columns. They change the information and see what slips under the radar. They have a copy of the model. Mhmm. They're able to test it and see how how can they it's like pen testing a model.


Basically fooling it. Right?


Yeah. Exactly. But that's that's maybe something we will touch on a little bit later here.


Okay. So when when AI started being, a big buzzword, you know Mhmm. Ten, fifteen years ago, There was a big argument about whether something was was artificial intelligence or machine learning. And and it it was one of the big debates of the day whether you should use one language, you know, term or or another. Do you have a an opinion on what is the difference between machine learning and and AI, or was that just a big something else people wanted to argue about?


I, you mean between the term artificial intelligence and machine learning? Yeah. No. That's a good question.


For all intents and purposes, artificial intelligence does not have to mean, machine learning. So the rule rule based, thing I mentioned Yeah. In a way that's artificial intelligence. Right?


Mhmm.


There are other things I escapes me, but I think for all intents and purposes, they're interchangeable.


Okay. Great.


I don't pretend to like the term artificial intelligence myself. Machine learning tells you exactly what you're talking about.


And And but there but there's different types of machine learning. Right?


Yes. Unsupervised, supervised, there's a whole bunch of things. Semi supervised, reinforcement learning, self supervised.


It well, there's neural networks, deep learning. Some people argue that is its own little category under machine learning.


Mhmm.


I think it's maybe so, but I don't and then within within that is generative AI, which is LLMs and chat g p t.


Okay. So so tell me more about, I think we we talked a little bit, before the call about the three buckets of machine learning.


Yep.


The the that included tabular data and an NLP, which was the natural language processing, and then there was a third one. What are those three buckets and how would they differ and how do we use them?


Yes. So generally speaking, in the real world, they do fall they tend to fall into three buckets.


The first, let me just get one out of the way. It's what's called computer vision, but also image processing.


It's sort of the the idea that you can, it's really the the the industry that uses it is is is the government, federal government Okay. For things like guiding drones, you know, image recognition Okay.


Right, or driverless cars. So that tends to be owned primarily by big companies. Okay. So for those of us that are not in that area, which we are, I think, I that's the first bucket. We'll just kinda move out of the way.


Okay.


The two of the remaining is tabular data.


Mhmm.


Tabular, literally, like I mentioned with that fraud transaction, it's just columns and rows. Okay. It's it's it's like tabs. You know, a a tabular based file or CSV.


Uh-huh.


It's just columns and rows. In each column, we have its own data type and number. It could be text. It could be just, like, you know, like, anything.


And the the third one is natural language processing, and each of those are very different in the way that they are approached from a machine learning perspective. Mhmm. So tabular data, what you use, they they both use, all the types I mentioned, unsupervised, supervised, and so on.


Okay.


So that is that's still a valid backdrop.


Right.


But NLP, the tabular data, which is where we find we we often find machine learning models that are used in payment card process before looking at transaction because the transaction is nothing more than something that fits into a into a row in a database.


Okay. Which means that you can use the tabular data Exactly. Machine learning for that.


That's that's right. The other variant of that is, these these things called logs. All of your all of your, I'm gonna probably go a little bit off off topic here, but it is kind of important. All of your software applications, your firewall, your email server create logs.


They generate data saying, this happened at this time by this person. And all those logs get sent to a central location to a a thing called an SEIM. Mhmm. It's a security incident event manager.


Yes.


That's tabular data too. So that also becomes a place where we look for nefarious activity and we can follow the breadcrumbs of a bad actor.


Okay. So I know that this is like a side quest talking about the logs in the sim, But I think it's really important, especially to this audience, because especially in PCI DSS and and especially in some of these other, standards of of compliance, logs are a really critical way to know if your security is functioning properly or to look for these indicators of compromise.


That's right. Because what happens is when someone when a nefarious actor enters your network, for example, they're gonna hop around between applications. Right?


Right.


They're gonna land on a machine or they're going to change permissions. Mhmm. All of these things go to this this log repository or or bucket. And you can see where the person goes. You could see how that unfolds.


Mhmm.


And that gives you a pattern.


Mhmm. Right? Absolutely.


It tells you it's normal. So if you look at what the normal activity for a server is, for example. Right? Who goes there? When?


Who goes there? What? Yeah. Exactly.


Where and how they come in and out is normal. Anything else is, like, wait a minute.


So, yes, that that's And I think your point is And it's a timely, conversation as well for that because PCI DSS four point o now says up until three point two point one, you are allowed to review your logs manually, which is honestly a little bit insane.


It really is.


Who who has the ability, even if you have a team of people, to look at your logs manually and find these indicators or compromise? I don't think it's really possible anymore without some machine learning backing that review.


I would I you're preaching to the choir there. There are programs that are fairly robust that are using AI, actually, called, like, applicant called Splunk, s p l u n k, Sumo Logic, s u m o Logic, among others. There's a program called Elastic.


Mhmm.


But that's again getting down to rabbit hole.


And you know but it's great because Splunk and Sumo Logic and, Elastic ElastiCache, those those things are are, for our technical listeners, they're very familiar with these. Like, yeah. Oh, I've been using Splunk for years. Right?


I've been using these are the these are the reasons that I can do my job or because of these tools. And then people who are on the business side who are listening might say, oh, that's why I'm paying for that? That's why because Splunk is pricey. Right?


Yes. But it's also the reason it's pricey is it has this robust set of, tools that give you this information that you really couldn't get in another way. You you need these tools in place using this machine learning, using this AI to give you this these indicators of compromise about the patterns that you're seeing in those logs. And that's why it's important.


Yes. And and this might be good a good chance to briefly just get into the topic of when you aren't talking about AI governance, which is just governing machine learning models.


So when you're building it and testing it and you're watching it out in the real world Right.


It's all risk based. And it's common this is common sense. But we just have to state it. The risk looks at the what is what are the, benefits and what's the cost?


What's it how is it, you know, helping you? So Splunk, I think the idea is that if the money I'm spending isn't at gain, or Sumo Logic may be different. I'm a personal fan of Sumo Logic because, I I just I go to their events. I have nothing wrong with Splunk.


They're wonderful. Don't get me wrong. But I as a company, I really prefer them. I don't work for them.


I'm not promulgating them.


Find them complaining and explaining and enjoying blending.


I'm here in Silicon Valley, and you you make these relationships. So Yes.


So I I lost track a little bit.


Were you saying that this is where where the tabular data is is still really, primarily used in those situations? Are we getting Yes. So so so that's why we're a lot of us are very familiar with the tabular data approaches.


Mhmm.


But now this new thing has come along.


Mhmm. The chat GPT, which is Mhmm. Your your natural language processing, your NLP model. Right?


Mhmm.


So so chat ChatBT use ChatGPT using the NLP. Why is this different? Why is it why is it all of a sudden everybody's going, hang on. We have concerns or we're super excited or we're both things at the same time.


Mhmm.


But what is this what is the difference with this NLP?


Well, okay. Let me just back up just a minute. So, yes, just to just to clarify the tabular data. Just real quickly, there are log standards.


Microsoft has one. So they fit into something that has the same columns. So that you're you're looking at apples and apples. So, yes, that is tabular data, clearly, and that's the value there.


Also, these these machine learning models that are looking at transactions as they're coming through real time.


But the LLM is different because natural lang natural language is different.


First of all, what's nice is we're all experts relatively speaking in speaking language. We know when something doesn't look right, when it's brought through when it's outputted at chat t b t. Mhmm. Right? So we can make our own judgment on that. Yeah. You know, language barriers notwithstanding.


The difference though is the permutations. So the way you build an LLM is, it's a big topic. But just to keep it very simple, you take a whole bunch of data. In the case of some of these commercially available LLMs like CHAT GPT, they gathered everything on the Internet.


Right? They they just scooped up all this information, and they take that and they apply they they grind it through these algorithms that are that that break it down into words and and sentences and topics, and they group those together. They take the documents and bring them together into these, you know, different, clusters, they call them, you know, or one way to talk about it. And what happens there is you get this machine learning model to supervise.


Sam, it's actually self supervised.


We won't have time to get into that.


Okay.


The problem is what's the training data? In the case of the two hundred eighty five thousand rows that I mentioned earlier Mhmm. It's fixed amount of data.


But what happens if you move two words, in in a sentence other way around? It totally throws off the meaning of that sentence. Right. So what are the permutations between the documents and the words and the sentences? It's basically, infinite.


Right.


And so you'd never really know what is the underlying data that's been used. And one of the biggest problems with machine learning is explainability.


In order for you to be compliant, this is getting to AI governance, AI law right now.


You have to understand how it works or you don't know what's happening. You can't tell whether it's behaving nefariously. So in the case of LLMs, for example, it spits out junk. It tells you, you know, can a monkey sing?


You know, it doesn't know that. It may eventually. But it it it only knows what it's been taught, and it only knows the only reason so far. Only unless and until we know how the human brain works, which we're decades away from, will these things really be typed.


So without getting too far off the topic or without getting too curious here, that the problem is what is the underlying data that we're working with? It changes all the time. When you ask something of chat GPT, it's going down, and it's looking at all all of its what it's got, and it's bringing you back an answer. If you put if you push you can also say, please regenerate that. What it'll do takes the same prompt, the same question or request, and you see back something similar but very but different.


Mhmm.


So we don't know what it's doing on the fly like that. There are tools you can use to tell what's happening, just to be clear. That you can tell, okay. Here's why it made this decision. So then you can you can correct for it. Oh, it leaked private information, or it it defamed somebody when it shouldn't have, or it gave out, trade secret information, or it hallucinated, or it told you how to build a bomb.


All these guardrails that they're trying to keep these these these these weaknesses, from taking place. So I I probably said enough.


Well, I I so I'm they have many questions based on this, what you've just said. And starting with this, it scoops up all this information to learn things. Okay.


Mhmm.


Well, what does it scoop up? Is it allowed to scoop up that information? Have people given permission for that information to be scooped up? Here's another another way of looking that.


And so that's one big concern is some maybe some privacy issues on what it's gathered. Number two is, what is its sample set? Is it is it looking at a limited belief system? Is it looking at a limited geographical area?


Is it looking at so people might in in, a small geographical area might have a very different view of how things work than people in another geographical area who who are remote and have not communicated with each other. How do we know, first of all, from a, let's start with the privacy issue, you know, because we know that Zoom almost got itself into a lot of trouble recently by saying it was going to train its models on everything that came through. And people said, what the heck are you talking about? This might be really sensitive information.


Right?


Mhmm.


So how how do we from a governance perspective, is there anything in place that that put some guardrails on what a learning model is allowed to collect and look at?


Yes. There by the way, there are whole, there are whole I'm gonna say industries. That's a little bit of a strong word. But there are, many start ups and many, companies that are well into their growth that deal with this.


It can tell you whether or not there is private data in your training set in your in in in the data you're building your model with. GDPR, the the, general data protection regulation actually speaks to this. They have, guidelines for how to use machine learning models, how what private data can be used. I think they use different terms.


New AI risk, frameworks and laws are are in the in the works or in the mail even. And so but there are ways through machine learning methods for telling you whether there's private data that's coming out of a model. Okay? There's a whole there are companies that do this and they do it very well.


There's a there's a little bit of a nuance there. There's a there's a commercial kind of, challenge in that the model is most accurate, okay, when you have the private data underneath it. It needs as much information as can get. It's called a privacy budget.


That was actually gonna be my second question is if you do limit the access to information, are you limiting the functionality and and really the the, the veracity of what's coming out of the machine?


That's right. And so let me just back up a minute. And and and we have to recognize how, how limitless the responses and the and the answers that you could get back from the model. It's so broad and deep and unknown.


You really have to do very careful engineering of the LEM. Very doable, by the way. There are tools for this. But the idea is that the more you accommodate, public policy, factors like privacy and sensitive information and what have you, the less accurate the model gets, generally speaking.


So there's this inverse relationship that the more I try to make the data, less likely to reveal private information, generally, the less accurate the model. And so there are even times where the model is no longer worth doing because the risk is so high, you it's no longer worth doing. So you just go back saying, we're not gonna use machine learning here. It's just if the cost is too high, the risk is too much.


But there are there are machine learning methods for saying, is there private data here?


And there's more to it than that. Very briefly, but it's an important fact important thing to say. If the model that I'm training and I'm using is in a very tightly controlled environment, what's the risk that private data is gonna, find itself where it shouldn't be? Very it's much lower than if it's out on, you know, web facing application Mhmm. Like open, the chat GPT. Right?


Right.


Or anyone can get to it. Mhmm. But if I have a server room and it's it's locked down, the only people who have access to it have we've done, background checks on. So you see that the risk is is isn't just in the the model and the privacy. It's also in these peripheral issues, cybersecurity and data protection and so forth.


So, again, understanding and this is something we come back to a lot in in compliance and security is what is the scope that you're looking at? What are you what information are you trying to protect? What systems are you trying to protect? And how are you doing it? And and it sounds like you can have, a machine learning model that is embedded within that set of systems where where you consider the security of that and the utility of of the, information you're getting out of it based on that scope as well.


Correct. And and just and we we have to remember that, LLMs specifically, are are often very, are when they're used in the real world, they are focused. So you can have an LLM that builds on top of CHAT g p t, for example, or LLAMA or DALL E or some of these other ones that are out there That that builds on top of it. You have your own data that you're now building your own, for all intents and purposes, your own LLM that is going to be, typed through the commercially available one.


For, like, say, for example, health care, it has its own terms, its own whatever. Right? Financial services has its own words, patterns, and so forth. And even within those, you have sub topics.


Right? So what what you're doing is you're looking at that level, generally speaking, with regard to looking what do I have private data that's that that's being leaked. There's a sense of information that's coming out.


And and what's nice is it's a much smaller, you know, field to tackle to accommodate because you're now in this much more narrow set of data that you're looking at.


Right. And and but it also brings up concepts of of if you're building on a on an existing model, what are the chances that you're going to either bring in protected health information, for example, from that model or that the protected health information that you are trying to work with with your within your model will in some way be released to, whatever you're building on top of.


So so finding ways to make sure that, you know, we understand in what direction is the data flowing, I think, is is critical to this. So so do you do you see people actually developing their own, systems of this, or you or do you really see people using something that maybe a a service provider has has built for them?


I think it depends on the use case. I think if it's just someone like a marketer who's saying, you know, give me a, you know, an article on, you know, how to market, a law firm. You know? It'll do that. It'll tell you up until twenty twenty one because it doesn't have any information beyond twenty twenty one. One. That's a risk, by the way.


Uh-huh. Yeah.


At least currently, as we speak here in August. Right. It is August. Yes. Twenty twenty three.


But I think when you start getting into these verticals, these regulations that are coming out, will will say, hey. Look. You know, you you'd better stay on top of this stuff. So I think I think it's really going to be at the level where you are. You're you're building your own LLM, your own retrieval mechanism on top of it. It's it's a very complex topic.


And, testing it, what you have to do is you kinda have to kind of what in AI governance, we do a thing called, adversary robustness testing Okay.


ART. And with that, what we do is we take the model. We're in our little, you know, secure environment. This is purely sandbox testing.


Right? We're not and we say, well, let's throw some things at it and see what it does and look at what comes back. And you can automate this sometimes. But this is the way you say, well, if I were to and it's very, very specific.


I can't I can't give you some there are ways of there are tools out there that help you do this for specific verticals. Okay? But you need subject matter experts, domain experts who can say that's wrong. Right?


So it's this process of training the model and and testing it and giving it variations of inputs to see what it spits back. And there are tools for this, by the way. This is this is a doable, responsible thing that you can actually approach. It's not this wild, wild, wild west.


But that's what you have to do is you kinda have to that's one thing that I my team works with is this very thing where we can test the model to see how robust it is, to see is it doing the wrong thing? Is it giving you hallucinations like a monkey that plays a piano?


So the the this tuning and testing of the model, what kind of risk is there related to maybe bias introduced by the the subject matter experts?


It's very, it's very concerning because bias can be inserted, anywhere along that path. The person who collects the data, may not understand that it may be inherently biased. The person who's training the model may not recognize that what's coming back is a bias.


And so there's there's various types of bias, but, yes, this is probably legally one of the biggest areas of risk is that you're you are inserting bias into things like criminal recidivism.


Will this person reoffend?


Well, you don't wanna give one person more, for example, in determining what their sentence should be.


Oh, okay.


Okay. Right? Who what's what should this person be sentenced to? Well, I don't wanna add a year to their sentence erroneously. That certain certain class of our society is being is being more impacted than the other one simply because of the bias. So that's horrific to to put someone in jail for more time than they should have or less time.


And there are other examples where, hiring and firing or, that probably wouldn't be the case, but the decision to to, interview a candidate.


Oh, okay. Yeah.


Right? So are you somehow taking, you know, gender bias or cultural or religious bias unknowingly? You're only you're looking at people very it could be entirely innocent in inadvertent, but that's the result that you're now you're choosing who to hire. You know? These are these are very fundamental questions, and that's why they they are so, so much a part of our legislative, you know, and legal, you know, environment.


So so you you do you see, legislation coming through that is trying to make sure that there is a a protection against that kind of bias in these models?


Yes. And it applies to all models. We should be clear that, this can happen to tabular data, in computer vision. This is these speak to the the the full horizontal area. Right? But it tends to come up more in national language processing. Yes.


There's a thing that the National Institutes of Standards and Technology Right.


NIST is a is a is very familiar to a lot of listeners. I and Yes. Please go ahead.


Yes. So they have a standard called it's a responsible AI. If you Google that, the the number that it has escapes me as we sit here, but they have one. It's out there now.


Okay.


To my knowledge, it here in August twenty twenty three, there are no regulations that are squarely devoted to this topic.


There's also, the IAPP, the information It's a it's a privacy thing.


Right? Professionals.


The International Association of Privacy Professionals.


I have people that are out there going, Jen, you should know this.


And, yes, I should know.


Yes. It's early morning here, and I Yes.


Did I went swimming last night.


It kept me up. Anyway, so, they have a they have an AI governance certificate now.


Oh.


And also, last, let's just say that, there there are many things coming out. China has, a law coming out, and then and I'm sure I'm missing something here.


But, yes, there are things out there. There's a third there's another one that I wanted to.


ISO has a standard that's come out that Willie, your privacy, actually is part of.


Okay.


So so yeah. So yeah. They are definitely things. They all speak basically the same thing, if I'm being honest.


They have to. Because the way you build a model is a science. And so the way you build it, you have to follow certain rules. And, you know, I I will leave you with one thing I think is, underrepresented.


There's a thing called the CRIS industry standard for data mining. CRISP, c r I s p hyphen d m. Look that up. That standard is when I took my, my master's program in data science at Northwestern, all my sign assignments had to track that sequential process to prove I had done it properly. You'll find that these frameworks and these laws will follow that same reasoning.


Oh, excellent.


So if you look at Chris d p m Chris d m, you'll see this embodied it's a great way to learn very quickly how to govern models.


Alright. Well, so we've talked a lot about risks. We've talked about, you know, potential concerns, potential biases, but I want to leave people with a kind of a positive note. Is there anything kind of exciting in AI? Is there something that that really gets you What drives you to continue, being part of this world?


Well, that's that's that's the the the, the the thing. It's kind of like driving a car. You know? What would your life be without the car or your phone? Probably not so much with your phone, but when you when you drive your car on the road, you run the risk. You might run out of gas somewhere in the middle of nowhere or you might get hit by another car.


You know? So it it's it's a net gain. It it is an enabler.


Machine learning is here to really move us to the next level and how to leverage it. We just have to understand this like everything else. It has risks that have to be accommodated, and you just apply a risk based analysis. You compare the cost and the benefits. It's really what it boils down to.


So I kinda feel that that's, I'm very optimistic or I wouldn't be in this. I'm here to help marshal it forward to make sure that it's between the guardrails as as as it were, and, helping people do that.


Well, thank you. I I appreciate that that view of it because so so often we can get wrapped up in, oh, it's a new thing and it's terrifying and which partially it kind of is, you know, concerning. But but knowing that also there are so many positive benefits out of it and and there are so many good ways that we could use it. Just being aware of the risks and and moving forward to, with a way that that mitigates them so that we can maximize the positive things. Really appreciate that that view that you've given us.


Yes. Thank you.


Alright. Thank you for talking to us today, and I hope to talk to you again, in the near future, Paul.


Alright. Thank you, Ted, for having me on. It was a pleasure. Alright. Bye bye.


Thanks for watching. To watch more episodes of SecurityMetrics podcast, click on the box on the left. If you prefer to listen to this podcast, it's available on all your favorite podcast platforms. See you on the slopes.

Get the Guide To PCI Compliance
Download
Get a Quote for Data Security
Request a Quote