THANK YOU FOR SUBSCRIBING
So, this may not be something that readers expect, especially from someone whose professional focus is cybersecurity but the most impactful effort that any IT organization can make isn’t improving cyber security, but that of data management.
The reason is simple – data is the lowest common denominator when it comes to every service that any company or government agency provides. Hopefully it is at the root of how organizations make decisions, but that is a different conversation altogether.
What is the big deal anyway? Organizations, especially government has lots of data. We have business processes around how services are provided and by extension, how data is created. We use this data and we make decisions based on it. Why is data management even a ‘thing’? Isn’t this something we are already doing as a result of providing services? Yes and no.
The question we should be asking is how well, for purposes of the intended audience, government is using data to make decisions. In this context ‘well’ can be described as (but not limited to):
• How accurate is the data?
• Is the data complete so as to make the best decision possible?
• Is the way data is presented, understood, and used consistent in the org?
• Is access to data timely or does it take so long to get that it negatively impacts the service in question?
• Is the data valid? Not to be confused with accurate . . .
If you accept the above criteria for what is considered good then how would your public agency respond to questions like:
• Do you have a master list of all the data sources used by your entire organization broken down by data type, format, usage, etc.?
• Does this master list also denote where those data sources are getting their data from? Is it another data source or is it tied to a unique/shared process?
• How many sources of data exist in your org that contain information with the data type of ‘user’?
• If you were to ask key staff how a decision was made, could they specify the data they used to make that decision and how that data was generated in the first place? For that matter, is that logic documented somewhere so that you know, regardless of who is making decisions, that it is being done consistently?
• For all the above questions, how long would it take your org to provide this information, if able to at all?
• For all the above questions, could you answer the above questions confidently and point to evidence to support your responses?
In technology, we invest time and resources in ITAM (IT Asset Management) solutions so that we know what hardware and software are in our networks. From a cyber perspective, we can’t measure risk unless we know what’s in the environment. So why should it be different when it comes to data management? If orgs have a fundamental dependency on data to be successful, however that is defined, shouldn’t we treat data no differently than other assets? Arguably, we should be taking greater care when it comes to data assets.
“From a cyber perspective, we can’t measure risk unless we know what’s in the environment. So why should it be different when it comes to data management?”
The big move now in all areas of technology, but also security, is AI (artificial intelligence) and ML-based (machine learning) technologies. The promise of not having to do any manual effort, planning, etc. is too good to pass up for many organizations but this is where one of the problems exist. Vendors want to consume all an organization’s data and apply it toward their technology. The obvious question we’ll set aside for now, is whether the technology works as advertised as there are other ramifications around data and information security. One that vendors will almost never tell you about or account for in their approach to AI/ML is whether the data they are consuming is good or not. If you are lucky, the technology in question will compare data from multiple sources in efforts to determine its validity but this is not always the case, let alone common. Many technology solutions out there just assume that the data they are consuming is accurate. If organizations had a mature approach to data management, this concern would be addressed, or at least the framework in place to do so.
What about knowing what sources of data exist? Many cybersecurity vendors require an API from a 3rd party source of data to work properly. Often, this limits options for data consumption to those larger vendors out there. What about internal sources of data in formats like Excel or flat files? Additionally, this all assumes that you are aware of all the potential data sources that exist in an organization, including those in 3rd party cloud environments. What these cybersecurity solutions using data don’t typically consider is whether the source of data being ingested are the ones that are used to make decisions, deliver services and the like. Which data sources are considered authoritative and which of those shadow systems and of those, how are decisions being made? If you think that all budgetary decisions are made from ERP-generated reports versus someone’s homegrown Excel spreadsheet, you may want to reconsider your assumptions.
To be able to take advantage of the next generation of cybersecurity tools we need to better understand what data exists in our environments and how they are really being used. There are no shortcuts to understanding how decisions are made.
Read Also