Views on Metadata Management
The concept of metadata management embodies the need to document, publish and control the behaviour of data, usually across multiple points within a system, or across several systems. In more sophisticated environments, robust and comprehensive toolsets are used to achieve this level of control, providing a common environment to design data structures, master the explanatory business and technical documentation behind them and publish this to a wider audience.
For many first starting out, a more limited metadata management solution, which enables the basic control to be achieved without this level of elaboration, may be better suited. This approach should be viewed as tactical, and as such should be reviewed as an organisation's use of metadata grows.
Broadly, the management of metadata can be considered by splitting the problem domain into:
- Active and passive metadata management
- Technical and business-focussed metadata management
Active and Passive Metadata Management
Metadata provides information about the structure and behaviour of data, as it is captured by, amended by and passed through a particular domain.This behavioural and structural information is used at many points to affect the way in which data is dealt with by software and humans, and to constrain the patterns of behaviour which it may adopt.
In the context of this document, we refer to metadata management not in the sense of individual points of control, but as the overall system of control, ensuring that the metadata being used is the same throughout a given organisation (and hence that consistent rules are being applied to the data).
Where there is a comprehensive toolset in place, the metadata it contains can be used to control the data at run time, in an active sense. The metadata can be interpreted dynamically, and the rules which it implies can be embedded in the logic of systems making use of the data. This level of control is known as active metadata management
Conversely, passive metadata management provides the necessary facilities to record and manage data characteristics, but not to use these characteristics at run time (i.e. the central definitions will be applied at design time, and their continued validity ensured via an essentially decoupled mechanism.)
Technical and Business Metadata Management
An orthogonal dimension for the consideration of metadata management covers the distinction between metadata needed to support and enhance business understanding of the data (e.g. plain language authoritative definitions of data elements) and that required to control the data in a technical environment (i.e. referential integrity rules, data ranges, etc.)Purpose of Metadata Management
Metadata management covers the discipline of controlling the nature of data (both behavioural and structural). This can be within a single system, or across an entire enterprise (generally speaking, the benefits accruing from effective metadata management across a wider scope being inherently greater).The purpose of imposing this control is to ensure that data is handled in a consistent manner from one point to another, and is well-understood at each stage. This consistency typically brings a number of significant benefits, including:
- Reducing or eliminating the need for retrospective data quality control
- Enabling and enhancing the reuse of system and process components for dealing with data (since the likelihood of reusability - driven by the enforced consistency - is greater
- Providing a immediate trusted source for new developments, where the nature of data can be described in an authoritative way, and understood
- (In some cases) active control over data movement, transformations, etc. at run time
Centralised Data Definitions
At the most basic level, a metadata management solution provides a centrally accessible repository of business- related metadata. This will comprise a set of plain language data definitions (loosely a 'data dictionary'), indicative descriptions of any restrictions which apply to the use or evolution of this data, and descriptions of the high-level events within the life-cycle of each data entity.The key issue with this feature is to ensure that it is:
- Authoritative
- Recognised as authoritative, and therefore trusted
- Complete, at a agreed level of detail
- Well published - immediately accessible to those with a need to refer to it
Centralised Data Control Rules
In addition to basic definitions, there is a need to record and publish the rules by which the data will be controlled as it passes through the relevant domain. These rules include, but are not limited to:- How the data is captured, and by whom / what
- What locations the data is passed to and under what circumstances
- How the data is transformed as it passes from one location to another within the domain
- What formal mechanisms are put in place to control the access to and use of data
Note that the degree of detail provided in each of these areas can vary significantly, and will be affected by the need to provide active or passive metadata management. In any case, the level of detail should be consistent with this agreed level and monitored against it.
Dynamic Control of Data Behaviour
Metadata management facilities can also be provided which enforce more active control over the use of data. Processes and software that make use of the data will interact with the metadata at run time, and react to the constraints that they find recorded. This has the following implied benefits:- It reduces the development effort since it removes the need for individual pieces of software/process to consider each data control rule explicitly
- It enforces consistency between the rules applied in different technical contexts
- It enhances flexibility, since a universal rule change can be recorded in one place and applied dynamically across the entire domain
Conceptual to Logical to Physical Data Mapping
As part of any meaningful metadata management facility, the ability to show the data behaviour at various levels must be accounted for.A conceptual or canonical form of the data should be recorded, providing a high level, but universally accepted version of the truth. Further to this, logical models of each subject area domain should be recorded, each providing a more detailed specific picture of how data will behave within the domain context. Finally, physical models must be recorded, defining the physical implementation of data structures within (for example) a relational DBMS schema.
The ability to record and control data definitions at each of these level and (significantly) to cross-refer between them, ensuring consistency and traceability, is central to the function of a metadata management solution.