Selecting a Data Catalog
Many organizations are now beginning a data governance program with people, culture, and data literacy establishment before selecting data governance tools to support their journey. Data catalogs have become an essential tool for organizations to manage and govern their data assets effectively. A data catalog is a centralized repository that provides a comprehensive view of an organization's data assets, including their location, quality, lineage, and access controls. Selecting the right data catalog for your organization is critical, as it can help you improve data literacy, governance, and decision-making. In this community insight article we look at some of the most important features and functions our members look for when selecting a data catalog for their enterprise.
“People need to find the data,
People need to know the quality of the data,
People need to know the lineage of the data,
The tool needs to fit within the culture”
Director of Experience Analytics
A data catalog should be a one-stop shop for all data-related information. It should provide access to basic information on what data is available, and you should be able to understand if there are policies or classifications on the data and how it should be treated. The catalog should also provide information on the quality of the data, where the data lives, where is came from, and what access controls are on the data.
Identifying all the use cases across the business when selecting a data catalog is a common best practice. This helps businesses understand how the data is being used and what users need from the catalog. Once all the use cases have been identified, businesses can choose the top X amount of use cases to prioritize, ensuring that the data catalog is designed to meet the most critical needs of the business.
Once use cases are selected it is essential to consider if the tool will empower users with data literacy. This means that the data catalog needs to be designed with non-technical users in mind, and it should be intuitive and easy to use. With that being said ensuring the security, data privacy, and compliance team are satisfied is also crucial, not just for the business users. The tools need to ensure that the security and compliance team is satisfied with the data catalog. The data catalog should provide comprehensive security features to maintain compliance with regulations such as GDPR, CCPA, or HIPAA. Security features should include access controls, user authentication, and data encryption. Understanding all of the various personas who will be accessing and leveraging the catalog and creating a comprehensive checklist will help satisfy all stakeholders.
Integration of systems is crucial when selecting a data catalog. Systems such as help desk, reporting, IT security, access management, and data quality. This will ensure that all the data-related systems are working together efficiently and that all data is easily accessible and understandable. Understanding the existing architecture and future state will help establish if metadata can easily be integrated and accessed. Otherwise, organizations can be put in a difficult and costly position of building or purchasing bridges to integrate data properly once a data catalog has been purchased.
Discovery, quality, and confidence are key compliments for users who are accessing the data on the catalog. With robust data lineage, analysts can see how the data is being leveraged and where it is being used in critical reports and applications. This information is particularly important for ensuring that business rules and data quality rules are being applied correctly.
“Starting back at the key reports, and the report attribute, document everything and then find where all the impacts are with data lineage. You can start to see how the data is being leveraged, apply business rules and data quality rules, and see if there is deterioration. The business will know what is being impacted.” – Director Enterprise Data Governance
Organizations need to consider the reusability of data. Once data is onboard, how easy is it to reuse it? The data catalog should allow for easy metadata tagging, finding, and automation. Moreover, business terms should be defined consistently across the organization, even though there may be variations in terminology. The data catalog should show these variations and allow users to understand the differences in terminology and the associated data.
With the explosion of ChatGPT and other AI-assisted programs, many users are looking to have AI (Artificial Intelligence) and ML (Machine Learning) embedded into the data catalog. AI can assist with data discovery, quality profiling, automating processes, and even collaboration by providing recommendations based on that user’s past experience and persona.
When selecting a data catalog, it is essential to consider features such as comprehensive data access, security, data privacy, ease of use for non-technical users, integration with existing systems, data lineage, and reusability. Ultimately, understanding the various personas and use cases of the data catalog will help satisfy all stakeholders, ensuring the successful adoption and integration of the tool into an organization’s data governance program. A Data Catalog should support the People-Centric data governance approach and enable collaboration and communication throughout an enterprise.