Technical Platform Lead (Data Catalog)
Data Catalog Platform
A data catalog platform is a collection of services that helps organizations create and maintain a metadata repository for data assets (technical & business), BI reports, and visualizations (dashboards). The goal of a data catalog platform is to become the primary source for data discovery requirements. At the core, a data catalog platform provides the following capabilities:
A search service – An interface to quickly find and discover relevant information
Metadata curation – Features to add more relevancy to data and increase trust in data through governance
User collaboration – Features to interact with data assets owner, contribute knowledge base articles, recommend trusted data assets, and publish user queries
Metadata Management – Features to administrate the metadata, user roles, and admin settings
“Measure your platform’s success by the satisfaction of users.”
Why measure the success of your Data Catalog Platform?
The success measure through a KPI driven approach is essential to answer the following questions about your platform:
Did investment in technology work?
Was cost efficiency achieved in managing the platform?
Does the platform serve as a single pane for data-related questions?
Did the platform help users to make data-driven decisions?
Were improvement areas identified to get better?
Did employee acceptance of the platform improve with time?
The success of any data catalog platform depends on the value that the platform provides to the end-users. The classification of the end-users of a data catalog platform can be categorized the following user personas:
Data Analyst – Wants to get started with the data research quickly
Data Engineer – Wants to find the relevant joins or queries for the data assets swiftly
Data Scientist – Wants to find the trusted data assets for the ML model promptly
A data catalog platform is critical to unlocking the connection between the data and ‘its’ meaning within the organization.
The success measures for a data catalog platform are:
‘Success Measures’ help organizations gauge the value their end-users get from the use of the platform over time. From the platform’s growth to maturity, there will be different ways to measure the platform’s success.
The data assets within a data catalog go through different stages during the platform’s maturity, so measuring the metrics at these critical stages becomes a cornerstone to define the success of the data catalog platform.
Now, you have an excellent context to go deeper into the Success Measures mentioned in the infographic above.
Measure 1: Provides value to end-users
Is the platform delivering the promised value to the end-users?
If the productivity improvement (hours saved) was one of the value propositions for the platform, then end-users should be able to discover data information faster than before. However, the promised value is not delivered if users still gravitate towards the old ways (searching team documents, messaging teammates, and re-writing queries) to get their data-related questions answered. Therefore, using the word ‘promised’ above is critical to know how value (benefit) will be provided, experienced, and attained.
Organizations can measure the following metrics to understand the value platform provides:
Total number of Users onboarded:This metric signifies the maturity of the platform. A high number illustrates the user’s comfort in using the platform for data search and discovery requirements. Another way to measure this metric could be the number of users onboarded across different user personas. For example, Data Scientists and Data Engineers are the most experienced data practitioners in an organization, so a higher number for these two users’ roles answers the question, “Did investment in technology work?”
Number of Users added Weekly/Monthly: This metric tells the pace of platform adoption. A weekly increase in this number will mean that existing platform users are given good feedback to users not onboarded yet and making a case to bring these users to your platform. A faster and seamless user onboarding results in a high number for this metric and also answers the question: “Was cost efficiency achieved in managing platform?”
Number of search queries per day: As discussed before, the search is a core capability of a data catalog platform. This metric shows how often users come to the data catalog platform for their data discovery needs. The question answered here is, “Does the platform serve as a single pane for data-related questions?“
Number of data assets pages viewed: The metric measures whether the users are becoming more active in using the platform. If the value remains the same or declines month on month, the platform adoption is slowing down. An increase in the value of this metric will help answer the question, “Did employee acceptance of the platform improve with time?”
Measure 2: Provides open eco-system
Does the platform promote innovation with easy integration patterns?
An open eco-system provides APIs for integration, supports open-source technologies to market new features faster, and is easy to maintain. For example, if you want to ingest cloud data warehouse (Redshift or Snowflake) metadata information. In that case, an open and flexible platform will provide connectors to bring the metadata.
The following metrics can identify whether the platform is easy to integrate with other applications and systems:
Number of new data sources onboarded: This metric indicates the platform’s flexibility to bring metadata from various sources. As the platform matures, integration with disparate data sources becomes simplified. In addition, this metric answers the question, “Were improvement areas identified to get better?”
Number of applications connected: A higher number of Enterprise applications integrated explains the platform’s openness and ease of integration. The measurement of this metric answers two crucial questions “Does the platform serve as a single pane for data-related questions?” and “Was cost efficiency achieved in managing the platform?”
Cycle time to deploy a feature: The time required to deploy an app should be minimal if the platform implements the deployment methodologies. The questions answered through this metric are “Were improvement areas identified to get better?” and “Did employee acceptance of the platform improve with time?”
Measure 3: Adds value to the organization
Does the platform reduce the cost to find data information?
Daily Active users: The more active users show users have built trust in the platform and are actively using the platform as the primary source of data discovery needs. One of the most critical questions for any organization is “Did investment in technology work?”. A high daily active users count establishes the value of the platform across the organization.
Number of unique data assets pages viewed: The data catalog platform contains enriched metadata for technical datasets, knowledge base articles, published queries, etc. In addition, as the platform’s maturity grows, new data assets get added and curated. Therefore, if users visit many unique data assets pages, the platform’s value proposition to serve as a single pane for data discovery needs is met.
(“Does the platform serve as a single pane for data-related questions?”)
Measure 4: Improves Adoption with New features
Does the platform provide value-adding capabilities?
Total traffic routed to a new feature: Increased use shows users’ interest in exploring the platform capabilities. If high traffic routes to the new feature and the new page observe user stickiness, the unique experience has added value to its users.
Number of users onboarded due to a new feature: If the new feature offers a tremendous productivity improvement for a specific task, then the platform can gain new users. In addition, user training for new features results in better adoption and acceptance of the data catalog rollout.
Lesson Learnt by applying these Success Measures
Any issue with the search functionality will result in a lousy user experience, thus decreasing the number of active users
Not making your user aware of new capabilities results in poor adoption of the data catalog initiative.
Having thousands of data sources in your data catalog may not be helpful to users until you enrich the metadata attributes associated with the data assets.