Data Governance Analyst
September 25, 2021
You might already be aware of how data governance is a critical prerequisite to successfully implementing data science or AI projects. If not, check out some of the other blogs on this site. But what I’m here to tell you about is how your organizational structure and specifically, data science team structure impacts governance within your business.
The three models in which data science and governance teams are structured in most organizations are the Centralized Model, Decentralized Model, and the Hub and Spoke model. I will discuss the effects that these data science team structures have on data governance.
The centralized data science and governance team is often housed within IT or as a separate group. This makes them independent of other groups in the organization, allowing them to centralize best practices and knowledge. Since they are centralized, they are aware of the different governance initiatives for each department, and how they fit together. This is the most common model.
Centralized data science and governance teams have an advantage: they own the entire organizations data governance standards. Data stewards sit within the unit and specialize in governing data of one or more business units. Stewards can either be analysts taking on other duties or is a specialized position(s).
They are independent of other departments, so data governance standards remain centralized.
There is less redundancy, and more efficiency in implementing data governance standards, especially data ingestion, quality, integrity, etc.
Specialists can be developed since the data standards flow through them.
Control of data governance and data quality standards is easier.
The disadvantage of centralized teams is the workload they must carry. In addition to governance duties, centralized teams also perform regular data analysis, engineering, and other duties. Data stewards may not be familiar with the business units’ needs if they sit within a data science team.
If the unit is short-staffed or overwhelmed, governance becomes non-existent.
Even with efficient teams, business alignment with governance sometimes suffers, particularly at the end-user level.
Reporting needs of the business comes first, governance of data often comes second.
IT is often perceived as being responsible for data governance, rather than governance teams.
The decentralized data science and governance teams live within their parent departments. Each department has a data science team with analysts, engineers, and data scientists. Since they are within the business unit, they have domain knowledge – which is important to setting high-quality data governance and quality standards.
These decentralized teams have several data governance advantages.
They work within their parent department and gain a hands-on awareness of their reporting needs, and the governance required.
They gain stronger domain knowledge and develop a strong relationship with data owners.
They have a quicker turnaround time for governance tasks since they are not taking on all the reporting and analysis responsibilities.
Data stewards are from the departments they belong to and have knowledge about the domain, reporting, and governance needs.
Decentralized data science teams are several data governance disadvantages:
The teams are siloed. Governance knowledge transfer is difficult between two different data science teams. This can result in different teams being unaware of data relationships, lineage, and transformation logic.
Differing standards on data governance, data quality, or definitions of business objects.
Lack of support beyond a department level. Teams lack support from the executive level and may end up in direct competition for resources and time needed to implement governance.
Lack of collective ownership for shared data models, processes, and quality standards.
The main downfalls of decentralized data science teams are communication and standardization. While each unit may be knowledgeable about their domain, it comes at a risk of conflicting data quality definitions and data governance standards.
Be on the lookout for Part 2 where I discuss the Hub & Spoke model along with my overall conclusion!