Data Trends: Data Catalogs Hit the Mainstream
Data catalogs are part of a category of tools and practices that have been around for decades – almost as long as databases themselves. Despite their longevity, rarely does the term “data catalog” show up in the same sentence as “strategic business value.” We believe a new generation of data catalog practices and tools – powered by machine learning and supported by analytics – promises to elevate the strategic value of this segment.
The term data catalog is often used synonymously with data governance, master data management or data stewardship. A data catalog is, at heart, a list of all of the data sources used by an organization, the tables of data within those sources and the columns of attributes that make up those tables. Along with these listings, data catalogs can contain additional data (metadata) such as data types, typical values, how the data should be used for analysis, and any derived tables that aggregate or combine multiple other pieces of data.
In the past, data catalogues have been a necessary tool for regulatory compliance or as a check on development teams. More recently, some data catalog software offerings have added intelligence on common queries performed on the data, dashboards that use it, and machine learning models that depend on it – by inspecting code rather than relying solely on humans to type this in directly. We believe several trends are converging to escalate the importance of data catalogs.
1. Data Sources are Proliferating
Organizations are analyzing data from a wider variety of sources than ever. This is fueled by the increasing use of SaaS applications across business functions and accelerated by the ease of moving data from these apps into data warehouses or data lakes using SaaS ETL tools. This new paradigm allows companies to use best-of-breed tools for demand generation, customer marketing, salesforce enablement and customer support, and still be able to assemble all of the touchpoints for a given customer into a unified picture of their lifetime customer journey.
2. Machine Learning has Matured
Machine learning, from algorithms to engineering practices, has matured in a way that allows data cataloging tools to include ML capabilities that add useful context to, and recommendations within, an analyst’s workflow. Previous generations of tools often felt like completely different systems, needed specialized knowledge to use, or provided little more than quickly out-of-date documentation. As a result, data catalogues have often failed to gain widespread usage by the people who could benefit from them most. The newer generation of tools promises to meet the user where they are already working, for example in the SQL query interface, and be as easy (and informative) as Slacking a coworker.
3. Engineering and Data Talent are in Demand
Demand for talent and the widely covered impact of the “great recession” reinforces the importance of data catalogs in facilitating knowledge transfer. Data employees tend to develop deep tribal knowledge, often in opaque but critical pieces of the data foundation. Nuances related to when an upstream process changed, the correct filters to apply to obtain financial numbers, bespoke code that sits behind management dashboards or production models: there are many reasons these employees (and their knowledge) are valued. In our experience, acquiring this knowledge is often a slow process for new employees. Knowledge transfer can be tedious for tenured folks. And with the seemingly insatiable demand for new data and engineering talent and the well-documented volume of employee turnover across industries, your team may run the risk of losing unknown amounts of specialized knowledge for good. Data catalogs can help solve some of this pain by storing in code and presenting in convenient interfaces what has previously been locked away in the minds of a few. They promise to accelerate time to impact by helping new team members find the data they need faster and more accurately and reduce the some of the frustrations of working with data that can themselves contribute to employee turnover.
Data catalogs are difficult to “get right”, but we believe they offer a straightforward promise: to apply data to solve difficult data problems. As you prepare your team to tackle challenges in the year ahead, we encourage you to consider their strategic potential.
Growth Timeline
Don't delete this element! Use it to style the player! :)
Related Experience
Related Content
The content herein reflects the views of Summit Partners and is intended for executives and entrepreneurs considering partnering with Summit Partners.
Get the Latest from Summit Partners
Subscribe to our newsletter to stay up to date on our partners, portfolio, and more.