28.8 C
Casper
Monday, July 15, 2024

Data Quality in the Gutter? 3 Root Causes and How to Finally Fix Them

Must read

Data quality issues plague businesses. Let’s explore three main culprits and offer proactive solutions inspired by the oil and gas industry.

In March 2024, dbt Labs, a freemium platform and tools provider focused on helping organizations with their data transformation efforts, completed its State of Analytics Engineering survey of 456 data practitioners and leaders. Among the key data prep findings were these:

  • 57% said poor data quality was their top concern, up from 44% in 2022
  • 57% said they would soon be managing data for AI training. Almost 50% ranked low stakeholder data literacy as a major concern.
  • 44% identified ambiguous data ownership as a top concern.

Unsurprisingly, the data practitioners surveyed said they spend 55% of their time maintaining or organizing data sets. And nearly 40% said integrating data from various sources was their biggest challenge.

The good news was that close to 40% of companies intend to maintain their investment in data quality, platforms, and catalogs, with 10% to 37% planning to increase these investments over the coming year, depending on the investment category.

Also Read: SaaS Security: Essential Stats and Best Practices for 2024

Of course, the big question is how to make the most impact with those investment dollars. Suppose a given company is reporting a major issue with ambiguous data ownership. In that case, ownership squabbles will consume a big portion of the time that could be spent on activities that directly impact data quality.

Three root causes of data quality shortfalls in enterprises

  1. Lack of innovative data collection, management, and reusing culture. Until generative AI and machine learning, in general, took the spotlight, leadership wasn’t placing a priority on data. The focus was on applications. Organizations have had to get to their data through applications that fragmented or trapped the data the applications generated. Many of these applications continue to be underused.
  2. Perpetuation of costly legacy data architecture that doesn’t scale. Most companies don’t realize they spend more and more year over year on what Dataware company Cinchy calls an “integration tax,” with 50% of their IT budgets or more spent on integration because of architecture complexity. Standards-based semantic graph architectures allow more ease of integration.
  3. Failure to see the problem through an Integration tax lens. Overly complicated architectures continue to cause integration costs to spiral out of control. Until organizations learn to connect and contextualize the data layer with the help of a flexible, tiered, unitary data model for reusability at scale, they will face yearly integration tax increases.

Push proactive, organic data management efforts upstream to boost data quality

The closer to the data source, the higher the impact potential on quality. Being proactive about the first-hand collection and focusing your data quality efforts upstream (closer to the source) allows more control.

The further organizations are from the original context of the data collection and the mentality of the person collecting the data, the harder it is to harness and repurpose that data.

An example of effective upstream management processes: An oil field example

Oil industry processes aren’t just about pumping crude out of the ground and converting it to fuel or petrochemicals at centralized refineries.

Besides what happens at refineries, there are many upstream processes, both existing and emerging. Let’s take direct lithium extraction (DLE), which, when done upstream at oil field sites, is on the cusp of commercialization as an example of an emerging, innovative process that parallels what could be done in the data quality/management space.

Active oil fields produce brine, which occurs naturally in geologic formations or from water injection to force oil to wells where it can be pumped out.

DLE takes brine (dirty, salty water), decontaminates it, and then extracts lithium.

DLE operating at Salt Lakes is responsible for ten percent of lithium production for applications such as electric vehicles and mobile phone batteries.

Hard rock lithium mining, by contrast, is responsible for around 60% of production, according to the Sustainable Minerals Institute of the University of Queensland (UQ).

Also Read: Modern Cloud vs. Mature Cloud: Are You Wasting Money?

Volt Lithium in Calgary, Canada, to name one of a number of DLE startups, claims a 99% decontamination and 90% lithium extraction capability, according to UQ. Volt plans to start DLE operations at oil fields in Alberta, Canada, in the third quarter of 2024. DLE’s emerging role in oil field brine processing is timely, given that research firm BMI has predicted a lithium shortage by 2025.

Be a data producer. Own your own data and data lifecycle processes

New, fit-for-purpose data sources mean seeing things from a producer’s point of view and having marketable resources on hand to share, trade, or monetize. Data product advocates anticipate that every data consumer will also become a producer.

Final point: Data quality has been a nagging issue in enterprises for decades, and it’s only worsened. Companies are stuck in a data management rut, still doing things that don’t result in significant improvement.

More articles

Latest posts