Discovering Data Nuances, Part 3: The Data Team

There’s not a single team leader, CEO or even seasonal intern within an online retail organization that won’t be improved by better data. But data can be overwhelming (hello, omnichannel, location-based, social with no persistent identifiers!), and getting a handle on it tends to be deprioritized when things look “good enough” or if the charts are going up and to the right. In this blog series, guest author Adam Paulisick will explore how clean and unified data can benefit different functional teams and roles, leading to more effective decision making that results in growth today. As more articles are added to the series, they will be linked here:

  1. Discovering Data Nuances, Part 1: The CEO or Founder

  2. Discovering Data Nuances, Part 2: The Marketing Team

Don’t get trapped by your commodity data stack

Gulrez Khan, a senior data scientist for PayPal and formerly Microsoft, recently published a tongue-in-cheek graphic that gets straight to the heart of some of the biggest struggles experienced by data teams at DTC brands. Not only is there so much to do, but before the data can even be useful in creating any models to enhance operations, e-commerce, etc., it has to be cleaned. And cleaning data manually takes for-ever-er. Especially if you’re trying to unify it around IRL customers. 

Enter the line of reasoning: Does it really have to be cleaned? Isn’t it close to being good enough? We’re too busy to clean it. We’ll clean it when we have more resources. It will work for now.

While data cleaning (and unifying customer data from various sources) could take more than half the time of a data scientist, reality probably looks more like a to-do item that constantly moves further down the list in favor of higher priorities. There are immediate needs, after all. Yet another dashboard for a board or investor meeting, website functionality, the next campaign launch or promo released and the list goes on and on and on. So you continue to send raw data through your stack, because no one is really complaining about the accuracy of the intelligence they may or may not be relying on to make decisions. The reality is…this works until it doesn’t. And when it doesn’t, it can lead to MASSIVE overspending or underinvesting on customers.


With clean and unified customer data, your data stack can be worth the money and effort you’ve put into it

Here’s the thing: As a DTC brand with a maturing data capability, you’ve probably invested at least 250k (minimally) to organize and collect and create easier access for important tasks like segmenting and messaging customers in your data stack. You use Fivetran or another ETL software to pipe the data from your online store, marketing platform, subscription service, in-store POS (etc, etc) to a data warehouse or data lake or data lakehouse. From there it goes to an analytics platform like Looker, and teams across your business have data-driven analytics to make decisions. But this stack you’ve spent time and money building isn’t worth its salt if the resulting intelligence isn’t trustworthy. Without data confidence, this asset will never reach its potential value let alone any consistent value at all. 

Nowhere along the way is the data you’ve put so much effort into moving, aggregating and analyzing being cleaned or unified. Of course there are some businesses in this stage that can afford identity resolution bundled in a CDP from Segment or Liveramp (more on this below), but it’s more than likely not in the budget for DTC brands trying to scale. And since you probably have a small-ish data team, spending 50% of someone's time cleaning and resolving customer lists by hand just isn’t realistic. 

According to Cory Ferreira, a product marketing leader for Snowflake, “Having clean, accurate data is essential to becoming a mature data-driven business. As the eco-system evolves, this moment should happen earlier and earlier in an org's journey to data utopia. Until then, the early movers doing this today will see a serious advantage.” 

In five years, we hope clean, unified data will be table stakes and no business will consider operating without taking this step. But until then, it’s a cheap and easy way to extract tremendous value out of your current investment. And it doesn’t have to be hard or take up someone’s valuable time. 


Identity resolution is too complicated and valuable to do by hand

Circling back to identity resolution, we describe this simply as a single view of customer activity. A single source of truth that links every customer data point from every platform around IRL customers. The first two articles of this series (linked above) talk about why this is critical for DTC brands. Suffice it to say: every department can be meaningfully more successful with clean and unified customer data. 

Of course, data has to be cleaned and verified before it can be linked. With the growing amount of customer data coming through your e-commerce store, doing this manually is a herculean task. Hundreds of thousands of rows of data to look through and you’d need to be able to spot every piece of false data and every typo just to be able to link records deterministically (connecting exact matches). Forget about those customers who have used different contact information when making purchases. 

To actually connect every piece of data to an IRL customer requires very sophisticated machine learning modeling plus endless validation rules. (Orita has more than 1,000,000 rules, and counting.) It goes generations beyond simple deterministic matching - matching on exact fields - to create a complete picture of a shopper’s history. This is something that would take an internal team more time than it was worth to build, since it would be out of date by the time it was ready. It also requires a specific skill set combined with extensive experience, so it’s no wonder the big players charge an arm and a leg. But we believe good data should be available to every brand, no matter the size. 

Most of the time, just by removing duplicates, you can already pay for the cost of data cleaning with Orita (it’s less than you think). Today, your brand is likely overpaying every single platform that uses customer data and charges based on the number of records or sends. By removing duplicates, the savings will outweigh the cost. The benefits to the business (better decisions, better marketing results, better inventory forecasting) are the icing on the cake. 

According to Manish Sinha, former corporate CTO of L’Oreal, “Clean data is something that can affect topline growth. Not only can it save money across the data stack and many other areas of the business, but those savings can be redirected to marketing or sales to increase revenue.”


Easier than you think

Clean, unified data doesn’t have to be something that requires additional work or another team member. Hiring and retaining talent is already difficult and your small team is likely tapped out. With the Orita API, data can be cleaned in the background and sent seamlessly to the tools that need it. Working with a modern company that’s not built on legacy systems or ideas means that we work with whatever works for you.

Need to get some of your fellow team members onboard? Send around this page to help them learn more about getting to data confidence and why it’s a critical component for any data stack.


Adam Paulisick is an Adjunct Professor of Entrepreneurship at Carnegie Mellon University and an Advisor to Orita. Adam was previously the Chief Product Officer at the Boston Consulting Group and a Senior Vice President at The Nielsen Company specializing in advertising attribution, identity resolution, and clean room data matching.

Previous
Previous

Discovering Data Nuances, Part 4: The Finance Team

Next
Next

Discovering Data Nuances, Part 2: The Marketing Team