Whatever you might want to know about a customer, as a marketer, there comes a point where you need their name and address. In both consumer and business markets, confirming that they are a legitimate entity at a valid location significantly improves the performance of everything, from campaign results through to customer service.
Data validation is therefore a central component of any data quality exercise. Simple as it might appear to be to prove a person or business is who they say they are, achieving that proof is actually quite complicated.
With a business, registration at Companies House is a good starting point, except that it excludes non-limited companies and does not yield all the trading addresses out of which a group might operate. With consumers, the Electoral Register now only provides evidence for under 60%of people who have registered to vote thanks to the opt-out rate. (Marketers are not allowed to use the full register for validation, even though credit reference bureaux can.)
If checking against names is challenging, at least address validation is relatively straightforward. Royal Mail’s Postal Address File (PAF) provides a reference set of all deliverable addresses in the UK, built using information from the postal services provider’s own frontline staff as well as a range of other data sources.
“PAF is more robust that it was ten years ago,” says Ian Dawson, manager, data services (marketing) at insurance company More Than. “It wasn’t uncommon then to pick up errors – it is less so now. We give a score to customer-provided addresses in relation to their closeness to PAF. A large proportion are the same, others are 80% to 90%. We get some where the address information is quite poor.”
Some data you have to get right, like deceaseds, death of a relative and goneaways
Rob Frost, Oxfam
Low quality data is flagged for further investigation and enhancement. What the insurance provider wants to ensure is that it can communicate appropriately with a prospect or customer, without bombarding them for requests for more information too early in their lifecycle.
A prospect looking for an insurance quote online may only provide sufficient data to get a price, for example. The insurer also has to consider issues such as whether the customer is giving their current, future or an alternative address.
“The property we are to insure may not be where they currently live. It is not important to validate that address because it could be a second property,” says Dawson. An insurance policy is a legal, binding contract which requires a valid address for correspondence, but not necessarily for the asset being covered.
Fraud prevention is a big concern in the insurance industry, especially identity hijacking where fraudsters are looking to get a policy and then claim, often without having paid a premium. Data quality plays a key role, usually through logic checks and internal comparisons rather than by reference to third party data sets. Patterns are often spotted where the same person claims from multiple addresses, but provides the same mobile phone number each time, for example.
For that reason, More Than has focused more on the use of suppression files to screen for “deceaseds” and “goneaways”. Dawson recently carried out a due diligence exercise to compare the available files. “We use one in each space for deceased and goneaways and like to understand how each is sourced. We made the decision to go to two separate sources. It comes down to a decision based on coverage, price and effectiveness,” he says.
Among marketers who work closely on data quality there is a sense of a gap in the market. Since 2002, when voters were given the right to opt-out from having their personal information used in marketing, the edited Electoral Register has lost is usefulness as a validation tool. Alternative databases have been built by a range of providers that seek to provide coverage of every consumer in the UK.
The challenge is knowing both whether they do contain all those individuals and whether they have got the information right, since there is nothing to check them against. Where a marketer is looking to screen data in case it might contain names of consumers who have died or moved, the source of that suppression data is often not clear, since its owners like to protect their proprietary sources.
For that reason, many data managers start working on data quality using internal measures. “The first thing you have to do is trust what you have on your database,” says Rob Frost, marketing database manager at Oxfam. “We have a lot of people who donate financially, so we assume their data is pretty right.”
A direct debit containing incorrect information will be returned by that person’s bank if something is wrong, providing an immediate data quality flag. As a charity, Oxfam also benefits from the level of positive engagement that its supporters have.
“People tend to phone if they don’t receive our newsletter or if they receive two at the same address. That level of interaction and the fact that people want to be involved with us allows us to talk to them. We make sure in every interaction that we check data where possible,” says Frost.
Oxfam is working on providing online access to supporters’ data so they can correct or change information if they move, for example. Using customer-supplied data is important because “the commercial data out there is hard to trust”, according to Frost.
One of the questions he has about suppression files is their frequency of updating, since this is often not stated. “With the mailing and telephone preference Service files, they do tell you. Those are the only files I have seen do that exercise. There is not that transparency on others,” he says.
Validating an entire customer record is important if they are a committed donor as it allows the charity to maintain income streams. Other data variables also need to be checked to ensure ongoing donor acquisition performs well. Oxfam has recently been testing and modelling the variables it uses to see which are the most predictive. It has then looked across the data market to see which suppliers can provide that variable at the best price.
“If we find an important variable, like telephone number – that is one of our best channels to recruit, renew and upsell supporters – we are using up to four different agencies to buy that,” says Frost. Charities are measured by their administrative costs, so keeping the cost of data quality as low as possible is a constant pressure.
Data quality problems are usually identified at two points – when a business process is not working well or when data is migrated from an old system into a new database. Sometimes the second of these provides an opportunity to resolve the first.
Severn Trent Water has to look after a big infrastructure of water and sewage systems while at the same time ensuring it sends the right bill to customers. Two years ago, it started a migration of its back-office data out of more 20 legacy systems into a new software solution. That involved more than 1 million customer and supplier records and more than 69 million asset records. (An asset is any item with a physical location, from a valve to a pumping station.)
It took the decision to address data quality during the migration. “We established a principle that we would not migrate any data unless it was absolutely mandatory for the new process being implemented,” says Mark Gwynne, a programme manager at Severn Trent Water.
That meant customer and supplier records needed to be validated, but only limited historical data would be transferred. The company used GB Group to check both customer and location records before putting them into the new system.
“We used GB to do deduplication [deleting unwanted data] for us on all 4 million property records we hold. Those are addresses we deliver water to and move sewage from, as well as connection objects, which are our principal assets on a site, like a pump,” says Gwynne.
Proving that these locations were accurate and valid has two benefits for the water provider. It ensures that it is billing the right person or business for the right services and also improves its customer service levels. As part of the data migration, all 1,400 field service engineers are being given mobile access to the information.
“What’s driving our data quality is the customer. The organisation doesn’t want to send an engineer to a job at the wrong location at the wrong time with the wrong equipment. It is about making sure we meet our service level agreements,” says Gwynne. To support this, GB has now added validation keys to 85% of Severn Trent Water’s records to support ongoing data quality.
Data quality is actually regulated in the water services industry by Ofwat and reported annually by companies in the sector. That ensures both visibility and responsibility at a level few marketers in other industries have to worry about.
Like water providers, however, it is the customer who ultimately determines whether a business has got it right. Where the opportunity exists to get data updates directly from the customer, they are the best source going. Where this is not possible, relying on external data validation sources can help fill the gap.
top trends 2010/11 predictions
Ian Dawson Manager, data services (marketing), More Than
“Before 2002, we used the electoral register to validate people at an address. If the register had a different person at that address, we would recognise them as a goneaway. Now we look to track that with commercial goneaway and suppression files. That is a step back to some extent. Previously, we were able to do positive validation of somebody by name at an address and, if we couldn’t, we flagged them as a positive goneaway. Now we are relying on third-party data providers and trusting their sources for goneaways.”
Rob Frost Marketing database manager, Oxfam
“A big part of my activity has been improving our data and showing Oxfam that data is an asset. Some data you have to get right, like deceaseds, death of a relative and goneaways. You have to trust the suppression providers. With deceased data, it is best to err on the side of caution and suppress an individual, so the source doesn’t matter. But we are missing a master set of data on consumers. Good data exists for proving an address, but being able to prove a customer is living at that address would be an asset.”
Monica Howell Data migration manager, Severn Trent Water
“We had a lot of procurement data and problems with contract data. We have carried out a lot of validation work to bring it up to standard, for example appending unique reference keys so we can take out duplicates. We have also been looking at the quality of data coming into the system to make sure it is right, such as on the metering side – when a new property is built, we might get third-party data about the meter that is not right. That is now being picked up. We also have address validation built in to our system.”
Tracy Thompson Delivery transition manager, e.on Energy
“You can’t rely on standalone rules. For example, take a site that may be on a domestic product, but has a business name, such as Blue Arrow. Based on a rule that just looks for business indicators in the name, such as ’limited’, it would not get picked up as a business without looking at the whole customer picture. You also need to be able to make changes, like a premises that used to be a pub and classed as a business being converted by a new customer who needs a domestic product.”
brand in the spotlight
Q&A – E.ON ENERGY AND CLEANING THE DATABASE
Data Strategy (DS): What is the scope of data quality at e.on?
Tracy Thompson, delivery transition manager, e.on Energy (TT): We have spent the last three years working on data quality and it has now reached a new iteration of the roject. It is a cultural, behavioural, organisational issue that needs to be addressed. You can’t just do a data quality initiative and expect it to work.
DS: How did data quality get noticed at e.on?
TT: Departments were telling us they were experiencing problems due to data quality and we had to look into whether that was true. So we broke the issues down into four pots: customer name and address; data ownership; metering; contracts. We focused on customer data and ownership.
DS: What was the main data problem you were tackling?
TT: The issue we had was defining whether a customer was a corporate, SME or domestic customer. That was very difficult because we didn’t know. We had rules on how to define them, but lots of exceptions. It was like peeling an onion – we uncovered flags that had been around for years, but a lot of people didn’t understand them. We had hundreds of flags that could mean different things, but we decided the three-step sector flag was easy to understand. Call centre staff could get their heads around it and it was focused on something the business could get hold of.
DS: How did you start to validate customer data?
TT: We used mainly internal data, but we did an exercise with a third party on a sample of our data to see if they could validate and integrate it. We took a single refrigeration customer with 30 sites that an account manager had been to so we knew the data was accurate. But the customer didn’t have one identity and 30 business addresses, it had multiple identities. So it was very difficult to get a single view of the customer. Imagine if you pulled up a record for Tesco on a single site. You wouldn’t know it had 900 more because each had a different identity.
DS: How did you deal with inaccuracies?
TT: We pulled in data from Companies House and a number of commercial providers, as well as our own sources. We put it all into a tabular spreadsheet that people had to eyeball and make decisions about. If nine out of ten records under a particular flag were a business, we flagged all of them as a business. That took us down from 100,000 records missing a sector to 20,000. Then we developed some rules, like if the name
included “limited”, it was probably a business. That got us down to 2,000 missing flags. At that stage, the programme went up to be a board-level initiative.
top tips you need to know
- Customer-provided data is the most reliable ource for information about changes to status or address. Interactions with customers should include a prompt to validate details where possible.
- The Data Protection Act requires data to be kept accurate and up to date, although there is no definition of how often data needs to be cleaned or to what standard.
- Data quality standards should be based around fitness for purpose, since the cost of achieving accuracy escalates the closer to 100% accuracy you get.
- Reference data sets are valuable for validating names, addresses and identities and are usually built from multiple sources that show persistence in key variables, such as length of tenure at an address.
- Deceased data is collected from a range of sources, including grants of probate on wills and funeral directors, but there is no access to the official register of deaths.
What is validation?
When the identity, existence, status or location of a customer is proved correct by checking it against an external reference data set. The most reliable is Royal Mail’s Postal Address File. Other data sets are built from various sources and vary in accuracy.
Why isn’t there is a single reference database of consumers?
In 2002, voters were given the right to opt-out of having their name and address included in the version of the Electoral Register that is sold for marketing purposes. This edited version now only holds data on 54% voters.