A New Age of American Data Sovereignty

Introduction
This paper advances a structural argument that runs against the prevailing consensus in enterprise technology. The consensus holds that artificial intelligence and automated data tools are systematically displacing human research from the B2B data supply chain — that what was once done by people with browsers and judgment will soon be done faster, cheaper, and at greater scale by machines. The consensus is wrong, and the market is beginning to price its incorrectness.
Between 2015 and 2025, thousands of B2B companies were sold data solutions built on a structural dependency that was never disclosed and is now unraveling. What this paper argues was a structural dependency on access to LinkedIn — the ground source of truth for professional identity — and it underwrote the accuracy claims of an entire generation of data providers. As that access is revoked, the data degrades, the providers scramble, and the downstream buyers discover that the infrastructure they were told was proprietary was, in fact, rented. The decade of 2015–2025 will be remembered as the period in which the B2B data industry sold extraction as innovation and left its customers holding the structural risk.
The argument has six components. Together they form a thesis about duration — about what compounds and what decays in the data supply chain that underlies every modern sales and marketing operation.
The first component is architectural. The American B2B data industry has passed through four generations of collection methodology since the 1840s, each building on the infrastructure and errors of the one before. The current generation — built on automated extraction from LinkedIn — created extraordinary scale and extraordinary fragility simultaneously. Understanding these generations reveals that the disruption now underway is not unprecedented. It is a structural reversion to the verification principles that the industry abandoned in pursuit of scale. Section I traces this architecture.
The second component is mechanical. LinkedIn, the ground source of truth for professional identity, is systematically shutting down the unauthorized access on which an entire generation of data providers was built. In March 2025, LinkedIn removed the company pages of Apollo.io and Seamless.AI from its platform. The enforcement campaign has continued and will intensify, because Microsoft is protecting an asset worth an estimated $155–195 billion in standalone enterprise value(7) — an asset generating $17.8 billion in annual revenue from 1.2 billion self-maintained professional profiles. The downstream consequences are already measurable: silent data decay through every layer of the supply chain, cascading degradation through aggregation platforms, and the quiet impairment of every sales and marketing pipeline fed by these tools. Section II examines this disruption.
The third component is financial. The market has begun to distinguish between companies that treat data as expendable commodity and those that treat it as durable infrastructure. ZoomInfo — the largest publicly traded B2B data provider, built primarily on LinkedIn extraction — has seen its stock decline approximately 92% from its all-time highs. Clay — the most celebrated company in the current generation — raised at a $3.1 billion valuation on $100 million in consumption-based revenue, pricing in growth assumptions that depend on the same upstream access now being revoked. Meanwhile, CoStar Group — which has spent $5 billion over 38 years building proprietary data through 1,500 human researchers(24) — has delivered 58 consecutive quarters of double-digit revenue growth(25). The divergence between these postures is the most consequential investment signal in B2B technology. Sections III and IV present this evidence.
The fourth component is economic. The prevailing assumption that 90% data accuracy at a fraction of the cost is a rational trade is demonstrably wrong when evaluated against business outcomes rather than procurement budgets. The relationship between data accuracy and revenue is nonlinear at the margin — the last 10% of accuracy is where deals happen or don't — and the organizational feedback loops that would reveal this are broken across functional silos. Section V quantifies this illusion.
The fifth component is structural. Human research persists not because it is sentimental or legacy, but because it is the only data collection methodology that is simultaneously compliant with LinkedIn's Terms of Service, capable of the judgment-based verification that automated tools cannot replicate, and able to produce proprietary data assets that compound over time rather than decay. The B2B imperative — high contract values, long sales cycles, small pools of qualified decision-makers — makes this structural advantage decisive. Section VI makes this case.
The sixth component is organizational. The companies that invest in proprietary data infrastructure share a common characteristic: they have at least one senior leader who understands the distinction between consuming data and building a data asset. The sophistication gap between organizations that see data as capital formation and those that see it as a procurement line item is the single largest determinant of who captures the value in a B2B data market projected to grow from $5 billion to $15 billion by 2033. Section VII identifies these buyers and explains why the expansion is a forecast about organizational learning, not technology adoption.
What emerges from these six components is a thesis about duration. The companies, the investors, and the operating teams that understand data as long-term infrastructure — that invest in verification through the cycle, that maintain research capacity when others cut it, that recognize compliance and accuracy as appreciating assets rather than depreciable costs — will compound their advantage every quarter. The companies that chase short-term savings on the data line item are making the most expensive cheap decision available to them.
The reader who finishes this paper should think differently about three things: what the current disruption in the data supply chain means for professional data integrity, where the financial risk actually sits in the B2B technology stack, and why the human research layer — far from being displaced — is entering the most structurally favorable environment in its history.

I. The Architecture of B2B Data
A. Four Generations of American Data Provision
The American B2B data industry has evolved through four distinct generations, each defined by its primary collection methodology and its structural relationship with the ground truth.
Generation I: The Manual Aggregators (1840s–1960s). Data was gathered through direct physical observation and manual verification. Pioneers like Dun & Bradstreet established the first standardized business records by deploying agents to verify company identities in person. The methodology was slow, expensive, and accurate. It also created the foundational expectation that business data should be verified before it is trusted — an expectation that the industry would spend the next sixty years eroding.
Generation II: The Relational and Tele-Verification Era (1970s–1990s). The advent of relational databases allowed for the synthesis of large datasets. Providers utilized large-scale call centers to qualify data, establishing a model for gathering custom details — budget authority, organizational hierarchy, decision-making dynamics — that automated systems could not capture. This generation introduced a critical principle: the most commercially valuable data is the data that requires human judgment to collect. Firmographics can be aggregated. Organizational intelligence must be researched.
Generation III: The Digital Extraction Era (2000s–2024). The focus shifted to volume and speed. Providers built massive databases by extracting self-reported data from third-party platforms, principally LinkedIn. This model relied on continuous, automated access to these external sources to maintain accuracy. It worked brilliantly — until it didn't. The generation's defining assumption was that platform access was a permanent condition of the internet. That assumption is now demonstrably false.
Generation IV: The Verified Source Era (2025–Present). As primary platforms restrict automated access, the industry is reverting to verification principles that Generation I understood and Generation III forgot. Verified research is once again the primary method for maintaining data integrity against the 40% annual decay rate typical of purely automated databases. What has changed is the infrastructure surrounding that research: the offshore workforce is larger, better educated, and more systematically managed; the compliance environment demands provenance that automated tools cannot provide; and the commercial value of accurate B2B data has never been higher relative to its cost.
The pattern across these four generations is not a straight line of technological progress. It is a cycle: expansion of scale, erosion of accuracy, reversion to verification. We are at the reversion point.
B. The Supply Chain Architecture
To understand why the current disruption matters, one must understand how B2B professional data has been sourced, enriched, and distributed for the past fifteen years.
The supply chain is deceptively simple. At the top sits LinkedIn, owned by Microsoft, with over one billion professional profiles(1) representing the single most comprehensive and current repository of professional identity data in existence. LinkedIn is the ground source of truth. Every job title, every company affiliation, every career transition, every professional connection is self-reported by the individual and updated in something approaching real time. No other source comes close.
Beneath LinkedIn sits a layer of data providers — ZoomInfo, Apollo, Seamless.AI, Lusha, Cognism, and dozens of others — whose data collection methodologies depend structurally on LinkedIn as the primary source of professional identity data, through access mechanisms LinkedIn has since moved to restrict. These providers then layer additional data (email addresses, phone numbers, technographic signals, intent data) on top of the LinkedIn foundation and sell access through subscription platforms.
Beneath the data providers sits a newer layer of workflow and enrichment platforms — Clay being the most prominent — that aggregate data from multiple providers through "waterfall enrichment." Clay does not own data. It pulls from 100+ third-party sources sequentially, attempting to fill in missing fields by cascading through provider after provider until something returns a result.
Beneath the workflow layer sits the end user: the sales team, the marketing operation, the competitive intelligence function, the recruiting desk. They consume the data, build campaigns on it, make outreach decisions based on it, and measure their pipeline against it.
This architecture has a critical vulnerability that was invisible as long as the top of the chain — LinkedIn — remained accessible. The moment LinkedIn restricts access, the entire chain degrades. Not gradually. Structurally.

II. The Disruption
A. The Enforcement Inflection: LinkedIn Shuts the Door
On March 6, 2025, LinkedIn removed the company pages of Apollo.io and Seamless.AI from its platform.(8) No advance warning. No negotiation period. The pages simply disappeared.
This was not an isolated enforcement action. It was the culmination of years of escalating restrictions. LinkedIn had already fought and won legal battles against data scrapers (the hiQ Labs litigation(9)), implemented increasingly sophisticated bot-detection systems, and publicly stated its opposition to unauthorized data extraction. But the March 2025 action represented something qualitatively different: LinkedIn moved from defensive measures to offensive enforcement against two of the largest data providers in the ecosystem.
The implications extend far beyond Apollo and Seamless. LinkedIn is owned by Microsoft, which has invested billions in AI and has every strategic incentive to protect its data assets from extraction by competing platforms. The enforcement pattern suggests a systematic campaign. First, LinkedIn targeted the most visible offenders — companies that openly marketed Chrome extensions designed to extract data from LinkedIn profiles. Second, LinkedIn expanded enforcement to include companies using indirect scraping methods. The enforcement pattern is expanding beyond the most visible offenders. Third, LinkedIn is simultaneously investing in its own premium data products — Sales Navigator, LinkedIn Recruiter(10) — positioning itself as the only authorized channel for professional data access.
The strategic logic is clear. Microsoft does not intend to allow third-party tools to extract for free what it can monetize through its own premium offerings. The scraping window that built a generation of data companies is closing, and it is not reopening.
LinkedIn's enforcement posture is rational and, from the perspective of its members and shareholders, entirely defensible. The platform hosts over one billion professional profiles — the most comprehensive repository of professional identity data ever assembled. That data was created by its members for their own career advancement, not for extraction by third-party tools. When LinkedIn restricts unauthorized scraping, it is protecting both its members' data and the value of its own premium products. The enforcement campaign is not punitive. It is a company protecting its core asset and directing commercial access through authorized channels. This is what well-managed platforms do.
The structural implication for the data industry, however, is permanent: the era of open extraction is over. LinkedIn's premium products will continue to evolve and serve buyers who want to work within LinkedIn's ecosystem. But for companies that need customized data programs — specific decision-makers, verified organizational structures, tailored targeting criteria — LinkedIn's own products are not designed to serve as replaceable substitutes for dedicated research. They are discovery and outreach tools, not research infrastructure. The gap between what LinkedIn's platform offers and what B2B revenue organizations need is filled either by data providers (whose access is being restricted) or by human research (whose access is not).
Estimating the value of LinkedIn's data moat. To understand why LinkedIn's enforcement posture is permanent and intensifying, it helps to estimate the economic value of what is being protected.
Morningstar assigns Microsoft a wide economic moat(6) — the highest durability rating in its framework — driven by switching costs, network effects, and cost advantages. Morningstar explicitly identifies LinkedIn as a key revenue growth driver alongside Azure, Office 365, and Dynamics 365. The "wide moat" designation means Morningstar's analysts believe the competitive advantage will persist for twenty years or more. LinkedIn is one of the assets underwriting that assessment.
The financial evidence supports the designation. Microsoft acquired LinkedIn in December 2016 for $26.2 billion(3), when the platform generated approximately $3 billion in annual revenue. By fiscal year 2025, LinkedIn's revenue had grown to approximately $17.8 billion(2) — nearly six times the revenue at acquisition — with 1.2 billion members and four consecutive years of double-digit member growth.(4) LinkedIn achieved record $2 billion in Premium subscription revenue in the twelve months through January 2025.(5) LinkedIn's revenue grew 9% year-over-year in fiscal 2025 with growth across all business lines: Talent Solutions, Premium Subscriptions, Marketing Solutions, and Sales Solutions.
A structured estimate of LinkedIn's standalone value can be derived through three complementary approaches. First, an acquisition-multiple approach: Microsoft paid approximately 8.7x revenue in 2016. Applying the same multiple to $17.8 billion in current revenue implies a standalone enterprise value of approximately $155 billion — nearly six times the acquisition price. Second, a comparable-platform approach: professional networking and data platforms with subscription models, high switching costs, and network effects typically trade between 8x and 15x revenue. At the midpoint of that range, LinkedIn's implied value exceeds $195 billion. Third, a replacement-cost approach: LinkedIn's dataset comprises over 1.2 billion self-maintained professional profiles, each updated by the individual for their own career interests. There is no methodology — automated or manual — by which a competitor could replicate this dataset. The replacement cost is functionally infinite.
The reason this valuation matters for the data supply chain is straightforward: Microsoft is protecting an asset worth multiples of what it paid, generating $17.8 billion in annual revenue, and growing. No rational steward of a $150 billion-plus data asset permits unauthorized extraction when that extraction directly competes with the premium products monetizing the same data. The enforcement campaign is not a policy choice. It is a fiduciary obligation.
B. The Downstream Degradation
The immediate impact of LinkedIn's enforcement is not that data providers cease to exist. They continue to operate, they continue to sell subscriptions, and they continue to return results when queried. The impact is more insidious: the data degrades silently.
The data decay problem. Professional data is perishable. People change jobs, companies reorganize, titles evolve, reporting structures shift. The median tenure of private-sector workers is approximately 3.5 years per the U.S. Bureau of Labor Statistics(12), meaning that roughly a quarter of any B2B contact database changes annually — and the turnover among senior decision-makers, who are most commercially valuable and most mobile, runs considerably higher. This has always been true. What masked the problem was continuous refresh — data providers could rescrape LinkedIn periodically to update their records. Remove that refresh mechanism, and the database begins dying from the moment the scraping stops.
The pattern repeats predictably: data that appeared accurate on day one, because it was built on a foundation that humans had verified, exhibits measurable drift by quarter two, with error rates on customized deliverables materially higher by quarter three.
The aggregation problem. The degradation compounds through each layer of the supply chain. Data providers who lose access to LinkedIn cannot refresh their databases. Workflow platforms like Clay that aggregate from those providers cascade through increasingly stale inputs. As the sources feeding the waterfall degrade, so does every output the integrator delivers. The architecture is examined in detail in Section III.C.

III. The Market Evidence
A. ZoomInfo: The Canary in the Data Mine
The market has already begun to price this thesis, and the leading indicator is ZoomInfo. The largest publicly traded B2B data provider has seen its stock decline approximately 92% from its all-time highs(13), with a market capitalization that has declined from peak levels by approximately 92%. Revenue guidance has disappointed analysts, customer counts are declining,(14) and Wells Fargo initiated coverage with an "Underweight" rating(15) — singling ZoomInfo out negatively even among small and mid-cap software peers. These are the current numbers, and they will change. But the structural condition they reflect will not.
The fundamental challenge is permanent, not cyclical. ZoomInfo built its business on one of the largest B2B contact databases in existence. That database requires continuous refresh to maintain accuracy. As LinkedIn tightens enforcement, ZoomInfo's ability to refresh at scale is impaired — not eliminated, but impaired in ways that compound over time. This is not a quarter-to-quarter headwind that management can "execute through." It is a structural degradation of the upstream source on which the entire business model depends. Any company whose primary asset is a database derived from a platform it does not own or control faces this risk indefinitely.
The company is attempting to pivot. It is migrating customers to its "Copilot" platform, pushing upmarket to larger enterprise contracts, and investing in proprietary data collection methods. Per its Q4 2025 earnings call, the migration stands at approximately 20% of total ACV(16) with a continued two-year migration timeline, and the core business continues to face headwinds from customers who are either downgrading or churning to cheaper alternatives.
ZoomInfo's predicament illustrates a broader principle: business models built on unauthorized access to a platform's data are structurally fragile. The access can be revoked at any time, and when it is, the entire value proposition degrades. This is not a technology problem. It is a governance problem.
B. Clay and ZoomInfo: The Structural Rhyme
Clay is the most celebrated company in the current generation of B2B data tools, and deservedly so. The product is elegant. The growth is extraordinary — from negligible revenue to $100 million in approximately two years, reaching that milestone in December 2025.(18) The Series C closed at a $3.1 billion valuation in August 2025, led by Alphabet's CapitalG, with Sequoia and Meritech participating.(17) Over 10,000 customers include OpenAI, Anthropic, Canva, and Rippling.(19) The "GTM Engineer" role that Clay helped pioneer has become a genuine career category. None of this is accidental. Kareem Amin and his team built something real.
And yet the structural similarities between Clay's current trajectory and ZoomInfo's pre-IPO trajectory are eerie enough to warrant careful examination.
ZoomInfo IPO'd in June 2020, raising $934 million at an $8 billion valuation.(20) Revenue had grown 103% in 2019 to $293 million, driven by new customers and the DiscoverOrg acquisition. The adjusted operating margin was 51%. The growth narrative was irresistible: largest B2B contact database, expanding customer base, AI-powered intelligence, platform stickiness. Wall Street valued it at 20x revenue, a premium that assumed continued frictionless access to the data that made the entire platform work. By late 2025, the stock had declined roughly 92% from its all-time highs, revenue guidance was missing, customer counts were declining, and the structural thesis had inverted. The data that made ZoomInfo valuable was the same data it could no longer reliably refresh.
Clay's trajectory rhymes. The revenue growth is faster. The valuation multiple is higher — $3.1 billion on approximately $100 million in revenue is 31x, a premium that prices in continued hypergrowth and assumes the underlying data infrastructure remains intact. The product is genuinely differentiated. But the structural dependency is identical: Clay does not own data. It aggregates from 100+ third-party providers whose own databases are, in most cases, derivatives of LinkedIn. Clay's "waterfall enrichment" is architecturally brilliant and structurally fragile for the same reason — it cascades through providers whose upstream source is being systematically restricted.
The question is not whether Clay is a good product. It is. The question is what happens to a $3.1 billion valuation when the data providers feeding the waterfall begin to degrade. ZoomInfo's story provides the answer: the product continues to work, the interface remains polished, and the results slowly become less reliable in ways that are invisible to the end user until pipeline quality declines.
The temporal parallel is the sharpest signal. ZoomInfo's growth phase (2018–2021) occurred during a period of maximum LinkedIn permissiveness — scraping was widespread, enforcement was sporadic, and the refresh cycle was uninterrupted. Clay's growth phase (2023–2025) is occurring at the opposite end of the enforcement cycle — LinkedIn is actively restricting access, major providers have been de-platformed, and the refresh cycle is impaired. ZoomInfo built its castle during a period of calm. Clay is building during a storm. The castle may be more cleverly designed, but the foundation is the same sand.
There is an additional structural observation worth making about Clay's revenue model. The company's growth — $1 million to $100 million in approximately two years — is extraordinary by any standard. But the revenue is consumption-based, not subscription-based.(21) Clay charges through a credit system where customers pay per enrichment action. This distinction matters for durability analysis. Subscription revenue recurs predictably regardless of usage. Credit-based revenue is a function of consumption volume, which means it is more volatile than it appears in a growth phase. If customers begin experiencing stale results from the waterfall — and the upstream degradation makes this increasingly likely — they may reduce credit consumption without formally churning. The revenue decelerates without any visible customer loss.
The secondary market activity is also worth noting. Clay has publicly facilitated secondary sales for employees(49) — liquidity events that allow insiders to convert equity to cash in advance of any IPO or acquisition. Secondary sales are common at high-growth private companies and are not inherently a negative signal. But at 31x revenue, they do raise a question about internal conviction. If insiders believed the $3.1 billion valuation was the floor, the rational action would be to hold. Secondary sales at this stage suggest that at least some insiders prefer liquidity now to the risk-adjusted value of waiting. This is a governance signal, not a condemnation — but it is the kind of signal that sophisticated PE investors and institutional buyers notice.
This is not a prediction that Clay will fail. The company has real revenue, real customers, and real product-market fit. But a $3.1 billion valuation on $100 million in revenue requires a theory about what the data looks like at $300 million, $500 million, and $1 billion — and that theory depends on the same upstream access assumptions that ZoomInfo's valuation once depended on, and that have since proven false.
C. Owning, Renting, and Integrating: A Taxonomy of Data Relationships
The B2B data industry uses language that obscures the most important structural distinction in the market. Companies describe themselves as data providers, data platforms, data enrichment tools, sales intelligence systems, GTM operating systems. The terminology changes with each funding cycle. The underlying economics do not. Every company in the data value chain occupies one of three structural positions — owner, renter, or integrator — and the durability of its business depends almost entirely on which position it holds.
Owners control the source data. They generate it, collect it, verify it, and maintain it through mechanisms they govern. LinkedIn owns its data because 1.2 billion professionals maintain their own profiles on the platform, creating a self-reinforcing cycle that no external party can replicate. CoStar owns its data because it has spent $5 billion over 38 years employing researchers to verify commercial properties firsthand. Bloomberg owns its data because its terminal network generates proprietary financial information that exists nowhere else. S&P and Moody's own their ratings. FICO owns its scoring models. Dun & Bradstreet owns its DUNS numbering system. In every case, the ownership is structural — it cannot be competed away by a faster product or a lower price, because the data itself is the product. Owners build moats that compound over decades.
Renters access someone else's data and resell it with additional layers — enrichment, formatting, workflow, intent signals — that make the underlying data more usable. ZoomInfo is a renter. Apollo is a renter. Seamless.AI is a renter. Lusha, Cognism, and dozens of smaller providers are renters. Rented data is, by definition, commodity data. If any competitor can purchase the same subscription and receive the same contacts, the same firmographics, the same enrichment fields, then no one who buys it has a competitive advantage. The data is shared, undifferentiated, and available to every participant in the market at the same price. This is the textbook definition of a commodity. And commodities do not compound. They compress.
The post-ZIRP environment has been devastating to data renters. During the zero-interest-rate era, companies tolerated commodity data spend because capital was free and growth forgave inefficiency. When capital became expensive, every line item faced scrutiny — and commodity line items faced the most. ZoomInfo's customer count did not decline because the product stopped working. It declined because procurement teams looked at a subscription delivering the same data their competitors received and asked the obvious question: what are we actually buying? The rental model works well as long as the owner permits access and the buyer does not demand differentiation. When the owner restricts access — as LinkedIn is now doing — the renter's asset depreciates. When the buyer demands differentiation — as post-ZIRP budget discipline requires — the renter has nothing proprietary to offer.
Automation accelerates this compression. Every advance in automated data collection, every new enrichment API, every workflow tool that aggregates commodity sources increases the supply of undifferentiated data and decreases its marginal value. Automation does not make commodity data more valuable. It makes it cheaper and more abundant, which is the same thing as making it worthless. The only data that holds value in an environment of accelerating automation is data that cannot be replicated by automated means: proprietary data, collected through methods that require human judgment, verified through contextual understanding that cannot be commoditized, and owned by the entity that commissioned it.
The "proprietary" distortion. The marketing language of the rental tier is worth examining because it reveals how the structural distinction between owning and renting is deliberately obscured. ZoomInfo, the largest renter, describes its data as gathered through "proprietary technology, machine learning, public sources, and a contributory network."(42) Read carefully, the claim is precise: the technology is proprietary. The data is not. The company's actual sources — per its own public disclosures — include web crawling across 28 million domains daily(43) (public data available to anyone), a "contributory network" of ZoomInfo Lite users who trade access to their email signatures and contact books for a free tier (an extraction exchange, not original research), third-party data resellers (the same commodity data everyone buys), and automated ML processing (proprietary tools operating on non-proprietary inputs). Wikipedia — which the company cannot edit without scrutiny — describes ZoomInfo flatly as "a registered data broker"(44) that "collects and sells personal data through various means of data and web scraping." Compare this to CoStar's "proprietary," which refers to the data itself — $5 billion invested over 38 years in physically verifying properties no one else has verified. Or LinkedIn's "proprietary," which refers to the network — 1.2 billion professionals maintaining their own profiles. ZoomInfo's "proprietary" refers to the algorithms that process what they collect from other people's proprietary assets. The word is doing different work in each case, and the marketing language is engineered so the buyer does not notice the difference.
Integrators do not even rent the data. They route traffic between renters and present the results. Clay is an integrator. Its waterfall enrichment sends API calls to ZoomInfo, Apollo, Lusha, Clearbit, and dozens of other providers — renters who are themselves dependent on LinkedIn's permission. The integrator captures margin on the orchestration, and the value proposition is real: the workflow is elegant, the interface saves time, and the multi-source approach improves coverage relative to any single provider. But the integrator is structurally the most exposed position in the value chain, because it is two levels removed from the source. When LinkedIn restricts the renters, the renters' data degrades. When the renters' data degrades, the integrator's waterfall cascades through stale inputs. The integrator has no mechanism to fix this, because it controls neither the source nor the intermediary layer.
The B2B data moat hierarchy. To place this dynamic in broader context, consider the ten most valuable proprietary data assets in B2B, ranked by implied or observable enterprise value. The methodology is straightforward: for public companies, enterprise value is derived from market capitalization; for private companies, implied valuations are estimated from revenue multiples, majority-owner net worth disclosures, or acquisition comparables.
LinkedIn (Microsoft) — Professional identity. ~$17.8B revenue. Implied standalone value: $155–195B. 1.2 billion self-maintained profiles. Moat source: network effects, self-reinforcing data refresh, no substitute dataset. Morningstar wide moat (as component of Microsoft).
Bloomberg Terminal (Bloomberg LP) — Financial markets. ~$12.5B estimated revenue. Implied value: $70–100B (Michael Bloomberg's 88% stake supports a Forbes net worth of $104.7B). 325,000+ terminal subscribers.(60) Moat source: proprietary data, workflow integration, switching costs. Morningstar does not rate (private).
S&P Global (SPGI) — Credit ratings, indices, market intelligence. ~$14.2B revenue. Market cap: ~$155B. Moat source: regulatory entrenchment (NRSRO designation), indices as de facto benchmarks, switching costs. Morningstar wide moat.
Moody's (MCO) — Credit ratings, analytics, risk assessment. ~$7B revenue. Market cap: ~$85B. Moat source: NRSRO regulatory designation, institutional entrenchment, brand. Morningstar wide moat.
FICO (FICO) — Consumer and commercial credit scoring. ~$1.7B revenue. Market cap: ~$50B. Moat source: de facto standard in lending decisions, regulatory embedding, switching costs. Morningstar wide moat.
MSCI (MSCI) — Indices, ESG ratings, portfolio analytics. ~$2.8B revenue. Market cap: ~$45B. Moat source: index licensing (trillions benchmarked to MSCI indices), switching costs. Morningstar wide moat.
Verisk Analytics (VRSK) — Insurance risk data and analytics. ~$3.1B revenue. Market cap: ~$31B. Moat source: 50+ years of proprietary insurance loss data, regulatory utility status, 95%+ retention. Morningstar wide moat.
CoStar Group (CSGP) — Commercial real estate. ~$3.1B revenue. Market cap: ~$30B. Moat source: 38 years and $5B+ invested in proprietary data collection, 1,500+ researchers, 95%+ recurring revenue. Morningstar narrow moat.(57)
Dun & Bradstreet (DNB) — Business credit and commercial data. ~$2.4B revenue. Taken private in 2024 at ~$6B. Moat source: 185-year operating history (founded 1841),(59) DUNS numbering system as de facto commercial identifier, government and trade credit integration. Not currently rated.
ZoomInfo (ZI) — B2B contact and sales intelligence. ~$1.2B revenue. Stock down approximately 92% from all-time high. Moat source: database scale — but critically, the database is derived from platforms the company does not own, principally LinkedIn. Wells Fargo "Underweight" rating. Morningstar does not assign a moat.
The pattern is unmistakable. The nine companies at the top of this list own their source data — through network effects, regulatory designation, accumulated investment, or institutional standard-setting. ZoomInfo sits at the bottom because its data asset is structurally derivative: built primarily through extraction from LinkedIn and other platforms the company does not control. When those platforms enforce their rights, the asset depreciates. For PE investors evaluating data businesses, this hierarchy provides a governance framework. The question is not "how large is the database?" It is "who owns the source?"
Where value accrues in data businesses. The LinkedIn and CoStar cases offer an instructive parallel. In both companies, the competitive advantage resides overwhelmingly in the data asset, not in the user interface. Independent reviewers have noted this clearly. Research.com's 2026 review of CoStar describes an "outdated and non-intuitive interface causing a steep learning curve"(39) with "limited customization" and "lack of advanced data visualization tools." Nakisa characterizes CoStar as "clunky and complex." Bertrand Duperrin, writing in March 2025, observed that LinkedIn's interface has seen "no major developments since 2017"(40) and described the user experience as "confusing and unpleasant." Product is Life concluded that LinkedIn "has been winning primarily because of lack of decent competition."
Yet CoStar has delivered 58 consecutive quarters of double-digit revenue growth. LinkedIn's revenue has grown from $3 billion to $17.8 billion since the acquisition, with engagement accelerating. The interface draws criticism. The revenue compounds. This is the structural signature of a data moat.
Andreessen Horowitz's research on data moats argues that defensibility "is not inherent to data itself"(41) but acknowledges that network effects — the specific type of moat LinkedIn possesses — are the strongest and rarest form of competitive advantage. LinkedIn's moat is a self-reinforcing incentive loop: professionals update their own profiles because doing so serves their career interests. The data refreshes itself. The interface is secondary. It is, to put it plainly, the part of the technology stack that gets offshored. The data collection and verification — the work that creates and protects the moat — is the part that cannot be commoditized and compounds in value the longer it is sustained.
The American data sovereignty pattern. This taxonomy maps directly onto the structure of American economic advantage in data. The owners are overwhelmingly American companies — LinkedIn, Bloomberg, S&P Global, Moody's, FICO, MSCI, Verisk, CoStar, Dun & Bradstreet. These institutions represent some of the most durable competitive moats in the history of American capitalism. American companies design the systems, own the intellectual property, and govern the standards. The labor that maintains these moats has always been global. CoStar employs researchers across multiple geographies. Bloomberg operates data collection worldwide. The pattern is not new: American companies own the asset, direct the strategy, and deploy execution globally at cost structures that make the service economically viable. The ownership and the value never leave. The execution follows the talent.
This is the same model that a B2B company adopts when it commissions a managed human research program staffed by researchers who access LinkedIn compliantly, verify contacts one at a time, and build a proprietary data asset that sits in the American company's CRM. The American company owns the output. The American company directs the methodology. The American company's balance sheet carries the appreciating asset. The companies that understand this distinction — between owning, renting, and integrating — make fundamentally different investment decisions. They do not ask "which data tool is cheapest?" They ask "who owns the source, and what is my structural relationship to it?"
D. The Case for Proprietary Data Programs
The ZoomInfo and Clay trajectories, examined together, point toward a conclusion that the B2B data industry has been reluctant to articulate: subscribing to a data provider is not the same as building a data asset.
When a company purchases a ZoomInfo subscription or a Clay license, it gains temporary access to a shared commodity. Every competitor in the same vertical can purchase the same subscription and receive the same contacts, the same firmographics, the same enrichment fields. The data is rented, not owned. It expires when the subscription lapses. It cannot be customized beyond the fields the provider has chosen to collect. And it degrades at the same rate for every customer simultaneously, because every customer is drawing from the same pool.
A proprietary data program — a managed human research engagement structured as a private service arrangement — produces something fundamentally different. The output is a data asset that belongs to the client. It reflects the client's specific ideal customer profile, their industry vertical, their competitive landscape, their organizational intelligence requirements. It accumulates over time. A researcher who has spent twelve months mapping the decision-making structure at a client's top 200 target accounts has built institutional knowledge that no subscription database contains. That knowledge compounds. The second year of a proprietary program is more valuable than the first, because the researcher understands the client's market, recognizes patterns, and can identify emerging opportunities before they surface in any automated system.
The distinction is between consumption and accumulation. A data subscription is consumed monthly and replaced. A proprietary data program accumulates monthly and appreciates. Over a three-to-five-year period, the company with the proprietary program has built a competitive intelligence infrastructure that cannot be purchased at any price — because it reflects research, judgment, and institutional knowledge that exists nowhere else.
The practical implication for B2B companies is this: the investment in a managed human research program is not a cost. It is capital formation. The company is building an asset — a verified, customized, proprietary database of decision-makers, organizational structures, and competitive intelligence that no competitor possesses and no subscription can replicate. When a company cancels a ZoomInfo subscription, the data disappears. When a company invests in a proprietary research program, the data endures — and the longer the program runs, the more valuable it becomes.
The market has not yet internalized this distinction. Most procurement teams still evaluate data spend as an operating expense to be minimized. The companies that reclassify it as capital investment — that understand they are building an asset, not purchasing a commodity — will emerge from this period with infrastructure their competitors cannot replicate on any timeline. This is the most underappreciated structural advantage in B2B: the company that builds its own data does not need to worry about what happens when LinkedIn restricts access, when ZoomInfo's database degrades, or when Clay's waterfall runs dry. The asset is theirs.

IV. Duration: The Institutional Standard
A. CoStar: Thirty-Eight Years of Compounding
The commercial real estate sector illustrates what disciplined data investment looks like when executed with institutional patience. Andrew Florance, the founder of CoStar Group and a physics lab partner of Jeff Bezos at Princeton, built the industry's dominant data platform by doing precisely what Generation III data companies refused to do: investing in proprietary collection infrastructure rather than extracting from third-party sources.
CoStar's competitive advantage is rooted in a proprietary database cultivated over 38 years through direct data collection and verification — an investment exceeding $5 billion, maintained by over 1,500 researchers. The company maintains a substantial research workforce that physically and digitally verifies property data, ensuring the information is not a derivative of other sources. This model ensures that the asset on the balance sheet remains accurate through consistent maintenance — the same principle that governs durable data moats in any vertical.
A significant divergence has emerged between market sentiment and the operational performance of this approach — and it is the kind of divergence that recurs in every market cycle.
The current snapshot: as of early 2026, CoStar shares reflect market skepticism about CRE industry headwinds and the investment intensity of Florance’s residential platform strategy. This is a familiar dynamic — the market struggles to value companies in the investment phase of a long-duration strategy, because the income statement shows spending today and the return arrives in years, not quarters.
The durable reality beneath the snapshot: CoStar has delivered 58 consecutive quarters of double-digit revenue growth. Over 95% of that revenue is recurring through subscription.(58) The stock price is a sentiment indicator. The revenue trajectory is a structural indicator. They are telling different stories, and one of them is wrong.
The contrast with ZoomInfo is instructive. Both companies experienced significant stock drawdowns. But ZoomInfo's drawdown reflects structural impairment of the underlying business model — loss of upstream access, customer churn, declining revenue guidance. CoStar's drawdown reflects market sentiment about CRE broadly and investment-phase optics, while the operational engine accelerates. One company's data moat is degrading. The other's is compounding. The market will eventually distinguish between the two. Duration investors already have.
B. The Builder and the Flippers
The divergence between CoStar and ZoomInfo is not merely financial. It is philosophical — and the philosophy explains the finance.
Andrew Florance started CoStar in 1987 with $10,000, working from his parents' basement after graduating from Princeton.(26) He did not raise venture capital. He did not optimize for a liquidity event. He built a data collection operation by hand — literally sending researchers to verify commercial properties one at a time — and he kept building it for thirty-eight years. When the Australian Financial Review profiled him in August 2025 as he completed CoStar's $1.9 billion acquisition of Domain(30), the headline captured something essential about the man: "I am willing to eat nails."(27) Florance started with nothing, became homeless at seven, supported himself by eleven, earned his way into Princeton, and has since built a $40 billion company by doing the same thing every single day for nearly four decades: investing in proprietary data collection when cheaper alternatives were available.
His philosophy is explicit. "We're not only thinking about this quarter or next quarter," Florance has said.(28) "We're thinking about what sort of investments we're making to digitize real estate over multiple years, and we are committed to it." When activist investors at Third Point demanded he cut spending on Homes.com(29) — a residential platform that has consumed billions in investment — Florance's response was to keep building, citing the same multi-decade logic that turned CoStar's commercial business into a monopoly. With Warren Buffett stepping back, Florance is now one of the longest-serving CEOs on the S&P 500.(32) He still sits on the Princeton Board of Trustees. His wealth is CoStar stock. His conviction is on the table every day. He is an owner in the deepest sense of the word.
At his 2019 VCU commencement address(31), Florance described the lesson he learned as a twelve-year-old at a music school in New York City: that getting closer to perfection is a joyous experience. He has applied that lesson to data collection for nearly four decades. The 1,500 researchers verifying commercial properties are the institutional expression of that philosophy.
The parallel trade: long Florance, short Schuck. The CoStar-ZoomInfo comparison is more instructive than any analyst report on either company because it has never been made. The two companies are covered by different analysts in different sectors — CoStar by real estate analysts who compare it to Zillow and Rightmove, ZoomInfo by software analysts who compare it to HubSpot and Salesforce. The analyst communities have never intersected. Neither has the investor base. Nobody has ever placed both companies in the same analytical frame and asked the structural question: what happens when one data company owns its source and the other rents it?
The answer is a parallel trade. Long CoStar, short ZoomInfo — not as a financial recommendation, but as a structural thesis about which approach to data compounds and which decays.
Henry Schuck started DiscoverOrg in 2007, acquired ZoomInfo in 2019, and took the combined company public in June 2020 at an $8 billion valuation(45) — one of the largest tech IPOs conducted during a global pandemic. The timing was exquisite: peak ZIRP, peak remote work, peak demand for digital sales tools. Wall Street valued the company at 20x revenue on the assumption that frictionless access to LinkedIn's data would persist indefinitely. The PE sponsors who backed ZoomInfo through successive rounds — including TA Associates and Carlyle — realized extraordinary returns. TA Associates' original $90 million investment was valued at $6 billion in paper gains within five years.(22) Carlyle owned 31% at IPO for a 13x paper gain after just two years.(23) Both firms sold significant holdings in the two years following the IPO, converting paper gains to realized returns while the structural thesis remained intact.
The stock has since declined approximately 92% from its all-time highs. The product did not change. The structural dependency was always there. The PE sponsors timed their exit before the dependency was priced. The public shareholders who bought at $60 or $40 or $30 are holding the structural risk that the private investors priced themselves out of.
This is not a criticism of Schuck's intelligence or execution. It is an observation about incentive structures and time horizons. Schuck built a genuinely impressive company. He reached $1 billion in ARR faster than almost any enterprise software company in history.(46) But the architecture was built for a window — a period of maximum LinkedIn permissiveness, maximum ZIRP capital availability, and maximum tolerance for commodity data spend. When the window closed, the architecture was exposed. The product worked. The lease expired.
Florance's architecture has no lease. He owns the data. He employs the researchers. He controls the methodology. When the market turns against him — as it has periodically for 38 years — he continues investing because the asset does not depend on anyone else's permission to exist.

C. The Litigation Doctrine: Protecting What You Built
Florance’s commitment to duration extends beyond investment strategy. He has built what is arguably the most aggressive intellectual property enforcement apparatus of any data moat company in the world — and his litigation doctrine provides the clearest preview of how LinkedIn and other primary-source data companies will protect their assets in the years ahead.
CoStar’s litigation history is not a series of isolated legal disputes. It is a systematic doctrine applied consistently across targets of different sizes, structures, and institutional backing — with complete resolution as the only acceptable outcome.
The doctrine has two defining characteristics. First, Florance does not pursue cease-and-desist letters, licensing negotiations, or settlements that allow the extraction to continue at a royalty rate. He pursues complete dissolution of the infringing operation. Second, he has applied this doctrine against institutionally backed competitors with the resources to fight — and won completely every time.
Understanding what that means in practice requires examining both cases in full.
The Xceligent Case: What Complete Enforcement Looks Like
By 2016, Xceligent had established itself as the most credible challenger CoStar had faced in the commercial real estate data market. The company was founded in 1998 by Doug Curry and had operated for nearly two decades, building data collection centers across the Philippines and India. It had institutional backing from DMGT — Daily Mail and General Trust — a publicly traded British media conglomerate with global legal resources, deep capital reserves, and every strategic incentive to sustain a prolonged legal fight. DMGT had invested approximately $150 million in Xceligent(36) as the vehicle to break CoStar’s dominance in commercial real estate data. That bet came with the capital and organizational resources to defend it.
This was not an underfunded startup that lacked the means to fight back. DMGT is one of Britain’s largest media companies. Its investment in Xceligent was a deliberate, well-capitalized attempt to build a competing platform using institutional resources. The capital was real. The strategic intent was clear. The legal resources were substantial.
Florance’s response was not a federal filing followed by years of discovery and a settlement negotiation. It was a coordinated multi-jurisdictional enforcement operation designed to be executed faster than the target could respond.
In December 2016, CoStar filed suit against Xceligent(33) for copyright infringement on what the court termed an “industrial scale.” Simultaneously — on the same day — private investigators entered Xceligent’s Philippines data operations at Avion BPO Corp. in Pasig City. The operation yielded 35 terabytes of data and the seizure of hundreds of computers.(34) In India, a permanent injunction was obtained against MaxVal Technologies, another Xceligent contractor, for accessing CoStar’s websites without authorization and uploading stolen content into Xceligent’s database. Xceligent’s offshore researchers had created more than 3,000 fake CoStar accounts, rotated IP addresses to evade detection, and used proxy servers to circumvent security measures. The operation crossed three sovereign jurisdictions simultaneously — United States, Philippines, India — each requiring separate legal coordination, each executed on the same timeline.
What followed was complete. Criminal indictments were obtained against directors of the Philippines vendor. The court entered a $500 million judgment against Xceligent — the largest-ever settlement in a copyrighted image lawsuit.(35) Within a year of CoStar filing suit, Xceligent filed for Chapter 7 bankruptcy and dissolved. DMGT wrote down its entire investment to zero(36) — a total loss on a strategic bet backed by one of Britain’s largest media companies. Florance spent more than double what he recovered in legal fees and investigation costs. When asked about the economics, he was direct: the purpose was deterrence, not recovery.
The message the enforcement sent was precise. It was not: we will make extraction expensive. It was: we will make extraction terminal.
The market understood. In the 34 months following the Xceligent filing, CoStar’s stock nearly tripled — from $183 to $572 per share. DMGT’s shares dropped 24% to a five-year low. Wall Street did not read the enforcement as a legal expense. It read it as a capital investment in the structural integrity of the moat. The market rewards asset protection.
DMGT discovered that institutional capital deployed in defense of a data extraction operation is not a match for a primary source owner who has built the legal architecture, the evidentiary record, and the operational willingness to pursue complete resolution regardless of the cost or complexity of doing so. That discovery is now in the permanent record. Every subsequent operator in the commercial real estate data space — and every institutional investor considering backing one — has had access to that record since the day Xceligent dissolved.
The CREXi Case: The Doctrine Applied Again
Xceligent might have been dismissed as an extreme response to an extreme provocation — a one-time escalation that demonstrated capability without establishing pattern. CREXi removed that interpretation entirely.
CREXi was a Los Angeles-based commercial real estate marketplace founded in 2015 by Michael DeGiorgio, a former executive at Ten-X. It had raised approximately $55 million across multiple funding rounds from institutional backers including Jackson Square Ventures, Mitsubishi Estate, Industry Ventures, and Prudence Holdings. By 2022, Forbes reported a $500 million valuation. CREXi’s founding investor estimated publicly that the company would soon reach a multi-billion-dollar valuation.
This was, again, not an unfunded operation. Jackson Square Ventures is a San Francisco-based institutional fund. Mitsubishi Estate is one of the largest real estate companies in the world. The capital behind CREXi was serious, institutional, and deployed with full knowledge that the company was entering CoStar’s territory. The investors funded the operation. The operation extracted CoStar’s content. The enforcement followed.
CoStar filed suit against CREXi in September 2020. Florance described the infringement publicly as larger in scope than Xceligent.
The federal record that emerged from litigation is unambiguous and detailed.
In June 2025, the U.S. District Court for the Central District of California issued a comprehensive opinion finding that CREXi had deliberately copied and cropped thousands of CoStar’s copyrighted photographs via an elaborate offshore scheme involving India-based agents(37) — the same geography the Xceligent enforcement had visited a decade earlier. The court found the evidence of deliberate misconduct overwhelming.
Internal CREXi documents produced in discovery told a story of institutional policy, not individual error. A company document explicitly instructed employees: take a screenshot of the photos, crop the watermark to ensure that the watermark logo is removed. CREXi’s Manager of Business Operations confirmed the instruction in writing. CREXi’s Chief Operating Officer and former Vice President of Revenue acknowledged in internal communications that CREXi’s business model revolved around taking listings from LoopNet. The company used offshore agents, proxy servers, and fraudulent accounts to conceal the activity from CoStar’s security protocols.
The court saw through CREXi’s attempts to blame broker customers for the infringement. It found that CREXi and its offshore teams had deliberately copied from CoStar’s LoopNet site, taken screenshots of CoStar’s images, cropped the CoStar watermark, and done so pursuant to established company policy — before, during, and after the litigation commenced.
Simultaneously, CoStar pursued the Indian BPOs directly — the same cross-border enforcement architecture deployed against Xceligent’s offshore contractors. Arcgate Teleservices in Udaipur was subjected to a permanent injunction and a decree from the Rajasthan High Court. Neptune Business Solutions in Chennai was sued in the Madras High Court, which ordered forensic analysis of the BPO’s devices. Yansh Technologies and 247 Web Support in Delhi had their content seized by court-appointed commissioners.
Trial on damages is pending. CoStar’s General Counsel stated publicly in June 2025: “CoStar is prepared to take CREXi to trial for its international scheme of mass infringement. CREXi can no longer hide from its well-documented policy to copy and crop images from CoStar, and the mountains of evidence that its employees did just that.”
The enforcement architecture that dissolved Xceligent is now aimed at CREXi. The target is different. The doctrine is identical.
The Zillow Action: Enforcement Enters New Territory
In July 2025, CoStar sued Zillow(38) — the dominant residential real estate platform — for copyright infringement involving tens of thousands of CoStar’s watermarked photographs displayed on Zillow’s sites and syndicated across Zillow’s partner network including Redfin and Realtor.com.
This was CoStar’s first legal action against a top residential competitor, signaling that the same enforcement posture that destroyed Xceligent and is dismantling CREXi will now be applied to the residential market Florance is entering through Homes.com. The enforcement doctrine is not confined to the commercial real estate territory Florance has dominated for 38 years. It travels with him into every market he enters.
What the Cases Establish Together
Xceligent and CREXi are not coincidences separated by a decade. They are the first and second application of a doctrine that Florance built deliberately and has demonstrated the willingness to apply at full cost regardless of the target’s size, backing, or legal resources.
The pattern is precise. In both cases, CoStar identified a competitor building on extracted primary source content. In both cases, the competitor had serious institutional backing — DMGT in the first instance, Jackson Square Ventures and Mitsubishi Estate in the second — and the resources to sustain prolonged litigation. In both cases, the institutional capital was not a defense. It was the funding mechanism for the infringement. The investors did not back scrappy startups that accidentally crossed a line. They backed operations that industrialized extraction as a business model. The capital made the infringement possible at scale.
In both cases, Florance pursued complete resolution rather than a negotiated outcome that would have allowed the extraction to continue. In both cases, the evidentiary record was overwhelming before enforcement commenced — because CoStar’s internal documentation practices, access control systems, and legal preparation had been building the case before the filing. And in both cases, the enforcement crossed international borders — Philippines, India, multiple Indian states — because the extraction operations were deliberately structured offshore to complicate enforcement. Florance pursued them anyway.
The doctrine does not scale with the size of the target or the quality of their legal defense. DMGT’s institutional capital did not protect Xceligent. CREXi’s $500 million valuation and venture backing did not prevent federal findings of systematic infringement. What determines the outcome is not what capital sits on the defending side. It is whether the plaintiff has built the evidentiary record, structured the legal architecture, and committed to complete resolution.
Florance has done all three. Twice. In the federal record. Permanently.
The implication for any operator currently building a commercial real estate data platform on extracted content — at any valuation, with any institutional backing, through any offshore structure — is not theoretical. It is documented. The enforcement range has been demonstrated across two targets, three jurisdictions, and nearly a decade of sustained litigation.
The primary source owner does not negotiate. The primary source owner enforces.
The Domain Acquisition: When Enforcement Does Not Require a Courthouse
In August 2025, CoStar completed the acquisition of Domain Holdings Australia for A$2.8 billion(30) — purchasing the country’s leading property marketplace, reaching an average of 7 million Australians monthly. The acquisition placed CoStar’s infrastructure, data standards, and enforcement architecture directly into the Australian market.
Doug Curry, the former CEO of Xceligent — the company CoStar’s enforcement doctrine dissolved in 2017 — relocated to Australia following that dissolution and founded Arealytics, a commercial real estate data company operating from North Sydney with research centers in the Philippines and South Africa.
The litigation doctrine established what happens when a competitor extracts CoStar’s content within reach of U.S. federal courts and cooperative international jurisdictions. The Domain acquisition raises a different structural question: what does the competitive landscape look like when the primary source owner enters a market through acquisition rather than litigation — purchasing the dominant platform in the same market where a former enforcement target now operates.
The enforcement doctrine has two modes. The first is litigation — the mode applied against Xceligent and CREXi, executed through federal courts, cross-border investigations, and multi-jurisdictional enforcement. The second is acquisition — entering a market by purchasing the infrastructure that defines it. Both modes produce the same structural result: the primary source owner controls the territory.
CoStar’s presence in Australia is now permanent, well-capitalized, and backed by the same enforcement apparatus that has been documented across two major cases in the federal record. The competitive dynamics of the Australian commercial real estate data market have changed accordingly.
The Critical Distinction: Competitors Versus Customers
What makes Florance’s litigation doctrine instructive for the broader data industry is not its aggression but its precision. CoStar targets exclusively the companies attempting to build rival platforms using stolen content and the offshore operations that execute the theft on their behalf. It has never — in 38 years of operation and decades of aggressive IP enforcement — litigated against a customer in the real estate industry.
The distinction is deliberate and consistent. Real estate investors, brokers, and analysts are authorized subscribers. If they employ offshore research staff, virtual assistants, or BPO teams to interact with CoStar’s platform through their subscription — pulling data for their own deals, managing listings, conducting market analysis, supporting investment decisions — that activity is authorized use. The research staff are operating on behalf of the subscriber, within the subscriber’s account, for the subscriber’s business purposes. CoStar does not restrict how its customers deploy their own teams to access data they pay for. The platform was built to serve these users, and the subscription revenue they generate is the economic engine that funds the $5 billion in proprietary data collection.
The line Florance draws is between usage and extraction. A real estate investor whose offshore team pulls CoStar data to support a $50 million acquisition is using the platform as intended. A competitor whose offshore team pulls CoStar data to build a rival database is stealing the asset that took 38 years to create. CoStar has publicly stated that it “does not sue competitors who compete lawfully.” A federal judge dismissed antitrust claims alleging that CoStar’s subscriber contracts were anticompetitive. The enforcement is reserved entirely for companies that attempt to replicate CoStar’s proprietary asset through industrial-scale theft — and Florance has shown he will pursue them across continents, through multiple judicial systems, at costs exceeding what he recovers, because the deterrent value protects the asset that generates everything else.
D. The Generational Pattern: Capital as Accelerant
The B2B data industry’s recent history follows a recognizable pattern: institutional capital funds companies that build on extracted data, those companies scale rapidly during permissive periods, achieve extraordinary valuations, provide liquidity to founders and early investors, and leave downstream stakeholders — employees holding equity, customers depending on the data, public shareholders who bought the growth narrative — exposed to the structural risk that the founders priced themselves out of.
The pattern is visible in the B2B sales intelligence market. It is equally visible in the commercial real estate data market. The two have never been placed in the same analytical frame. They should be, because the capital dynamics are identical — and the enforcement consequences documented in the federal record apply structurally to both.
The B2B Sales Intelligence Pattern
ZoomInfo is the completed case study. TA Associates invested approximately $90 million in 2014 and watched the position grow to an implied value of approximately $6 billion by the June 2020 IPO.(22) Carlyle acquired a 31% stake and realized an approximately 13x return in roughly two years.(23) Both firms sold significant holdings in the two years following the IPO, converting paper gains to realized returns while the structural thesis remained intact. The public shareholders who bought at $60 or $40 or $30 are holding the structural risk that the private investors priced themselves out of. The stock has since declined approximately 92% from its all-time highs.(13) The product did not change. The structural dependency was always there. The PE sponsors timed their exit before the dependency was priced.
Apollo represents the pattern at an earlier stage. The company raised $250 million in venture funding,(47) built its platform on data sourced through LinkedIn-dependent collection methodologies, and had its company page removed by LinkedIn in March 2025.(8) Apollo subsequently reached $150 million in ARR by May 2025(48) — growth that arrived after, not before, the enforcement inflection. The company is privately held, so the downstream consequences are less visible. But the structural dependency is identical to ZoomInfo’s, and the enforcement environment is materially worse than the one ZoomInfo navigated during its growth phase.
Clay occupies a different position in the value chain — it is an integrator, not a renter — but the capital dynamic is the same. The company raised at a $3.1 billion valuation on $100 million in consumption-based revenue.(17) The secondary sales facilitating insider liquidity at 31x revenue(49) are a governance signal that sophisticated observers recognize: some insiders prefer cash today to the risk-adjusted value of waiting. Clay’s product is genuinely differentiated, and its growth is real. But a 31x revenue multiple requires a theory about what the data looks like at $300 million, $500 million, and $1 billion — and that theory depends on the same upstream access assumptions that ZoomInfo’s valuation once depended on.
In each case, the capital enabled the scale. The scale created the valuations. The valuations created the liquidity events. And the structural risk — the dependency on upstream access that can be revoked at any time — was transferred from the insiders who priced themselves out to the downstream stakeholders who bought the growth narrative.
The Commercial Real Estate Pattern
The Xceligent and CREXi cases, documented in Section C, reveal the same capital dynamics operating in a different market with a different enforcement posture — and a different outcome.
DMGT invested approximately $150 million in Xceligent.(36) The capital was deployed to build a competing commercial real estate data platform. That platform was built on extracted CoStar content, executed through offshore BPO operations in the Philippines and India. The institutional capital did not merely tolerate the extraction. It funded it. DMGT’s investment was the mechanism that made industrial-scale infringement operationally possible — the offshore data centers, the thousands of fake accounts, the proxy servers, the infrastructure required to extract 35 terabytes of proprietary content.(34) Without institutional backing, the extraction could not have occurred at that scale. The result: Xceligent dissolved. DMGT wrote down $150 million to zero.(36) The institutional capital that funded the operation became the institutional loss when enforcement arrived.
CREXi followed the identical pattern. Jackson Square Ventures, Mitsubishi Estate, Industry Ventures, and Prudence Holdings deployed approximately $55 million across multiple rounds. Forbes reported a $500 million valuation. The capital funded the same structural activity — offshore teams extracting CoStar’s proprietary content through systematic, policy-directed infringement.(37) The federal court documented the extraction in comprehensive detail. The institutional investors backed the operation. The operation extracted the content. The enforcement followed.
In the B2B sales intelligence market, the capital funded extraction from LinkedIn. In the commercial real estate market, the capital funded extraction from CoStar. The upstream source is different. The capital dynamic is identical. Institutional investors funded companies whose business models depended on access to proprietary data assets they did not own — and in both cases, the owners of those assets moved to protect them.
The Structural Parallel
The connection between these two markets has never been made because the analyst communities do not overlap. ZoomInfo is covered by software analysts who compare it to HubSpot and Salesforce. CoStar is covered by real estate analysts who compare it to Zillow and Rightmove. The investor bases are different. The coverage is siloed. Nobody has placed both in the same frame and asked the structural question: what happens when institutional capital funds extraction from a primary source owner who decides to enforce?
The CoStar cases answer the question with the federal record. Xceligent — dissolved. CREXi — facing trial. The institutional capital that funded both operations was not a defense. It was the accelerant that made the infringement large enough to warrant the enforcement.
The B2B sales intelligence market has not yet produced its Xceligent moment. LinkedIn’s enforcement has escalated — from the hiQ Labs litigation(9) to the March 2025 de-platforming of Apollo and Seamless(8) — but it has not yet reached the level of coordinated, multi-jurisdictional enforcement that Florance has demonstrated. The question is not whether it will. The question is when.
The enforcement timeline difference has a structural explanation. Florance’s targets had no commercial relationship with CoStar beyond the adversarial one. There were no shared customers, no ecosystem partnerships, no board-level relationships that enforcement would disrupt. The litigation cost was purely financial. LinkedIn operates in a different commercial environment. The B2B data ecosystem is interconnected — the companies extracting data, the companies funding them, and the platform being extracted from share enterprise customers, partnership agreements, and overlapping institutional relationships. This entanglement does not change the structural logic of enforcement. It changes the pace. Florance could move at the speed of conviction. LinkedIn moves at the speed of institutional complexity.
The capital deployed into B2B data providers over the past decade follows the same structural logic as the capital deployed into Xceligent and CREXi. In every case, institutional investors funded the rapid scaling of a business model that depends on access to a proprietary data asset the company does not own. In every case, the growth narrative was compelling, the product was real, and the structural risk was underpriced. The difference between the two markets is not the capital dynamic. It is the enforcement timeline. CoStar enforced early, aggressively, and completely. LinkedIn is on the same trajectory — but navigating a more complex institutional landscape on the way.
The commercial real estate cases are the leading indicator. They show what happens at the end of the enforcement escalation. The B2B sales intelligence market is earlier on the same curve. The capital at risk is larger. The downstream stakeholders are more numerous. And the structural dependency is identical.
What the Pattern Reveals
The pattern is not that these are bad companies or bad founders. They are talented operators who built real products with real customers. The pattern is that the incentive structure of venture-backed data companies rewards speed, scale, and exit velocity — and penalizes the kind of patient, capital-intensive investment in proprietary data that Florance has practiced for 38 years. The VC model needs a return within 7–10 years. Florance’s model has been compounding for 38. The time horizons are structurally incompatible.
The result is a data landscape — across both B2B sales intelligence and commercial real estate — where the most celebrated companies are the most structurally fragile, and the most durable companies are the least discussed. CoStar has been building for nearly four decades and most B2B sales professionals have never heard of it. ZoomInfo is a household name in revenue operations and its structural thesis is inverting in real time. The market conflates visibility with durability. It shouldn’t.
The capital distortion is this: institutional investors funded the extraction era because extraction produced the growth curves that justified the valuations that produced the returns. The capital was optimized for speed. The data required patience. When the owners of the source data enforce — as Florance has, as LinkedIn increasingly will — the capital that funded the extraction does not protect the companies it built. It becomes the write-down.
DMGT learned this at $150 million. The public shareholders of ZoomInfo are learning it at 92% from the all-time high. The lesson is the same. The scale is different. The federal record is permanent.
E. The Predictive Framework: LinkedIn and Beyond
CoStar’s litigation doctrine is the leading indicator for how every primary-source data moat company will behave as the value of proprietary data increases. The structural logic is identical across the hierarchy: a company that has invested billions building a proprietary data asset — whether through 38 years of human researchers (CoStar), through a self-reinforcing network of 1.2 billion professionals (LinkedIn), or through decades of accumulated regulatory designation (S&P, Moody’s) — will protect that asset with the full force of the legal system. The investment demands the protection. The protection justifies the investment.
LinkedIn’s enforcement trajectory already mirrors CoStar’s early stages: escalating from passive security measures to active de-platforming of companies that extract its data, pursuing and winning litigation (the hiQ Labs case resulted in a permanent injunction and destruction of all scraped data(9)), and investing in its own premium products(10) to channel data access through authorized commercial relationships. The pattern will intensify for the same reason CoStar’s did — because the asset is worth more every year, which makes the cost of tolerating extraction higher every year.
Microsoft is protecting an asset worth an estimated $155–$195 billion in standalone enterprise value(7) — an asset generating $17.8 billion in annual revenue(2) from 1.2 billion self-maintained professional profiles.(4) No rational steward of a $150 billion-plus data asset permits unauthorized extraction when that extraction directly competes with the premium products monetizing the same data. The enforcement campaign is not a policy choice. It is a fiduciary obligation.
The CoStar record provides the template for what full enforcement looks like: simultaneous multi-jurisdictional action, cross-border seizures of offshore extraction operations, criminal referrals, pursuit of complete resolution rather than negotiated settlements, and willingness to spend more on enforcement than the direct recovery justifies — because the deterrent value protects the asset that generates everything else.
LinkedIn has not yet deployed enforcement at this level. But every structural prerequisite is in place. The asset is larger than CoStar’s. The extraction is more widespread. The premium products being cannibalized by unauthorized access are more profitable. The legal precedents — hiQ Labs, the Apollo and Seamless de-platforming — establish the trajectory. The question is not whether LinkedIn will follow the Florance doctrine. It is whether the enforcement arrives before or after the next generation of data companies has raised its next round of institutional capital on the assumption that the access will persist.
For the B2B data industry, the implication is structural. The era when offshore teams could freely extract from primary-source platforms to build competing databases is ending — not through gradual market forces, but through deliberate, well-funded enforcement campaigns by the owners of those platforms. Florance showed the industry what rigorous IP enforcement looks like. LinkedIn is following the same playbook. Any company whose data infrastructure depends on extraction from a primary source that has decided to protect itself is building on ground that is being systematically reclaimed.
And once the doctrine is in the federal record, it is available as precedent forever. Every future enforcement action by every primary-source data company can point to the CoStar cases — Xceligent dissolved, CREXi facing trial, $500 million in judgments, criminal indictments across borders, institutional investors absorbing total losses — as evidence that the legal architecture exists, that it works, and that the cost of extraction has been permanently repriced.
The primary source owner does not negotiate. The primary source owner enforces. And the enforcement compounds.

V. The Economics of Accuracy
A. The 90/10 Illusion
The most common objection to investing in proprietary data programs is the simplest: why pay for 99% accuracy when 90% accuracy costs a fraction of the price?
The math appears obvious. An automated data subscription at one-third the cost of a managed human research program appears to deliver 90% of the value. On a procurement spreadsheet, that is 90% of the value at a fraction of the cost. No procurement officer, no CFO conducting a line-item review, no marketing director defending a budget would choose otherwise. The 90/10 trade is the rational decision within the frame of the data line item.
The frame is wrong.
The 10% accuracy gap between commodity data and verified data does not show up on the data invoice. It shows up everywhere else — in wasted sales cycles, in misrouted outreach, in pipeline that converts at 5% instead of 7%, in deals that stall because the SDR pitched the wrong stakeholder, in renewal conversations that never happen because the account was mapped incorrectly from the start. The cost of the accuracy gap is real, material, and almost always larger than the savings on the data line. But it is distributed across different functions, different budget lines, different quarters — and therefore invisible to the person who made the procurement decision.
Consider the arithmetic. A B2B company with ten SDRs, fully loaded at $80,000 each, spends $800,000 annually on outbound sales labor. If 10% of that labor is wasted on bad contacts — wrong titles, outdated emails, misidentified decision-makers — the company is losing $80,000 per year in unproductive effort. That alone exceeds the incremental cost of a managed research program. But the waste does not stop at the SDR level. The pipeline built on 90% accurate data contains phantom opportunities — contacts who are no longer in role, companies where the org chart has shifted, decision-makers who have been misidentified. These phantom opportunities progress through the pipeline, consuming sales engineering time, executive attention, and forecasting credibility before they eventually die. A 2% reduction in pipeline conversion rate caused by data quality — from 7% to 5% on a $20 million pipeline — is $400,000 in lost closed revenue. The company saved $45,000 on the data line and lost $400,000 in pipeline quality.
The relationship between data accuracy and business outcomes is not linear. It is nonlinear at the margin. The first 80% of accuracy gets you into the building — correct company, correct industry, approximately correct org structure. The last 20% gets you into the room — the right decision-maker, the current title, the valid direct dial, the accurate reporting structure. And the last 10% — the difference between 90% and 99%+ — is where the deal happens or doesn't. A pitch to the VP of Operations who left six months ago does not produce 90% of the result of a pitch to the current VP of Operations. It produces 0% of the result, plus the opportunity cost of every hour spent preparing for and executing a conversation with someone who cannot buy.
This is the nonlinearity that makes the 90/10 trade so destructive. The procurement team evaluates data spend as if accuracy has a linear relationship to value — 90% accuracy delivers 90% of the result. In practice, 90% accuracy delivers perhaps 70% of the result, because the 10% that is wrong is not randomly distributed across the pipeline. It concentrates in the most commercially valuable contacts — the senior decision-makers who change roles most frequently, the executives at growing companies where org structures evolve fastest, the high-value targets where getting it right matters most. The data that decays fastest is the data that matters most.
The broken feedback loop explains why companies continue making this trade despite its cost. The CMO or marketing director who purchases the data subscription optimizes for cost per lead. The downstream damage — lower conversion rates, longer sales cycles, wasted SDR effort, pipeline quality degradation — is absorbed by sales, not marketing. The cost center and the impact zone sit in different functions. Marketing buys the data. Sales suffers the consequences. The CFO sees the savings on the data line in Q1 and the pipeline miss in Q3 and does not connect them because the causal chain is too long, too diffuse, and too distributed across organizational silos.
The companies that break through this illusion share a common characteristic: someone with cross-functional visibility — a CRO who sees both the data spend and the pipeline quality, a CEO who reads both the procurement report and the sales forecast, a PE operating partner who has watched the pattern repeat across portfolio companies — connects the data quality decision to the revenue outcome. Once the connection is made, the 90/10 trade is revealed as what it is: the most expensive cheap decision on the P&L. The company is not saving $45,000. It is spending $45,000 to lose $400,000.
The reclassification is straightforward. Data accuracy is not a procurement decision. It is a capital allocation decision. The question is not "how much does the data cost?" It is "how much does the data produce?" When evaluated on production rather than cost, the managed research program does not look like a premium over the commodity subscription. It looks like arbitrage — a dollar of pipeline quality purchased for seventy-five cents. The buyer who can see the full cost equation acts accordingly. The buyer who can only see the data line item continues making the rational trade that is quietly destroying their pipeline.
The 90/10 trade also has a temporal dimension that makes it especially dangerous. In the first quarter, automated data looks adequate because it was recently refreshed. The 90% accuracy figure — to the extent it was ever accurate — reflects the database on the day of purchase, not six months later. Professional data decays at approximately 40% per year. By the second quarter, the 90% has become 80%. By the third quarter, 70%. The managed research program, by contrast, is continuously maintained — the researcher updates records as contacts change roles, verifies against current sources, and flags organizational changes in real time. The accuracy gap between the two approaches widens every month.
The deepest problem with the 90/10 frame is that it treats data as a consumption item rather than an investment. A ZoomInfo subscription is consumed monthly and replaced. When the subscription lapses, the data disappears. Nothing was built. Nothing compounds. A managed research program accumulates. The researcher who has spent twelve months mapping decision-making structures at a client's top 200 target accounts has built institutional knowledge that no subscription database contains. That knowledge compounds — the second year of the program is more valuable than the first, not because the researcher works harder, but because the accumulated understanding of the client's market makes every subsequent research task faster, more accurate, and more strategically valuable. The company is not paying for data. It is building a proprietary asset that no competitor can purchase at any price.
Within the narrow frame of the data budget, the 90/10 trade is rational. Within the wider frame that includes pipeline quality, sales productivity, conversion rates, and competitive position, it is the single most expensive line item on the P&L — not because of what it costs, but because of what it destroys.
B. The Economics of the Data Layer
The B2B data enrichment market is estimated at approximately $5 billion in 2025 and is projected to reach $15 billion by 2033, growing at a compound annual growth rate of 15%.(50) North America accounts for roughly 60% of global revenue. The major data providers — ZoomInfo, Apollo, Cognism, 6sense, Lusha, Seamless.AI — each generate tens or hundreds of millions in annual revenue. ZoomInfo alone reported approximately $1.25 billion in revenue in 2025. These are not marginal line items. They are the foundation of every B2B sales and marketing technology stack.
Beneath and alongside this automated layer sits the human research workforce. India's IT-BPM sector, valued at approximately $45 billion as of 2023 and projected to exceed $60 billion by 2025, employs millions directly and indirectly. India accounts for approximately 57% of the global IT-BPM outsourcing industry and nearly 38% of the global BPM market.(51) Within this ecosystem, a substantial and growing segment is dedicated specifically to B2B data research: lead generation, competitive intelligence, KYC/AML investigations, provider directory maintenance, and custom data enrichment programs.
The value chain economics are straightforward when properly understood. A typical B2B company may spend $15,000 to $100,000+ annually on data provider subscriptions. These tools provide volume — large databases of contacts, company records, and enrichment fields — efficient for commodity data where 80% accuracy is sufficient. A managed human research program for customized B2B data — structured as a private service arrangement rather than a software subscription — provides what automated tools cannot: verified decision-maker identification, current organizational mapping, judgment-based targeting, and real-time LinkedIn compliance. Accuracy rates in well-managed programs consistently exceed 95% for verified, customized deliverables. The cost of managed research is typically a fraction of the data provider spend it supplements, yet it protects and generates revenue that is orders of magnitude larger than either line item.
The ROI on verified human research is not marginal. Organizations that have measured the full cost of data quality — across the pipeline, not just the data line item — consistently find it among the highest-return investments available to a B2B revenue team.
C. The Performance Divergence: 2022–2025
The period from 2022 to 2025 has produced a meaningful divergence in sales and marketing performance across business segments. This divergence is not random. It correlates directly with data investment discipline, and it is likely to accelerate.
Enterprise companies ($1B+ revenue, 1,000+ employees) have generally maintained or increased their investment in data quality infrastructure, including human research. These organizations understand that their sales cycles — 6 to 18 months, involving 7+ stakeholders per deal — demand precision targeting. A single misidentified decision-maker can stall a deal for months. Enterprise companies that maintained data investment through the post-ZIRP tightening have seen more stable pipeline performance because their targeting infrastructure remained intact.
Middle-market companies ($50M–$1B revenue, 100–999 employees) represent the contested zone. Some have maintained disciplined data investment and seen continued pipeline performance. Others, under CFO-driven cost pressure, cut human research layers and shifted entirely to automated tools between 2022 and 2024. The performance gap between these two groups is widening. Middle-market companies with contract values of $50,000 to $500,000 per deal cannot afford the error rates that cheaper automated tools introduce.
SMB companies (under $50M revenue, fewer than 100 employees) face the tightest resource constraints and are most susceptible to the penny-wise dynamic. Many have cut any human data layer entirely, relying exclusively on freemium or low-cost automated tools. For some SMBs with lower contract values and simpler sales motions, this may be adequate. But SMBs attempting to sell into enterprise or middle-market accounts find themselves at a structural disadvantage when their data quality cannot match the sophistication of their target buyer.
Data quality advantages compound. A company with accurate targeting data identifies the right accounts, reaches the right stakeholders, delivers relevant messaging, and builds pipeline faster. That pipeline converts at higher rates because it was built on genuine fit, not spray-and-pray volume. The resulting revenue funds continued investment in data quality, creating a flywheel. The companies that invested in data quality through the 2022–2025 tightening will emerge with structurally stronger pipelines. The companies that cut data investment will find themselves running harder to achieve the same results — or worse.

VI. The Structural Case for Human Research
A. Why Human Research Persists
The case for human research is not sentimental. It is structural.
Compliance. LinkedIn's User Agreement contains explicit prohibitions.(53) The "Don'ts" section forbids users from developing, supporting, or using "software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons or any other technology) to scrape the Services or otherwise copy profiles and other data from the Services." These provisions are not decorative. LinkedIn has demonstrated willingness to enforce them — the hiQ Labs litigation ran for five years before concluding in December 2022 with a $500,000 consent judgment against hiQ,(52) a permanent injunction requiring destruction of all scraped data, source code, and derived algorithms, and the effective dissolution of hiQ's business. Apollo and Seamless AI had their LinkedIn company pages removed in March 2025. This is not theoretical risk.
What does this mean for companies that rely on data from these providers? It means the provenance of professional data in your CRM is a question worth asking. If your data vendor populates its database through automated scraping of LinkedIn profiles — and the major providers do — then the data you are purchasing was obtained through methods that LinkedIn's User Agreement explicitly prohibits and that LinkedIn is actively enforcing against. Whether that creates direct liability for the downstream purchaser is a question for counsel. That it creates vendor risk, reputational risk, and data continuity risk is not a question at all.
Human researchers operate differently. They access LinkedIn the same way any professional does — through a standard browser session, logged into a personal account, viewing publicly available profiles one at a time. This is the activity LinkedIn is designed for and monetizes through premium subscriptions. There is no scraping. No automation. No browser extension bulk-extracting data. No fake accounts. No circumvention of access controls. When LinkedIn's enforcement intensifies — and the trajectory is clearly toward more enforcement, not less — this distinction becomes the difference between a data program that continues to function and one that loses its upstream source.
Accuracy on customized deliverables. Automated tools perform adequately on standardized, high-volume enrichment tasks — appending basic firmographic data, finding email addresses for known contacts, filling in company size and industry codes. These are commodity functions where speed and volume matter more than precision. The gap emerges on customized data programs: identifying the correct decision-maker at a specific company, verifying reporting structures, confirming that a contact is still in role, assessing organizational dynamics that inform outreach strategy. These tasks require judgment, contextual understanding, and the ability to triangulate across multiple sources. Human researchers do this naturally. Automated tools produce outputs that look similar but carry materially higher error rates.
The verification layer. A growing number of companies are arriving at a hybrid model: they use automated tools for initial volume enrichment, then run the output through human verification before it enters production workflows. A growing number of companies have built entire businesses around this premise — the human verification layer positioned explicitly on top of automated enrichment platforms. This hybrid model validates the thesis. If automated tools produced reliable outputs, the verification layer would not exist. Its existence — and its growth — is market confirmation that the human research function has not been displaced by AI. It has been repositioned.
The India workforce advantage. India's BPO ecosystem provides access to millions of educated, English-speaking professionals at cost structures that make managed human research economically viable at scale. The demographic dividend — 65% of India's population under 35(56), expanding middle class, clear career progression paths — ensures sustained workforce availability. Variance compression in India's BPO sector means that operational predictability actually improves over time, the opposite of the trend in automated tool reliability.
B. The B2B Imperative
The data supply chain disruption does not affect all businesses equally. For B2C companies running high-volume, low-value transactions, imprecise data is a tolerable cost of doing business. The margin of error is wide because no single record carries material value.
B2B is a fundamentally different environment. Contract values are high. Sales cycles are long. The number of qualified decision-makers at any given target company is small. A single enterprise deal can represent substantial recurring revenue over years. In this context, data accuracy is not a nice-to-have. It is the literal foundation of the entire revenue engine.
Every B2B pipeline begins with data: who to target, who holds budget authority, what their organization looks like, whether the timing is right. If that foundation is degraded — wrong contacts, outdated titles, misidentified decision-makers — the entire pipeline built on top of it is compromised. The damage compounds silently through every stage of the funnel.
This is what makes cutting the human research layer penny wise and pound foolish in B2B. The savings on the data line item are visible and immediate. The destruction of pipeline quality, conversion rates, and ultimately shareholder value is deferred and diffuse — but materially larger than whatever was saved.
B2B companies understand this dynamic intuitively. Accurate professional data is not one input among many. It is the core infrastructure on which targeting, outreach, pipeline generation, and revenue execution all depend.
Reactivation cost. Companies that cut human research teams and later discover they need to rebuild them face not only the direct cost of reengagement but the institutional knowledge loss. The researcher who knew the client's industry vertical, understood their ideal customer profile, and had refined the methodology over months is gone. Rebuilding that capability takes time that degraded data does not afford.
The savings are real, immediate, and visible. The costs are deferred, distributed, and invisible — until they are not.
C. Implications Across Data-Dependent Operations
The disruption extends across every operational function that depends on accurate, current professional data — but it is worth stating plainly that the largest and most common application is sales and marketing.
Sales and marketing operations. This is where the data supply chain disruption is felt most immediately and most broadly. Sales and marketing departments are typically the buyers of automated data providers — they hold the ZoomInfo contracts, the Apollo licenses, the Clay subscriptions. These tools are purchased at the departmental level because they solve an immediate, visible problem: pipeline volume. The limitation is that departmental buyers optimize for volume and cost, not for accuracy and durability. By contrast, managed human research programs are more commonly commissioned by C-suite executives, chief revenue officers, and chief technology officers — buyers with longer planning horizons, broader visibility into how data quality affects the entire revenue engine, and the institutional authority to invest in infrastructure rather than tools. Sales and marketing teams buy subscriptions. Executive leadership builds data programs.
Healthcare operations. Provider directories, referral networks, credentialing databases, and payer contact information all require continuous verification against current professional status. Automated tools that cannot reliably access LinkedIn data will produce provider directories with outdated affiliations and incorrect contact information — creating compliance risk and operational inefficiency.
Financial services. KYC and AML functions depend on accurate identification of beneficial owners, politically exposed persons, and corporate officers. Data that is stale or unverified creates regulatory exposure. Human research remains the standard for enhanced due diligence precisely because automated tools cannot guarantee the accuracy that regulators require.
Private equity. Portfolio company operations, management team assessment, competitive intelligence, and add-on acquisition research all depend on current, accurate professional data. Investment decisions made on stale data carry material financial risk.
Competitive intelligence. Tracking competitor hiring patterns, organizational changes, and strategic positioning requires data that reflects current reality, not last quarter's scrape. Human researchers can detect signals — a cluster of hires in a new geography, a reorganization of reporting lines, the departure of key personnel — that automated tools miss because they lack the contextual framework to interpret what they find.
In each of these domains, the cost of bad data is not a minor inconvenience. It is a material operational risk. And the risk is increasing as the data supply chain degrades.

VII. The Thesis
A. Data Literacy and the Sophistication Gap
The thesis advanced in this paper is not difficult to understand. It is difficult to act on — and the difficulty is not intellectual. It is organizational.
The companies that invest in proprietary data infrastructure share a common characteristic: they have at least one senior leader who understands the distinction between consuming data and building a data asset. That leader may sit in any of several functions — but where they sit determines how the company allocates its data spend, and the variance is enormous.
Chief Revenue Officers are the most naturally aligned buyers for this thesis. CROs see the full revenue engine: targeting, pipeline generation, conversion, and closed revenue. They experience directly the consequences of degraded data and they have the organizational authority to invest in infrastructure that compounds. When a CRO understands data provenance, the company builds a data program. The investment is evaluated not against the cost of a ZoomInfo subscription but against the pipeline quality and closed revenue it protects.
Chief Technology Officers and Chief Information Officers bring a different but complementary lens. CTOs who have managed data infrastructure understand upstream dependency, vendor risk, and the difference between a tool and a platform. They recognize the architectural fragility of waterfall enrichment built on third-party APIs whose upstream sources are degrading. When a CTO evaluates a data vendor, the question is not "what does it cost?" but "what does it depend on, and what happens when that dependency breaks?"
CEOs and owners — particularly in the middle market — represent the widest variance. Some have the institutional sophistication to see data as capital formation. These tend to be operators with backgrounds in private equity, finance, or consulting — executives who have seen how compounding advantages work across multi-year horizons. Others treat data as a procurement line item to be minimized. The difference is not intelligence. It is literacy — specifically, literacy about how data flows through a revenue organization and what happens at each stage when accuracy degrades.
Chief Marketing Officers are, structurally, the most common buyers of automated data tools — and the least likely to commission proprietary data programs. This is not a criticism of the function. It reflects how CMOs are typically evaluated: on pipeline volume, campaign efficiency, and cost per lead. These metrics incentivize speed and scale, which automated tools deliver. They do not incentivize accuracy and durability, which human research delivers. The downstream consequences of degraded data — lower conversion rates, longer sales cycles, wasted SDR effort — are absorbed by sales, not marketing. The cost center and the impact zone are in different functions, which means the feedback loop is broken.
Private equity operating partners represent a distinct and increasingly important category. PE firms that manage portfolio companies across multiple verticals see the data quality pattern repeated: the portfolio company that maintained data investment outperforms the one that cut it. Operating partners who have observed this pattern across three, five, ten portfolio companies develop an institutional view on data infrastructure that transcends any single company's budget cycle.
The literacy gap is the market opportunity. The urgency of this literacy is compounded by a measurable collapse in buyer trust across B2B. A survey of 625 U.S.-based B2B technology decision-makers by Rob Roy Consulting found that 73% believe most vendors fall short on honesty.(54) Research fielded across 750 marketing leaders by DemandScience found that 87% of organizations report their marketing investments yield unreliable or inflated signals(55) — clicks, downloads, and behavioral scores that do not convert to revenue. The commodity marketing playbook — volume outreach, generic content, feature-comparison landing pages — was built for a buyer that no longer exists. The contemporary B2B buyer is the most skeptical, most self-directed, and most resistant to undifferentiated vendor claims in the history of enterprise sales. The companies that reach this buyer will do so through demonstrated intellectual authority and verified competence — not through louder marketing of the same commodity claims that produced the trust deficit in the first place.
The B2B data enrichment market is projected to grow from approximately $5 billion in 2025 to $15 billion by 2033 — a tripling that cannot be explained by population growth, inflation, or incremental tool adoption alone. The growth is driven by literacy. As more companies experience the consequences of degraded automated data — and as more senior leaders develop the sophistication to connect data quality to revenue outcomes — the market will expand not because companies spend more on the same tools, but because they invest differently. The shift from consumption to accumulation, from subscriptions to proprietary programs, from departmental line items to strategic infrastructure — this is what $15 billion looks like. It is not more ZoomInfo licenses. It is more companies understanding that data is capital.
The expansion from $5 billion to $15 billion is not a forecast about technology adoption. It is a forecast about organizational learning.
B. The Ground Source of Truth
LinkedIn's position as the ground source of truth for professional data is not a temporary market condition. It is a structural feature of how professional identity works in the modern economy.
Professionals update their own LinkedIn profiles because doing so serves their career interests. No other platform has achieved this self-reinforcing dynamic at scale. Corporate directories are maintained by HR departments and lag reality by months. Email signatures change when IT departments get around to it. Company websites list leadership teams that may be months out of date. Only LinkedIn reflects professional identity in something approaching real time, because the professionals themselves maintain it.
This means that any data methodology — automated or manual — that does not ultimately source from LinkedIn is working with a derivative. And derivatives, by definition, decay relative to the source.
Human research accesses the source directly, compliantly, and with the judgment to interpret what the source reveals. Automated tools access derivatives of the source through intermediaries whose access is being systematically revoked.
The advantage is permanent because the source's incentive structure is permanent. As long as professionals update their own profiles — and they will, because it serves their interests — LinkedIn remains the ground truth, and human research remains the most reliable method of accessing it.

Conclusion
This paper has advanced a structural argument in seven parts. The argument is summarized here.
The architecture is fragile. The B2B data supply chain was built on an assumption of permanent, frictionless access to LinkedIn's professional database. Four generations of data provision — from Dun & Bradstreet's agents in the 1840s through the digital extraction era of the 2000s — reveal a cycle of scale, degradation, and reversion to verification. We are at the reversion point.
The disruption is permanent. LinkedIn, the ground source of truth for professional identity, is systematically revoking the unauthorized access on which an entire generation of data providers was built. Microsoft is protecting an asset worth an estimated $155–195 billion in standalone enterprise value. This is not a policy decision that will reverse. It is a fiduciary obligation that will intensify.
The market is pricing it. ZoomInfo's stock has declined approximately 92% from its all-time highs — not because the product failed, but because the lease expired. Clay's $3.1 billion valuation at 31x revenue prices in growth assumptions that depend on the same upstream access now being revoked. CoStar, which has spent $5 billion over 38 years building proprietary data through 1,500 human researchers, has delivered 58 consecutive quarters of double-digit revenue growth. The market is beginning to distinguish between companies that own their data and companies that rent it. The distinction determines durability.
The builder's model compounds. The flipper's model decays. Andrew Florance started with $10,000 in 1987 and has built a $40 billion company by investing in proprietary data collection every day for 38 years. The venture-backed data companies of the ZIRP era — ZoomInfo, Apollo, Clay — built on extracted data, scaled during permissive windows, created liquidity events for insiders, and left downstream stakeholders holding the structural risk. The incentive structure of venture capital rewards speed and exit velocity. Florance's model rewards patience and duration. The time horizons are incompatible, and only one produces assets that compound.
The 90/10 trade is an illusion. The prevailing assumption that 90% data accuracy at a fraction of the cost is rational is demonstrably wrong when evaluated against business outcomes rather than procurement budgets. The relationship between data accuracy and revenue is nonlinear — the last 10% is where deals happen or don't. A company that saves $45,000 on the data line and loses $400,000 in pipeline quality has not optimized. It has made the most expensive cheap decision on the P&L. The broken feedback loop between marketing (which buys the data) and sales (which absorbs the cost of bad data) is why this illusion persists — and why the companies that break through it pull ahead.
Human research is structural, not sentimental. It persists because it is the only data collection methodology that is simultaneously compliant with LinkedIn's Terms of Service, capable of judgment-based verification that automated tools cannot replicate, and able to produce proprietary data assets that compound rather than decay. The B2B imperative — high contract values, long sales cycles, small pools of qualified decision-makers — makes this structural advantage decisive.
The thesis is about duration. Data ownership compounds. Data rental decays. The companies, the investors, and the operating teams that understand data as long-term infrastructure — that invest in verification through the cycle, that maintain research capacity when others cut it, that recognize compliance and accuracy as appreciating assets rather than depreciable costs — will compound their advantage every quarter. The companies that chase short-term savings on the data line item will lose ground at an accelerating rate against competitors whose data infrastructure improves while theirs deteriorates.
The $5 billion B2B data market is projected to reach $15 billion by 2033. That expansion is not a forecast about technology adoption. It is a forecast about organizational learning — about how many companies will develop the literacy to see data as capital formation rather than procurement expense. The companies that see it first will build assets their competitors cannot replicate on any timeline. The companies that see it last will wonder why their pipeline stopped working.
The question facing every B2B company is not which data tool to buy. It is what their structural relationship to the source data looks like — and whether they are building an asset or consuming a commodity. Governance decides which side you are on.
The decade from 2015 to 2025 sold a generation of B2B companies substandard data solutions built on undisclosed structural dependencies. The providers called it innovation. The market is now calling it what it was: extraction dressed as technology, renting dressed as ownership, commodity dressed as proprietary. The companies that recognize this — that see their data as sovereign intellectual property, that invest in compliant providers who support the building of durable, verified data assets — will own what their competitors rent. That ownership is the foundation of American data sovereignty in B2B: the principle that the company that commissions the research, directs the methodology, and maintains the asset owns something no subscription can replicate and no competitor can purchase.
The companies that understand duration will act accordingly.
Sources
(1) Microsoft Corporation. "Fiscal Year 2025 Annual Report." Microsoft Investor Relations, 2025. LinkedIn member count and profile data.
(2) Microsoft Corporation. "Q4 and Full Year Fiscal Year 2025 Earnings Results." Microsoft Investor Relations, July 2025. LinkedIn revenue $17.8 billion; 9% year-over-year growth across all business lines.
(3) Microsoft Corporation. "Microsoft to Acquire LinkedIn." Press Release, December 8, 2016. Acquisition price $26.2 billion.
(4) Microsoft Corporation. "Q4 Fiscal Year 2025 Earnings Call Transcript." Microsoft Investor Relations, July 2025. LinkedIn 1.2 billion members; four consecutive years of double-digit member growth.
(5) Microsoft Corporation. "Q2 Fiscal Year 2025 Earnings Call." January 2025. LinkedIn record $2 billion in Premium subscription revenue in the trailing twelve months.
(6) Morningstar. "Microsoft Corporation (MSFT) — Equity Research Report." 2025. Wide economic moat designation; LinkedIn identified as key revenue growth driver.
(7) Implied standalone value derived from three methodologies: (a) acquisition-multiple approach applying Microsoft's 2016 acquisition multiple of 8.7x revenue to LinkedIn's fiscal 2025 revenue of $17.8 billion (implied: ~$155B); (b) comparable-platform approach using 8x–15x revenue multiples for professional networking/data platforms with subscription models and network effects (implied range: ~$142B–$267B, midpoint ~$195B+); (c) replacement-cost approach acknowledging 1.2 billion self-maintained profiles represent a functionally irreplicable dataset.
(8) Lunden, Ingrid. "LinkedIn removes Apollo.io and Seamless.AI pages in enforcement action." TechCrunch, March 7, 2025.
(9) hiQ Labs, Inc. v. LinkedIn Corporation. N.D. Cal. Litigation filed 2017. Concluded with stipulated consent judgment and permanent injunction, December 2022. Permanent injunction requiring destruction of all scraped data, source code, and derived algorithms.
(10) LinkedIn Corporation. LinkedIn Sales Navigator and LinkedIn Recruiter product pages. LinkedIn.com, 2025.
(11) Data decay rates for professional B2B databases are widely documented. See: Dun & Bradstreet, "B2B Data Quality Report" (multiple editions); ZoomInfo published estimate of 30% annual contact obsolescence; Gartner data quality research. The 40% annual figure reflects the high-mobility segment of professional data (director-level and above) where decision-maker turnover exceeds average tenure benchmarks.
(12) Bureau of Labor Statistics. "Employee Tenure in 2024." USDL-24-1971, September 26, 2024. Median private-sector employee tenure: 3.5 years in January 2024. Workers ages 25–34: median 2.7 years. URL: https://www.bls.gov/news.release/tenure.nr0.htm
(13) ZoomInfo Technologies (ZI). Stock price history. Yahoo Finance. All-time high approximately $65 in November 2021; trading below $4 as of February 2026.
(14) ZoomInfo Technologies. "Q4 and Full Year 2025 Earnings Release." ZoomInfo Investor Relations, February 2026. Revenue guidance; declining customer counts.
(15) Wells Fargo Securities. "ZoomInfo Technologies (ZI) — Initiation of Coverage: Underweight." Wells Fargo Equity Research, 2024.
(16) ZoomInfo Technologies. "Q4 2025 Earnings Call Transcript." February 2026. CFO Graham O'Brien: "Over 20% of our total ACV is coming from Copilot after it more than doubled in 2025."
(17) Clay. "Clay Raises $40M Series B." Press Release, August 2025. Series C: $3.1 billion valuation; investors: Alphabet's CapitalG, Sequoia Capital, Meritech Capital. Reported by TechCrunch, The Information, and Bloomberg, August 2025.
(18) Clay. Official company announcement and investor communications, December 2025. Corroborated by Sacra Research, "Clay Revenue and Growth Analysis," January 2026.
(19) Clay. Company website and press materials, 2025. Customer list: clay.com.
(20) ZoomInfo Technologies. "Form S-1 Registration Statement." SEC Filing, June 2020. Raised $934 million at $21 per share; initial valuation approximately $8 billion. Revenue 2019: $293.3 million; 2018: $144.3 million (103% growth). Adjusted operating margin: 51%.
(21) Clay. Pricing documentation and credit-based billing model. clay.com/pricing, 2025.
(22) TA Associates. Investment in DiscoverOrg / ZoomInfo. PE Hub, June 2020; ZoomInfo Form S-1, SEC, June 2020. TA Associates' initial investment approximately $90 million in 2014; implied value at IPO approximately $6 billion based on disclosed ownership stake.
(23) The Carlyle Group. ZoomInfo investment and IPO stake. ZoomInfo Form S-1, SEC, June 2020; Beara Mergers capital markets analysis; Carlyle Group investor communications. Carlyle's 31% stake at IPO; approximately 13x return on original investment over approximately two years.
(24) CoStar Group. "Annual Report 2024" and "Investor Presentation Q4 2024." CoStar Group Investor Relations. $5 billion cumulative investment in data collection infrastructure; 1,500+ researcher workforce.
(25) CoStar Group. "Q4 2025 Earnings Release and Investor Presentation." CoStar Group Investor Relations, February 2026. 58 consecutive quarters of double-digit revenue growth; 95%+ recurring subscription revenue.
(26) Various biographical sources on Andrew Florance: CoStar Group corporate history; Australian Financial Review, "I am willing to eat nails: the story of a billionaire's climb," August 2025; Princeton University alumni records.
(27) Stewart, Cameron. "I am willing to eat nails: the story of a billionaire's climb." Australian Financial Review, August 2025. Profile of Andrew Florance following CoStar's acquisition of Domain.
(28) Florance, Andrew. CoStar Group Investor Day remarks. Reported in multiple analyst transcripts and financial press coverage. Also: CoStar Group Q4 earnings call transcript.
(29) Third Point LLC. Letter to CoStar Group Board of Directors. Public filing / press coverage, 2024–2025. Activist campaign regarding Homes.com investment intensity.
(30) CoStar Group. "CoStar Group to Acquire Domain Holdings Australia." Press Release, 2024. ASX announcement; transaction value A$2.8 billion (approximately USD $1.9 billion).
(31) Florance, Andrew. "VCU Commencement Address." Virginia Commonwealth University, May 2019. Video recording and transcript available via VCU commencement archives.
(32) S&P 500 CEO tenure analysis. See: Spencer Stuart, "S&P 500 CEO Tenure Analysis," 2025. Florance has served as CoStar Group CEO since founding in 1987, placing him among the longest-tenured chief executives of any S&P 500 constituent following Warren Buffett's planned succession.
(33) CoStar Group, Inc. v. Xceligent, Inc. Case No. 2:16-cv-01477 (E.D. Mo., filed December 2016). See also: CoStar Group press release, December 2016; court records.
(34) Court records: CoStar Group v. Xceligent. Philippines seizure operations conducted December 2016–January 2017; 35 terabytes of data recovered; hundreds of computers seized. Avion BPO Corp., Pasig City, Philippines.
(35) CoStar Group, Inc. v. Xceligent, Inc. Final judgment, 2018. $500 million judgment; described by CoStar as the largest-ever settlement in a copyrighted image lawsuit at time of entry. See also: CoStar Group press release, 2018.
(36) Daily Mail and General Trust PLC. Annual Report and financial statements, 2018. Full write-down of Xceligent investment (approximately $150 million).
(37) CoStar Group, Inc. v. CREXi, Inc. Case No. 2:20-cv-08819 (C.D. Cal.). Federal district court ruling, June 2025. Findings: "elaborate offshore scheme involving Indian-based agents"; internal "copy and crop" policy; internal communications evidencing deliberate instruction. See also: CoStar Group press release, June 2025.
(38) CoStar Group, Inc. v. Zillow, Inc. Filed July 2025, federal court. Allegation: copyright infringement involving tens of thousands of CoStar watermarked photographs displayed on Zillow, Redfin, and Realtor.com. See: CoStar Group press release, July 2025.
(39) Research.com. "CoStar Software Review 2026." Research.com, January 2026. Describes "outdated and non-intuitive interface causing a steep learning curve," "limited customization," and "lack of advanced data visualization tools."
(40) Duperrin, Bertrand. "LinkedIn UX in 2025: No Major Developments Since 2017." Duperrin.com, March 2025.
(41) Li, Li; Duan, Yiyi; He, Yannan; King, Irwin. "Anderssen Horowitz: The Empty Promise of Data Moats." Andreessen Horowitz (a16z), May 2019. Available: a16z.com. Original title: "The Empty Promise of Data Moats." Authors argue defensibility "is not inherent to data itself" but acknowledge network-effect data moats as the strongest and rarest form of competitive advantage.
(42) ZoomInfo Technologies. "How ZoomInfo Builds Its Database." ZoomInfo.com, and ZoomInfo Technologies Form S-1, SEC Filing, June 2020. Description of data collection methodology including "proprietary technology," "contributory network," web crawling, and third-party data sources.
(43) ZoomInfo Technologies. Form S-1, SEC Filing, June 2020. "We crawl more than 28 million company websites and online sources daily."
(44) Wikipedia. "ZoomInfo." Wikipedia.org. Article describes ZoomInfo as "a registered data broker" that "collects and sells personal data through various means of data and web scraping." Cited as illustrative of public characterization versus proprietary marketing language.
(45) ZoomInfo Technologies. Form S-1, SEC Filing, June 2020. DiscoverOrg founding 2007; acquisition of ZoomInfo Inc. 2019; IPO June 4, 2020 at $21 per share, valuation approximately $8 billion.
(46) Multiple sources document ZoomInfo reaching $1 billion in ARR. See: ZoomInfo investor materials; financial press coverage. The milestone was achieved in approximately 15 years from founding of DiscoverOrg in 2007.
(47) Apollo.io. Crunchbase funding data; press coverage of funding rounds. Apollo raised $250 million in Series D (December 2021) and additional rounds. Total venture raised approximately $250 million+ per public records.
(48) Apollo.io. Official company announcement, May 2025. $150 million ARR milestone. Corroborated by Sacra Research, "Apollo Revenue Analysis," June 2025.
(49) The Information; Bloomberg. Reporting on Clay secondary share sales, 2024–2025. Clay publicly facilitated secondary transactions allowing employees to convert equity to cash in advance of any liquidity event.
(50) Multiple market research reports corroborate the $5 billion to $15 billion projection with a 15% CAGR through 2033. See: Grand View Research, "B2B Data Enrichment Market Size Report," 2024; MarketsandMarkets, "Data Enrichment Market — Global Forecast to 2033," 2024; Mordor Intelligence, "B2B Data Enrichment Market Analysis," 2025.
(51) NASSCOM. "Technology Sector in India: Strategic Review 2024." National Association of Software and Service Companies, 2024. India IT-BPM sector revenue; global market share; workforce statistics.
(52) hiQ Labs, Inc. v. LinkedIn Corporation. N.D. Cal. No. 17-cv-03301. Stipulation and consent judgment, December 2022. $500,000 judgment; permanent injunction; destruction of all scraped data, source code, and derived algorithms. hiQ Labs effectively dissolved following judgment.
(53) LinkedIn Corporation. "LinkedIn User Agreement." LinkedIn.com. Section 8.2 "Don'ts": prohibits development, support, or use of "software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons or any other technology) to scrape the Services or otherwise copy profiles and other data from the Services."
(54) Rob Roy Consulting; Cambia Information Group. "B2B Technology Buyer Trust Survey." Fielded to 625 U.S.-based B2B technology decision-makers at companies with at least 100 employees and $50,000 annual IT budget. Reported in Sword and the Script, "Skepticism is the New Normal: Survey Finds B2B Tech Has Trust Issues," February 2022. Statistic: 73% of respondents said they believe "most vendors fall short" of the honesty mark.
(55) DemandScience. "2026 State of Performance Marketing Report: Exposing the Marketing Data Mirage." December 17, 2025. Survey of 750 marketing leaders. Statistic: 87% of organizations report their marketing investments yield unreliable or inflated intent signals.
(56) United Nations Population Fund (UNFPA); World Bank. India demographic data, 2024. Approximately 65% of India's population is under age 35.
(57) Morningstar. "CoStar Group (CSGP) — Equity Research Report." 2025. Narrow moat designation based on proprietary data collection depth and switching costs. Note: Morningstar distinguishes CoStar's narrow moat (limited geographic expansion risk) from a wide moat, despite CoStar's operational characteristics resembling wide-moat businesses in its core CRE segment.
(58) CoStar Group. "Q4 2025 Earnings Release." CoStar Group Investor Relations, February 2026. 95%+ recurring subscription revenue reported across multiple consecutive periods.
(59) Dun & Bradstreet. "Our History." DNB.com. Company founded 1841 by Lewis Tappan as The Mercantile Agency; DUNS (Data Universal Numbering System) numbering system established 1963 and since adopted as a de facto commercial identifier across government procurement, trade credit, and financial reporting.
(60) Bloomberg LP. Company statistics and terminal subscriber data. Various press sources. Bloomberg Terminal: approximately 325,000–340,000 subscribers. Bloomberg LP ownership: Michael Bloomberg holds approximately 88% stake; Forbes Billionaires list estimates his net worth at $104.7 billion as of 2025, substantially derived from Bloomberg LP equity.
About Assivo
Assivo is an execution partner specializing in structured delivery across operations, finance, and data workflows. We build managed teams that run seamlessly and consistently—so leaders can focus on growth, not supervision.