Skip to main content
The Lakehouse is a shared knowledge base that enrichment reads from when filling in your products. It holds structured product data, brand technologies, and the source documents those were extracted from. Enrichment consults the Lakehouse before scraping the open web, so data your organization has already contributed — or that is shared across organizations — can complete a product without another round trip to a brand site. The Lakehouse is a source, not a destination for your catalog: you don’t add sellable products to it directly. You contribute to it by uploading documents, and it feeds enrichment.

What the Lakehouse holds

Lakehouse products
extracted product data
Structured product records pulled from uploaded documents — names, brands, and attributes. Enrichment matches these against your catalog products to fill in missing values.
Technologies
brand technologies
Named brand technologies and features (materials, cushioning systems, and the like), with the source document they came from. These enrich the products that reference them.
Documents
the source files
The uploaded files — spec sheets, catalogs, and similar — that products and technologies are extracted from. Each document keeps a link back to what was derived from it.

Documents and extraction

You populate the Lakehouse by uploading documents. Extraction then reads each file and proposes the products and technologies it found.
1

Upload the file

Upload one or more documents and tag them with the brand and category they describe. Each upload is tracked as a Job.
2

Extraction runs in the background

MerchantOps reads each document and extracts candidate products and technologies, each with a confidence score, running as a tracked Job.
3

You review the column mapping

For spreadsheet documents, MerchantOps proposes how each column maps to a known field and shows you a preview. You confirm, adjust, or reject the mapping before the extracted data is accepted — a human-in-the-loop review step, distinct from the automatic column mapping used when uploading products.
4

Extracted data enters the Lakehouse

Once approved, the extracted products and technologies become available to enrichment. Low-confidence extractions can be flagged for review.
Documents themselves always belong to the organization that uploaded them. Whether the data extracted from them is shared more broadly is controlled by your sharing settings, below.

Per-org sharing

The Lakehouse can be shared across organizations, and what you share is configurable. By default, product and technology data is shared, while MAP pricing data is kept private. Changing a sharing setting affects future writes only — data already contributed keeps the sharing decision it was written with. Manage this at Lakehouse sharing settings.

How enrichment works

How enrichment consults the Lakehouse before the open web.

MAP policies

The pricing data the Lakehouse can hold, kept private by default.

Sharing settings

Choose what your organization shares into the Lakehouse.

Uploading products

Adding sellable products to your catalog, as opposed to the Lakehouse.