Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Accurately estimating substitution patterns in e-commerce is difficult because most demand models rely on hand-coded product attributes that are often missing or incomplete. I propose a multimodal-embedding approach that replaces those attributes with low-dimensional features extracted from product images and texts by pre-trained deep-learning models. Embedding principal components are entered—alongside price—into a mixed logit, allowing visual and textual similarity to discipline cross-price elasticities. Applied to 3,478 Amazon purchases in a 25-item Headsets category, adding just two image principal components from a ResNet-50 encoder lowers the Akaike Information Criterion by 296 points relative to a price-only logit and reduces the out-of-sample mean absolute error of market-share forecasts by 22%. Diversion ratios become more concentrated, raising the category-level Herfindahl–Hirschman Index from 0.073 to 0.088 (+21%), which reveals tighter competition within visually defined sub-segments such as mid-range gaming headsets. These results demonstrate that information already present in product pages can materially improve demand estimation, even when no structured attributes are available.

Details

from
to
Export
Download Full History