2021-SIGIR-Path-based Deep Network for Candidate Item Matching in Recommenders_综合

Path-based Deep Network for Candidate Item Matching in Recommenders

2021-SIGIR-阿里、蚂蚁

介绍了工业界主流的两种召回方式及对应的特点，提出PDN整合两种召回方式。

a matching stage is expected to retrieve a small fraction of relevant items in low latency and computational cost
a ranking stage aims to refine the ranking of these relevant items in terms of the user’s interest with more complex models

本文关注于召回阶段

item-based CF（item-to-item based collaborative filtering），基于物品的共现模式估计两个Item的相关性。
- 缺点：传统的反序索引很难满足个性化的需要；只考虑Item共现关系，没有使用附加信息，遭受稀疏性问题
EBR（embedding-based retrieval），通常使用双塔模型。通过分别嵌入u i的特征来表示用户和物品，将问题转化为在嵌入空间中找最近邻。
- 缺点：双塔模型很难显式整合物品间的共现关系；一个用户总是表示成一个嵌入向量，不合适编码用户的多样化兴趣
为了同时捕获用户的多样化和个性化兴趣，通常使用多种策略（各种不同网络结构的协同过滤倒序索引和EBR策略）（即多路召回？？）。这些模型并行实施，候选集的score是在不同的尺度下计算得来的，很难直接融合这些分数。

在这里插入图片描述

在这里插入图片描述
u和target item的match score由以下组成：

直接的u-i
n条2-hop-path的得分（u-history item-target item）
- TrigNet,history item对于u的权重
- SimNet,history item对于target item的相关性

四个特征域：用户域、用户行为域、物品共现域、物品域
user id, item id, age id, brand id, monthly sales, stay time, the statistical correlation between items…
将所有的dense特征离散化转化成one-hot型特征

在这里插入图片描述

在这里插入图片描述

关于选择偏差特征（位置特征，小时特征）的一个浅层网络（训练时加，在线不加）

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述