99 research outputs found
Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models
Sequential recommendation aims to estimate the dynamic user preferences and
sequential dependencies among historical user behaviors. Although
Transformer-based models have proven to be effective for sequential
recommendation, they suffer from the inference inefficiency problem stemming
from the quadratic computational complexity of attention operators, especially
for long behavior sequences. Inspired by the recent success of state space
models (SSMs), we propose Mamba4Rec, which is the first work to explore the
potential of selective SSMs for efficient sequential recommendation. Built upon
the basic Mamba block which is a selective SSM with an efficient hardware-aware
parallel algorithm, we design a series of sequential modeling techniques to
further promote model performance while maintaining inference efficiency.
Through experiments on public datasets, we demonstrate how Mamba4Rec
effectively tackles the effectiveness-efficiency dilemma, outperforming both
RNN- and attention-based baselines in terms of both effectiveness and
efficiency. The code is available at https://github.com/chengkai-liu/Mamba4Rec
Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference
While increasingly deep networks are still in general desired for achieving
state-of-the-art performance, for many specific inputs a simpler network might
already suffice. Existing works exploited this observation by learning to skip
convolutional layers in an input-dependent manner. However, we argue their
binary decision scheme, i.e., either fully executing or completely bypassing
one layer for a specific input, can be enhanced by introducing finer-grained,
"softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS)
framework. The core idea of DFS is to hypothesize layer-wise quantization (to
different bitwidths) as intermediate "soft" choices to be made between fully
utilizing and skipping a layer. For each input, DFS dynamically assigns a
bitwidth to both weights and activations of each layer, where fully executing
and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero
bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive
power during input-adaptive inference, enabling finer-grained
accuracy-computational cost trade-offs. It presents a unified view to link
input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive
experimental results demonstrate the superior tradeoff between computational
cost and model expressive power (accuracy) achieved by DFS. More visualizations
also indicate a smooth and consistent transition in the DFS behaviors,
especially the learned choices between layer skipping and different
quantizations when the total computational budgets vary, validating our
hypothesis that layer quantization could be viewed as intermediate variants of
layer skipping. Our source code and supplementary material are available at
\link{https://github.com/Torment123/DFS}
ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation
Large language models have been flourishing in the natural language
processing (NLP) domain, and their potential for recommendation has been paid
much attention to. Despite the intelligence shown by the
recommendation-oriented finetuned models, LLMs struggle to fully understand the
user behavior patterns due to their innate weakness in interpreting numerical
features and the overhead for long context, where the temporal relations among
user behaviors, subtle quantitative signals among different ratings, and
various side features of items are not well explored. Existing works only
fine-tune a sole LLM on given text data without introducing that important
information to it, leaving these problems unsolved. In this paper, we propose
ELCoRec to Enhance Language understanding with CoPropagation of numerical and
categorical features for Recommendation. Concretely, we propose to inject the
preference understanding capability into LLM via a GAT expert model where the
user preference is better encoded by parallelly propagating the temporal
relations, and rating signals as well as various side information of historical
items. The parallel propagation mechanism could stabilize heterogeneous
features and offer an informative user preference encoding, which is then
injected into the language models via soft prompting at the cost of a single
token embedding. To further obtain the user's recent interests, we proposed a
novel Recent interaction Augmented Prompt (RAP) template. Experiment results
over three datasets against strong baselines validate the effectiveness of
ELCoRec. The code is available at
https://anonymous.4open.science/r/CIKM_Code_Repo-E6F5/README.md
M-scan: A Multi-Scenario Causal-driven Adaptive Network for Recommendation
We primarily focus on the field of multi-scenario recommendation, which poses
a significant challenge in effectively leveraging data from different scenarios
to enhance predictions in scenarios with limited data. Current mainstream
efforts mainly center around innovative model network architectures, with the
aim of enabling the network to implicitly acquire knowledge from diverse
scenarios. However, the uncertainty of implicit learning in networks arises
from the absence of explicit modeling, leading to not only difficulty in
training but also incomplete user representation and suboptimal performance.
Furthermore, through causal graph analysis, we have discovered that the
scenario itself directly influences click behavior, yet existing approaches
directly incorporate data from other scenarios during the training of the
current scenario, leading to prediction biases when they directly utilize click
behaviors from other scenarios to train models. To address these problems, we
propose the Multi-Scenario Causal-driven Adaptive Network M-scan). This model
incorporates a Scenario-Aware Co-Attention mechanism that explicitly extracts
user interests from other scenarios that align with the current scenario.
Additionally, it employs a Scenario Bias Eliminator module utilizing causal
counterfactual inference to mitigate biases introduced by data from other
scenarios. Extensive experiments on two public datasets demonstrate the
efficacy of our M-scan compared to the existing baseline models.Comment: This paper has been accepted by WWW'2
Large Language Models Make Sample-Efficient Recommender Systems
Large language models (LLMs) have achieved remarkable progress in the field
of natural language processing (NLP), demonstrating remarkable abilities in
producing text that resembles human language for various tasks. This opens up
new opportunities for employing them in recommender systems (RSs). In this
paper, we specifically examine the sample efficiency of LLM-enhanced
recommender systems, which pertains to the model's capacity to attain superior
performance with a limited quantity of training data. Conventional
recommendation models (CRMs) often need a large amount of training data because
of the sparsity of features and interactions. Hence, we propose and verify our
core viewpoint: Large Language Models Make Sample-Efficient Recommender
Systems. We propose a simple yet effective framework (i.e., Laser) to validate
the viewpoint from two aspects: (1) LLMs themselves are sample-efficient
recommenders; and (2) LLMs, as feature generators and encoders, make CRMs more
sample-efficient. Extensive experiments on two public datasets show that Laser
requires only a small fraction of training samples to match or even surpass
CRMs that are trained on the entire training set, demonstrating superior sample
efficiency.Comment: Accepted by Frontier of Computer Scienc
Towards Efficient and Effective Unlearning of Large Language Models for Recommendation
The significant advancements in large language models (LLMs) give rise to a
promising research direction, i.e., leveraging LLMs as recommenders (LLMRec).
The efficacy of LLMRec arises from the open-world knowledge and reasoning
capabilities inherent in LLMs. LLMRec acquires the recommendation capabilities
through instruction tuning based on user interaction data. However, in order to
protect user privacy and optimize utility, it is also crucial for LLMRec to
intentionally forget specific user data, which is generally referred to as
recommendation unlearning. In the era of LLMs, recommendation unlearning poses
new challenges for LLMRec in terms of \textit{inefficiency} and
\textit{ineffectiveness}. Existing unlearning methods require updating billions
of parameters in LLMRec, which is costly and time-consuming. Besides, they
always impact the model utility during the unlearning process. To this end, we
propose \textbf{E2URec}, the first \underline{E}fficient and
\underline{E}ffective \underline{U}nlearning method for LLM\underline{Rec}. Our
proposed E2URec enhances the unlearning efficiency by updating only a few
additional LoRA parameters, and improves the unlearning effectiveness by
employing a teacher-student framework, where we maintain multiple teacher
networks to guide the unlearning process. Extensive experiments show that
E2URec outperforms state-of-the-art baselines on two real-world datasets.
Specifically, E2URec can efficiently forget specific data without affecting
recommendation performance. The source code is at
\url{https://github.com/justarter/E2URec}.Comment: Accepted by Frontier of Computer Scienc
Behavior-Dependent Linear Recurrent Units for Efficient Sequential Recommendation
Sequential recommender systems aims to predict the users\u27 next interaction through user behavior modeling with various operators like RNNs and attentions. However, existing models generally fail to achieve the three golden principles for sequential recommendation simultaneously, i.e., training efficiency, low-cost inference, and strong performance. To this end, we propose RecBLR, an Efficient Sequential Recommendation Model based on Behavior-Dependent Linear Recurrent Units to accomplish the impossible triangle of the three principles. By incorporating gating mechanisms and behavior-dependent designs into linear recurrent units, our model significantly enhances user behavior modeling and recommendation performance. Furthermore, we unlock the parallelizable training as well as inference efficiency for our model by designing a hardware-aware scanning acceleration algorithm with a customized CUDA kernel. Extensive experiments on real-world datasets with varying lengths of user behavior sequences demonstrate RecBLR\u27s remarkable effectiveness in simultaneously achieving all three golden principles - strong recommendation performance, training efficiency, and low-cost inference, while exhibiting excellent scalability to datasets with long user interaction histories.Accepted to CIKM 202
Dual Dynamic Inference: Enabling More Efficient, Adaptive and Controllable Deep Inference
State-of-the-art convolutional neural networks (CNNs) yield record-breaking
predictive performance, yet at the cost of high-energy-consumption inference,
that prohibits their widely deployments in resource-constrained Internet of
Things (IoT) applications. We propose a dual dynamic inference (DDI) framework
that highlights the following aspects: 1) we integrate both input-dependent and
resource-dependent dynamic inference mechanisms under a unified framework in
order to fit the varying IoT resource requirements in practice. DDI is able to
both constantly suppress unnecessary costs for easy samples, and to halt
inference for all samples to meet hard resource constraints enforced; 2) we
propose a flexible multi-grained learning to skip (MGL2S) approach for
input-dependent inference which allows simultaneous layer-wise and channel-wise
skipping; 3) we extend DDI to complex CNN backbones such as DenseNet and show
that DDI can be applied towards optimizing any specific resource goals
including inference latency or energy cost. Extensive experiments demonstrate
the superior inference accuracy-resource trade-off achieved by DDI, as well as
the flexibility to control such trade-offs compared to existing peer methods.
Specifically, DDI can achieve up to 4 times computational savings with the same
or even higher accuracy as compared to existing competitive baselines
ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
With large language models (LLMs) achieving remarkable breakthroughs in
natural language processing (NLP) domains, LLM-enhanced recommender systems
have received much attention and have been actively explored currently. In this
paper, we focus on adapting and empowering a pure large language model for
zero-shot and few-shot recommendation tasks. First and foremost, we identify
and formulate the lifelong sequential behavior incomprehension problem for LLMs
in recommendation domains, i.e., LLMs fail to extract useful information from a
textual context of long user behavior sequence, even if the length of context
is far from reaching the context limitation of LLMs. To address such an issue
and improve the recommendation performance of LLMs, we propose a novel
framework, namely Retrieval-enhanced Large Language models (ReLLa) for
recommendation tasks in both zero-shot and few-shot settings. For zero-shot
recommendation, we perform semantic user behavior retrieval (SUBR) to improve
the data quality of testing samples, which greatly reduces the difficulty for
LLMs to extract the essential knowledge from user behavior sequences. As for
few-shot recommendation, we further design retrieval-enhanced instruction
tuning (ReiT) by adopting SUBR as a data augmentation technique for training
samples. Specifically, we develop a mixed training dataset consisting of both
the original data samples and their retrieval-enhanced counterparts. We conduct
extensive experiments on a real-world public dataset (i.e., MovieLens-1M) to
demonstrate the superiority of ReLLa compared with existing baseline models, as
well as its capability for lifelong sequential behavior comprehension.Comment: Under Revie
DisCo: Towards Harmonious Disentanglement and Collaboration between Tabular and Semantic Space for Recommendation
Recommender systems play important roles in various applications such as
e-commerce, social media, etc. Conventional recommendation methods usually
model the collaborative signals within the tabular representation space.
Despite the personalization modeling and the efficiency, the latent semantic
dependencies are omitted. Methods that introduce semantics into recommendation
then emerge, injecting knowledge from the semantic representation space where
the general language understanding are compressed. However, existing
semantic-enhanced recommendation methods focus on aligning the two spaces,
during which the representations of the two spaces tend to get close while the
unique patterns are discarded and not well explored. In this paper, we propose
DisCo to Disentangle the unique patterns from the two representation spaces and
Collaborate the two spaces for recommendation enhancement, where both the
specificity and the consistency of the two spaces are captured. Concretely, we
propose 1) a dual-side attentive network to capture the intra-domain patterns
and the inter-domain patterns, 2) a sufficiency constraint to preserve the
task-relevant information of each representation space and filter out the
noise, and 3) a disentanglement constraint to avoid the model from discarding
the unique information. These modules strike a balance between disentanglement
and collaboration of the two representation spaces to produce informative
pattern vectors, which could serve as extra features and be appended to
arbitrary recommendation backbones for enhancement. Experiment results validate
the superiority of our method against different models and the compatibility of
DisCo over different backbones. Various ablation studies and efficiency
analysis are also conducted to justify each model component
- …
