RecSys for B2B platform

I am working on RecSys to generate product recommendations for ABI’s B2B platform BEEs. Some of the challenges involved in the project include building AutoML for best hyper-parameter selection, distributed model training. feature store integration, building a python library for curated ML models with default configs, deployment of models in cloud native compute and many more. Super excited to work in this work stream with an amazing team.

  • Cross validation: How to perform cross validation for RecSys. How to link statistical metrics with business KPIs. Determining weighage between model goodness of fit and business KPIs. How to create a scoring function which can compare between different models during cross validation. Managing splitting strategy to ensure that models are comparable.
  • Model selection: Single model or market-based model or hybrid model - combined of two or more models? Time/Sequence based models(LSTM/GRU)?
  • Hyper parameter tuning: What can be the preferable hyper-parameter tuning framework, which can support GPU (Wide and Deep), Spark (ALS) and CPU (SAR etc.).
  • KPIs: Evaluate existing KPIs such as Map@K, NDCG@K and improve if possible.
  • Hybrid model or mixture of model: Also, what type of hybrid - sequential, parallel or weighted? As of now, two use case (conceptually)
  • AutoML: Example of AutoML for multi-country setup (including hybrid model, hyper-parameter tuning) with recommended tech stack.
  • Model drift, data drift, retraining and model monitoring: How to build a framework which can be integrated with the python library to detect model drift, data drift, retraining requirements and monitor generated results in online and offline models.
  • Others: Backtesting, AB testing, linking online and offline evaluation.
  • Code spaces: How a developer can use code space for CPU based workflow for day-to-day development. Managing multiple envs base of Spark/GPU/CPU dependencies using devcontainer. Can the same image be used in AML/ADB?
  • AML + VS Code/Code Space Integration: Attaching AML compute to VS Code as terminal and jupyter kernel. Run experiments in AML without leaving VS Code/Code spaces. Triggering multiple concurrent jobs (not always same as distributed model training. Some of our models are classical models which we are running multiple times as embarrassingly parallel workload) in AML from VS Code which can scale in multiple nodes to run different models and return results in a fan out fan in pattern to a cloud storage. (One additional information here is we want to leverage all the cores within a node using joblib, hence the auto scaling we are expecting is at node level for a given threshold) mlflow integration with VS Code and AML.
  • ADB + VS Code/Code Space Integration: Run experiment in ADB without leaving VS Code/Code spaces.
  • Debugging: Using VS Code visual debugger in a distributed workflow in AML & ADB.
  • Observability: Monitoring aggregated logs from different nodes in VS Code.
  • Testing: How to run property based testing for ML models in distributed compute environments.
  • Library: Managing multiple dependencies such as pyspark, GPU and CPU level system dependencies. Usage of JIT within and across models taking execution infra into account. Making library infra agnostic.

If you are excited about solving above mentioned challenges feel free to reach out to me.

Aritra Biswas
Aritra Biswas
Senior ML Engineer

My research interests include computational statistics, causal inference, simulation and mathematical optimization.

Related