Knowledge Sharing
At Hyletic, we believe in the power of knowledge sharing to foster growth, innovation, and collaboration within our organization. Our knowledge sharing initiatives aim to create a culture of continuous learning, where employees can freely exchange ideas, insights, and expertise to drive collective success.
Early in the days of the ENG Team formation we started an issue to share articles, books and resources within our team and Hyletic. We are moving some of these resources to a shared handbook page. We aim to update this frequently.
Our approach to performance problems
We are life-long learners and every new challenge we take on means lessons were learned, and we think that other teams can benefit from us documenting these.
Table of Contents
- On this page
- Knowledge Sharing
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- Infra
- MLOps Platforms
- Practices
- Team structure
- Fails
Data Quality
- Reliable and Scalable Data Ingestion at Airbnb
Airbnb
2016
- Monitoring Data Quality at Scale with Statistical Modeling
Uber
2017
- Data Management Challenges in Production Machine Learning (Paper)
Google
2017
- Automating Large-Scale Data Quality Verification (Paper)
Amazon
2018
- Meet Hodor — Gojek’s Upstream Data Quality Tool
Gojek
2019
- An Approach to Data Quality for Netflix Personalization Systems
Netflix
2020
- Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper)
Facebook
2020
Data Engineering
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb
2018
- Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Airbnb
2020
- Unbundling Data Science Workflows with Metaflow and AWS Step Functions
Netflix
2020
- How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand
DoorDash
2020
- Revolutionizing Money Movements at Scale with Strong Data Consistency
Uber
2020
- Zipline - A Declarative Feature Engineering Framework
Airbnb
2020
- Automating Data Protection at Scale, Part 1 (Part 2)
Airbnb
2021
- Real-time Data Infrastructure at Uber
Uber
2021
Data Discovery
- Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code)
Apache
- Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code)
WeWork
- Discovery and Consumption of Analytics Data at Twitter
Twitter
2016
- Democratizing Data at Airbnb
Airbnb
2017
- Databook: Turning Big Data into Knowledge with Metadata at Uber
Uber
2018
- Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code)
Netflix
2018
- Amundsen — Lyft’s Data Discovery & Metadata Engine
Lyft
2019
- Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code)
Lyft
2019
- DataHub: A Generalized Metadata Search & Discovery Tool (Code)
LinkedIn
2019
- Amundsen: One Year Later
Lyft
2020
- Using Amundsen to Support User Privacy via Metadata Collection at Square
Square
2020
- Turning Metadata Into Insights with Databook
Uber
2020
- DataHub: Popular Metadata Architectures Explained
LinkedIn
2020
- How We Improved Data Discovery for Data Scientists at Spotify
Spotify
2020
- How We’re Solving Data Discovery Challenges at Shopify
Shopify
2020
- Nemo: Data discovery at Facebook
Facebook
2020
- Exploring Data @ Netflix (Code)
Netflix
2021
Feature Stores
- Distributed Time Travel for Feature Generation
Netflix
2016
- Building the Activity Graph, Part 2 (Feature Storage Section)
LinkedIn
2017
- Fact Store at Scale for Netflix Recommendations
Netflix
2018
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb
2018
- Introducing Feast: An Open Source Feature Store for Machine Learning (Code)
Gojek
2019
- Michelangelo Palette: A Feature Engineering Platform at Uber
Uber
2019
- The Architecture That Powers Twitter's Feature Store
Twitter
2019
- Accelerating Machine Learning with the Feature Store Service
Condé Nast
2019
- Feast: Bridging ML Models and Data
Gojek
2020
- Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression
DoorDash
2020
- Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed
LinkedIn
2020
- Building a Feature Store
Monzo Bank
2020
- Butterfree: A Spark-based Framework for Feature Store Building (Code)
QuintoAndar
2020
- Building Riviera: A Declarative Real-Time Feature Engineering Framework
DoorDash
2021
- Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory
Uber
2021
- ML Feature Serving Infrastructure at Lyft
Lyft
2021
Classification
- Prediction of Advertiser Churn for Google AdWords (Paper)
Google
2010
- High-Precision Phrase-Based Document Classification on a Modern Scale (Paper)
LinkedIn
2011
- Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper)
Walmart
2014
- Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper)
NAVER
2016
- Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google
2017
- Discovering and Classifying In-app Message Intent at Airbnb
Airbnb
2019
- Teaching Machines to Triage Firefox Bugs
Mozilla
2019
- Categorizing Products at Scale
Shopify
2020
- How We Built the Good First Issues Feature
GitHub
2020
- Testing Firefox More Efficiently with Machine Learning
Mozilla
2020
- Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper)
Microsoft
2020
- Scalable Data Classification for Security and Privacy (Paper)
Facebook
2020
- Uncovering Online Delivery Menu Best Practices with Machine Learning
DoorDash
2020
- Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging
DoorDash
2020
- Deep Learning: Product Categorization and Shelving
Walmart
2021
- Large-scale Item Categorization for e-Commerce (Paper)
DianPing
,eBay
2021
Regression
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb
2017
- Using Machine Learning to Predict the Value of Ad Requests
Twitter
2020
- Open-Sourcing Riskquant, a Library for Quantifying Risk (Code)
Netflix
2020
- Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment
DoorDash
2020
Forecasting
- Engineering Extreme Event Forecasting at Uber with RNN
Uber
2017
- Forecasting at Uber: An Introduction
Uber
2018
- Transforming Financial Forecasting with Data Science and Machine Learning at Uber
Uber
2018
- Under the Hood of Gojek’s Automated Forecasting Tool
Gojek
2019
- BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video)
Google
2020
- Retraining Machine Learning Models in the Wake of COVID-19
DoorDash
2020
- Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code)
Atlassian
2020
- Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code)
Uber
2021
- Managing Supply and Demand Balance Through Machine Learning
DoorDash
2021
- Greykite: A flexible, intuitive, and fast forecasting library
LinkedIn
2021
Recommendation
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper)
Amazon
2003
- Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2)
Netflix
2012
- How Music Recommendation Works — And Doesn’t Work
Spotify
2012
- Learning to Rank Recommendations with the k -Order Statistic Loss (Paper)
Google
2013
- Recommending Music on Spotify with Deep Learning
Spotify
2014
- Learning a Personalized Homepage
Netflix
2015
- Session-based Recommendations with Recurrent Neural Networks (Paper)
Telefonica
2016
- Deep Neural Networks for YouTube Recommendations
YouTube
2016
- E-commerce in Your Inbox: Product Recommendations at Scale (Paper)
Yahoo
2016
- To Be Continued: Helping you find shows to continue watching on Netflix
Netflix
2016
- Personalized Recommendations in LinkedIn Learning
LinkedIn
2016
- Personalized Channel Recommendations in Slack
Slack
2016
- Recommending Complementary Products in E-Commerce Push Notifications (Paper)
Alibaba
2017
- Artwork Personalization at Netflix
Netflix
2017
- A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper)
Twitter
2017
- Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper)
Pinterest
2017
- How 20th Century Fox uses ML to predict a movie audience (Paper)
20th Century Fox
2018
- Calibrated Recommendations (Paper)
Netflix
2018
- Food Discovery with Uber Eats: Recommending for the Marketplace
Uber
2018
- Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper)
Spotify
2018
- Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper)
Alibaba
2019
- SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper)
Alibaba
2019
- Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper)
Alibaba
2019
- Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor
2019
- Powered by AI: Instagram’s Explore recommender system
Facebook
2019
- Marginal Posterior Sampling for Slate Bandits (Paper)
Netflix
2019
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
2019
- Music recommendation at Spotify
Spotify
2019
- Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox
2019
- Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox
2019
- Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)
LinkedIn
2019
- Temporal-Contextual Recommendation in Real-Time (Paper)
Amazon
2020
- P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper)
Amazon
2020
- Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper)
Alibaba
2020
- TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper)
Alibaba
2020
- PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper)
Alibaba
2020
- Controllable Multi-Interest Framework for Recommendation (Paper)
Alibaba
2020
- MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper)
Alibaba
2020
- ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper)
Alibaba
2020
- For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify
2020
- Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify
2020
- Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper)
Spotify
2020
- The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify
2020
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn
2020
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn
2020
- Building a Heterogeneous Social Network Recommendation System
LinkedIn
2020
- How TikTok recommends videos #ForYou
ByteDance
2020
- Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper)
Google
2020
- Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper)
Google
2020
- Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper)
Google
2020
- Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper)
Tencent
2020
- A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper)
Home Depot
2020
- Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper)
Ikea
2020
- How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads
Pinterest
2020
- Multi-task Learning for Related Products Recommendations at Pinterest
Pinterest
2020
- Improving the Quality of Recommended Pins with Lightweight Ranking
Pinterest
2020
- Personalized Cuisine Filter Based on Customer Preference and Local Popularity
DoorDash
2020
- How We Built a Matchmaking Algorithm to Cross-Sell Products
Gojek
2020
- Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper)
Twitter
2021
- Self-supervised Learning for Large-scale Item Recommendations (Paper)
Google
2021
- Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper)
ByteDance
2021
- Using AI to Help Health Experts Address the COVID-19 Pandemic
Facebook
2021
- Advertiser Recommendation Systems at Pinterest
Pinterest
2021
- On YouTube's Recommendation System
YouTube
2021
Search & Ranking
- Amazon Search: The Joy of Ranking Products (Paper, Video, Code)
Amazon
2016
- How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada
2016
- Ranking Relevance in Yahoo Search (Paper)
Yahoo
2016
- Learning to Rank Personalized Search Results in Professional Networks (Paper)
LinkedIn
2016
- Using Deep Learning at Scale in Twitter’s Timelines
Twitter
2017
- An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper)
Etsy
2017
- Powering Search & Recommendations at DoorDash
DoorDash
2017
- Applying Deep Learning To Airbnb Search (Paper)
Airbnb
2018
- In-session Personalization for Talent Search (Paper)
LinkedIn
2018
- Talent Search and Recommendation Systems at LinkedIn (Paper)
LinkedIn
2018
- Food Discovery with Uber Eats: Building a Query Understanding Engine
Uber
2018
- Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper)
Alibaba
2018
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba
2018
- Semantic Product Search (Paper)
Amazon
2019
- Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb
2019
- Entity Personalized Talent Search Models with Tree Interaction Features (Paper)
LinkedIn
2019
- The AI Behind LinkedIn Recruiter Search and recommendation systems
LinkedIn
2019
- Learning Hiring Preferences: The AI Behind LinkedIn Jobs
LinkedIn
2019
- The Secret Sauce Behind Search Personalisation
Gojek
2019
- Neural Code Search: ML-based Code Search Using Natural Language Queries
Facebook
2019
- Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper)
Alibaba
2019
- Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search
Alibaba
2019
- Understanding Searches Better Than Ever Before (Paper)
Google
2019
- How We Used Semantic Search to Make Our Search 10x Smarter
Tokopedia
2019
- Query2vec: Search query expansion with query embeddings
GrubHub
2019
- MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search
Baidu
2019
- Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper)
Amazon
2020
- Managing Diversity in Airbnb Search (Paper)
Airbnb
2020
- Improving Deep Learning for Airbnb Search (Paper)
Airbnb
2020
- Quality Matches Via Personalized AI for Hirer and Seeker Preferences
LinkedIn
2020
- Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn
2020
- Ads Allocation in Feed via Constrained Optimization (Paper, Video)
LinkedIn
2020
- Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn
2020
- AI at Scale in Bing
Microsoft
2020
- Query Understanding Engine in Traveloka Universal Search
Traveloka
2020
- Bayesian Product Ranking at Wayfair
Wayfair
2020
- COLD: Towards the Next Generation of Pre-Ranking System (Paper)
Alibaba
2020
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest
2020
- Driving Shopping Upsells from Pinterest Search
Pinterest
2020
- GDMix: A Deep Ranking Personalization Framework (Code)
LinkedIn
2020
- Bringing Personalized Search to Etsy
Etsy
2020
- Building a Better Search Engine for Semantic Scholar
Allen Institute for AI
2020
- Query Understanding for Natural Language Enterprise Search (Paper)
Salesforce
2020
- Things Not Strings: Understanding Search Intent with Better Recall
DoorDash
2020
- Query Understanding for Surfacing Under-served Music Content (Paper)
Spotify
2020
- Embedding-based Retrieval in Facebook Search (Paper)
Facebook
2020
- Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper)
JD
2020
- QUEEN: Neural query rewriting in e-commerce (Paper)
Amazon
2021
- Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper)
Amazon
2021
- Seasonal relevance in e-commerce search (Paper)
Amazon
2021
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba
2021
- How We Built A Context-Specific Bidding System for Etsy Ads
Etsy
2021
- Pre-trained Language Model based Ranking in Baidu Search (Paper)
Baidu
2021
- Stitching together spaces for query-based recommendations
Stitch Fix
2021
- Deep Natural Language Processing for LinkedIn Search Systems (Paper)
LinkedIn
2021
- Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset (Paper, Code)
Seznam
2021
Embeddings
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper)
Sears
2017
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper)
Alibaba
2018
- Embeddings@Twitter
Twitter
2018
- Listing Embeddings in Search Ranking (Paper)
Airbnb
2018
- Understanding Latent Style
Stitch Fix
2018
- Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper)
LinkedIn
2018
- Personalized Store Feed with Vector Embeddings
DoorDash
2018
- Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper)
Moshbit
2019
- Machine Learning for a Better Developer Experience
Netflix
2020
- Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code)
Google
2020
- Embedding-based Retrieval at Scribd
Scribd
2021
Natural Language Processing
- Abusive Language Detection in Online User Content (Paper)
Yahoo
2016
- Smart Reply: Automated Response Suggestion for Email (Paper)
Google
2016
- Building Smart Replies for Member Messages
LinkedIn
2017
- How Natural Language Processing Helps LinkedIn Members Get Support Easily
LinkedIn
2019
- Gmail Smart Compose: Real-Time Assisted Writing (Paper)
Google
2019
- Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper)
Amazon
2019
- Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix
2019
- DeText: A deep NLP Framework for Intelligent Text Understanding (Code)
LinkedIn
2020
- SmartReply for YouTube Creators
Google
2020
- Using Neural Networks to Find Answers in Tables (Paper)
Google
2020
- A Scalable Approach to Reducing Gender Bias in Google Translate
Google
2020
- Assistive AI Makes Replying Easier
Microsoft
2020
- AI Advances to Better Detect Hate Speech
Facebook
2020
- A State-of-the-Art Open Source Chatbot (Paper)
Facebook
2020
- A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook
2020
- Deep Learning to Translate Between Programming Languages (Paper, Code)
Facebook
2020
- Deploying Lifelong Open-Domain Dialogue Learning (Paper)
Facebook
2020
- Introducing Dynabench: Rethinking the way we benchmark AI
Facebook
2020
- How Gojek Uses NLP to Name Pickup Locations at Scale
Gojek
2020
- The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper)
Baidu
2020
- PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code)
Google
2020
- Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo)
Salesforce
2020
- GeDi: A Powerful New Method for Controlling Language Models (Paper, Code)
Salesforce
2020
- Applying Topic Modeling to Improve Call Center Operations
RICOH
2020
- WIDeText: A Multimodal Deep Learning Framework
Airbnb
2020
- Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code)
Facebook
2021
- How we reduced our text similarity runtime by 99.96%
Microsoft
2021
- Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models)
Facebook
2021
Sequence Modelling
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper)
Sutter Health
2015
- Deep Learning for Understanding Consumer Histories (Paper)
Zalando
2016
- Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper)
Sutter Health
2016
- Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper)
Telefonica
2017
- Deep Learning for Electronic Health Records (Paper)
Google
2018
- Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba
2019
- Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper)
Alibaba
2020
- How Duolingo uses AI in every part of its app
Duolingo
2020
- Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video)
Facebook
2020
Computer Vision
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox
2017
- Categorizing Listing Photos at Airbnb
Airbnb
2018
- Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb
Airbnb
2019
- How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic
- Making machines recognize and transcribe conversations in meetings using audio and video
Microsoft
2019
- Powered by AI: Advancing product understanding and building new shopping experiences
Facebook
2020
- A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper)
Google
2020
- Machine Learning-based Damage Assessment for Disaster Relief (Paper)
Google
2020
- RepNet: Counting Repetitions in Videos (Paper)
Google
2020
- Converting Text to Images for Product Discovery (Paper)
Amazon
2020
- How Disney Uses PyTorch for Animated Character Recognition
Disney
2020
- Image Captioning as an Assistive Technology (Video)
IBM
2020
- AI for AG: Production machine learning for agriculture
Blue River
2020
- AI for Full-Self Driving at Tesla
Tesla
2020
- On-device Supermarket Product Recognition
Google
2020
- Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper)
Google
2020
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest
2020
- Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper)
Google
2020
- Vision-based Price Suggestion for Online Second-hand Items (Paper)
Alibaba
2020
- New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model)
Facebook
2021
- An Efficient Training Approach for Very Large Scale Face Recognition (Paper)
Alibaba
2021
- Identifying Document Types at Scribd
Scribd
2021
- Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper)
Walmart
2021
Reinforcement Learning
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper)
Alibaba
2018
- Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper)
Alibaba
2018
- Reinforcement Learning for On-Demand Logistics
DoorDash
2018
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba
2018
- Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper)
Alibaba
2019
- Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga
2020
- Deep Reinforcement Learning in Production Part1 Part 2
Zynga
2020
- Building AI Trading Systems
Denny Britz
2020
Anomaly Detection
- Detecting Performance Anomalies in External Firmware Deployments
Netflix
2019
- Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code)
LinkedIn
2019
- Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video)
Swedbank
,Hopsworks
2019
- Preventing Abuse Using Unsupervised Learning
LinkedIn
2020
- The Technology Behind Fighting Harassment on LinkedIn
LinkedIn
2020
- Uncovering Insurance Fraud Conspiracy with Network Learning (Paper)
Ant Financial
2020
- How Does Spam Protection Work on Stack Exchange?
Stack Exchange
2020
- Auto Content Moderation in C2C e-Commerce
Mercari
2020
- Blocking Slack Invite Spam With Machine Learning
Slack
2020
- Cloudflare Bot Management: Machine Learning and More
Cloudflare
2020
- Anomalies in Oil Temperature Variations in a Tunnel Boring Machine
SENER
2020
- Using Anomaly Detection to Monitor Low-Risk Bank Customers
Rabobank
2020
- Fighting fraud with Triplet Loss
OLX Group
2020
- Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative)
Facebook
2020
- How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4
Facebook
2020
Graph
- Building The LinkedIn Knowledge Graph
LinkedIn
2016
- Scaling Knowledge Access and Retrieval at Airbnb
Airbnb
2018
- Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)
Pinterest
2018
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
2019
- AliGraph: A Comprehensive Graph Neural Network Platform (Paper)
Alibaba
2019
- Contextualizing Airbnb by Building Knowledge Graph
Airbnb
2019
- Retail Graph — Walmart’s Product Knowledge Graph
Walmart
2020
- Traffic Prediction with Advanced Graph Neural Networks
DeepMind
2020
- SimClusters: Community-Based Representations for Recommendations (Paper, Video)
Twitter
2020
- Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper)
Alibaba
2021
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba
2021
- JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper)
JPMorgan Chase
2021
Optimization
- Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)
Lyft
2016
- The Data and Science behind GrabShare Carpooling (Part 1) (PAPER NEEDED)
Grab
2017
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber
2018
- Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash
2020
- Optimization of Passengers Waiting Time in Elevators Using Machine Learning
Thyssen Krupp AG
2020
- Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper)
Amazon
2020
- Optimizing DoorDash’s Marketing Spend with Machine Learning
DoorDash
2020
Information Extraction
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper)
Rakuten
2013
- Using Machine Learning to Index Text from Billions of Images
Dropbox
2018
- Extracting Structured Data from Templatic Documents (Paper)
Google
2020
- AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video)
Amazon
2020
- One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper)
Alibaba
2020
- Information Extraction from Receipts with Graph Convolutional Networks
Nanonets
2021
Weak Supervision
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper)
Google
2019
- Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper)
Intel
2019
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple
2019
- Bootstrapping Conversational Agents with Weak Supervision (Paper)
IBM
2019
Generation
- Better Language Models and Their Implications (Paper)
OpenAI
2019
- Image GPT (Paper, Code)
OpenAI
2019
- Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post)
OpenAI
2020
- Deep Learned Super Resolution for Feature Film Production (Paper)
Pixar
2020
- Unit Test Case Generation with Transformers
Microsoft
2021
Audio
- Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)
Google
2020
- The Machine Learning Behind Hum to Search
Google
2020
Validation and A/B Testing
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper)
Google
2010
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper)
Google
2015
- Twitter Experimentation: Technical Overview
Twitter
2015
- It’s All A/Bout Testing: The Netflix Experimentation Platform
Netflix
2016
- Building Pinterest’s A/B Testing Platform
Pinterest
2016
- Experimenting to Solve Cramming
Twitter
2017
- Building an Intelligent Experimentation Platform with Uber Engineering
Uber
2017
- Scaling Airbnb’s Experimentation Platform
Airbnb
2017
- Meet Wasabi, an Open Source A/B Testing Platform (Code)
Intuit
2017
- Analyzing Experiment Outcomes: Beyond Average Treatment Effects
Uber
2018
- Under the Hood of Uber’s Experimentation Platform
Uber
2018
- Constrained Bayesian Optimization with Noisy Experiments (Paper)
Facebook
2018
- Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
Grab
2018
- Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code)
Better
2019
- Detecting Interference: An A/B Test of A/B Tests
LinkedIn
2019
- Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper)
Uber
2020
- Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka
2020
- Large Scale Experimentation at Stitch Fix (Paper)
Stitch Fix
2020
- Multi-Armed Bandits and the Stitch Fix Experimentation Platform
Stitch Fix
2020
- Experimentation with Resource Constraints
Stitch Fix
2020
- Computational Causal Inference at Netflix (Paper)
Netflix
2020
- Key Challenges with Quasi Experiments at Netflix
Netflix
2020
- Making the LinkedIn experimentation engine 20x faster
LinkedIn
2020
- Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn
LinkedIn
2020
- How to Use Quasi-experiments and Counterfactuals to Build Great Products
Shopify
2020
- Improving Experimental Power through Control Using Predictions as Covariate
DoorDash
2020
- Supporting Rapid Product Iteration with an Experimentation Analysis Platform
DoorDash
2020
- Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity
DoorDash
2020
- Leveraging Causal Modeling to Get More Value from Flat Experiment Results
DoorDash
2020
- Iterating Real-time Assignment Algorithms Through Experimentation
DoorDash
2020
- Spotify’s New Experimentation Platform (Part 1) (Part 2)
Spotify
2020
- Interpreting A/B Test Results: False Positives and Statistical Significance
Netflix
2021
- Interpreting A/B Test Results: False Negatives and Power
Netflix
2021
- Running Experiments with Google Adwords for Campaign Optimization
DoorDash
2021
- The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%
DoorDash
2021
- Experimentation Platform at Zalando: Part 1 - Evolution
Zalando
2021
- Designing Experimentation Guardrails
Airbnb
2021
- Network Experimentation at Scale(Paper]
Facebook
2021
- Universal Holdout Groups at Disney Streaming
Disney
2021
Model Management
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast
2018
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple
2019
- Runway - Model Lifecycle Management at Netflix
Netflix
2020
- Managing ML Models @ Scale - Intuit’s ML Platform
Intuit
2020
- ML Model Monitoring - 9 Tips From the Trenches
Nubank
2021
Efficiency
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper)
Facebook
2020
- How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
Roblox
2020
- Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper)
Uber
2021
Ethics
- Building Inclusive Products Through A/B Testing (Paper)
LinkedIn
2020
- LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper)
LinkedIn
2020
Infra
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook
2020
- Elastic Distributed Training with XGBoost on Ray
Uber
2021
MLOps Platforms
- Meet Michelangelo: Uber’s Machine Learning Platform
Uber
2017
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast
2018
- Big Data Machine Learning Platform at Pinterest
Pinterest
2019
- Core Modeling at Instagram
Instagram
2019
- Open-Sourcing Metaflow - a Human-Centric Framework for Data Science
Netflix
2019
- Managing ML Models @ Scale - Intuit’s ML Platform
Intuit
2020
- Real-time Machine Learning Inference Platform at Zomato
Zomato
2020
- Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform
Lyft
2020
- Building Flexible Ensemble ML Models with a Computational Graph
DoorDash
2021
- LyftLearn: ML Model Training Infrastructure built on Kubernetes
Lyft
2021
- "You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper)
Coveo
2021
- MLOps at GreenSteam: Shipping Machine Learning
GreenSteam
2021
- Evolving Reddit’s ML Model Deployment and Serving Architecture
Reddit
2021
Practices
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Yoshua Bengio
2012
- Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper)
Google
2014
- Rules of Machine Learning: Best Practices for ML Engineering
Google
2018
- On Challenges in Machine Learning Model Management
Amazon
2018
- Machine Learning in Production: The Booking.com Approach
Booking
2019
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Booking
2019
- Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank
Rabobank
2019
- Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper)
Cambridge
2020
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook
2020
- The problem with AI developer tools for enterprises
Databricks
2020
- Continuous Integration and Deployment for Machine Learning Online Serving and Models
Uber
2021
- Tuning Model Performance
Uber
2021
- Maintaining Machine Learning Model Accuracy Through Monitoring
DoorDash
2021
- Building Scalable and Performant Marketing ML Systems at Wayfair
Wayfair
2021
- Our approach to building transparent and explainable AI systems
LinkedIn
2021
- 5 Steps for Building Machine Learning Models for Business
Shopify
2021
Team structure
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix
2016
- Building The Analytics Team At Wish
Wish
2018
- Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist
Stitch Fix
2019
- Cultivating Algorithms: How We Grow Data Science at Stitch Fix
Stitch Fix
- Analytics at Netflix: Who We Are and What We Do
Netflix
2020
- Building a Data Team at a Mid-stage Startup: A Short Story
Erikbern
2021
Fails
- When It Comes to Gorillas, Google Photos Remains Blind
Google
2010
- 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate
2020
- An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor
Harrisburg University
2020
- It's Hard to Generate Neural Text From GPT-3 About Muslims
OpenAI
2020
- A British AI Tool to Predict Violent Crime Is Too Flawed to Use
United Kingdom
2020
- More in awful-ai
P.S., Want a summary of ML advancements? Get up to speed with survey papers 👉ml-surveys