You’ve trained machine learning models on your data, but how do you put them into production? When you have tens of thousands of model versions, each written in any mix of frameworks (R/Java/Ruby/SciKit/Caffe/Tensorflow on GPUs etc) and exposed as REST API endpoints, and your users love to chain algorithms and run ensembles in parallel… how do you maintain a latency less than 20ms on just a few servers?
AI has been a hot topic lately, with advances being made constantly in what is possible, but there has not been as much discussion of the infrastructure and scaling challenges that come with it. At Algorithmia, we’ve built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework (from scikit-learn to tensorflow). We’ve seen many of the challenges faced in this area, and in this talk I’ll share some insights into the problems you’re likely to face, and how to approach solving them.
In brief, we’ll examine the need for, and implementations of, a complete “Operating System for AI” – a common interface for different algorithms to be used and combined, and a general architecture for serverless machine learning which is discoverable, versioned, scalable and sharable.
A full-stack developer with two decades of industry experience, Jon Peck now focuses on bringing scalable, discoverable, and secure machine-learning microservices to developers across a wide variety of platforms via Algorithmia.com--------Speaker at: DeveloperWeek 2018+19, SeattleJS... Read More →