Salesforce is open sourcing the machine learning technology behind its Einstein AI platform.
Branded TransmogrifAI, the AutoML library is less than 10 lines of Scala code written on top of Apache Spark, and can be used by developers looking to train machine learning models to predict customer behaviour without having to use a large data set for training.
“It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time,” it states on its website.
In a lengthy Medium post last week, Shubha Nabar, senior director of data science on the Salesforce Einstein team wrote: “Three years ago when we set out to build machine learning capabilities into the Salesforce platform, we learned that building enterprise-scale machine learning systems is even harder.”
The key for Salesforce when it was developing Einstein was to be able to deliver smart insights and recommended actions without pooling all of their customer’s data together. This posed quite a serious challenge for the vendor before it acquired a bunch of machine learning specialist companies, including MetaMind and its founder Richard Socher, who is now chief scientist at Salesforce.
“Up to that point, if you couldn’t see or normalise the data, you can’t apply the intelligence,” Salesforce CEO Marc Benioff said. “We have massive amounts of data, petabytes and petabytes, so we have the data that we need and the answer is that we can now operate on that data without interfering with the trust relationship with our customers.”
Expanding on this, Nabar writes: “We have to build customer-specific machine learning models for any given use case. Even if we could build global models, it makes absolutely no sense to do so because every customer’s data is unique, with different schemas, different shapes, and different biases introduced by different business processes.
“In order to make machine learning truly work for our customers, we have to build and deploy thousands of personalised machine learning models trained on each individual customer’s data for every single use case.
“The only way to achieve this without hiring an army of data scientists is through automation. Most [AutoML] solutions today are either focused very narrowly on a small piece of the entire machine learning workflow, or are built for unstructured, homogenous data for images, voice and language.
“But we needed a solution that could rapidly produce data-efficient models for heterogeneous structured data at massive scale.”
The end product is a way of building a single, modular machine learning model that can function across smaller, more personalised sets of data, giving the impression of multiple, domain-specific models.
Nabar explained: “With just a few lines of code, a data scientist can automate data cleansing, feature engineering, and model selection to arrive at a performant model from which she can explore and iterate further.
“TransmogrifAI has been transformational for us, enabling our data scientists to deploy thousands of models in production with minimal hand tuning and reducing the average turn-around time for training a performant model from weeks to just a couple of hours.
“While this level of automation has been essential for us to scale for enterprise purposes, we believe that every business today has more machine learning use cases than it has data scientists, and automation is key to bringing the power of machine learning within reach.”
Explaining why Salesforce decided to bring this project to the open source community, Nabar says: “Salesforce has been a long-time user and contributor to Apache Spark, and we are excited to continue to build TransmogrifAI alongside the community.
“Machine learning has the potential to transform how businesses operate, and we believe that barriers to adoption can only be lowered through an open exchange of ideas and code. By working in the open we can bring together diverse perspectives to continue to push the technology forward and make it accessible to everyone.”