Machine learning models — artificial intelligence (AI) that identifies relationships among hundreds, thousands, or even millions of data points — are rarely easy to architect. Data scientists spend weeks and months not only preprocessing the data on which the models are to be trained, but extracting useful features (i.e., the data types) from that data, narrowing down algorithms, and ultimately building (or attempting to build) a system that performs well not just within the confines of a lab, but in the real world.
Salesforce’s new toolkit aims to ease that burden somewhat. On GitHub today, the San Francisco-based cloud computing company published TransmogrifAI, an automated machine learning library for structured data — the kind of searchable, neatly categorized data found in spreadsheets and databases — that performs feature engineering, feature selection, and model training in just three lines of code.
It’s written in Scala and built on top of Apache Spark (some of the same technologies that power Salesforce AI platform Einstein) and was designed from the ground up for scalability. To that end, it can process datasets ranging from dozens to millions of rows and run on clustered machines on top of Spark or an off-the-shelf laptop.
Mayukh Bhaowal, director of product management for Salesforce Einstein, told VentureBeat in a phone interview that TransmogrifAI essentially transforms raw datasets into custom models. It’s the evolution of Salesforce’s in-house machine learning library, which allowed the Einstein team to deploy custom models for enterprise clients in just…