Machine learning platforms: How they perform and why it matters

Every time you buy a book Amazon recommends, query Siri on your iPhone, or trust the traffic prediction function in Google Maps to keep you rolling along, you’re putting your faith in machine learning (ML). That’s the technology behind personal assistants, traffic forecasts, shopping basket analysis, recommendation engines, and a host of other applications, such as fraud-detection software and robotics.

Here’s the good news about machine learning: “It’s the biggest part of what we call the Fourth Industrial Revolution,” says Professor of Information Systems Asim Roy. Here’s the bad news: Machine learning has become so crucial and complex, you’ve practically got to be an expert to make use of it, which leaves the technology inaccessible to many who could benefit from it.

Because of this, the race is on to automate machine learning via platforms that simplify the process of leveraging this data-driven, decision-making technology. Roy and a team of ASU master’s students evaluated these platforms to see how they performed. As it turns out, none of the ML platforms rose to the top of the pile, but some showed supremacy in certain types of problem-solving.  

For data crunchers, Roy’s research may help practitioners pick a platform to buy and use. For the rest of us – those who rely on machine learning daily – Roy offers insight into our digital helpers.

Thinking inside the box

What is machine learning? Wikipedia defines it as “a field of computer science that uses statistical techniques to progressively improve the performance of the computer program in detecting patterns in data.” Roy explains that by pointing to Einstein’s E=MC2 equation, “He had been looking at data and then he reduced it to a little formula that says it all and describes how things happen,” Roy notes. “Machine learning goes from data to a formula. It’s essentially boiling the data down.”

The formulas used for machine learning are algorithms, and they can perform various functions. Initially, machine learning involved algorithms called decisions trees, which are mathematical representations of tree-like graphs that identify possible outcomes. The graph typically starts with one node that branches out to reflect other possibilities. “Random forest” analysis involves using multiple decision trees for classification of items – i.e. this behavior is fraud or this behavior isn’t fraud – and other tasks.

Another type of algorithm – the support vector – also models data for classification and regression analysis. So do logistic regression algorithms. And, all of these, plus others, are the types of algorithms collected together in the machine learning platforms Roy tested.

All together now

From early ML algorithms, brain-like machine learning evolved. Called neural network algorithms, this is the technology that garners most of Roy’s research attention. Neural nets are based on the belief that the brain uses parallel processing to analyze things and make decisions. “You have cells or neurons in the brain and they do simple computations, but they do it in parallel. That’s what makes the brain so powerful,” says Roy, who also admits that this view of the brain is still theoretical. Still, this view creates another set of algorithms used by machine learning platforms that Roy and his team evaluated.

These platforms consist of collections of algorithms that attempt to simplify the process of doing things – such as determining credit-worthiness in a customer or ferreting out fraud – very simply. “IBM, SAS, Microsoft … they are all making attempts at simplifying things so that all you need to do is point to data and tell the machine to do a task,” Roy says. He adds: “Total automation is not there yet. Just grappling with the systems is huge, and you still have to understand how the algorithms work.”

Roy likens this to a world where cars are so complex, they can only be driven by a small group of highly trained specialists. “You couldn’t deploy millions of cars on the road if you need mechanics to drive them,” he says. “You need to get simplicity down to a level where the machine learning is more like Excel, software you can train a host of people to use. Unless you get to that simplified form, we can’t deploy machine learning on a wide scale.”

Four … societal transformation

Why do we need to deploy machine learning in a big way? Because, as Roy notes, it’s foundational to what many technologists call the Fourth Industrial Revolution.

During the First Industrial Revolution, people harnessed water and steam power to run machinery. The Second Industrial Revolution progressed to electric power and enabled mass production. The Third Industrial Revolution brought us information technology and process automation. Building on that, the Fourth Industrial Revolution is “blurring the lines between the physical, digital, and biological spheres,” says Klaus Schwab, founder and executive chair of the World Economic Forum in Geneva.

“There are three reasons why today’s transformations represent not merely a prolongation of the Third Industrial Revolution but rather the arrival of a fourth and distinct one: velocity, scope, and systems impact,” Schwab wrote in 2016. “The speed of current breakthroughs has no historical precedent. When compared with previous industrial revolutions, the Fourth Industrial Revolution is evolving at an exponential rather than a linear pace. Moreover, it is disrupting almost every industry in every country. And the breadth and depth of these changes herald the transformation of entire systems of production, management, and governance.”

What does the Fourth Industrial Revolution bring us? Think robotics, self-driving cars, 3D printing, as well as huge leaps forward in bio- and nanotechnology. Ahead, Schwab and others see synthetic biology to support our bodies, machines to replace us in our jobs, algorithms to guide us in business decisions, smart buildings, super smart electric grids, and more.

It’s all just beyond our reach. “With artificial intelligence still in its infancy and data analytics still highly dependent on human oversight, we are still some way away from truly autonomous, self-optimizing systems,” notes an EY blog on the topic.

A versus B versus R

To reach that next step in machine learning, several companies are turning to system platforms, such as those Roy and his students examined. These platforms contain collections of algorithms designed to take the human hand-holding out of ML and let the machines decide what approaches to take. This, Roy notes, is one approach to automation.

Do the platforms work? That’s what Roy and his team attempted to find out. The team tested performance and accuracy of several leaders in the ML platform world, including SAS, IBM‘s SPSS, Microsoft Azure ML, Apache Spark ML, Python, and R. Each of the systems under review was tested against 29 problems. Was there a winner? Not really. “They all appear to be about the same in terms of performance,” Roy says.

Still, his students think the paper that outlines their research results will be of value for data scientists, and they should know.

Shiban Qureshi is a data scientist with Progressive Leasing, a firm that supports merchants with leasing options. He’s used machine learning for a variety of tasks such as deciding if an applicant should get credit and, if so, how much. He also uses it for fraud detection, forecasting loan delinquencies, and profitability analyses.

“Data and business metrics are core aspects of the credit industry,” Qureshi says. “During early days, people who applied for credit were approved or denied based on gut feelings. Machine learning has made decision-making better and faster.”

Kartikeya Pande, another master’s student who contributed to Roy’s platform study, is an analytics consultant who currently uses SAS to predict buyer behaviors based on demographic characteristics.

Like Qureshi, Pande sees ML as vital to “any industry now.” He says it helps business professionals “move away from instinct-driven decisions or a limited scorecard-based evaluation to a more comprehensive one. ML helps to incorporate a large number of criteria and surmountable transactions to arrive at a decision.”

Pande joins Qureshi in noting that other data scientists could make better platform choices after looking at the 41-page write-up of this platform comparison research. Other evaluation sources – such as Gartner or Forrester report – use subjective, word-of-mouth views to rank ML platforms. Roy et al. used rigorous statistical analysis to see the actual performance.

“Our study is the first of its kind to compare the classification of algorithms across different platforms,” Pande says. Qureshi, who uses R and Data Robot himself, agrees. He says the research will help other practitioners “make a more informed choice about which algorithms and platforms to adopt.”

The paper got accepted in Informs Journal on Computing, as well as featured in Datanami, a news portal that covers emerging trends and solutions in Big Data. “This research has huge implications for the machine learning marketplace,” Roy says. “So, it became a big story almost instantaneously. Datanami’s Editor in Chief Alex Woodie wanted to cover the article, and it now ranks among the most read on the site.”

Roy and his master’s student team also have an invitation to do a summary article for Data Science Central.

By Betsy Loeff