River for Online Machine Learning in Python

Sep 30, 2021 Posted by Dr. Nidhi Arora in Artificial Intelligence, Programming | 2 comments

River for Online Machine Learning in Python

Facebook

Twitter

River is a Python library for online machine learning. The library lets you train machine learning models on streaming data.

Introduction

All the traditional machine learning algorithms whether it is as simple as linear regression or strong learner algorithms like xgboost, all the algorithms process the data in batches. This means that these algorithms look at the complete dataset and fit the model. In case, there is new data available, it requires model fitting from scratch considering both; the new and old data.

There can be many challenges in re-training the model. Sometimes, it requires lot of memory to hold all the data which can affect training adversely and can make the process slower. In some other cases, it can be limited to the data storage infrastructure. It is almost impossible to retrieve the older data in some applications which keeps on generating new data.

One of the solutions to deal with above challenges is to do online training with streaming data. The continuously generated data is considered as a stream which makes it stream learning or incremental learning. This method is broadly suitable for IoT applications in which the real-time data is collected by sensors.

What is Online Machine Learning?

Online machine learning is a technique used for training machine learning models in those applications where it is either involves impracticable computations to train the model on the whole dataset or where the data is available time to time in sequential ordering. As the data is found to be in motion and keeps changing, it is required to capture the behavior of streaming data to be able to process it whenever it is available.The method is useful in the settings where the algorithm is required to dynamically adapt to new patterns available in the data over a period of time.

River: The Online Machine Learning Library

River is a Python package for online machine learning. It provides an array of incremental learning algorithms including supervised and unsupervised learning. It is a combined package consisting of Creme and Scikit-Multiflow.

River like creme has a similar API like Scikit-learn and that’s why also known to be the Scikit-learn for online machine learning. It supports almost all the different ML estimators and transformers specially built for streaming data. It has wide range of supported models including naïve Bayes, tree-ensemble models, factorization machines, linear models, and many more. A complete listing of algorithms is available here.

Some of the differences between the libraries and frameworks used for model training on data and streaming data is as follows:

Model Training on Data at Rest	Model Training on Data in Motion
Tensorflow Scikit-learn PyTorch Caffe Spark	Creme Scikit-Multiflow River SOA Spark Streaming

Thanks to River, it has brought about the possibilities of deal with data on the go with online learning as opposed to offline learning.

By: Dr. Nidhi Arora

Nidhi Arora is a technophile and a fervent scholar with deep expertise in web technologies. She has been a dedicated enthusiast in exploring fundamentals of high-end technologies and has written many research papers in National and International journals.
Visit the author’s website
Quick Scraper – A Solution to your Scraping Needs
Tips to Boost your Business Growth through Data Analytics
What is Smart Home Technology?
Are Humans Still Better Than Artificial Intelligent Machines?
River For Online Machine Learning: An Example

See all this author’s posts

More from my site

2 Responses to “River for Online Machine Learning in Python”

Mark Ruffle says:

October 26, 2021 at 11:42 am

Thanks for posting this article. I will go ahead and read more on it.

Reply
Ketki Iyer says:

October 26, 2021 at 11:40 am

This is really useful. We have been struggling a lot with streaming data.

Reply

River for Online Machine Learning in Python

River is a Python library for online machine learning. The library lets you train machine learning models on streaming data.

Introduction

What is Online Machine Learning?

River: The Online Machine Learning Library

More from my site

2 Responses to “River for Online Machine Learning in Python”

Leave a Reply Cancel reply

Testimonials

Appropriate for all Interest Groups

Visually Attractive

Specialized Communication Vehicle

Exemplary

Practical Tech Tips

Tags

Archives

Archive