ClowdFlows - data mining workflows on the cloud

ClowdFlows is an open-sourced cloud-based platform for composition, execution, and sharing of interactive machine learning and data mining workflows. The basic idea behind ClowdFlow is to simplify the creation of analysis pipelines by encapsulating complex data analysis steps in simple operations. This approach helps abstract the complexity and the implementation details of complex data-parallel pipelines.

This platform core is written in Python using the Django framework. The GUI( Graphical User Interface) is implemented in HTML and JavaScript as it needs to be rendered and used in a browser. In order to distribute processing on several machines, headless instances of ClowdFlows can be easily started as workers using the Django Celery distributed task queue and connected to a distributed messaging broker such as RabbitMQ.

The unique feature of ClowdFlows tool that separates it from comparable open-sourced tools such as RapidMiner, Weka, KNIME, and Orange, is its web-based architecture. During run-time, the ClowdFlows platform resides on a server (or on a cluster of machines) while its GUI (Graphical User Interface) that allows workflow construction and issuing execution commands is served as a web application accessible from any modern web browser.

ClowdFlows is very flexible and easily extensible by adding new packages and workflow components by writing simple or complex Python functions. Current packages include Weka algorithms, Orange algorithms, text mining, decision support, natural language processing, inductive logic programming, etc.

Work anywhere

it is a cloud application so it allows us to work anywhere at any time! No installation is required, Web application in nature so our work is saved on the server instead of the client system.

Web services in workflows

WSDL web services as workflow components so simply enter the WSDL URL of a web service and use them as workflow elements. Connect them with other web services or provided workflow elements.

Data mining algorithms

It is very powerful so use the power of data mining by Weka’s algorithms that have been exposed as WSDL web services. Construct trees, build models and experiment.

