Orange Data Mining



  • Orange is a scriptable environment for quick prototyping of the latest algorithms and testing patterns. It is an open-source data visualization, data mining, and machine learning tool. A group of python-based modules that exist in the core library.
  • Orange is set of graphical widgets that utilizes strategies from core library and orange modules and gives a decent user interface. It incorporates a variety of tasks such as pretty-print of decision trees, bagging,boosting, attribute subset, and many more.
 Orange Datamining

Orange Data mining

  • Orange is proposed both experienced users and analysts in data mining and machine learning want to create and test their own algorithms while reusing as much of code as possible, and for simply entering the field can either write short python contents for data analysis. It used in bioinformatics, genomic research, biomedicine, and teaching.

Orange Data Mining

  • Orange employs a component-based approach for fast prototyping.
  • It supports a flexible domain for developers, analysts, and data mining specialists. Orange's top-down induction of decision tree is a technique build of numerous components which anyone can prototyped in python and used in place of original one. Orange core objects Python modules incorporate numerous data mining tasks that far from data preprocessing evaluation and modeling.
  • The operating principle of Orange is cover techniques and perspective in data mining and machine learning.

Orange Widgets

  • It gives us a graphical user interface to orange's data mining and machine learning techniques.
  • Widgets convey the data by tokens that are passed from the sender to the receiver widget
 Orange Widgets

Orange Widgets

  • Classification tree builds a classification model that sends data to the widget that graphically shows tree. Evaluation widget may get data set from the file widget and objects.

Orange scripting

 Orange Scripting

Orange Scripting

  • Orange interfaces to Python, model simple to use a scripting language with clear and powerful syntax and broad set of additional libraries.

We can see how it uses Python and Orange with an example, consider an easy script that reads the data set and prints the number of attributes used. We will utilize a classification data set called "voting" from UCI Machine Learning Repository that records sixteen key votes of each of the Parliament of India MP (Member of Parliament), and labels each MP with a party membership.

import orange
data1 = orange.ExampleTable('voting.tab')
print('Instance:', len(data1))
print(Attributes:', 1len(data.domain.attributes))

If we store this script in script.py and run it by shell command "python script.py" ensure that the data file is in the same directory then we get

 Instances: 543
 Attributes: 16

Let us proceed with our script that uses the same data created by a naïve Bayesian classifier and print the classification of the first five instances:

model = orange.BayesLearner(data1)

for i in range(5):

print(model(data1[i]))

It is easy to produce the classification model; we have called Oranges object (Bayes Learner) and gave it the data set. It returned another object (naïve Bayesian classifier) when given an instance returns the label of the possible class.

Output

inc

inc

inc

bjp

bjp

Here, we need to discover what the correct classifications. we can print the original labels of our five instances:

for i in range(5):

print(model(data1[i])), 'originally' , data[i].getclass()

What we cover is that naïve Bayesian classifier has misclassified the third instance:

inc originally inc

inc originally inc

inc originally bjp

bjp originally bjp

bjp originally bjp

All classifiers implemented in Orange are probabilistic. For example, they assume the class probabilities. So in the naïve Bayesian classifier, and we may be concerned about how much we have missed in the third case:

n = model(data1[2], orange.GetProbabilities)

print data,domain.classVar.values[0], ':', n[0]

Here we recognize that Python's indices initiate with 0, and that classification model returns a probability vector when a classifier is called with argument orange.-Getprobabilities. Our model was estimating a very high probability for an inc:

Inc : 0.878529638542


Related Searches to Orange Data Mining