Session

Data Mining Using Orange and Python

Mitchell Smith, Chief Software Architect, Array BioPharma

Track: Python
Date: Wednesday, July 26
Time: 5:20pm - 6:05pm
Location: F151

Orange is an open source data mining package that is written in C++, but best accessed through Python. Orange is extensible and new components can be created in either Python or C++. This talk will cover the basics of Orange and detail its usage through real-world examples. Some of the topics include:
  • How to create predictive models in Orange such as naive bayes, k-Nearest Neighbors, support vector machines, logistic regression, classification trees with boosting or bagging, etc.
  • How to utilize model validation techniques such as cross-validation, random sampling, etc.
  • Model preprocessing such as filters, feature subset selection, data categorization, etc.
  • How to create your own learners and classifiers
  • Loading data from tab-delimited and C4.5 files

Real world examples will be taken from data mining work in the pharmaceutical domain. A basic knowledge of data mining techniques is assumed.