Session
Data Mining Using Orange and Python
Mitchell Smith, Chief Software Architect, Array BioPharma
Track: Python
Date: Wednesday, July 26
Time: 5:20pm
- 6:05pm
Location: F151
Orange is an open source data mining package that is written in C++, but best accessed through Python. Orange is extensible and new components can be created in either Python or C++. This talk will cover the basics of Orange and detail its usage through real-world examples. Some of the topics include:
- How to create predictive models in Orange such as naive bayes, k-Nearest Neighbors, support vector machines, logistic regression, classification trees with boosting or bagging, etc.
- How to utilize model validation techniques such as cross-validation, random sampling, etc.
- Model preprocessing such as filters, feature subset selection, data categorization, etc.
- How to create your own learners and classifiers
- Loading data from tab-delimited and C4.5 files
Real world examples will be taken from data mining work in the pharmaceutical domain. A basic knowledge of data mining techniques is assumed.





















































