4.75(4)

#SC-04-08 More Data Mining with Weka

by Foundations Academy
Course level: All Levels
Share:

Categories Science
Duration 18h
Total Enrolled 4
Last Update September 27, 2021

About Course

The University of Waikato

Description

Learn how to process, analyse, and model large data sets

On this course, led by the University of Waikato where Weka originated, you’ll be introduced to advanced data mining techniques and skills.

Following on from their first Data Mining with Weka course, you’ll now be supported to process a dataset with 10 million instances and mine a 250,000-word text dataset.

You’ll analyse a supermarket dataset representing 5000 shopping baskets and learn about filters for preprocessing data, selecting attributes, classification, clustering, association rules, cost-sensitive evaluation.

You’ll also explore learning curves and how to automatically optimize learning parameters.

What topics will you cover?

Running large-scale data mining experiments
Constructing and executing knowledge flows
Processing very large datasets
Analyzing collections of textual documents
Mining association rules
Preprocessing data using a range of filters
Automatic methods of attribute selection
Clustering data
Taking account of different decision costs
Producing learning curves
Optimizing learning parameters in data mining

Who will you learn with?

Ian Witten

I grew up in Ireland, studied at Cambridge, and taught computer science at the Universities of Essex in England and Calgary in Canada before moving to paradise (aka New Zealand) 25 years ago.

Who developed the course?

The University of Waikato

Sitting among the top 3% of universities world-wide, The University of Waikato prepares students to think critically and to show initiative in their learning.

What Will I Learn?

Compare the performance of different mining methods on a wide range of datasets
Demonstrate how to set up learning tasks as a knowledge flow
Solve data mining problems on huge datasets
Apply equal-width and equal-frequency binning for discretizing numeric attributes
Identify the advantages of supervised vs unsupervised discretization
Evaluate different trade-offs between error rates in 2-class classification
Classify documents using various techniques
Debate the correspondence between decision trees and decision rules
Explain how association rules can be generated and used
Discuss techniques for representing, generating, and evaluating clusters
Perform attribute selection by wrapping a classifier inside a cross-validation loop
Describe different techniques for searching through subsets of attributes
Develop effective sets of attributes for text classification problems
Explain cost-sensitive evaluation, cost-sensitive classification, and cost-sensitive learning
Design and evaluate multi-layer neural networks
Assess the volume of training data needed for mining tasks
Calculate optimal parameter values for a given learning system

Topics for this course

14 Lessons18h

Hello again?

This practical course on more advanced data mining follows on from Data Mining with Weka. You'll become an expert Weka user, and pick up many new techniques and principles of data mining along the way.

What will you learn?00:03:05

About this course

Welcome! Please introduce yourself

First, install Weka

Well, are you ready for this?

What are Weka’s other interfaces for??

Each week we’ll focus on a couple of “Big Questions” relating to data mining. This is the first Big Question for this week.

Exploring the Experimenter?

You can use the Experimenter to find the performance of classification algorithms on datasets, or to determine whether one classifier performs better (or runs faster) than another. In the Explorer, such things can be tedious.

Comparing classifiers?

The Experimenter can be used to compare classifiers. The "null hypothesis" is that they perform the same. To show that one is better than the other, we must *reject* this hypothesis at a given level of statistical significance.

The Knowledge Flow interface?

The Knowledge Flow interface is an alternative to the Explorer. You can lay out filters, classifiers, evaluators on a 2D canvas ... and connect them up in different ways. Data and classification models flow through the diagram!

Using the Command Line?

You can do everything the Explorer does (and more) from the command line. One advantage is that you get more control over memory usage. To access the definitive source of Weka documentation you need to learn to use JavaDoc.

Can Weka process big data??

This week's second Big Question!

Working with big data?

The Explorer can handle pretty big datasets, but it has limits. However, the Command Line Interface does not: it works incrementally whenever it can. Some classifiers can handle arbitrarily large datasets.

Student Feedback

4.8

Total 4 Ratings

3 ratings

1 rating

0 rating

Ngo Thanh Danh

4 years ago

I like the three parts of the course most, which are conducting large-scale data mining experiments, building and executing knowledge streams, and dealing with large data sets, which I think are all very good.

Praneeth Kumar

4 years ago

This course will analyze a supermarket data set, representing 5,000 shopping baskets, and learn filters for preprocessing data, selection attributes, classification, clustering, association rules, and cost-sensitive assessment. It's a great course.

Snehal Patel

4 years ago

This course introduced the beginning of their first data mining with the Weka Course, where we will now be supported to process a data set with 10 million instances and mine a 250,000 word text data set. It's amazing the development of technology.

Barry Mosley

4 years ago

This course, led by the University of Waikato, the birthplace of Weka, introduces advanced data mining techniques and skills.

$49

Material Includes

Official Certificate

Requirements

Before the course starts, download the free Weka software. It runs on any computer, under Windows, Linux, or Mac. It has been downloaded millions of times and is being used all around the world.

Target Audience

This course is aimed at anyone who deals in data professionally or is interested in furthering their professional or academic skills in data science.
This course follows on from Data Mining with Weka and it’s recommended that you complete that course first unless you already have a rudimentary knowledge of Weka.
As with the previous course, it involves no computer programming, although you need some experience with using computers for everyday tasks.
High school maths is more than enough; some elementary statistics concepts (means and variances) are assumed.