Top 10 Data mining algorithm – C4.5

What does C4.5 do?

It constructs a classifier in the form of a decision tree. It is supervised learning as the data set needs to be labeled with classes.

How C4.5 is different than other decision tree systems?

  1. C4.5 uses information gain to determine which attribute should be used first as the decision node
    1. Information gain helps to measure the most informative attribute in a mathematics way.
  2. Single-pass pruning process to mitigate over-fitting
  3. Can work with both continuous and discrete data
  4. Finally, incomplete data is dealt with in its own ways

Why use C4.5?

Arguably, the best selling point of decision trees is their ease of interpretation and explanation. They are also quite fast, quite popular and the output is human readable.

References

Top 10 data mining algorithms in plain english

Data mining top 10 algorithm – C4.5 (Chinese)

http://bbs.pinggu.org/thread-3852506-1-1.html

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s