Data mining is the process of extracting information from data. There are many different methods that serve different purposes, each with their own pros and cons. Here are some examples we learned about in class:
Classification
This method is used so sort data into different groups. This is used to categorise and sort information in order to 'classify' data. It must be supervised when training. One application of this method would be a decision tree. This is very similar to a flow chart and goes through a series of yes or no questions to draw conclusions. It is also able to detect spam and fraudulent emails, how cool is that?
Clustering
This is where data is grouped into small clusters, with each point containing similar value. It starts by branching out gradually and grouping each data point by using the mean value. K-means is often used to handle large data sets. (see diagram 1).
Prediction
This method takes existing data and uses it to 'forecast' future results. This can identify trends and has many applications in modern society, like predicting weather patterns or sales projections. It operates by looking at sequences within large data sets to confidently predict the next value. Pattern recognition has become very effective due to the influx of raw data and progression of data mining capabilities.
Neural Networks
This is a method that is meant to simulate the inner workings of the human brain. It creates pathways using nodes. The layout consists of the input layer, hidden layer, and output layer. They have multiple applications and surpass other methods in dealing with unstructured data. (see diagram 2).
Outlier Detection
These operate by looking at data sets and finding anomalies within the data. This can be done by using standard deviation, but on a large scale. Outstanding figures can be identified and examined quickly. This can help purify a data set, resulting in more accurate results.
References:
No comments:
Post a Comment