We are making data all the time. Unbeknownst to most individuals, data is stored every time we use our credit cards, post on social media, or make a web search. Sure, we could be interested in the specific details of your search history or Amazon purchases, but the patterns within this data also hold a lot of information. This brings us to the field of data mining.
Data mining, also known as “Knowledge Discovery in Data,” is the purposeful identification of implicit patterns that are found within large databases.
“Data mining can be considered as a task of performing advanced analysis on data,” said Carson Leung, computer science professor at the University of Manitoba.
Leung’s Database and Data Mining Laboratory at the U of M focuses on developing ways to detect frequently occurring patterns and abnormal items within large databases. Leung’s research attempts to incorporate more human control over the data mining algorithms, letting the humans do the hard thinking and letting the computers do the hard work.
Data mining is a multidisciplinary field, incorporating elements from fields like artificial intelligence, machine learning, data visualization, and statistics.
Data mining tasks include detecting relationships between different variables, clustering similar data into groups and clusters, and summarizing the data in a concise way. Often these tasks are much more laborious than something a normal human can handle. Computers can do this no sweat.
These tasks allow data miners to also predict likely outcomes. Governments and corporations can learn a great deal more about you as an individual or as a member of a larger group through data mining.
Data mining is what makes it possible for Walmart to analyze its millions of daily transactions.
When you shop for groceries at a supermarket, the stores will often record what kinds of items are frequently purchased by customers. That sort of information is easy to find and retrieve, but data mining allows the store to determine the relationships between individual purchases.
“For example, store managers may find that customers often purchase bread and butter, which you may expect,” said Leung. “However, there are some other less obvious patterns too.”
These stores can utilize customer behavioural patterns to increase their sales.
“On the one hand, if many customers purchase items A and B frequently together, the store may place them together for the customer’s convenience,” said Leung. “On the other hand, the store may put these items apart so that customers would likely walk between the two items and potentially purchase more items along the way.”
To a further extent, customer behaviours can also be monitored on the individual basis with the use of loyalty cards, which allow the store to use data mining for analyzing behavioural sequences and do personalized promotion to targeted customers.
This example focuses on the frequency of a behaviour, but data mining can be used to analyze behavioural sequences as well.
The amount of data being stored about us is equal parts beneficial and concerning. You may worry why every single purchase you make with a credit card has to be monitored, but data mining allows for the detection of anomalies. Credit card companies can use data mining to become aware of fraud by detecting uncharacteristic purchases.
Facebook uses data mining to find people you may know.
“If you put on your profile that you are an undergraduate student here at the U of M who started in a particular year, then Facebook can mine their databases and find people with similar backgrounds,” said Leung.
Data mining can be used in other scientific fields as well. By analyzing the patterns of variation within our DNA sequences, we can figure out how they relate to things like disease.