Monday, 6 January 2025

Data Mining Methods

Data mining is the process of extracting information from data. There are many different methods that serve different purposes, each with their own pros and cons. Here are some examples we learned about in class:

Classification

This method is used so sort data into different groups. This is used to categorise and sort information in order to 'classify' data. It must be supervised when training. One application of this method would be a decision tree. This is very similar to a flow chart and goes through a series of yes or no questions to draw conclusions. It is also able to detect spam and fraudulent emails, how cool is that? 

Clustering

This is where data is grouped into small clusters, with each point containing similar value. It starts by branching out gradually and grouping each data point by using the mean value. K-means is often used to handle large data sets. (see diagram 1).

Prediction

This method takes existing data and uses it to 'forecast' future results. This can identify trends and has many applications in modern society, like predicting weather patterns or sales projections. It operates by looking at sequences within large data sets to confidently predict the next value. Pattern recognition has become very effective due to the influx of raw data and progression of data mining capabilities.

Neural Networks

This is a method that is meant to simulate the inner workings of the human brain. It creates pathways using nodes. The layout consists of the input layer, hidden layer, and output layer. They have multiple applications and surpass other methods in dealing with unstructured data. (see diagram 2).

Outlier Detection

These operate by looking at data sets and finding anomalies within the data. This can be done by using standard deviation, but on a large scale. Outstanding figures can be identified and examined quickly. This can help purify a data set, resulting in more accurate results.




References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-14


Implications of Big Data for Individuals

Big Data is an effective tool and has its uses. However, some of the effects have lasting impact on individuals. Here are some of the implications we learned about in class:

News sources

Unreliable news websites contain biased information or may be misinformed. This can be used to create fearmongering and generate a large number of clicks. Individuals may come across these sites in their spare time and become misinformed, especially young people who are impressionable as well as older people who aren't as familiar with technology. This directly targets people at the source, and makes it hard to distinguish true from false. This misleads people and spreads inaccurate information.

Digital Exclusion

This is a phenomenon in which certain people are unable to access technologies, typically affecting people with disabilities, the elderly, and those with poor finances. This means there are not as many opportunities available for these people, especially nowadays where everything is mostly online like job applications. Overall, this can result in limited access to educational materials, health services, and social interaction. This is not cool.

Lack of freedom

Being constantly monitored can induce a 'chilling effect' in which people act with less freedom. This is because they are aware their movements are being recorded. This can alter behaviour and people don't necessarily act as they would normally. People are limited in their choices and do not like to be restricted. The surge of Big Data over the years has accelerated so much that it tracks everything we do on a daily basis, our habits, it knows everything about us. This is daunting. As a result, everyone acts with a little bit more caution. It's weird to imagine what the world would look like today without the acceleration of Big Data and surveillance.

References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-11

Implications of Big Data for Society

In class we looked at the potential implications on society caused by Big Data. While Big Data can be a beneficial tool, it is very unpredictable and will have issues when utilised. its relatively new implementation can have adverse effects on society as a whole, here are a few examples that we looked at:

Automated decision making

Algorithms make assumptions and aren't always accurate. When deciding an outcome, automation can overlook certain aspects. Consider an automated response conducting an online job interview, or marking an exam. Artificial intelligence in particular is poor at making deductions based on user input, and even when trained it still makes errors. Look at the YouTube video at the end for a funny example where an AI has to collect user input/clues and determine a target country. These errors can hinder society by hiring the wrong people for the job or making incorrect passes and fails. These decisions can also lead to filter bubbles or echo chambers, in which users see information they are already acclimated to. This lead to fragmentation and lots of small groups with opposing ideas, in contrast to one big group where everyone is on the same page.

Cultural shifts

The use of Big Data is encouraging people to behave differently, and how they socialize within their groups. Big Data can be weaponised to sculpt societies to behave a certain way. The release of Facebook in the country of Myanmar sparked controversy when it promoted hate speech against local ethnic groups in the region. This was actively promoted, perhaps in pursuit of profit, clicks, and user interactions. This caused violence and resulted in thousands taking refuge in neighbouring countries. This shows that Big Data can be used to shape society and can be dangerous. This is not cool.

Instability

Many uses of Big Data are controversial, even with the presence of the GDPR. Most people using Big Data are large companies or incredibly wealthy individuals. With such a powerful tool at their disposal they can exert their influence in many ways. Whether it is providing better opportunities to certain groups in society, oppression, or using data maliciously. This can weaken the social structure while maintaining complete control, resulting in social instability.

References:

https://youtu.be/iOfYZ-wMfNA?si=-f0Yhj4KmhzEE2Qi
https://www.bbc.co.uk/news/blogs-trending-45449938
https://ilearn.fife.ac.uk/course/view.php?id=9751#section-10

Sunday, 15 December 2024

Applications of Big Data in Business

In class we learned how businesses big and small utilise Big Data in a business environment.

Amazon, one of the biggest retailers in the world rely on Big Data to promote products. This is done by comparing searches, clicks on similar products as well as using geological information to target region-specific requirements. For example, someone who bought a notebook will most likely require stationary. As for the region, It will promote warm winter clothing in areas with cold climates while it would be less likely to promote shorts or tank tops. This is done through algorithms that compare other users buying habits. There are many factors at play, but the basic ones would be age, sex, occupation, and lifestyle. Reviews are also a big part as well. Poorly rated products are less likely to be promoted. Big Data is used to maximise profits and encourage consumers to buy more.

Facebook also uses Big Data. Likes, comments, and cookies provide valuable data that can be extracted. This alone can be used to build a picture of the type of person is behind the keyboard, even if little external information is given. This can be quite intimidating especially if it is used maliciously, however Facebook mostly uses this data to target advertisements and make recommendations. Furthermore, they use it to analyse trends in user behaviour. 

Google utilises Big Data in Google Maps. The algorithms identify ideal routes to take with the shortest journey time. This is imperfect however as in some cases it leads to roads that have been closed off temporarily.

Smaller businesses can also utilise Big Data, not just the global giants. In fact, they should use it to gain the upper hand against competitors. Whether it is to find out trends in customer spending habits or find ways to better promote their business, it can be an effective tool in gaining the advantage in a business environment where competition is tight. 

References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-7

Applications of Big Data in Society

In class we learned about how Big Data is used in Society today.

People in society naturally generate a lot of data, whether it is through social media, e-commerce or the websites they access. This information can be useful to government institutions, allowing them to track citizens behaviour and influence them in different ways.

One application would be the GDELT project (Global Database of Events, Language and Tone), which maps a number of themes on society, particularly media, that spans across the world. This tracks human sentiments and creates a network of events happening on a global scale.

Education would be another factor. Big Data can look at students test results and highlight those who need extra support. This might seem insignificant, however multiple students dropping out of education will contribute to a larger societal issue. By providing extra support highlighted by these algorithms, these numbers will decrease. Furthermore, the same can be applied to the courses themselves, those with low pass rates may indicate the need for further funding and development.

Crime is another application in which Big Data has been implemented. Records of previous crimes and their locations is vital information. Comparing this data on a graph can show key areas where crime rates are higher and more police are required. This is currently a work in progress however, as some criticisms have come forward arguing that predictive policing is seen as racist and targets certain communities. Fraud detection would be another application, as algorithms can be trained to look for certain anomalies online. 

References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-9

Applications of Big Data in Science

In class we learned about how Big Data can be used in science. 

Weather forecasting has become more accurate and efficient due to the rise of Big Data. The time between forecasting has been reduced from over a week to just under 3 days. This is because Big Data uses predictive algorithms to learn weather patterns, giving insight into weather forecasts much sooner than alternative methods. It will only get more accurate over time as the algorithms gather more data, particularly on extreme weather patterns which are not as predictable. 

Healthcare has also benefitted from the rise of Big Data. It allows people to receive diagnoses much faster, but also creating new links between symptoms and illness. For example, a new study showed a link between the retina and diabetes. When an algorithm is shown multiple pictures of eyes under a powerful camera, it can distinguish who is most likely to be vulnerable to illnesses like diabetes. How cool is that? It also played a crucial role during the COVID-19 pandemic where it was able to track outbreaks and predict the number of cases in certain regions.

Another application would be in the military, particularly in the development of new technologies and analysis of patterns. DARPA utilises Big Data in managing their defense on cyber attacks, encryption, and in their engineering. It plays a crucial part particularly in the defence of countries, but has many other uses. Data from sattelites can provide vital information on the battlefield if analysed correctly, furthermore Big Data can also analyse social media platforms to judge morale.

References:

DARPA and Data: a Portfolio Overview. (n.d.). Docslib. https://docslib.org/doc/626715/darpa-and-data-a-portfolio-overview
https://ilearn.fife.ac.uk/course/view.php?id=9751#section-8

Wednesday, 4 December 2024

Characteristics of Big Data

In class we learned about the key components of Big Data and what variables apply to it. Each of these measures analyse the effectiveness of data sets, and are essential in having accurate, reliable data.

Big Data can be measured by using 'The 7 V's of Big Data' which include:

  • Volume
  • Velocity
  • Variety
  • Veracity
  • Value
  • Variability
  • Visualisations
Volume refers to the size of the data sample, the scale on which it represents. Data projections have shown a dramatic increase in the volume, growing exponentially every year, which brings us onto the next point:

Velocity is the rate at which new data is being generated. Institutions must constantly upgrade and ensure they have the capacity to store data being put into their systems. Bigger measures for data storage are being used in today's world to accommodate the increasing velocity, with exabytes and zettabytes becoming more common on the large scale.

Variety is the different types of data collected. It can be structured, semi-structured, and unstructured. These refer to how easily data is to analyse, with structured being the easiest and most effective for analysis while unstructured is incredibly difficult to analyse. This can also refer to the sources in which data is acquired. Some examples include but are not limited to science, business, and government statistics.

Veracity is the term that represents the accuracy of data. In other words, its a test of how reliable the data is. This is crucial when analysing data because it isn't accurate, the end result will not be useful whatsoever. Big Data should always use data sets that are as accurate and relevant as possible. After all. if the data cannot be trusted, then why should we use it? No data set is100% accurate, however recent measures have ensured that it is as close to 100% as it has ever been.

Value refers to how useful data is, and how organisations can use the data after its value is extracted. If the data can be used, it automatically has value, however data can be used in different applications and therefore some data may be more valuable than others. Data can be used in many different ways, but a business, for example, could find value in customer data showing what products should be targeted, how to improve products, and providing valuable feedback on certain products, to name a few.

Variability is similar to veracity but slightly different. It looks at the consistency of data and the real meaning behind it. Some data may have a different meaning than what is originally intended. If inconsistencies are not found it can greatly impact the accuracy of results. 

Visualisations refers to the way data can be displayed and represented. This is commonly done through charts and graphs and makes information more readable in contrast to looking at data in a table. The format in which data is displayed makes data easy to comprehend.

We made a poster in class to represent all of these terms in class. How cool is this? (Admittedly it could be a little better)


References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-5

Additional Comments

Overall I have really enjoyed learning about Big Data and never realised how important of a role it plays in everyday life. From learning ab...