Sunday, 12 January 2025

Traditional Statistics

In class we learned about statistics. Traditional statistics are methods of analysing data using averages and standard deviation. These are used to describe and test variables numerically. By using statistics, data can be compared and put into a table where value can be extracted. The two types of statistics used:

Descriptive

These are used to summarise data. It uses measures of central tendency to find 'typical' expected results and averages. It uses the mean, which is calculated by adding up the total value of all the variables and then dividing by the total number of variables. Median is also used which is calculated by sorting the variables into ascending or descending order. The number in the middle of the sample is then chosen as the average. If there are is an even sample, the two middle numbers are added then divided by 2. The mode can also be used, where the number with the most instances in the sample is chosen as the average. All measures of central tendency have their uses and may yield different results, but gives data analysts insight into the contents of their data.

Measures of variability are another type of descriptive statistic. This includes the range, where the smallest value is subtracted from the largest value when placed in ascending order. Standard deviation is another measure of variability, where vast differences in numerical value can be measured, or how values 'deviate' on average from one another. Measures of variability focuses on how data differs from each other.

Inferential

Inferential statistics use probability to calculate the reliability of results and decide how much room there is for randomness to occur. The P value is used to make assumptions of wider samples, to predict how likely it is for a random values to appear. If the P value is less than 0.05 results are likely to be more accurate and reliable. Having a P value of 0.05 translates to a 5% chance that the outcome is a product of pure chance. Confidence intervals are used to estimate and predict future events. For example, the P value of me passing this course is 0.01. How cool is that?

References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-6

Value of Data (Including Future Value)

In class we learned about what makes Big Data so valuable. The short answer is it can provide information when extracted, this can only be done if the right tools are used however. On the surface, unstructured data has no value unless it is extracted and compared. 

We learned about Meta, the company that owns many social media companies like Facebook, Instagram, and WhatsApp. The physical assets the company is estimated to be worth between 30 and 40 billion dollars. However, with the inclusion of intangible assets the total value of the company skyrockets to between 600 and 700 billion dollars. Obviously this figure isn't entirely data, but a lot of this wealth is generated by user data. Companies will pay money for this data, and actually make a profit from it.

The value of data arises when it is sold to advertisers and data miners, where data will be consolidated into databases and used to create demographics. When the value is extracted, targeted adverts can be recommended to users who show interest in similar products. The irony is, a lot of these adverts will be shown on sites like Facebook or Instagram causing the process to repeat itself.

While data may not hold physical value, the information given when extracted properly is. It's true what they say, knowledge is power. Forecasts have predicted that Big Data alone will surpass 500 billion dollars by 2027, valued approximately the same as the GDP of The United Arab Emirates. How cool is that?

The value of data has many applications and can be generated/used in:

  • Business (E-commerce, marketing, fraud detection, inventory management)
  • Society (Education, government, crime)
  • Science (Healthcare, astronomy, AI, new technologies)
References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-7

Friday, 10 January 2025

Reasons for the Growth of Data

During class we studied the reasons behind Big Data and how it emerged. There are multiple factors that contributed to this 'data explosion' but here are a few examples:

E-commerce

Online shopping is becoming more common with companies like Amazon, Shein, and eBay. This has allowed an excess of user data to flood in based on customer shopping habits. This is especially true during the COVID-19 pandemic in which most people did their shopping online. This contributed massively to the rapid growth of Big Data and changed the way businesses market their products. 

Automisation

People traditionally use paper products in schools, universities and the workplace, however over time we have become increasingly reliant on technology. This has generated a lot of new data and allows us to apply Big Data in new ways. This can be done on a national level, and is applicable to most countries around the world who are also making the switch digitally. Estonia, a small country in the Baltic region of Europe, is the first proclaimed country to become fully digital and paperless, as well as being the most digitally advanced society. Any document submitted is conducted electronically, and even their voting system is done digitally! How cool is that? With the impending climate crisis and the automisation of data, more countries are likely to make the switch which will generate a lot more data.

Social Media and Accessibility

Similarly, the world has also been connected via the use of social media. This is primarily done through apps like Facebook, Instagram, TikTok, and Snapchat among others. These small scale interactions are part of a larger network that interlinks user interests and behaviour. The digitisation in recent years means there is an increasing number of people using social media every single day. This has contributed massively to the growth of Big Data as more and more people sign up for accounts and interact on these apps.

References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-4

Thursday, 9 January 2025

Applications of Big Data

Today in class we covered the different ways data can be portrayed. We used two different applications to display this information.

Word cloud

Below is a word cloud i created that contains all the lyrics for 'She's Electric' by Oasis. The bigger words represent the words that are mentioned more frequently throughout the song. This was very quick to make and is very effective at displaying information.


Machine Learning/Teachable AI

I personally find machine learning to be super exciting and interesting. Being able to create our own AI in class was a blast. It uses classification techniques and recognition to distinguish between certain criteria and give a certain response. For our example, We created an AI that was able to recognise if someone was wearing glasses or not and read a text-to-speech response saying "glasses" or "no glasses." i wanted to recreate this myself however with a basic prompt. This time the AI would distinguish between apples and oranges, created by AI imagery as well. I will then generate a brand new image of an apple, an orange, and a mango to test the accuracy of the model. The results were surprisingly good and are listed below.

Some of the samples used for apples and oranges.

Apple test.




Orange test.



Mango test.

Overall, this was a lot of fun and I was surprised to see how accurate the results were. It's interesting how the model classifies the mango as majority apple but still has a small percentage of orange. How cool is that?

References:

https://www.wordclouds.com/

https://deepai.org/machine-learning-model/text2img

https://teachablemachine.withgoogle.com/models/574549pNR/

Monday, 6 January 2025

Data Mining Methods

Data mining is the process of extracting information from data. There are many different methods that serve different purposes, each with their own pros and cons. Here are some examples we learned about in class:

Classification

This method is used so sort data into different groups. This is used to categorise and sort information in order to 'classify' data. It must be supervised when training. One application of this method would be a decision tree. This is very similar to a flow chart and goes through a series of yes or no questions to draw conclusions. It is also able to detect spam and fraudulent emails, how cool is that? 

Clustering

This is where data is grouped into small clusters, with each point containing similar value. It starts by branching out gradually and grouping each data point by using the mean value. K-means is often used to handle large data sets. (see diagram 1).

Prediction

This method takes existing data and uses it to 'forecast' future results. This can identify trends and has many applications in modern society, like predicting weather patterns or sales projections. It operates by looking at sequences within large data sets to confidently predict the next value. Pattern recognition has become very effective due to the influx of raw data and progression of data mining capabilities.

Neural Networks

This is a method that is meant to simulate the inner workings of the human brain. It creates pathways using nodes. The layout consists of the input layer, hidden layer, and output layer. They have multiple applications and surpass other methods in dealing with unstructured data. (see diagram 2).

Outlier Detection

These operate by looking at data sets and finding anomalies within the data. This can be done by using standard deviation, but on a large scale. Outstanding figures can be identified and examined quickly. This can help purify a data set, resulting in more accurate results.




References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-14


Implications of Big Data for Individuals

Big Data is an effective tool and has its uses. However, some of the effects have lasting impact on individuals. Here are some of the implications we learned about in class:

News sources

Unreliable news websites contain biased information or may be misinformed. This can be used to create fearmongering and generate a large number of clicks. Individuals may come across these sites in their spare time and become misinformed, especially young people who are impressionable as well as older people who aren't as familiar with technology. This directly targets people at the source, and makes it hard to distinguish true from false. This misleads people and spreads inaccurate information.

Digital Exclusion

This is a phenomenon in which certain people are unable to access technologies, typically affecting people with disabilities, the elderly, and those with poor finances. This means there are not as many opportunities available for these people, especially nowadays where everything is mostly online like job applications. Overall, this can result in limited access to educational materials, health services, and social interaction. This is not cool.

Lack of freedom

Being constantly monitored can induce a 'chilling effect' in which people act with less freedom. This is because they are aware their movements are being recorded. This can alter behaviour and people don't necessarily act as they would normally. People are limited in their choices and do not like to be restricted. The surge of Big Data over the years has accelerated so much that it tracks everything we do on a daily basis, our habits, it knows everything about us. This is daunting. As a result, everyone acts with a little bit more caution. It's weird to imagine what the world would look like today without the acceleration of Big Data and surveillance.

References:

https://ilearn.fife.ac.uk/course/view.php?id=9751#section-11

Implications of Big Data for Society

In class we looked at the potential implications on society caused by Big Data. While Big Data can be a beneficial tool, it is very unpredictable and will have issues when utilised. its relatively new implementation can have adverse effects on society as a whole, here are a few examples that we looked at:

Automated decision making

Algorithms make assumptions and aren't always accurate. When deciding an outcome, automation can overlook certain aspects. Consider an automated response conducting an online job interview, or marking an exam. Artificial intelligence in particular is poor at making deductions based on user input, and even when trained it still makes errors. Look at the YouTube video at the end for a funny example where an AI has to collect user input/clues and determine a target country. These errors can hinder society by hiring the wrong people for the job or making incorrect passes and fails. These decisions can also lead to filter bubbles or echo chambers, in which users see information they are already acclimated to. This lead to fragmentation and lots of small groups with opposing ideas, in contrast to one big group where everyone is on the same page.

Cultural shifts

The use of Big Data is encouraging people to behave differently, and how they socialize within their groups. Big Data can be weaponised to sculpt societies to behave a certain way. The release of Facebook in the country of Myanmar sparked controversy when it promoted hate speech against local ethnic groups in the region. This was actively promoted, perhaps in pursuit of profit, clicks, and user interactions. This caused violence and resulted in thousands taking refuge in neighbouring countries. This shows that Big Data can be used to shape society and can be dangerous. This is not cool.

Instability

Many uses of Big Data are controversial, even with the presence of the GDPR. Most people using Big Data are large companies or incredibly wealthy individuals. With such a powerful tool at their disposal they can exert their influence in many ways. Whether it is providing better opportunities to certain groups in society, oppression, or using data maliciously. This can weaken the social structure while maintaining complete control, resulting in social instability.

References:

https://youtu.be/iOfYZ-wMfNA?si=-f0Yhj4KmhzEE2Qi
https://www.bbc.co.uk/news/blogs-trending-45449938
https://ilearn.fife.ac.uk/course/view.php?id=9751#section-10

Additional Comments

Overall I have really enjoyed learning about Big Data and never realised how important of a role it plays in everyday life. From learning ab...