Big Data : What? Why? How?

Prakash Agarwal
5 min readSep 16, 2020

--

Let’s first understand, What is Data? (In Rancho’s Style)

Everything we see, touch or experience is data. Confused?

You are reading this article, It’s a data, you will clap after reading it, It’s a data, You will share this article on LinkedIn, Facebook, Whatsapp That will also create data, if you tell someone about it on call, that’ll also be Data.

This means, everything around us is data, every person is creating 10s of GB of Data everyday. From working on Laptop, to watching any video on Youtube, to even talk with someone, Everything is Data.

Now, Many of you know, What is GB(Giga-Byte)?

It’s a unit to measure the quantity of data, right! But when we deal with such a huge chunk of Data, It’s not possible to just measure in GB, So there are many units from bit to KiloByte(KB) to Gigabyte(GB) to PetaByte(PB) to BrontoByte, AlphaByte and more…

Now, as We go further with the data, It’s coming in Huge Quantity. Here, I am talking about Huge in the sense, Normal Human or even Normal Machines can’t even Read it, Store it OR Retrieve it. It would take much more time to retrieve a small information. That’s where BIG DATA comes in the scene.

Yes, This Problem of Huge Data Accumulation is known as BIG DATA. You heard it right, BIG DATA is not a Technology but a Problem.

You can ask, So what? What is the problem if Data is Huge, we have Hardware to store it. It’s no big deal?

I thought the same, But It’s not about only storing the data, It’s about using the data at right time of need. What is the use of such Huge Data, if you can’t work on it. When Data comes, It always is in RAW DATA form, we have to apply many operations like Data Cleansing, Filtering, Classification and real-time retrieval, to optimally use that data. Also, It costs huge amount of money to store this kind of data too, Regardless of Computing Resources.

And there are more problems like:

…and These problems are huge when we work with Huge size of Data, So to handle and solve these problems, We found the solutions in Technologies like;

These are the main technologies, companies are working on

This is the statement of one of top researcher of Google Cloud and Data System.

For Big MNC’s like Google, Facebook, Amazon etc, BIG DATA is the biggest problem today.

They are ruling the Data world and for that instant Data Storage and Retrieval Speed is very important, One Mistake and they will be out of Race and Out of Business too. So here, comes the important part of Credibility to Data and Data Storage. They are working on Petabytes and Exabytes of Data daily to make User/Customer Experience better on their platforms.

How they are able to Process such a Huge Data? How did they overcome the problem?

As we talked before, with BIG DATA Technologies like HADOOP and Distributed Storage Systems.

A Google Data Center

Here is the simple definition of distributed data storage:

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion.

Normally Speaking, If you have 1 file of 100GB, instead storing it in one hardware, you can divide the file in 100 blocks and store 1 GB each on 100 systems and then connect them to one master system/server. and the other 100 systems/servers will be known as Slave Systems/Servers.

How does it work? Master is always receiving the data and distributing the data in between the slaves. So, we don’t need to purchase more storage devices to support our Master System. Slave System will store all data.

Now, Let’s talk about Speed and Processing:

Also, It’ll reduce the retrieval and processing speed, as, we retrieve 10GB of data from one system, It’ll take about one minute But when we retrieve 1GB of Data from 10 Systems = 10GB of Data from 10 System, It’ll only take about 10 seconds. that’s time will also become 1/10th as storage distributes.

Also, There are many technologies like HADOOP which makes this process even simpler.

We will deep dive into the tech part of all technologies like Hadoop, Spark, Kafka, BigQuery etc. in upcoming stories, So don’t forget to Follow me, for more such articles.

If you have any doubt or feedback, you can contact me on LinkedIn on given profile:

Thank you so much! Till Next Time…

JAI HIND!

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Prakash Agarwal
Prakash Agarwal

Written by Prakash Agarwal

Technical Writer | Content Creator | Storyteller | Engineer | Investor

No responses yet

Write a response