Public Section Preview
Big Data, Cloud Computing and Edge Computing
4.1 Big Data
Big Data refers to datasets too large and complex for traditional data processing software. Characterised by the 5 Vs:
| V | Description | Example |
|---|---|---|
| Volume | Massive scale (terabytes → exabytes) | Facebook: 4 petabytes of data/day |
| Velocity | Speed of data generation and processing | Twitter: 500 million tweets/day |
| Variety | Multiple formats (text, images, video, sensor data) | Healthcare: EHR + imaging + genomics |
| Veracity | Accuracy and trustworthiness of data | Social media noise, fake reviews |
| Value | Business/societal insight extracted | Fraud detection, personalised medicine |
Big Data Technologies:
- Hadoop (2006): Open-source distributed storage and processing framework (HDFS + MapReduce); enables parallel processing across thousands of servers
- Apache Spark: Fast, in-memory data processing — 100× faster than Hadoop MapReduce; supports real-time streaming
- NoSQL Databases: Non-relational; handle unstructured data — MongoDB (document store), Cassandra (column store), Redis (key-value), Neo4j (graph database)
Applications:
- Healthcare: Analysing EHR data + genomics + imaging → precision medicine (IBM Watson Health)
- Finance: Fraud detection across millions of daily transactions
- Government: UIDAI processes 50 million+ Aadhaar verifications daily using big data infrastructure
- Agriculture: Analysis of satellite imagery + weather data + soil sensors → precision farming
4.2 Cloud Computing
Cloud computing delivers computing resources (servers, storage, databases, networking, software, analytics) over the Internet with on-demand availability, scalability, and pay-per-use pricing.
Deployment Models:
| Model | Description | Example |
|---|---|---|
| Public cloud | Resources owned/managed by third-party provider | AWS, Microsoft Azure, Google Cloud |
| Private cloud | Dedicated cloud for single organisation | NIC Cloud (India's government cloud — MeghRaj) |
| Hybrid cloud | Combination of public and private | Most large enterprises |
| Multi-cloud | Using multiple cloud providers | Using AWS + Azure + GCP together |
Service Models:
| Model | What Customer Controls | Provider Controls | Examples |
|---|---|---|---|
| IaaS (Infrastructure) | OS, middleware, data, applications | Hardware, virtualisation, network | AWS EC2, Azure VMs, Google Compute Engine |
| PaaS (Platform) | Data, applications | OS, runtime, middleware, hardware | Google App Engine, Heroku, Azure App Service |
| SaaS (Software) | User data only (configuration) | Everything else | Gmail, Salesforce, Office 365, Zoom |
India's GovCloud — MeghRaj
- National Cloud of India under NIC (National Informatics Centre); launched 2014
- Hosts government applications: GSTN (GST portal), Aadhaar systems, government e-mail
- DigiSakshee — cloud-based election management infrastructure
4.3 Edge Computing
Edge computing processes data at or near the source (edge of the network) rather than sending all data to a central cloud data centre.
Why edge computing?
- Latency: Cloud round-trip = 50–200 ms; edge = <5 ms — critical for autonomous vehicles, industrial robots, AR/VR, remote surgery
- Bandwidth: Only relevant/processed data sent to cloud, reducing costs
- Privacy: Sensitive data (patient health, factory floor) processed locally, not sent to cloud
Edge vs. Fog vs. Cloud:
- Cloud: Centralised, powerful, high latency
- Fog: Intermediate nodes between edge and cloud (Cisco's concept)
- Edge: Processing at the device or local server level
