Big Data Computing – Introduction to Big Data

Big data is the term for collecting large and complex data sets. As data is large and complex, it becomes difficult to process using traditional database management tools or data processing applications. The significant challenges include capture, storage, search, sharing, transfer, analysis, and visualization.

Big Data

The trend to larger data sets is due to the additional information derivable from analyzing a single extensive collection of related data compared to separate smaller groups with the same total amount of data. 

We can name various scenarios where big data technology can find correlations to spot business trends from the usability perspective. For example, to determine the quality of research from the research dataset, from the disease historical data to prevent infections, from the citation network to link legal citations, combat crime, and determine real-time roadway traffic conditions, etc.

Source of Data Generation

  • People: More people carrying data-generating devices such as mobile phones with social networking access, sharing or uploading of photos and videos, web browsing, Twitter tweet data, youtube videos, log data, etc.
  • Machine or different sensors: Radar generated data, surveillance data captured from various sources such as cameras, IoT devices such as smart meters, RFID tags, GPS devices, etc. Actual examples of sensor data generation:
    • A flight generates 240 terabytes of flight data in 6-8 hours of flying.
    • The largest AT&T database boasts titles including the most significant volume of data in one unique database (312 terabytes)
  • Organization: Transaction data generated by various organizations during the exchange of business events, e-commerce transactions, etc.

The major usability of these data is for analysis and visualization to understand or extract patterns and forecast future events by seeing the existing data.

Problem in Existing Systems

The traditional RDBMS queries are not sufficient to get helpful information out of the massive volume of data. To search it with conventional tools to find out if a particular topic was trending would take so long that the result would be meaningless by the time it was computed.
Big data come up with a solution to store these data in novel ways to make it more accessible and come up with methods of performing analysis on it.

Application of Big Data

  • Petabytes customer review data for the sentiment analysis to increase the customer experience in customer-centric businesses.
  • Traffic management system identifies traffic scenarios and manages signals accordingly with various constraints such as high priority for emergency services such as ambulance, etc.

Criteria for Big Data (3V’s)

The 3V’s: Volume, Velocity, and Variety. The big data is represented with three different V’s, and the characteristics of 3V’s are explained as follows: 

  • Volume: Beyond petabytes of data enters into the big data domain
  • Variety: Data is not only from a single domain, but it is from multiple domains in multiple formats such as image, text, video, audio, etc.
  • Velocity: The rate at which these data are being generated and tapped under processes.

This way, three V’s together characterize the data as big data. Big data can be in the form of transactions, interactions, or observations.

Characteristics of Big Data

Volume (Scale)

In the present scenario, almost all enterprises generate ever-growing data of all types, and these data are gradually aggregated in terabytes or even petabytes. Examples of big data:

  • Turn 12 terabytes of Tweets created every day into improved product sentiment analysis.
  • Convert 350 billion annual meter readings to predict power consumption better
  • The EarthScope is the world’s largest science project, and it is designed to track North America’s geological evolution. The observatory records data over 2.8 million square miles, amassing 67 terabytes of data. It is used to analyzes seismic slips in the San Andreas fault.

Data volume has increased 44x times from 2009 to 2020 in the actual magnitude from 0.8 zettabytes to 35 zettabytes. It shows that data volume has grown exponentially.

Velocity (Speed)

Sometimes two minutes is too late!. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise to maximize its value.

    • Scrutinize 5 million trade events created each day to identify potential fraud
    • Analyze 500 million daily call detail records in real-time to predict customer churn faster. 

Big data application drives organizations to retain their customers or run businesses in the future. Thus, together volume and velocity are very much required to create the big data computation challenges. This aspect of big data is also very much needed.

Data is begin generated fast and needs to be processed quickly for online data analytic. Suppose a delay occurs in computation and processing. In that case, organizations miss the opportunities as these operations are done in real-time with deadlines, and after that, the late decision is of no use. 

Examples:

  • E-Promotions: Based on our current location, your purchase history, what you like, send promotions right now for the store next to you.
  • Healthcare monitoring: Sensors monitoring your activities and body to find any abnormal measurements require immediate reaction. 

Real-time or fast data: These data are generated by social media and networks, scientific instruments, mobile devices, and sensor technology and networks. The progress and innovation are not hindered by the ability to collect data. However, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data quickly and in a scalable fashion. 

Real-time analytic/Decision requirement: The real-time analytical or decision-making criteria can be understood with the following examples: 

  • Product recommendations that are relevant and compelling
  • Improving the marketing effectiveness of promotion while it is still in play
  • Learning why customers switch to competitors and their offers; in time to counter
  • Friend invitations to join a game or activity that expands businesses
  • Preventing fraud as it is occurring and preventing more proactively

Variety (Type)

Big data can be any data, structure data, such as tabular data, and unstructured data, such as text, sensor data, audio, and video. Semi-structured such as web data, log files, etc. 

More complex datatype:

  • Relational data (tables/transaction/legacy data)
  • Text data (web)
  • Semi-structured data (XML)
  • Graph data 
    • Social network, semantic web (RDF)
  • Streaming data
  • Big public data such as online, weather, finance, etc

Dealing with Volume

When the data is enormous, we require methods to deal with massive data, and these are mentioned below:

  • Distill big data down to small information
  • Parallel and automated analysis
  • Automation requires standardization
  • Standardize by reducing the variety
    • Format, standards, and structure

Different dimensions of big data

Different dimensions of big data make computation and analysis of big data difficult:

  • Valence represents connectedness: more connected means more scope of learning from data.
  • Velocity represents speed
  • Variety represents complexity
  • Veracity represents quality; if a lot of noise, the quality of the decision goes down.
  • Volume represents size

Value: It is the heart of all the dimensions we gain some value from the big data.

References

  1. Big Data Computing, By Prof. Rajiv Misra, IIT Patna.

Loading

Scroll to Top
Scroll to Top