What is distributed data processing (DDP)

Processing of data that is done online by different interconnected computers is known as distributed data processing.

We host our website on the online server. Nowadays cluster hosting is also available in which website data is stored in different clusters (remote computers). When a visitor comes to the website then website pages are loaded from the server which is near to the user. Google also use distributed processing. Google has database servers in all major countries. When computer user visits Google site from China then Google website is loaded from china server.

Distributed data processing diagram
Distributed data processing diagram

In distributed processing, there exists one main server which controls all other computers in the network. Distributed processing is done with high Internet speed like querying the database.

Advantages of distributed data processing (DDP)

Inexpensive:

Some companies buy a mainframe and supercomputers to do large-scale processing online but it cost those a hundred thousand dollars. Buying mainframe and supercomputers tend to centralized processing and if that computer malfunction then all company data get into risk. On the other hand, doing processing by connecting personal computers from different locations can save money because they cost them a thousand bucks. Data is also distributed so adding and removing nodes (computers) can be easy. To achieve distributed networking we can use Beowulf cluster technology. In Beowulf cluster, remote computers are assigned processing through network switches and routers.

Easy to replace remote computers:

Microsoft Windows server has a feature called failover clustering that helps to remove faulty computers. If any computer on the network fails or corrupted by some means then that computer is automatically replaced by other computers.

Optimized processing:

Managing data on online server solves slow processing. On the personal computer, we can do extra tasks also. Doing extra tasks consumes processor power. But the online computer is dedicated to one type of processing and it is more likely to increase processing powers. Database server can only handle database queries and file server stores files. So data processing is optimized.

Easy to expand:

Suppose your company needs more data processing than expected then you can easily attach more computers to the distributed network.

Parallel processing:

Adding and removing computers from the network cannot disturb data flow. All data from different computers are processed in parallel. Parallel processing means data is updated at the same time from all nodes.

Better performance:

The overall performance of the company gets better and data is filtered and processed more rapidly in the distributed environment.

Backup of data:

Data can be backup from any computer connected to the network. So the user can backup data at a different time and work with that data locally and then upload the data to the server.

Local data synchronization:

All the computers on the network can have local storage of important data. Suppose there are different office branches interconnected to each other. All branch computers are interlinked with the main branch office. All office branch computers have a local copy of data. Office users edit and update data and then upload to the main server. So the data is synced and available to all computers. Working locally with data is easy and fast and when the user thinks that his work is complete then at the end of the day he can sync that data with the main server.

Data recovery:

If some data like the database is a loss in any computer then it can be recovered by another interconnected computer i.e. main database server.

Disadvantages of distributed data processing (DDP)

Complexity:

Computers attached in DDP are difficult to troubleshoot, design and administrate.

Planning data synchronization is difficult:

Doing the correct synchronization of data is difficult to develop. Sometimes data is updated in wrong order. So administrators have to keep the focus on it before making a distributed network.

Data security:

If the unauthorized computer is connected to a distributed network then it can affect other computer performance and data can be a loss also.

Examples of distributed data processing

  • Hosting a website on the online server
  • Online photo editing tools
  • Airline ticketing system
  • Processing user data by mobile companies
  • Dropbox, Google drive, MSN drive, Google photos
  • Report generation from satellite
  • Weather forecast system


Share This Story, Choose Your Platform!