chat off
Toll Free:
order now
most popular orders
Accounting Essays
Biology Essays
Environment Essays
Law Essays
Literature Essays
Management Essays
Marketing Essays
Music Essays
Religion Essays
Technology Essays


← Complete Short Stories of MaupassantThe Modern Prometheus →
Live Chat

MapReduce vs Parallel DBMS

Buy custom MapReduce vs Parallel DBMS essay

In the article “MapReduce and Parallel DBMSs: Friends or Foes?” written by Michael Stonebraker, Daniel J. Abadi, David J. Dewitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexandr Rasin it is argued that cloud computing tasks should be performed by complex systems in which a MapReduce (MR) system should be used upstream with respect to a parallel relational database management system (DBMS), and interfaces between these two systems are to be developed.

Cloud computing is a new technology that assumes the use of a large number of processors working in parallel to perform calculations (Stonebraker et al., 2010). These processors are situated in interconnected commodity computers that are viewed as a cluster. Each such computer is called a node of the cluster (Stronberg, 1986). Among the tools for cluster programming there are MR and parallel DBMS. There is the opinion that extreme scalability of MR gives it huge competitive advantage over a parallel DBMS. Moreover, Facebook enterprise has solely used the MR technology to implement its warehouse. Nevertheless, parallel DBMSs completely satisfy current customers’ needs in scalability. Although any program that contains parallel processing can be written “as either a set of database quires or a set of MR jobs”, there are classes of tasks that are considered to be more suitable to a MR model than to a parallel DBMS.

Typically, a MR system is supposed to transform “raw data into useful information that is consumed by another storage system”. Therefore, a MR system is like an extract-transform-load (ETL) system. As practice shows, for a modern DBMS other products perform ETL. At the same time, no ETL system is used “to do DBMS services”. Analytical problems encountered in data mining assume “multiple passes over the data”. Therefore, they cannot be programmed by means of “single SQL aggregate queries”. Instead, to find numerical solutions “a complex data flow program” is to be developed. Therefore, MR model should be used in this case. Since MR systems do not require a specification of a scheme for their data, they can work with data that have a varying number of attributes. In relational DBMS model such data can be described by means of tables “with many attributes”. If a specific record does not require some attributes, they can be assigned NULL values. Relational DBMSs that use this technique are called row-based DBMSs (Stonebraker et al., 2010). In its turn, a column-based DBMS reads only necessary attributes performing a query (Abadi, 2007). Authors believe that analytical queries on such data should be performed by the last mentioned system. In case there is a need to perform ETL on them a MR system should be used. Start up time of a MR system is significantly smaller than that of a DBMS. It can be accounted for by the fact that it is much easier to install a MR system than a DBMS (Pavlo et al., 2009). Besides, MR systems work with raw data by default, while DBMSs need to transform them to required formats. Therefore, MR systems are more suitable for quick approximate analyses on transient data than DBMSs. Finally, most MR systems are available for free, while parallel DBMSs are expensive. Hence, the systems of the first type better fit to “users with limited budgets” than those of the second type (Stonebraker et al., 2010). A comparison of real life performances of MR systems and DBMSs can verify these arguments.

Pavlo et al. (2009) conducted a study, where open source project Hadoop was chosen as a typical representative of MR systems. All parallel DBMSs of acceptable quality were commercial. Vertica and DBMS-X were chosen as typical representatives of parallel relational column-store and row-based databases respectively for this study. Performances were compared on three “tasks of increasing complexity”. In two of them Hadoop was expected to perform better than chosen databases. Nevertheless, results of the study indicate that after data loading, Vertica and DBMS-X solve all these tasks much faster than Hadoop does. At the same time, data loading in these databases is much more time consuming than in Hadoop. It should be mentioned that Google version of the MR system can be faster than Hadoop, but it is not available for this study. Poor Hadoop performance in the discussed study can be explained by its inefficient architecture.

Differences in architectures of a MR system and a parallel DBMS can be explained by the fact that the first system is designed to perform “complex analytics and ETL tasks”, while the second is designed to perform “efficient querying of large data sets”. Therefore, these technologies should complement each other by placing a MR system upstream with respect to a parallel DBMS. Since in order to find a numerical solution of an analytical problem one often needs to run a query on a large data set, there is a need to develop interfaces between these systems (Stonebraker et al., 2010).

Personal Reflection

I do not agree with the evidence used by authors in arguing that complex analytics should be solved by MR systems. Specifically, it was stated that numerical solving of an analytical task assumes multiple passes over the data, and that these passes can not be structured as “single SQL aggregate queries” (Stonebraker et al., 2010). I had an experience of numerical solving of a system of linear algebraic equations with a tridiagonal matrix. This problem can be referred to as complex analytic. To find its numerical solution I used Thomas method that assumes two passes over input data. Nevertheless, this algorithm can be presented as superposition of single a pass over the data, and each of them can be performed in parallel on many processors (Karniadakis & Kirby II, n. d.).

Thus, investigation should be performed to find out whether all data mining algorithms can be adapted for parallel calculations and whether these calculations can be performed by parallel DBMSs. If these assumptions turned out to be true, then it would entail the use of DBMSs for entire complex analytic in a cloud computing system.

Other than that, this work presents a thorough analysis of advantages and weaknesses of MR and parallel DBMS technologies. I completely agree with authors' conclusion that in a complex system a MR subsystem should perform ETL for a parallel DBMS.

Buy custom MapReduce vs Parallel DBMS essay

Order Now

Related essays

  1. The Modern Prometheus
  2. Antigone as Amalgamation of Two Sexes
  3. Complete Short Stories of Maupassant
  4. The Story of Nathan Rothschild's
why we are #1
More than a decade of practical
experiences working in the essay
writing business
Sustained customer satisfaction
A broad range of custom writing services
Complete confidentiality and privacy
Delivering custom papers within 3 hours
A cohesive team of outstanding, experienced writers
Free revision (within 2 days)
Direct communication with the assigned writer
Free reference and cover page
current status
At least 6 live chat and phone operators available at a time
831 active/working writers
16544 registered custom writers
10 writers passed our interviews and language tests yesterday
1190 total pages completed
99% satisfied customers
9.6 of 10 writing quality and professionalism score
order now
Type of assignment:
Writer level:
Cost per page: ...
Number of pages:
Total without discount: ...
Buying an Essay
Essays for Sale
Good Advice
Purchase Term Paper
Research Paper Ideas
Student's Life
Using Writing Service
Writing a Reflection Paper
Writing an Essay
Writing Papers Guide
Writing Term Paper
Tips on Writing Applications: CV & Resume
5 Student Traditions from Different Countries
Is Astronomy Worth Studying?
Steps to Improve Your Budget During Summer
Surprising Benefits of Volunteering

15% off your first custom essay order. Order now

from $12.99/PAGE