Replication (computing)

Terminology and Models of Replication
– Replication in computing can refer to data replication or computation replication.
– Data replication involves storing the same data on multiple storage devices.
– Computation replication involves executing the same computing task multiple times.
– Computational tasks can be replicated in space or in time.
– Replication in space refers to executing tasks on separate devices.
– Replication in time refers to executing tasks repeatedly on a single device.
– Replication in space or in time is often linked to scheduling algorithms.
– Access to a replicated entity is typically uniform with access to a single non-replicated entity.
– The replication itself should be transparent to an external user.
– Three widely cited models for data replication are transactional replication, state machine replication, and virtual synchrony.
– Transactional replication is used for replicating transactional data, such as a database.
– State machine replication assumes that the replicated process is a deterministic finite automaton and that atomic broadcast of every event is possible.
– Virtual synchrony involves a group of processes that cooperate to replicate in-memory data or coordinate actions.
– State machine replication is usually implemented using the Paxos algorithm.
– Virtual synchrony defines a distributed entity called a process group.
– Database replication can be used on many database management systems (DBMS).
– Multi-master replication allows updates to be submitted to any database node and ripple through to other servers.

Replication in Distributed Systems
– Replication transparency is achieved when data is replicated between database servers and users cannot tell or know which server they are using.
– Replication becomes more complex when it scales up horizontally and vertically.
– Problems raised by horizontal scale-up can be alleviated by a multi-layer, multi-view access protocol.
– Replication in disk storage aims to prevent damage from failures or disasters.
– Replication is one of the oldest and most important topics in distributed systems.
– Replication ensures that replicas see the same events in equivalent orders, maintaining consistent states.
– Replication transparency may not always be achieved due to constraints imposed by the CAP theorem or PACELC theorem.
– Various data consistency models have been developed to serve as Service Level Agreements (SLA) between service providers and users.

Cross-Site Replication
– Latency determines the distance between sites or the type of replication used.
– Write operations can be handled asynchronously or synchronously.
– Synchronous replication guarantees zero data loss but decreases overall performance.
– Asynchronous replication increases performance but may result in data loss.
– Semi-synchronous replication offers better performance but lacks durability in case of local storage failure.
– Replication is used in distributed fault-tolerant file systems.
– Some commercial synchronous replication systems continue operating locally when the remote replica fails.
– Wide-area network (WAN) optimization techniques can address latency limitations.

File-Based Replication
– Replication is performed at the logical level rather than the storage block level.
– Different software-based methods are used for file-based replication.
– Synchronous and asynchronous modes are available for file-level replication.
– File-level replication allows for informed decisions based on file location and type.
– Only changed data is replicated, reducing bandwidth usage.
– Capture with a kernel driver involves intercepting filesystem functions to capture file operations.
– Captured operations are transmitted to another machine for replication.
– Synchronous mode waits for replication acknowledgment, while asynchronous mode does not.
– File-level replication allows for more granular data transmission.
– Batch replication involves comparing and synchronizing source and destination file systems.
– Rsync is a notable implementation of batch replication.

Performance and Optimization
– Measurement of achieved performance levels of web applications.
– Data replication strategies with performance objectives.
– Dangers of replication and a solution.
– Chain replication for high throughput and availability.
– Object storage on CRAQ for read-mostly workloads.
– WANdisco’s active replication scheme.
– Spread Toolkit supporting virtual synchrony model.
– C-Ensemble and Quicksilver as alternatives to Spread Toolkit.
– Modern multi-primary replication protocols optimizing for failure-free operation.
– ITTIA DB SQL™ Users Guide on replication conflict resolution.

Replication (computing) (Wikipedia)

This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (October 2012) (Learn how and when to remove this template message)

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

« Back to Glossary Index

Replication (computing)

Company

Services

Support

Contact and Business Information

Quote Request Details

Replication (computing)

Request an article

Submit your RFP

Contact and Business Information

Quote Request Details