Understanding Distributed Locking Requirements
In distributed systems, concurrent access to shared resources must be controlled—similar to how locks prevent race conditions in single-process multi-threaded environments. A distributed lock ensures safe coordination across multiple nodes and clients. Ideally, such a mechanism should provide:- Mutual Exclusion: Only one client may hold the lock for a given resource at any time.
- Tolerance to Failures: The lock service should remain available even if some nodes fail (講究AP in CAP).
- No Deadlock: Locks must eventually be released—even if a client crashes or loses connectivity during use.
Distributed Lock via Relational DBMS
A simple table-based locking strategy can be implemented by leveraging uniqueness constraints. Example schema:
CREATE TABLE system_lock (
id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
lock_key BIGINT NOT NULL,
owner_id VARCHAR(64) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
UNIQUE KEY uk_lock_key (lock_key)
) ENGINE=InnoDB;
lock_key represents a protected resource. Inserting a record attempts to acquire the lock; deletion attempts to release it.
Core logic:
boolean acquireLock(String key, String clientId) {
try {
jdbcTemplate.update(
"INSERT INTO system_lock(lock_key, owner_id) VALUES (?, ?)",
key, clientId
);
return true;
} catch (DuplicateKeyException e) {
return false;
}
}
boolean releaseLock(String key, String clientId) {
int affected = jdbcTemplate.update(
"DELETE FROM system_lock WHERE lock_key = ? AND owner_id = ?",
key, clientId
);
return affected > 0;
}
To simulate blocking, clients retry periodically in a loop with exponential backoff.
Caveats:
- Deadlock prevention: Requires periodically cleaning stale entries (e.g., delete records older than 2 minutes).
- Throughput limitation: Typically restricted to ~1,000 ops/sec due to I/O latency and single-database constraint.
- High availability: Achieved via master-slave replication and failover using virtual IPs.
Distributed Lock via Redis
Redis provides native atomicity, making it ideal for lock implementation. The canonical command:
SET lock-key client-id NX PX 30000
NX: Set only if absent.
- PX: Set expiry in milliseconds.
Acquire & release logic (Lua script wrapped for atomicity):
String acquire(String key, String id, long ttlMs) {
String result = redis.eval("return redis.call('set', KEYS[1], ARGV[1], 'NX', 'PX', ARGV[2])",
Arrays.asList(key), Arrays.asList(id, String.valueOf(ttlMs)));
return "OK".equals(result) ? id : null;
}
boolean release(String key, String id) {
String current = redis.get(key);
if (id.equals(current)) {
redis.del(key);
return true;
}
return false;
}
Revoking stale locks: Use lease renewal. For example, schedule background renewals at intervals equal to one-third of the TTL to keep the lock alive.
Drawbacks:
- Data stored in memory risks loss on crash—even with replication, async replication may lose recent writes before sync completes.
- Not ideal when strict consistency is required.
Distributed Lock via ZooKeeper
ZooKeeper implements ordering semantics via ephemeral sequential nodes. To lock on path/resource/lock:
- Create
/resource/lock/lock-withEphemeral + Sequentialflags. - Retrieve all children of
/resource/lock; identify the smallest node. - Check if this client created the smallest node—if yes, lock acquired.
- Otherwise, register a watcher on the immediate predecessor node. Triggered only when predecessor is deleted.
CuratorFramework client = CuratorFrameworkFactory.builder()
.connectString("zk-host:2181")
.retryPolicy(new ExponentialBackoffRetry(1000, 3))
.build();
client.start();
InterProcessMutex mutex = new InterProcessMutex(client, "/locks/resource1");
mutex.acquire();
try {
// critical section
} finally {
mutex.release();
}
Advantages:
- Ephemeral nodes automatically remove locks upon disconnection—including network partitions or client crashes.
- Strong consistency via ZAB consensus protocol ensures no split-brain.
- Watchers provide real-time lock-granted notifications (no polling needed).
Limitations:
- Network timeouts can misclassify transient failures as permanent disconnects, potentially leading to premature lock loss.
- Higher latency than Redis due to disk-based persistence and consensus overhead.
comparative overview
| Dimension | DB | Redis | ZooKeeper |
|---|---|---|---|
| Performance | Low (~1k ops/s) | High (memory-only) | Moderate (disk + consensus) |
| Deadlock Safety | TTL-based cleanup (application layer) | TTL + renewal | Ephemeral nodes auto-cleanup |
| High Availability | Sync/multi-master + VIP | Cluster or Redis Sentinel | ZAB quorum-based clusters |
| Consistency | Strong (ACID) | Eventual (async replication) | Strong (ZAB ensures quorum writes) |
| Lock Notification | Poll-based | Poll-based | Watcher-driven |
Practical Considerations
Distributed locks cannot match in-failure safety guarantees of local (@synchronized) primitives due to external failures—especially network glitches. Best practices include:- Deploy clients and lock service within the same low-latency network zone.
- Ensure monitors and alarm systems detect lock acquisition failures.
- Choose wisely: Redis suits high-throughput scenarios where near-real-time availability matters more than absolute correctness; ZooKeeper suits high-safety use cases like payment workflows.
Đăng vào ngày 26 tháng 5 lúc 17:50