# 分布式锁使用文档

## 概述

基于 Redis 实现的分布式锁，提供了在分布式环境下对共享资源进行互斥访问的能力。

## 特性

- ✅ **唯一标识符**: 使用 UUID 防止误解锁
- ✅ **自动续期**: 支持长时间运行任务的自动续期机制
- ✅ **超时保护**: 防止死锁，自动释放超时的锁
- ✅ **重试机制**: 支持获取锁失败时的自动重试
- ✅ **多种使用方式**: 上下文管理器、手动控制、装饰器
- ✅ **线程/进程安全**: 基于 Redis 的原子操作
- ✅ **Lua 脚本**: 确保释放和延期操作的原子性

## 快速开始

### 方式 1: 使用上下文管理器（推荐）

```python
from core.redis import get_redis
from core.distributed_lock import DistributedLock

async def example():
    redis = await get_redis()
    
    async with DistributedLock(redis, "my_resource"):
        # 执行需要互斥的操作
        await do_something()
    # 锁会自动释放
```

### 方式 2: 使用便捷函数

```python
from core.distributed_lock import distributed_lock

async def example():
    async with distributed_lock("my_resource", timeout=60):
        # 执行需要互斥的操作
        await do_something()
```

### 方式 3: 使用装饰器

```python
from core.distributed_lock import distributed_lock_decorator

# 使用函数名作为锁名
@distributed_lock_decorator()
async def my_function():
    await do_something()

# 指定自定义锁名
@distributed_lock_decorator("custom_lock_name")
async def my_function():
    await do_something()

# 指定额外参数
@distributed_lock_decorator("custom_lock", timeout=60, auto_renewal=True)
async def long_running_task():
    await do_something()
```

### 方式 4: 手动控制

```python
from core.redis import get_redis
from core.distributed_lock import DistributedLock

async def example():
    redis = await get_redis()
    lock = DistributedLock(redis, "my_resource", timeout=30)
    
    if await lock.acquire():
        try:
            await do_something()
        finally:
            await lock.release()
```

## 高级功能

### 1. 自动续期

对于长时间运行的任务，可以启用自动续期功能：

```python
async with DistributedLock(
    redis,
    "long_task",
    timeout=60,
    auto_renewal=True,
    renewal_interval=20,  # 每 20 秒续期一次
):
    # 即使任务运行超过 60 秒，锁也不会过期
    await long_running_task()
```

### 2. 重试机制

获取锁失败时自动重试：

```python
lock = DistributedLock(
    redis,
    "resource",
    timeout=30,
    retry_times=10,      # 重试 10 次
    retry_delay=0.5,     # 每次重试间隔 0.5 秒
)

if await lock.acquire():
    # 获取锁成功
    pass
```

### 3. 手动延长锁

在任务执行过程中手动延长锁的持有时间：

```python
lock = DistributedLock(redis, "resource", timeout=30)

if await lock.acquire():
    try:
        await partial_work()
        
        # 延长锁的持有时间
        await lock.extend(additional_time=30)
        
        await more_work()
    finally:
        await lock.release()
```

### 4. 检查锁状态

```python
lock = DistributedLock(redis, "resource")

# 检查锁是否被任何实例持有
is_locked = await lock.is_locked_by_anyone()

# 检查锁是否由当前实例持有
is_mine = await lock.is_locked_by_me()
```

## 参数说明

### DistributedLock 参数

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `redis` | Redis | 必填 | Redis 客户端实例 |
| `lock_name` | str | 必填 | 锁的名称（资源标识符） |
| `timeout` | int | 30 | 锁的超时时间（秒） |
| `retry_times` | int | 0 | 获取锁失败时的重试次数 |
| `retry_delay` | float | 0.1 | 重试间隔时间（秒） |
| `auto_renewal` | bool | False | 是否启用自动续期 |
| `renewal_interval` | int | timeout/3 | 自动续期间隔（秒） |

## 使用场景

### 1. 防止重复执行定时任务

```python
@distributed_lock_decorator("daily_report_task")
async def generate_daily_report():
    # 即使多个实例同时触发，也只有一个会执行
    await generate_report()
```

### 2. 库存扣减

```python
async def decrease_inventory(product_id: int, quantity: int):
    async with distributed_lock(f"inventory:{product_id}"):
        # 确保库存扣减的原子性
        inventory = await get_inventory(product_id)
        if inventory >= quantity:
            await update_inventory(product_id, inventory - quantity)
            return True
        return False
```

### 3. 缓存更新

```python
async def get_or_refresh_cache(key: str):
    # 先尝试从缓存获取
    data = await redis.get(key)
    if data:
        return data
    
    # 缓存不存在，使用锁防止缓存击穿
    async with distributed_lock(f"cache_refresh:{key}", retry_times=5):
        # 再次检查缓存（其他进程可能已经更新）
        data = await redis.get(key)
        if data:
            return data
        
        # 从数据库加载并更新缓存
        data = await load_from_database(key)
        await redis.set(key, data, ex=3600)
        return data
```

### 4. 分布式任务调度

```python
async def process_job(job_id: str):
    lock_name = f"job:{job_id}"
    
    lock = DistributedLock(
        redis,
        lock_name,
        timeout=300,
        auto_renewal=True,
        retry_times=0,  # 不重试，如果已有其他实例在处理则跳过
    )
    
    if await lock.acquire(blocking=False):
        try:
            await process(job_id)
        finally:
            await lock.release()
    else:
        # 任务已被其他实例处理
        pass
```

## 最佳实践

### 1. 选择合适的超时时间

- 超时时间应该大于任务的预期执行时间
- 对于不确定执行时间的任务，建议启用自动续期
- 避免设置过长的超时时间，防止异常情况下长时间锁定资源

### 2. 使用有意义的锁名

```python
# ✅ 好的做法
async with distributed_lock(f"order:{order_id}:payment"):
    await process_payment(order_id)

# ❌ 不好的做法
async with distributed_lock("lock1"):
    await process_payment(order_id)
```

### 3. 合理使用重试机制

```python
# 对于必须获取锁的场景，使用重试
lock = DistributedLock(
    redis,
    "critical_resource",
    retry_times=10,
    retry_delay=0.5,
)

# 对于可选的场景，不重试
lock = DistributedLock(
    redis,
    "optional_task",
    retry_times=0,
)
```

### 4. 异常处理

```python
try:
    async with distributed_lock("resource"):
        await risky_operation()
except RuntimeError:
    # 获取锁失败
    logger.error("Failed to acquire lock")
except Exception as e:
    # 其他异常
    logger.error(f"Operation failed: {e}")
```

### 5. 避免死锁

- 始终确保锁会被释放（使用 `try-finally` 或上下文管理器）
- 设置合理的超时时间
- 避免在持有锁的情况下等待其他锁

## 运行示例

项目中提供了完整的使用示例，可以直接运行查看效果：

```bash
# 确保 Redis 已启动
# 设置环境变量（如果需要）
export REDIS_HOST=localhost
export REDIS_PORT=6379

# 运行示例
python -m core.distributed_lock_example
```

## 注意事项

1. **Redis 依赖**: 分布式锁依赖 Redis，确保 Redis 服务可用
2. **时钟同步**: 在分布式环境中，确保各节点时钟同步
3. **网络延迟**: 考虑网络延迟对锁超时的影响
4. **资源清理**: 使用上下文管理器确保锁的正确释放
5. **锁粒度**: 选择合适的锁粒度，避免过粗或过细

## 故障排查

### 问题 1: 锁无法释放

**原因**: 程序异常退出，锁没有正确释放

**解决**:

- 使用上下文管理器或 `try-finally`
- 设置合理的超时时间，让锁自动过期

### 问题 2: 获取锁失败

**原因**:

- 其他实例正在持有锁
- 超时时间设置过短
- 重试次数不足

**解决**:

- 增加重试次数和重试间隔
- 检查是否有死锁
- 增加超时时间

### 问题 3: 锁提前过期

**原因**: 任务执行时间超过超时时间

**解决**:

- 增加超时时间
- 启用自动续期功能
- 手动延长锁的持有时间

## 性能考虑

- 每次获取/释放锁需要 1-2 次 Redis 操作
- 自动续期会定期执行 Redis 操作
- 建议在高并发场景下监控 Redis 性能
- 合理设置连接池大小

## 总结

分布式锁是分布式系统中的重要组件，正确使用可以：

- ✅ 保证数据一致性
- ✅ 防止重复执行
- ✅ 控制并发访问
- ✅ 提高系统可靠性

选择合适的使用方式和参数，可以在保证功能的同时获得最佳性能。