name: pdb description: > 从RCSB PDB获取并分析蛋白质结构。在以下情况使用此技能： (1) 需要通过PDB ID下载结构， (2) 搜索相似结构， (3) 为结合剂设计准备靶标， (4) 提取特定链或结构域， (5) 获取结构元数据。

序列查找请使用uniprot。结合剂设计工作流请使用binder-design。 license: MIT category: utilities tags: [database, structure, fetch]

PDB数据库访问

注意：此技能直接使用RCSB PDB网络API。无需Modal部署 - 所有操作通过HTTP请求在本地运行。

获取结构

通过PDB ID

# 下载PDB文件
curl -o 1alu.pdb "https://files.rcsb.org/download/1ALU.pdb"

# 下载mmCIF
curl -o 1alu.cif "https://files.rcsb.org/download/1ALU.cif"

使用Python

from Bio.PDB import PDBList

pdbl = PDBList()
pdbl.retrieve_pdb_file("1ABC", pdir="structures/", file_format="pdb")

使用RCSB API

import requests

def fetch_pdb(pdb_id: str, format: str = "pdb") -> str:
    """从RCSB PDB获取结构。"""
    url = f"https://files.rcsb.org/download/{pdb_id}.{format}"
    response = requests.get(url)
    response.raise_for_status()
    return response.text

def fetch_fasta(pdb_id: str) -> str:
    """以FASTA格式获取序列。"""
    url = f"https://www.rcsb.org/fasta/entry/{pdb_id}"
    return requests.get(url).text

# 使用示例
pdb_content = fetch_pdb("1ALU")
with open("1ALU.pdb", "w") as f:
    f.write(pdb_content)

结构准备

选择链

from Bio.PDB import PDBParser, PDBIO, Select

class ChainSelect(Select):
    def __init__(self, chain_id):
        self.chain_id = chain_id

    def accept_chain(self, chain):
        return chain.id == self.chain_id

# 提取A链
parser = PDBParser()
structure = parser.get_structure("protein", "1abc.pdb")
io = PDBIO()
io.set_structure(structure)
io.save("chain_A.pdb", ChainSelect("A"))

修剪至结合区域

def trim_around_residues(pdb_file, center_residues, buffer=10.0):
    """将结构修剪至指定残基周围的区域。"""
    parser = PDBParser()
    structure = parser.get_structure("protein", pdb_file)

    # 获取中心坐标
    center_coords = []
    for res in structure.get_residues():
        if res.id[1] in center_residues:
            center_coords.extend([a.coord for a in res.get_atoms()])

    center = np.mean(center_coords, axis=0)

    # 保留缓冲区内的残基
    class RegionSelect(Select):
        def accept_residue(self, res):
            for atom in res.get_atoms():
                if np.linalg.norm(atom.coord - center) < buffer:
                    return True
            return False

    io = PDBIO()
    io.set_structure(structure)
    io.save("trimmed.pdb", RegionSelect())

搜索PDB

RCSB搜索API

import requests

query = {
    "query": {
        "type": "terminal",
        "service": "full_text",
        "parameters": {
            "value": "EGFR激酶结构域"
        }
    },
    "return_type": "entry"
}

response = requests.post(
    "https://search.rcsb.org/rcsbsearch/v2/query",
    json=query
)
results = response.json()

通过序列相似性

query = {
    "query": {
        "type": "terminal",
        "service": "sequence",
        "parameters": {
            "value": "MKTAYIAKQRQISFVK...",
            "evalue_cutoff": 1e-10,
            "identity_cutoff": 0.9
        }
    }
}

结构分析

获取链信息

def get_structure_info(pdb_file):
    parser = PDBParser(QUIET=True)
    structure = parser.get_structure("protein", pdb_file)

    info = {
        "chains": [],
        "total_residues": 0
    }

    for model in structure:
        for chain in model:
            residues = list(chain.get_residues())
            info["chains"].append({
                "id": chain.id,
                "length": len(residues),
                "first_res": residues[0].id[1],
                "last_res": residues[-1].id[1]
            })
            info["total_residues"] += len(residues)

    return info

查找界面残基

def find_interface_residues(pdb_file, chain_a, chain_b, distance=4.0):
    """查找两个链之间界面处的残基。"""
    parser = PDBParser(QUIET=True)
    structure = parser.get_structure("complex", pdb_file)

    interface_a = set()
    interface_b = set()

    for res_a in structure[0][chain_a].get_residues():
        for res_b in structure[0][chain_b].get_residues():
            for atom_a in res_a.get_atoms():
                for atom_b in res_b.get_atoms():
                    if atom_a - atom_b < distance:
                        interface_a.add(res_a.id[1])
                        interface_b.add(res_b.id[1])

    return interface_a, interface_b

结合剂设计常见任务

靶标准备清单

下载结构：curl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb"
识别靶标链
移除水和配体（如需要）
修剪至结合区域+缓冲区
识别潜在热点
如有需要重新编号

故障排除

结构未找到：检查PDB ID格式（4个字符） 多个模型：为设计选择第一个模型 缺失残基：检查结构中的缺口

下一步：使用结构进行boltzgen（推荐）或rfdiffusion设计。

序列查找请使用uniprot。 结合剂设计工作流请使用binder-design。 license: MIT category: utilities tags: [database, structure, fetch]