name: pdb description: > 从RCSB PDB获取并分析蛋白质结构。在以下情况使用此技能: (1) 需要通过PDB ID下载结构, (2) 搜索相似结构, (3) 为结合剂设计准备靶标, (4) 提取特定链或结构域, (5) 获取结构元数据。
序列查找请使用uniprot。 结合剂设计工作流请使用binder-design。 license: MIT category: utilities tags: [database, structure, fetch]
PDB数据库访问
注意:此技能直接使用RCSB PDB网络API。无需Modal部署 - 所有操作通过HTTP请求在本地运行。
获取结构
通过PDB ID
# 下载PDB文件
curl -o 1alu.pdb "https://files.rcsb.org/download/1ALU.pdb"
# 下载mmCIF
curl -o 1alu.cif "https://files.rcsb.org/download/1ALU.cif"
使用Python
from Bio.PDB import PDBList
pdbl = PDBList()
pdbl.retrieve_pdb_file("1ABC", pdir="structures/", file_format="pdb")
使用RCSB API
import requests
def fetch_pdb(pdb_id: str, format: str = "pdb") -> str:
"""从RCSB PDB获取结构。"""
url = f"https://files.rcsb.org/download/{pdb_id}.{format}"
response = requests.get(url)
response.raise_for_status()
return response.text
def fetch_fasta(pdb_id: str) -> str:
"""以FASTA格式获取序列。"""
url = f"https://www.rcsb.org/fasta/entry/{pdb_id}"
return requests.get(url).text
# 使用示例
pdb_content = fetch_pdb("1ALU")
with open("1ALU.pdb", "w") as f:
f.write(pdb_content)
结构准备
选择链
from Bio.PDB import PDBParser, PDBIO, Select
class ChainSelect(Select):
def __init__(self, chain_id):
self.chain_id = chain_id
def accept_chain(self, chain):
return chain.id == self.chain_id
# 提取A链
parser = PDBParser()
structure = parser.get_structure("protein", "1abc.pdb")
io = PDBIO()
io.set_structure(structure)
io.save("chain_A.pdb", ChainSelect("A"))
修剪至结合区域
def trim_around_residues(pdb_file, center_residues, buffer=10.0):
"""将结构修剪至指定残基周围的区域。"""
parser = PDBParser()
structure = parser.get_structure("protein", pdb_file)
# 获取中心坐标
center_coords = []
for res in structure.get_residues():
if res.id[1] in center_residues:
center_coords.extend([a.coord for a in res.get_atoms()])
center = np.mean(center_coords, axis=0)
# 保留缓冲区内的残基
class RegionSelect(Select):
def accept_residue(self, res):
for atom in res.get_atoms():
if np.linalg.norm(atom.coord - center) < buffer:
return True
return False
io = PDBIO()
io.set_structure(structure)
io.save("trimmed.pdb", RegionSelect())
搜索PDB
RCSB搜索API
import requests
query = {
"query": {
"type": "terminal",
"service": "full_text",
"parameters": {
"value": "EGFR激酶结构域"
}
},
"return_type": "entry"
}
response = requests.post(
"https://search.rcsb.org/rcsbsearch/v2/query",
json=query
)
results = response.json()
通过序列相似性
query = {
"query": {
"type": "terminal",
"service": "sequence",
"parameters": {
"value": "MKTAYIAKQRQISFVK...",
"evalue_cutoff": 1e-10,
"identity_cutoff": 0.9
}
}
}
结构分析
获取链信息
def get_structure_info(pdb_file):
parser = PDBParser(QUIET=True)
structure = parser.get_structure("protein", pdb_file)
info = {
"chains": [],
"total_residues": 0
}
for model in structure:
for chain in model:
residues = list(chain.get_residues())
info["chains"].append({
"id": chain.id,
"length": len(residues),
"first_res": residues[0].id[1],
"last_res": residues[-1].id[1]
})
info["total_residues"] += len(residues)
return info
查找界面残基
def find_interface_residues(pdb_file, chain_a, chain_b, distance=4.0):
"""查找两个链之间界面处的残基。"""
parser = PDBParser(QUIET=True)
structure = parser.get_structure("complex", pdb_file)
interface_a = set()
interface_b = set()
for res_a in structure[0][chain_a].get_residues():
for res_b in structure[0][chain_b].get_residues():
for atom_a in res_a.get_atoms():
for atom_b in res_b.get_atoms():
if atom_a - atom_b < distance:
interface_a.add(res_a.id[1])
interface_b.add(res_b.id[1])
return interface_a, interface_b
结合剂设计常见任务
靶标准备清单
- 下载结构:
curl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb" - 识别靶标链
- 移除水和配体(如需要)
- 修剪至结合区域+缓冲区
- 识别潜在热点
- 如有需要重新编号
故障排除
结构未找到:检查PDB ID格式(4个字符) 多个模型:为设计选择第一个模型 缺失残基:检查结构中的缺口
下一步:使用结构进行boltzgen(推荐)或rfdiffusion设计。