atom_quality_summary.md 6.74 KB

Atoms 质量报告

  • 扫描文件数:189
  • 扫描行数:21280
  • 问题总数:3487

问题分布

  • screenshot_marker_in_canon: 1122
  • screenshot_marker_in_result: 1057
  • case_action_is_enumeration: 628
  • generic_result: 542
  • doc_action_fragmented: 56
  • result_too_long: 54
  • doc_action_too_short: 22
  • missing_action: 3
  • invalid_json: 3

文件分布

  • build/v4.40.0/case_atoms.jsonl: 333 (case_action_is_enumeration:161, screenshot_marker_in_canon:61, generic_result:56, screenshot_marker_in_result:54, result_too_long:1)
  • build/v4.21.9/case_atoms.jsonl: 259 (screenshot_marker_in_canon:127, screenshot_marker_in_result:119, case_action_is_enumeration:11, generic_result:2)
  • build/v4.21.3/case_atoms.jsonl: 244 (screenshot_marker_in_canon:120, screenshot_marker_in_result:111, generic_result:12, case_action_is_enumeration:1)
  • build/v4.21.1/case_atoms.jsonl: 178 (screenshot_marker_in_canon:86, screenshot_marker_in_result:85, generic_result:6, result_too_long:1)
  • build/v4.23.5/case_atoms.jsonl: 165 (screenshot_marker_in_canon:69, screenshot_marker_in_result:49, result_too_long:29, generic_result:14, case_action_is_enumeration:4)
  • build/v4.19.3/case_atoms.jsonl: 155 (screenshot_marker_in_canon:71, screenshot_marker_in_result:70, generic_result:11, result_too_long:2, case_action_is_enumeration:1)
  • build/v4.32.0/case_atoms.jsonl: 143 (generic_result:55, screenshot_marker_in_result:34, screenshot_marker_in_canon:34, case_action_is_enumeration:20)
  • build/v4.27.0/case_atoms.jsonl: 139 (case_action_is_enumeration:42, generic_result:35, screenshot_marker_in_result:31, screenshot_marker_in_canon:31)
  • build/v4.26.5/case_atoms.jsonl: 135 (case_action_is_enumeration:49, generic_result:33, screenshot_marker_in_canon:27, screenshot_marker_in_result:26)
  • build/v4.21.7/case_atoms.jsonl: 103 (screenshot_marker_in_result:48, screenshot_marker_in_canon:48, generic_result:7)
  • build/v4.14.5/case_atoms.jsonl: 97 (screenshot_marker_in_result:43, screenshot_marker_in_canon:43, generic_result:11)
  • build/v4.22.3/case_atoms.jsonl: 82 (screenshot_marker_in_canon:28, generic_result:27, screenshot_marker_in_result:27)
  • build/v4.33.0/case_atoms.jsonl: 82 (case_action_is_enumeration:31, generic_result:31, screenshot_marker_in_result:10, screenshot_marker_in_canon:10)
  • build/v4.17.5/case_atoms.jsonl: 75 (screenshot_marker_in_canon:27, screenshot_marker_in_result:26, case_action_is_enumeration:14, generic_result:5, result_too_long:3)
  • build/v4.42.0/case_atoms.jsonl: 75 (case_action_is_enumeration:41, generic_result:19, screenshot_marker_in_canon:9, screenshot_marker_in_result:6)
  • build/v4.18.9/case_atoms.jsonl: 69 (screenshot_marker_in_canon:27, screenshot_marker_in_result:25, generic_result:17)
  • build/v4.25.5/case_atoms.jsonl: 67 (generic_result:23, screenshot_marker_in_result:16, screenshot_marker_in_canon:16, case_action_is_enumeration:12)
  • build/v4.43.0/case_atoms.jsonl: 64 (generic_result:23, case_action_is_enumeration:19, screenshot_marker_in_result:11, screenshot_marker_in_canon:11)
  • build/v4.44.0/case_atoms.jsonl: 64 (case_action_is_enumeration:46, generic_result:8, result_too_long:8, screenshot_marker_in_canon:2)
  • build/v4.17.1/case_atoms.jsonl: 60 (screenshot_marker_in_result:30, screenshot_marker_in_canon:30)
  • build/v4.57.3/case_atoms.jsonl: 53 (screenshot_marker_in_result:19, screenshot_marker_in_canon:19, case_action_is_enumeration:10, generic_result:5)
  • build/v4.26.0/case_atoms.jsonl: 51 (screenshot_marker_in_canon:20, screenshot_marker_in_result:19, generic_result:9, case_action_is_enumeration:3)
  • build/v4.21.2/case_atoms.jsonl: 50 (screenshot_marker_in_result:24, screenshot_marker_in_canon:24, generic_result:2)
  • build/v4.35.0/case_atoms.jsonl: 50 (case_action_is_enumeration:18, screenshot_marker_in_result:12, screenshot_marker_in_canon:12, generic_result:8)
  • build/v4.18.3/case_atoms.jsonl: 49 (screenshot_marker_in_result:22, screenshot_marker_in_canon:22, generic_result:5)
  • build/v4.46.0/case_atoms.jsonl: 49 (generic_result:18, screenshot_marker_in_result:11, screenshot_marker_in_canon:11, case_action_is_enumeration:9)
  • build/v4.21.5/case_atoms.jsonl: 48 (screenshot_marker_in_result:22, screenshot_marker_in_canon:22, generic_result:4)
  • build/v4.20.9/case_atoms.jsonl: 43 (screenshot_marker_in_result:15, screenshot_marker_in_canon:15, generic_result:10, result_too_long:3)
  • build/v4.23.0/case_atoms.jsonl: 39 (screenshot_marker_in_result:15, screenshot_marker_in_canon:15, generic_result:9)
  • build/v4.19.1/case_atoms.jsonl: 37 (generic_result:13, screenshot_marker_in_result:12, screenshot_marker_in_canon:12)
  • build/v4.30.10/case_atoms.jsonl: 37 (case_action_is_enumeration:37)
  • build/v4.32.10/case_atoms.jsonl: 36 (case_action_is_enumeration:33, generic_result:3)
  • build/v4.37.0/case_atoms.jsonl: 35 (generic_result:21, case_action_is_enumeration:14)
  • build/v4.22.1/case_atoms.jsonl: 33 (screenshot_marker_in_result:15, screenshot_marker_in_canon:15, generic_result:3)
  • build/v4.47.0/case_atoms.jsonl: 33 (screenshot_marker_in_result:13, screenshot_marker_in_canon:13, generic_result:4, case_action_is_enumeration:3)
  • build/v4.28.0/case_atoms.jsonl: 26 (generic_result:17, case_action_is_enumeration:5, screenshot_marker_in_result:2, screenshot_marker_in_canon:2)
  • build/v4.24.0/case_atoms.jsonl: 22 (screenshot_marker_in_canon:11, generic_result:8, screenshot_marker_in_result:3)
  • build/v4.18.7/case_atoms.jsonl: 21 (screenshot_marker_in_result:10, screenshot_marker_in_canon:10, generic_result:1)
  • build/v4.20.3/case_atoms.jsonl: 19 (screenshot_marker_in_result:9, screenshot_marker_in_canon:9, generic_result:1)
  • build/v4.29.0/case_atoms.jsonl: 17 (generic_result:5, screenshot_marker_in_result:4, screenshot_marker_in_canon:4, case_action_is_enumeration:4)
  • build/v4.38.0/case_atoms.jsonl: 17 (case_action_is_enumeration:17)
  • build/v4.39.0/case_atoms.jsonl: 14 (case_action_is_enumeration:14)
  • build/v4.25.5/doc_atoms.jsonl: 13 (doc_action_fragmented:8, doc_action_too_short:5)
  • build/v4.20.5/case_atoms.jsonl: 11 (screenshot_marker_in_result:5, screenshot_marker_in_canon:5, generic_result:1)
  • build/v4.31.10/case_atoms.jsonl: 11 (screenshot_marker_in_result:4, screenshot_marker_in_canon:4, case_action_is_enumeration:3)
  • build/v4.39.0/doc_atoms.jsonl: 11 (doc_action_too_short:11)
  • build/v4.33.0/doc_atoms.jsonl: 8 (doc_action_fragmented:8)
  • build/v4.32.0/case_atoms_model.jsonl: 5 (case_action_is_enumeration:5)
  • build/v4.46.0/doc_atoms_model.jsonl: 5 (doc_action_fragmented:5)
  • build/v4.32.0/doc_atoms.jsonl: 4 (doc_action_too_short:4)

建议

  • invalid_jsonmissing_*:先修输入数据,再讨论检索效果。
  • screenshot_marker_*case_action_is_enumeration:说明纯脚本抽取未完成语义清洗,应改为模型蒸馏。
  • doc_action_fragmenteddoc_action_too_short:说明 PDF 切块断裂,培训文档必须走模型重组。