Codebase Inspection — Inspect codebases w/ pygount: LOC, languages, ratios
Codebase Inspection
Section titled “Codebase Inspection”Inspect codebases w/ pygount: LOC, languages, ratios.
Skill metadata
Section titled “Skill metadata”| Source | Bundled (installed by default) |
| Path | skills/github/codebase-inspection |
| Version | 1.0.0 |
| Author | Hermes Agent |
| License | MIT |
| Tags | LOC, Code Analysis, pygount, Codebase, Metrics, Repository |
| Related skills | github-repo-management |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Codebase Inspection with pygount
Section titled “Codebase Inspection with pygount”Analyze repositories for lines of code, language breakdown, file counts, and code-vs-comment ratios using pygount.
When to Use
Section titled “When to Use”- User asks for LOC (lines of code) count
- User wants a language breakdown of a repo
- User asks about codebase size or composition
- User wants code-vs-comment ratios
- General “how big is this repo” questions
Prerequisites
Section titled “Prerequisites”pip install --break-system-packages pygount 2>/dev/null || pip install pygount1. Basic Summary (Most Common)
Section titled “1. Basic Summary (Most Common)”Get a full language breakdown with file counts, code lines, and comment lines:
cd /path/to/repopygount --format=summary \ --folders-to-skip=".git,node_modules,venv,.venv,__pycache__,.cache,dist,build,.next,.tox,.eggs,*.egg-info" \ .IMPORTANT: Always use --folders-to-skip to exclude dependency/build directories, otherwise pygount will crawl them and take a very long time or hang.
2. Common Folder Exclusions
Section titled “2. Common Folder Exclusions”Adjust based on the project type:
# Python projects--folders-to-skip=".git,venv,.venv,__pycache__,.cache,dist,build,.tox,.eggs,.mypy_cache"
# JavaScript/TypeScript projects--folders-to-skip=".git,node_modules,dist,build,.next,.cache,.turbo,coverage"
# General catch-all--folders-to-skip=".git,node_modules,venv,.venv,__pycache__,.cache,dist,build,.next,.tox,vendor,third_party"3. Filter by Specific Language
Section titled “3. Filter by Specific Language”# Only count Python filespygount --suffix=py --format=summary .
# Only count Python and YAMLpygount --suffix=py,yaml,yml --format=summary .4. Detailed File-by-File Output
Section titled “4. Detailed File-by-File Output”# Default format shows per-file breakdownpygount --folders-to-skip=".git,node_modules,venv" .
# Sort by code lines (pipe through sort)pygount --folders-to-skip=".git,node_modules,venv" . | sort -t$'\t' -k1 -nr | head -205. Output Formats
Section titled “5. Output Formats”# Summary table (default recommendation)pygount --format=summary .
# JSON output for programmatic usepygount --format=json .
# Pipe-friendly: Language, file count, code, docs, empty, stringpygount --format=summary . 2>/dev/null6. Interpreting Results
Section titled “6. Interpreting Results”The summary table columns:
- Language — detected programming language
- Files — number of files of that language
- Code — lines of actual code (executable/declarative)
- Comment — lines that are comments or documentation
- % — percentage of total
Special pseudo-languages:
__empty__— empty files__binary__— binary files (images, compiled, etc.)__generated__— auto-generated files (detected heuristically)__duplicate__— files with identical content__unknown__— unrecognized file types
Pitfalls
Section titled “Pitfalls”- Always exclude .git, node_modules, venv — without
--folders-to-skip, pygount will crawl everything and may take minutes or hang on large dependency trees. - Markdown shows 0 code lines — pygount classifies all Markdown content as comments, not code. This is expected behavior.
- JSON files show low code counts — pygount may count JSON lines conservatively. For accurate JSON line counts, use
wc -ldirectly. - Large monorepos — for very large repos, consider using
--suffixto target specific languages rather than scanning everything.