Arxiv — Search arXiv papers by keyword, author, category, or ID
Search arXiv papers by keyword, author, category, or ID.
Skill metadata
Section titled “Skill metadata”| Source | Bundled (installed by default) |
| Path | skills/research/arxiv |
| Version | 1.0.0 |
| Author | Hermes Agent |
| License | MIT |
| Tags | Research, Arxiv, Papers, Academic, Science, API |
| Related skills | ocr-and-documents |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
arXiv Research
Section titled “arXiv Research”Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.
Quick Reference
Section titled “Quick Reference”| Action | Command |
|---|---|
| Search papers | curl "https://export.arxiv.org/api/query?search_query=all:QUERY&max_results=5" |
| Get specific paper | curl "https://export.arxiv.org/api/query?id_list=2402.03300" |
| Read abstract (web) | web_extract(urls=["https://arxiv.org/abs/2402.03300"]) |
| Read full paper (PDF) | web_extract(urls=["https://arxiv.org/pdf/2402.03300"]) |
Searching Papers
Section titled “Searching Papers”The API returns Atom XML. Parse with grep/sed or pipe through python3 for clean output.
Basic search
Section titled “Basic search”curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"Clean output (parse XML to readable format)
Section titled “Clean output (parse XML to readable format)”curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "import sys, xml.etree.ElementTree as ETns = {'a': 'http://www.w3.org/2005/Atom'}root = ET.parse(sys.stdin).getroot()for i, entry in enumerate(root.findall('a:entry', ns)): title = entry.find('a:title', ns).text.strip().replace('\n', ' ') arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1] published = entry.find('a:published', ns).text[:10] authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns)) summary = entry.find('a:summary', ns).text.strip()[:200] cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns)) print(f'{i+1}. [{arxiv_id}] {title}') print(f' Authors: {authors}') print(f' Published: {published} | Categories: {cats}') print(f' Abstract: {summary}...') print(f' PDF: https://arxiv.org/pdf/{arxiv_id}') print()"Search Query Syntax
Section titled “Search Query Syntax”| Prefix | Searches | Example |
|---|---|---|
all: | All fields | all:transformer+attention |
ti: | Title | ti:large+language+models |
au: | Author | au:vaswani |
abs: | Abstract | abs:reinforcement+learning |
cat: | Category | cat:cs.AI |
co: | Comment | co:accepted+NeurIPS |
Boolean operators
Section titled “Boolean operators”# AND (default when using +)search_query=all:transformer+attention
# ORsearch_query=all:GPT+OR+all:BERT
# AND NOTsearch_query=all:language+model+ANDNOT+all:vision
# Exact phrasesearch_query=ti:"chain+of+thought"
# Combinedsearch_query=au:hinton+AND+cat:cs.LGSort and Pagination
Section titled “Sort and Pagination”| Parameter | Options |
|---|---|
sortBy | relevance, lastUpdatedDate, submittedDate |
sortOrder | ascending, descending |
start | Result offset (0-based) |
max_results | Number of results (default 10, max 30000) |
# Latest 10 papers in cs.AIcurl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"Fetching Specific Papers
Section titled “Fetching Specific Papers”# By arXiv IDcurl -s "https://export.arxiv.org/api/query?id_list=2402.03300"
# Multiple paperscurl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"BibTeX Generation
Section titled “BibTeX Generation”After fetching metadata for a paper, generate a BibTeX entry:
{% raw %}
curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "import sys, xml.etree.ElementTree as ETns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}root = ET.parse(sys.stdin).getroot()entry = root.find('a:entry', ns)if entry is None: sys.exit('Paper not found')title = entry.find('a:title', ns).text.strip().replace('\n', ' ')authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))year = entry.find('a:published', ns).text[:4]raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]cat = entry.find('arxiv:primary_category', ns)primary = cat.get('term') if cat is not None else 'cs.LG'last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')print(f' title = {{{title}}},')print(f' author = {{{authors}}},')print(f' year = {{{year}}},')print(f' eprint = {{{raw_id}}},')print(f' archivePrefix = {{arXiv}},')print(f' primaryClass = {{{primary}}},')print(f' url = {{https://arxiv.org/abs/{raw_id}}}')print('}')"{% endraw %}
Reading Paper Content
Section titled “Reading Paper Content”After finding a paper, read it:
# Abstract page (fast, metadata + abstract)web_extract(urls=["https://arxiv.org/abs/2402.03300"])
# Full paper (PDF → markdown via Firecrawl)web_extract(urls=["https://arxiv.org/pdf/2402.03300"])For local PDF processing, see the ocr-and-documents skill.
Common Categories
Section titled “Common Categories”| Category | Field |
|---|---|
cs.AI | Artificial Intelligence |
cs.CL | Computation and Language (NLP) |
cs.CV | Computer Vision |
cs.LG | Machine Learning |
cs.CR | Cryptography and Security |
stat.ML | Machine Learning (Statistics) |
math.OC | Optimization and Control |
physics.comp-ph | Computational Physics |
Full list: https://arxiv.org/category_taxonomy
Helper Script
Section titled “Helper Script”The scripts/search_arxiv.py script handles XML parsing and provides clean output:
python scripts/search_arxiv.py "GRPO reinforcement learning"python scripts/search_arxiv.py "transformer attention" --max 10 --sort datepython scripts/search_arxiv.py --author "Yann LeCun" --max 5python scripts/search_arxiv.py --category cs.AI --sort datepython scripts/search_arxiv.py --id 2402.03300python scripts/search_arxiv.py --id 2402.03300,2401.12345No dependencies — uses only Python stdlib.
Semantic Scholar (Citations, Related Papers, Author Profiles)
Section titled “Semantic Scholar (Citations, Related Papers, Author Profiles)”arXiv doesn’t provide citation data or recommendations. Use the Semantic Scholar API for that — free, no key needed for basic use (1 req/sec), returns JSON.
Get paper details + citations
Section titled “Get paper details + citations”# By arXiv IDcurl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300?fields=title,authors,citationCount,referenceCount,influentialCitationCount,year,abstract" | python3 -m json.tool
# By Semantic Scholar paper ID or DOIcurl -s "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example?fields=title,citationCount"Get citations OF a paper (who cited it)
Section titled “Get citations OF a paper (who cited it)”curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.toolGet references FROM a paper (what it cites)
Section titled “Get references FROM a paper (what it cites)”curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.toolSearch papers (alternative to arXiv search, returns JSON)
Section titled “Search papers (alternative to arXiv search, returns JSON)”curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.toolGet paper recommendations
Section titled “Get paper recommendations”curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \ -H "Content-Type: application/json" \ -d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.toolAuthor profile
Section titled “Author profile”curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.toolUseful Semantic Scholar fields
Section titled “Useful Semantic Scholar fields”title, authors, year, abstract, citationCount, referenceCount, influentialCitationCount, isOpenAccess, openAccessPdf, fieldsOfStudy, publicationVenue, externalIds (contains arXiv ID, DOI, etc.)
Complete Research Workflow
Section titled “Complete Research Workflow”- Discover:
python scripts/search_arxiv.py "your topic" --sort date --max 10 - Assess impact:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount" - Read abstract:
web_extract(urls=["https://arxiv.org/abs/ID"]) - Read full paper:
web_extract(urls=["https://arxiv.org/pdf/ID"]) - Find related work:
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20" - Get recommendations: POST to Semantic Scholar recommendations endpoint
- Track authors:
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=NAME"
Rate Limits
Section titled “Rate Limits”| API | Rate | Auth |
|---|---|---|
| arXiv | ~1 req / 3 seconds | None needed |
| Semantic Scholar | 1 req / second | None (100/sec with API key) |
- arXiv returns Atom XML — use the helper script or parsing snippet for clean output
- Semantic Scholar returns JSON — pipe through
python3 -m json.toolfor readability - arXiv IDs: old format (
hep-th/0601001) vs new (2402.03300) - PDF:
https://arxiv.org/pdf/{id}— Abstract:https://arxiv.org/abs/{id} - HTML (when available):
https://arxiv.org/html/{id} - For local PDF processing, see the
ocr-and-documentsskill
ID Versioning
Section titled “ID Versioning”arxiv.org/abs/1706.03762always resolves to the latest versionarxiv.org/abs/1706.03762v1points to a specific immutable version- When generating citations, preserve the version suffix you actually read to prevent citation drift (a later version may substantially change content)
- The API
<id>field returns the versioned URL (e.g.,http://arxiv.org/abs/1706.03762v7)
Withdrawn Papers
Section titled “Withdrawn Papers”Papers can be withdrawn after submission. When this happens:
- The
<summary>field contains a withdrawal notice (look for “withdrawn” or “retracted”) - Metadata fields may be incomplete
- Always check the summary before treating a result as a valid paper