A Discourse plugin that detects when a URL is pasted into the **topic title** field and offers to scrape the page, extracting the article content (à la browser Reader Mode) and populating the **composer body** with a clean Markdown rendering.
---
## Features
- 🔗 Detects a bare URL typed/pasted into the topic title
- 📄 Extracts article content using a Readability-style heuristic (no external API needed)
- ✍️ Populates the topic body with clean Markdown: heading, byline, description, full article text
- 🛡️ SSRF protection: blocks requests to private/loopback addresses
- ⚙️ Configurable: auto-populate mode, allowlist/blocklist, timeout, content length cap
- 🌐 Works with most article-style pages (news, blogs, documentation)
`initializers/bookmark-url.js` hooks into the `composer-editor` component and observes the `composer.model.title` property via Ember's observer system. When the title matches a bare URL pattern:
1.**Fetches** the HTML via `Net::HTTP` with a browser-like User-Agent (follows one redirect).
2.**Extracts metadata** from Open Graph / Twitter Card / standard `<meta>` tags.
3.**Finds the content node** by trying a list of known semantic selectors (`article`, `[role=main]`, `.post-content`, etc.), then falling back to a text-density scoring algorithm over all `<div>` and `<section>` elements.
4.**Cleans the node**: removes nav, ads, scripts, hidden elements; strips non-essential attributes; makes relative URLs absolute.
5.**Converts to Markdown** using the `reverse_markdown` gem.
For sites that require JavaScript rendering, replace the `fetch_html` method with a call to a headless browser service (e.g. Browserless, Splash) or a third-party extraction API (Diffbot, Mercury Parser API).