Webpage Metadata Checker API
Extract metadata, Open Graph tags, images, and more from any webpage. Build rich link previews with ready-to-use components for React, Vue, Svelte, and vanilla JavaScript.
Test the API with any URL. No API key required for this demo playground.
These are free-tier categories available in the playground. Free users have access to 17 categories total.
Everything you need to extract and display webpage metadata
Extract comprehensive data from any webpage with our 22 specialized categories. Each category is optimized for specific use cases and provides detailed, structured responses.
PageSight extracts comprehensive data from any webpage through intelligent parsing and analysis. Categories are processed efficiently to deliver fast, accurate results. All relative URLs are automatically resolved to absolute URLs, and responses are cached to reduce API calls and improve response times.
Extracts basic HTML metadata including title, description, keywords, author, viewport settings, and all meta tags from the page head.
Parses all <meta> tags, <title> element, canonical links, and HTML lang attributes using Cheerio HTML parsing.
Extracts Open Graph protocol tags used for rich social media previews on Facebook, LinkedIn, and other platforms.
Parses all <meta property="og:*"> tags and resolves relative URLs to absolute URLs for images and media.
Extracts Twitter Card metadata for rich Twitter link previews with images, titles, and descriptions.
Parses all <meta name="twitter:*"> tags and handles Twitter Card types (summary, summary_large_image, app, player).
Extracts all favicon variants including standard favicons, Apple touch icons, and shortcut icons with their sizes and types.
Searches for <link rel="icon">, <link rel="apple-touch-icon">, and <link rel="shortcut icon"> tags, and checks for default /favicon.ico.
Extracts all images from the page including src, alt text, dimensions, loading attributes, and accessibility metrics.
Finds all <img> tags, extracts attributes (src, alt, width, height, loading, srcset), and counts images with/without alt text for accessibility analysis.
Analyzes robots.txt file to extract crawl rules, disallowed/allowed paths, sitemap locations, and crawl delays.
Fetches /robots.txt, parses user-agent rules, disallow/allow directives, sitemap declarations, and crawl-delay settings.
Extracts XML sitemap data including URLs, last modification dates, change frequencies, priorities, and sitemap indexes.
Fetches sitemap.xml (or variants), parses XML structure, extracts URL entries, and handles sitemap index files.
Analyzes page content structure including headings hierarchy, links (internal/external), text content, word count, and content elements.
Extracts all headings (h1-h6), parses all links with internal/external classification, counts paragraphs/lists/blockquotes, and analyzes text content.
Extracts JSON-LD, Microdata, and RDFa structured data including schema types, properties, and semantic markup.
Parses <script type="application/ld+json"> for JSON-LD, [itemtype] attributes for Microdata, and identifies all schema.org types present.
Analyzes technical aspects of the page including HTML version, doctype, element counts, and SEO technical indicators.
Analyzes HTML structure, counts scripts/stylesheets/forms, checks for SEO elements (H1, meta description, canonical), and extracts technical metadata.
Captures a full-page screenshot of the website rendered in mobile viewport (375x667) for visual analysis.
Uses Playwright with mobile viewport settings, waits for page load, and captures full-page PNG screenshot.
Captures a full-page screenshot of the website rendered in desktop viewport (1920x1080) for visual analysis.
Uses Playwright with desktop viewport settings, waits for fonts and critical rendering, and captures full-page PNG screenshot.
Measures page performance metrics including load times, resource counts, and Core Web Vitals indicators.
Uses Playwright performance API to measure navigation timing, resource timing, and calculates performance scores.
Analyzes accessibility features including ARIA attributes, semantic HTML usage, alt text coverage, and WCAG compliance indicators.
Scans HTML for ARIA attributes, semantic elements, form labels, heading hierarchy, and accessibility best practices.
Analyzes security headers, HTTPS configuration, content security policy, and security-related meta tags.
Extracts HTTP security headers (CSP, HSTS, X-Frame-Options), checks HTTPS configuration, and analyzes security meta tags.
Extracts social media links, sharing buttons, and social platform integrations from the page.
Finds social media links (Facebook, Twitter, LinkedIn, etc.), detects sharing widgets, and extracts social meta tags.
Detects analytics and tracking scripts including Google Analytics, Facebook Pixel, and other tracking tools.
Scans for analytics script tags, detects common analytics platforms (GA, GTM, Facebook Pixel), and extracts tracking IDs.
Comprehensive link analysis including internal/external links, nofollow attributes, anchor text, and link structure.
Extracts all <a> tags, classifies internal vs external, analyzes rel attributes (nofollow, noopener), and extracts anchor text.
Extracts form elements including input fields, form actions, methods, validation attributes, and form structure.
Finds all <form> elements, extracts inputs, selects, textareas, form actions/methods, and analyzes form validation.
Extracts media elements including videos, audio files, embedded content, and media metadata.
Finds <video>, <audio>, <iframe>, and embedded media elements, extracts sources, dimensions, and media attributes.
Identifies technologies used on the website including CMS, frameworks, libraries, and server technologies.
Analyzes HTML comments, script sources, meta tags, and HTTP headers to detect technologies like WordPress, React, Vue, etc.
Analyzes server infrastructure including hosting provider, CDN, DNS, SSL certificates, and server headers.
Extracts HTTP headers (Server, X-Powered-By), analyzes DNS records, checks SSL certificates, and identifies hosting/CDN providers.
Note: Free tier users can access 7 categories. Premium users have access to all 22 categories and can request up to 3 categories per API call. All responses are cached for 24 hours by default (Premium: customizable cache duration).
Choose the plan that fits your needs
Start extracting webpage metadata in minutes. No credit card required.