On this page
A link audit starts with a simple question: what links are actually on this page? It sounds trivial until you try to count them manually on a page with navigation menus, footer links, in-content links, sidebar widgets, and dynamically generated related-post blocks. Even a medium-sized content page can have 80 to 150 links when you count everything. Extracting them all systematically is the foundation of any serious internal linking or broken-link project.
Why link extraction matters for SEO
Search engines discover pages by following links. If a page on your site has no internal links pointing to it, crawlers may find it only through your sitemap—and may not crawl it frequently. Conversely, pages that receive many internal links from high-authority pages on your site tend to rank better because link equity flows through your internal structure.
Link extraction lets you see exactly what you have built versus what you intended to build. You might think a cornerstone piece of content is well-supported by internal links, but an extraction audit might reveal that most of those links come from low-traffic blog posts rather than your highest-traffic landing pages. That is actionable information. Similarly, an extraction run across your entire site can surface links to URLs that no longer exist, pages that have been redirected but whose internal links were never updated, and external links to domains that have changed ownership or content since you first linked to them.
What types of links to extract
A thorough extraction should capture more than just the raw URLs. For each link, you want:
- The destination URL — the value of the href attribute, either absolute or relative
- The anchor text — the visible, clickable text of the link (or the alt text of a linked image)
- The link type — internal (same domain) or external (different domain)
- The rel attribute — whether the link carries nofollow, sponsored, ugc, or no modifier at all
- The context — where on the page the link appears (navigation, body content, footer, sidebar)
Most link extractor tools return at least the URL and anchor text. That is enough for most audits. If you need the rel attribute values, look for a tool that parses the full anchor tag rather than just the href value.
Extracting links manually vs. with a tool
If you are comfortable with browser developer tools, you can extract links manually. Open the page, right-click anywhere, and choose "Inspect" to open DevTools. In the Console tab, run:
Array.from(document.querySelectorAll('a[href]'))
.map(a => ({ href: a.href, text: a.innerText.trim() }));
This returns an array of all anchor elements with their resolved URLs and anchor text. You can copy this from the console and paste it into a spreadsheet. It works for static pages but will not capture links inside iframes or certain dynamically rendered content.
For repeated audits or pages with heavy JavaScript rendering, a dedicated link extractor is faster and more reliable. You paste the HTML source or the URL, and the tool returns a structured list you can filter and export.
What to look for in the extracted data
Once you have your link list, the audit begins. Here are the most common issues to check for:
404 links. These are links pointing to URLs that return a Not Found error. They provide a poor user experience and waste crawl budget. Every 404 found via internal links should either be updated to point to the correct destination or removed.
Redirect chains. A link pointing to a URL that redirects to another URL that redirects again before reaching the final destination is a redirect chain. Each hop in the chain adds latency and dilutes link equity. Internal links in redirect chains should be updated to point directly to the final destination URL.
Links to competitor sites. An extraction of outbound external links sometimes surfaces links added by content contributors or pulled in through embedded third-party widgets that point to competitors. These are easy to miss without a systematic audit.
Duplicate anchor text. If 12 different links on a single page all use the anchor text "click here" or "learn more," those anchors give search engines almost no information about the destination pages. Descriptive anchor text—"how to reduce image file size" rather than "read more"—improves both usability and the SEO signal sent through the link.
Thin or missing anchor text. Image links with no alt text pass no textual context to search engines about the linked page. Every linked image should have an alt attribute that describes either the image or the destination page.
How to act on your findings
After completing the extraction, prioritize fixes in this order. First, fix or redirect all internal 404 links—these have the most immediate negative impact on crawl efficiency and user experience. Second, update redirect chains to point directly to final destination URLs. Third, review external links pointing to off-topic or low-quality domains and add nofollow where appropriate. Fourth, improve anchor text diversity on pages where the same generic phrase is overused.
For ongoing link health, run an extraction audit at least quarterly on your most important pages. A page that was link-healthy six months ago may now contain broken external links if any destination sites changed their URL structure.
The FixTools Extract Links tool at /text-tools/extract-links parses the full HTML of any page and returns a structured list of every link, including the anchor text and the resolved href. Paste in raw HTML or a URL, filter by internal or external, and export the results for use in a spreadsheet audit. Pair it with the Meta Tags Analyzer to check both linking structure and on-page metadata in the same session.
Try it instantly
Use these free FixTools right in your browser. No sign-up, no uploads—your data stays private.
Frequently asked questions
What is the fastest way to extract all links from a webpage?
Pasting the page's HTML source code into a link extractor tool is the fastest method for a one-off audit. Open the page in a browser, press Ctrl+U (or Cmd+U on Mac) to view the source, copy the full HTML, and paste it into the tool. The extractor will parse every href attribute and return a clean list in seconds.
How do I find broken links on my website?
Extract all links from each page, then check each URL for a response code. Links returning 404 (Not Found) or 410 (Gone) are broken. Links returning 301 or 302 are redirects, which add a small latency penalty and dilute link equity if they form chains. A link audit tool or a script using fetch requests can check response codes in bulk.
What is the difference between internal and external links for SEO?
Internal links connect pages within your own domain and help search engines discover and crawl your site. They also distribute PageRank between your own pages, so strategic internal linking can lift the ranking of important pages. External links point to other domains. Outbound external links can add credibility to your content but should be followed carefully to avoid linking to low-quality sites.
What does nofollow mean on a link, and why does it matter?
A nofollow attribute (rel='nofollow') tells search engines not to pass link equity through that link. Google introduced it to combat spam. Paid links, user-generated content links, and links you cannot vouch for should carry nofollow. During a link audit, identifying which of your outbound links are followed versus nofollowed helps you manage your site's link equity intentionally.
How many internal links should a page have?
There is no hard limit, but Google's guidance has historically been to keep crawlable links on a page to a reasonable number—often cited as under a few hundred. More important than quantity is relevance: link to pages that are genuinely related to the content the user is reading. Pages buried deep in your site with few internal links pointing to them are harder for crawlers to find and rank.
O. Kimani
Software Developer & Founder, FixTools
Building FixTools — a single destination for free, browser-based productivity tools. Every tool runs client-side: your files never leave your device.
About the author →