Dynamic list of external links

Purpose

Articles with inline links are convenient when looking for references while reading a document. They can unfortunately also be quite bothersome at times. If the article is long, finding a link you previously skipped can be a pain if you're in a hurry (or, like me, impatient). Further, if you print the document, the printout remains the same, no matter how hard you tap the paper with your finger.

Providing a list of external links at the end of each article can therefore make your visitors much happier. Making a manual list requires time, however, and makes maintaining the article highly annoying. An automated solution is therefore much preferred.

The following script will find all the active hyperlinks in a string, extract a list with the links and their descriptions, and create a list with the links at the end of the string. The string can come from a database or a file - though be sure that the string does not include your menu or other links which are not part of the article.

The Code

  1. $article = "...";
  2. if(preg_match_all("/<a href=\"(http:\/\/|https:\/\/|ftp:\/\/)([^\"]*)\"[^>]*?>([^<]*)<\/a>/i", strip_tags($article, '<a>'), $external_links, PREG_SET_ORDER)) {
  3. $article .= "\n\n<h2>External Links</h2>\n";
  4. $article .= "<ul>\n";
  5. foreach($external_links as $value) {
  6. $article .= sprintf("<li>%s: <a href=\"%s%s\" rel=\"nofollow\">%s%s</a></li>\n", ucfirst($value[3]), $value[1], $value[2], $value[1], $value[2]);
  7. }
  8. $article .= "</ul>\n";
  9. }

How it works

Line 1 is fairly simple: $article = "..."; just loads the article into the variable $article.

Line 3 looks a lot more daunting, but it's fairly simple. We'll break it up into smaller parts, starting from the inside of the if statement.

preg_match_all is a build-in function of PHP that will find all the matches for a Regular Expression-statement. It has four parts:

"/<a href=\"(http:\/\/|https:\/\/|ftp:\/\/)([^\"]*)\"[^>]*?>([^<]*)<\/a>/i"
This is the Regular Expression-statement. I won't go into the details of how it works. In short what it does is to search for the beginning of an anchor tag (<a). It will then match and save the value of the href attribute of that link (href=\"([^\"]*)\"), check whether it's an external link ((http:\/\/|https:\/\/|ftp:\/\/)), ignore any other attributes of the anchor tag ([^<]*?>) and match and save the link text (([^<]*)).
strip_tags($article, "<a>")
This is where the article string is inputted into the function. Notice that the strip_tags() function is used. This is because the anchor tag might contain other tags. Since we want the format of each link in the list to be the same, we'll remove all other tags than the anchor tag for the function (they won't be permanently removed, only within this function).
$external_links
This is the name of the array we will output the matches to.
PREG_SET_ORDER
This isn't strictly necessary to make this function, but it will order the array into subarrays, each of which will contain a URL and the corresponding link text, making the function much easier to write.

The preg_match_all function is placed inside an if statement. The way this works is that if there are no matches (i.e. no links), preg_match_all returns false, thereby skipping the rest of the function. If there are matches, the code inside the if statement will be executed, using the resulting arrays.

Line 4 and 5 merely sets up the headline and begins the list. It use an h2 heading, which can of course be changed to match your site. You can also add classes and make other changes.

Line 7 to 9 loops through the array using the foreach function. This passes each subarray into the $value variable, which can be accessed as an array. It then adds the link for each subarray as a list item, marking each with the text of the link along with a clickable URL. Notice that the link text is changed so that the first character is converted to upper-case using ucfirst($value[3]). This is done since many of the links would otherwise have a lower-case initial letter.

Line 11 and 12 completes the function, and you're done.