Monday, October 5, 2009

Detailed explanation of using XSL to scrape ISO country codes

I apologize if you couldn't understand how I managed to scrape the country codes from the ISO website as mentioned in my previous article. Here are the detailed steps to scrape the codes accordingly. You'll need 3 things to achieve the goal: manual editing (I know it's primitive otherwise you are more than welcomed to write your HTML tag stripper to recognize the elements in the web page), XML file and XSL file.

[1] Copy the table element:
- Copy the entire table element that contains the country names and codes

[2] Editing the XML file:
- Open an empty file
- Save it as "countries.xml"
- Add this line at the top of the XML file:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="countries.xsl"?>
- Paste the table element you copied earlier
- Remove the table row that contains "Country names" and "ISO 3166-1-alpha-2 code"
- The screenshot is as below:[3] Editing the XSL file:
- Copy the XPath codes I demonstrated in this article
- Save it as "countries.xsl"

[4] You are all set!
- Here is the sample output translating the XML using XPath

0 comments:

Post a Comment