Perl for Java Programmer: Generating a country list
Introduction
Almost every business application requires a country list as a dictionary data. Even simple registration form might contain country input field. And if you’re going to store billing or shipping information beyond one country you definitely have to have this important dictionary in your system. Firstly, you should decide which countries will be in your list. Depends on your goals it might be short list of the largest countries in the world or full list with all Islands, Territories and even Antarctica. Let’s review common sources for filling your country list dictionary.
Existing country lists
If you have billing or shipping integration with 3rd parties like PayPal, you can get this list from the register page html source code. Open a page with the registration information on the any trusted web portal, then find in the source code country list data (e.g. “View -> Page Source” in FF):
<option value="DZ">Algeria</option>
<option value="AD">Andorra</option>
<option value="AO">Angola</option>
<option value="AI">Anguilla</option>
………
Now you can copy&paste this information and then parse/use it in any convenient way. As you can see, the value in this list is a two character identifier which is called ‘ISO 3166-1 2 Letter Code‘ . It is very useful as a unique identifier and can work as a primary key in the countries DB table.
However there are lot’s of sites which already have well-proven lists, I’d suggest take this list directly from the original source – United Nations published
official list, which is republished in the many places like List of countries and territories or ISO country codes.
Generating SQL for country table
Let’s take one of that list and paste everything to the Excel sheet, then save it in the CSV format. The result should be similar to ISOCodes.csv and contains data separated by commas:
AFG,AF,Afghanistan
AGO,AO,Angola
AIA,AI,Anguilla
ALA,AX,Aland Islands
………
Once we have data stored in Perl readable format, we can parse it and generate SQL code:
# PERL MODULE
use Text::CSV::Simple;
# This script generetes sql code for country table
print <<SQL_END;
DROP TABLE IF EXISTS country;
CREATE TABLE country (
short_code varchar(2) NOT NULL COMMENT ‘ISO 3166-1 2 Letter Code’,
name varchar(100) NOT NULL,
PRIMARY KEY (short_code),
UNIQUE KEY UNQ_COUNTRY_NAME (name)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SQL_END
my $tplSQL = "INSERT INTO country (short_code, name) VALUES (\"%s\",\"%s\");\n";
my $tplXML = "<country short_code=\"%s\" name=\"%s\" />\n";
my $csvFile="H:/workspace/update/data/ISOCodes.csv";
my $parser = Text::CSV::Simple->new();
my @data = $parser->read_file($csvFile);
print "– SQL DUMP\n";
foreach(@data) {
printf $tplSQL, @$_[1], @$_[2];
};
print "– Data for DBUnit tests\n";
foreach(@data) {
printf $tplXML, @$_[1], @$_[2];
};
I’d prefer not to populate data directly to DB, but generate a script which can be edited and invoked many times. This script doesn’t work with command line parameters and use hard coded path because I’m not a Perl programmer who develops a multi-purpose tool:) I’m just a Java developer who needed a valid country list. After executing this script you’ll get SQL code and DBUnit xml snippets.
For those who are curious
Unfortunately, there are no one common countries list, however there are some official lists supported by different organizations like UN, bank communities, post offices, phone companies. When you’re choosing a proper list, take in account countries which are disappeared like USSR or was divided like Serbia and Montenegro. Choose a primary key from large variety of existent (ISO 2,3 characters, number, phone code,…) which satisfies your needs. If you have a unique key, you can get another useful information about chosen country from other sources afterwards. To get answers on those questions, I’d recommend to skim through How many countries are there in the world? and then google on interested keywords.
This is a part of Perl for Java Programmer series. Previous post was Perl for Java programmer: Installation. Stay tuned!
