Code of the day
Secure Web site access with Perl
Write a Perl script to automate Web-based logins
Perl and its LWP module make it a breeze to automate Web site access; too bad the breeze becomes a storm when the Web site requires a username and password for access. Fortunately, you can use Perl modules to calm the storm.
Hard stuff first
If you plan to communicate with a secure Web site, your session URL will start with HTTPS rather than HTTP. Unfortunately, the LWP (Library for WWW in Perl) module doesn’t support HTTPS. To establish communication over a secure HTTP session you’ll need to install a module called Crypt::SSLeay. This module is easily found at CPAN (see Resources), but since I develop on Windows that doesn’t really help me.
Nearly all Perl programmers on Windows use Perl from ActiveState. The package has been compiled and installs similarly to other Window applications. The best part of Perl from ActiveState is the Perl Package Manager (PPM). Simply type ppm at the C:\Perl\Bin prompt and ppm starts. From there, you can search for any Perl module already compiled for Windows and install it in a snap. Unfortunately, most of the modules found in the default ActiveState repositories are very old or simply not available, which is the case with Crypt::SSLeay. Try a search for Crypt::SSLeay from the ppm prompt and you get a nice little error message: No matches for 'Crypt::SSLeay'; see 'help search'.
But don’t despair — Crypt::SSLeay already compiled for Windows does exist. You just need to look in a different module repository.
Finding and installing Crypt::SSLeay
I have no idea why Crypt::SSLeay isn’t available from ActiveState. I do know that you can find it in a Canadian repository and install the module from the ppm prompt. Instead of typing install Crypt::SSLeay you need to type:
install http://theoryx5.uwinnipeg.ca/ppms/Crypt-SSLeay.ppd Type the command correctly and the installation takes off without a hitch. Make a typographical mistake, however, and you get another error message: Error: Failed to download <your typographical error here> Crypt::SSLeay installs everything you need automatically with the exception of two DLLs. During installation you’ll be prompted to add libeay32.dll and ssleay32.dll. Answer yes when prompted; you need both of these files. With that you have the hardest part out of the way (finding and installing Crypt::SSLeay for Windows, that is) and you’re ready to start writing code.
Make your life easier
To send a username and password to a secure site is the next hurdle. While you can achieve this goal using just LWP, it seems more intuitive to write a script that interacts with a page similar to the way you might with a regular browser, or at least as close as possible. I got my next break after I wrote some scripts and posted snippets of them on the listserv libwww@perl.org, looking for help. Someone wrote back to me and said “Hey, it would be a lot easier if you just used WWW::Mechanize.” So off I went to CPAN (again) to investigate their advice. One quick read of the documentation and the mystery of logging onto a secure Web site was solved. The WWW::Mechanize module allows you to interact with a Web site much like you would with a Web browser. It allows you to follow links and fill out forms. The module was exactly what I needed, and you need it too. Here’s how to get it.
- Put aside your code and open a command window (you know, the one that takes you back to the good old days of DOS).
- Change to your C:\Perl\bin directory and type
ppm. The Perl Package Manager starts and leaves you at the ppm prompt ppm>.
- At the ppm prompt type
search WWW::Mechanize. The search returns a couple of matches. You want the one that simply says WWW::Mechanize (in my search that is the first match in the list).
- To install the module, type
install 1 (if your search associates WWW::Mechanize to a different number, enter that number instead of 1).
www::Mechanize in Action
Once the installation is complete, head over to CPAN and read the documentation for the WWW::Mechanize module (see Resources). You’ll also find some great code snippets and useful cookbook examples with the online documentation. To get you started, I’ve written a quick WWW::Mechanize example. The script in Listing 1 retrieves the WWW::Mechanize module documentation page and dumps it to a file titled output.html.
listing 1:Using WWW::Mechanize
1. #!c:\\perl\\bin
2. use strict;
3. use WWW::Mechanize;
4. my $url = "http://www.cpan.org";
5. my $searchstring = "WWW::Mechanize";
6. my $outfile = "out.htm";
7. my $mech = WWW::Mechanize->new();
8. $mech->get($url);
9. $mech->follow_link(text => "CPAN modules, distributions, and authors", n => 1);
10. $mech->form_name('f');
11. $mech->field(query => "$searchstring");
12. $mech->click();
13. my $output_page = $mech->content();
14. open(OUTFILE, ">$outfile");
15. print OUTFILE "$output_page";
16. close(OUTFILE);
The script is straightforward and probably self-explanatory, but here is a quick run-down of each line:
- Lines 2 are the all-important
USE statements. USE strict forces you to declare all variables and reduces the risk of Perl mistaking your intentions when using sub-procedures (which do not exist in the above example). USE WWW::Mechanize allows you to use the module previously installed.
- Line 4 assigns the URL used later in the script to
$url. Want to go to a different Web site? Start by changing $url.
- Line 5 is what gets searched for at the declared URL.
- Line 6 assigns a filename to the final output file.
- Lines 7 and 8 create a new instance of WWW::Mechanize and then call the
GET method for that instance using the URL previously assigned.
- Line 9 assumes the page was received and follows a known link on that page (obviously you can put more error checking here but, for now, I just want to demonstrate how to retrieve a page). The link page is retrieved. Since I previously followed these steps with a standard browser I know that my next page provides a search field in a form named “f”.
- Line 10 references the form named “f” on the page.
- Line 11 assigns the form field
query the search string I want to search for.
- Line 12 is the virtual button click, as if you were interacting with the page yourself.
- Lines 13, 14, 15, and 16 assign the content of the returned page to
$output_page, open a simple output file, write the contents to the file, and close the file.
That’s it for the basic usage of WWW::Mechanize; now let’s move on to using it with a secure Web site.
Find a secure site and log in
In Listing 2 you see a script example where I’ve tried to log into a Web-based e-mail account at Yahoo!® mail. Test this script out for yourself and see how it runs. (Obviously, you’ll need a Web-based e-mail account for this test.)
Listing 2:Logging into a secure site
1. #!c:\\perl\\bin
2. use strict;
3. use WWW::Mechanize;
4. use HTTP::Cookies;
5. my $outfile = "out.htm";
6. my $url = "https://mail.yahoo.com/";
7. my $username = "your_email_username_here";
8. my $password = "your_account_password_here";
9. my $mech = WWW::Mechanize->new();
10. $mech->cookie_jar(HTTP::Cookies->new());
11. $mech->get($url);
12. $mech->form_name('login_form');
13. $mech->field(login => $username);
14. $mech->field(passwd => $password);
15. $mech->click();
16. my $output_page = $mech->content();
17. open(OUTFILE, ">$outfile");
18. print OUTFILE "$output_page";
19. close(OUTFILE);
Notice that most of the script is the same as the first one shown in Listing 1; the differences are as follows:
- Line 4 tells the script to use cookies. Secure sites use cookies for authentication purposes. Exactly how the cookie process works is beyond the scope of this article. For now just know that you need cookie support to log into a secure Web site.
- Line 6 is the URL to the secure Web site.
- Lines 7 and 8 are the username and password for the Yahoo! mail account. Obviously, I didn’t include my real username and password. You can easily substitute your account information in these lines so the script works for you.
- Line 10 creates a new cookie instance for the previously created WWW::Mechanize instance.
- Line 12 sets the form to the name specified on the page that the previously [created] URL lands on.
- Lines 13 and 14 set the
login and passwd properties to the username and password values previously defined. The rest of the script is the same as the one in Listing 1.
Keep in mind that I found the form name and fields login and passwd by browsing to yahoo.mail.com and examining the source of the HTML page that URL lands on.
A quick and easy solution
First, the page returned to the script made no reference to a failed login. The page simply said it could not redirect the browser and to click here to continue. So, I simply add one more line of code to the script after Line 15, and the click here option takes my script to its final destination, as shown in Listing 3.
1. #!c:\\perl\\bin
2. use strict;
3. use WWW::Mechanize;
4. use HTTP::Cookies;
5. my $outfile = "out.htm";
6. my $url = "https://mail.yahoo.com/";
7. my $username = "your_email_username_here";
8. my $password = "your_account_password_here";
9. my $mech = WWW::Mechanize->new();
10. $mech->cookie_jar(HTTP::Cookies->new());
11. $mech->get($url);
12. $mech->form_name('login_form');
13. $mech->field(login => $username);
14. $mech->field(passwd => $password);
15. $mech->click();
16. $mech->follow_link(text => "click here", n => 1);
17. my $output_page = $mech->content();
18. open(OUTFILE, ">$outfile");
19. print OUTFILE "$output_page";
20. close(OUTFILE);
As long as the script continues to retrieve a redirect page, and Yahoo! mail continues to supply the same redirect failure message, my quick and easy solution does the trick. Obviously, not all secure Web sites respond like Yahoo! does. Be prepared to do a little detective work of your own to get your login script working.
Secure Log-in CheckList
As shown you a Perl script that solves the mystery of logging into a secure Web site. To summarize, here is a checklist of must-haves for building successful, secure Web site login scripts with Perl:
- Start with Crypt::SSLeay: Logging into a secure site is usually done over HTTPS. You need this module to make it possible. You can find it already compiled for Windows from the TheoryX server in Canada (see Resources).
- Add WWW::Mechanize: Make your life easier and use this module, which allows you to write code that mimics Web site interaction by easily following links and filling out forms (a critical part of logging into a secure site).
- Use cookies: Secure Web transactions use cookies. You need to turn them on with a
use statement to get them to work automatically in your script.
- Enable debugging: When things aren’t working as expected, enable debugging with
use LWP::Debug qw(+); This statement sends a flood of information to your screen; however, if you are patient, the output is very helpful.
- Make an output file: Dump the final output, or the output after each page retrieval point in your script, to a simple HTML file and examine it. The contents of the file provides a clear picture of what the script got in return for its “get” requests.
Apply this checklist to your scripts and you’ll soon be automating access to secure Web sites with Perl.
Resources
Learn
Get products and technologies
- ActiveState: Download Perl for Windows for free.
- CPAN: Find (almost) all the Perl modules you could ever want, including WWW::Mechanize.
- TheoryX: Try this repository for hard-to-find Perl modules such as Crypt::SSLeay.
- Software Evaluation Kit (SEK): Download free trial versions of IBM middleware products that run on Linux or Windows.
Discuss
Like this:
Be the first to like this page.
Recent Comments