Making The Right Web Connection

by Anthony Frey

So, you've got your old reliable enterprise database and the decision has been made to develop Web access for its information. This could be the reason that you decided you needed Internet access in the first place, or even more likely, you finally want to provide Web-based access to your corporate information. So, where should you begin?

We've seen an incredible array of Web-to-DBMS middleware packages introduced lately. In an attempt to bring order to these products, we've broken them down into Perl scripting, embedded queries and dynamic HTML generation, and tested a few examples of each. First we established a small networking information database on two different data sources, Oracle Workgroup Server 7.2 and Microsoft SQL Server 6.5. We chose Intergraph's TD-300 as our Web server (Editor's Choice in our May 1 issue, page 46). Our choice of data access middleware in this case was ODBC. After setting each package up in our Wisconsin lab, we developed a Web site for each product to see how the various product types stacked up.

Issues Related to Web-DB Connectivity Before we get into each of these database access methods, let's talk about some of the issues you should be concerned about when setting out to connect your database to the Web. Obviously you are going to need a data access method, or the middleware between your Common Gateway Interface (CGI) program and your database. Technically you can use any type of middleware that you might use for a standard client/server application. However, you'll find that the CGI environment you select generally narrows your choices. If you're developing on the Windows NT platform, ODBC is the de facto data access standard.

Performance between your Web server and database server is crucial. Because of the data access requirements, the CGIs that many of these products are implemented with can no longer be thought of as lightweight "scripts." Every time an HTML page containing a database query is accessed, it must establish and close a session with the database server. This is in addition to the overhead of the query itself. A couple of products have worked around this by using a session identifier (possibly the same one you use to maintain state) to cache database sessions for queries that are developed across HTML pages.

Also essential for most database applications is a method of maintaining state information during a remote user's session. If you are defining a transaction across a series of Web pages, how do you tie the user you authenticated in the first page with a table you are about to update on the following pages? This, of course, is an inherent problem of the HTTP protocol. The most common answer of course is a session identifier or "magic cookie." A good Web development package should provide some assistance with this.

How flexible is the query engine? The Microsoft Internet Database Connector (MS-IDC) only allows a single query per MS-IDC file. NetScheme InterMart Toolkit doesn't allow inserts or updates. The more complex your data model is, the more flexibility you will need. Finally, you're going to want to consider what these methods offer in the way of HTML page maintenance. This is not an issue for DBMS access alone, but it is a part of the environment you will be working with when you choose a package.

Perl Scripts and Oracle WebServer In the early stages of Web development, Perl scripts were the most common method of accessing your data. At the time, data sources being accessed weren't always RDBMSes. Often they were tabular structured data stored on a server's file system. Perl is well suited to these environments due to its text-processing capabilities. But Perl scripts are also great because they provide a fairly sophisticated environment and a rich set of libraries from which you can build data access routines and maintain state information.

We tried this approach with two freely available packages, Oraperl and ODBC Extensions for Win32 Perl (both are readily found on the Internet by searching for oraperl and ntodbc at your favorite search site). Oraperl is an exception to our ODBC middleware choice, since it requires a local instance of Oracle to run (some versions have been extended for SQL*Net). These are both general-purpose packages for data access, but again the Perl environment allows relatively easy HTML generation.

The likable characteristic of this method is that you have complete procedural control over your data access and HTML generation. A large drawback, however, is having to write a large number of specialized scripts--usually one or more for each page. You will have to come up with your own libraries for generating HTML and maintaining state information. Make no mistake about it, this is programming. It's not simple and it's not quick, but you can certainly get the job done.

We've included Oracle with Perl scripts because the flavor of development is much the same. Oracle has integrated an HTTP server into its Workgroup Server and added an HTML generation library to its standard procedural language PL/SQL. This places you into the Oracle environment where you can make full use of the stored packages, procedures and triggers you've already developed for existing Oracle-based application. This method is a good choice if you already have a large investment in Oracle development.

Embedded Queries This approach seems to be the method of choice among most of the new tools we've seen. This basic technique allows you to dynamically generate HTML pages by embedding queries and additional logic within an extended HTML syntax. These pages are not referenced directly as a hyperlink. Instead, you call the provided CGI, which preprocesses the page, performs the database queries and carries out the logic before returning the HTML text. It's important to remember that the HTML text is dynamic and not the actual queries.

Microsoft's Internet Database Connector, available with the Microsoft Internet Information Server, is a good example of embedded queries. Other products include WebBase from ExperTelligence, WebDBC from Nomad Development and Cold Fusion from Allaire.

Generally, embedded queries are the most logical Web-to-database development method because there's a one-to-one relationship between what you're writing and the final output. This is unlike Perl scripts, which require you to programmatically generate the HTML output. The entire embedded query process feels like a natural extension of HTML writing.

For example, let's look at how we implemented our Web site with the Cold Fusion package. Our simple data model included three tables: IP addresses, machines and contacts. Using the Cold Fusion administrator, we assigned a user name and password to an ODBC data source. Administering access in this way is better than specifying the information in each query file, the method employed by MS-IDC and WebBase, because you have a single point of maintenance for authentication information.

Web pages with database queries are typically designed using two pages--a normal HTML page (see IPADDR.HTM) that sets up the query and a template page (see IPADDR.DBM) that contains the extended HTML markup. Cold Fusion uses the DBM extension to label extended HTML database markup files. The IPADDR.HTM file asks for two IP addresses to specify a range of addresses to display. These are assigned the identifiers lowip and highip, respectively. The Cold Fusion CGI and template file is specified in the ACTION field of the form tag, ACTION="/cgi-bin/ dbml.exe?template=ipaddr. dbm". When the submit button on the form is activated, these parameters are passed as variables to the CGI, which are subsequently used as parameters for the query.

Notice the <DBQUERY Name=<P> "qryIPaddr" tag in the IPADDR.DBM template page. This identifier represents the row set that will be returned by the query. The DBTABLE tag uses this row set identifier to loop through the rows and produce the HTML table. You can also specify multiply queries with Cold Fusion, something the MS-IDC doesn't allow. Cold Fusion's extensions are also fairly robust, allowing procedural logic within the HTML page. This allows you to conditionally modify the HTML output based on input parameters or query results.

With HTML you have to rethink some of your usual techniques of writing typical client/server application features. In a form-building 4GL you might create a list box that contains the detail items of a master-detail relationship and the detail items need to be selected for further editing. Using HTML, however, you need to present the detail output in a HTML table, wrapping each item in a hyperlink to a CGI that processes the row selection.

What about data validation? Although this has always been cumbersome with HTML forms, Cold Fusion is no exception; it is slightly easier. If our IP addresses were stored as four separate fields we could check to make sure each octet was in the range 0..255 by appending "_range" to the input widget name, for example,<INPUT TYPE="text" NAME="ipa_range" MIN=0 MAX=255 SIZE="20">. The range validation is then carried out on the server side by the Cold Fusion CGI.

Both Cold Fusion and WebDBC have ISAPI DLLs available to narrow the performance gap between the Web server and the gateway. Despite being an HTTP server itself, WebBase was noticeably slower in its delivery of HTML documents and query template processing. We expected the opposite to be true and suspect it's a result of a less-optimized HTTP server. Therefore, we found the DLLs to have comparable, if not better, performance.

Dynamic HTML Table Generation This last method takes dynamic HTML generation a step further. In this case, HTML coding and static queries are entirely eliminated. Instead, you use a modeling application to pick the tables, views and stored procedures that you want visible to the Web browser. The CGI uses this model file to generate the entire HTML tree on the fly. The result is dynamically generated HTML pages of database tables, replete with headers, footers, navigation links and even predefined query pages. Typically there is an additional server component that communicates with the CGI to perform the actual database query. Both NetScheme's Intermart Toolkit and DataCraft's DataWave extension to the 4th Dimension RDBMS currently use this technique.

The Dynamic Table Generation method requires no HTML coding whatsoever. With these tools you can publish your existing database schemas almost instantly. If your data model includes a foreign key relationship, your drill-down detail is automatically linked and generated. Maintaining state becomes a non-issue as session IDs are automatically embedded in the generated HTML pages. This also allows them to easily cache user connections.

Obviously, the trade-off here is flexibility. NetScheme has removed the tedious work of developing repetitive Web pages, but in the version we tested it's strictly publish only. We couldn't define pages with any inserts, updates, deletes or with any custom features. DataWave is more complex as it sits on top of the full-featured 4th Dimension client/server development environment. It will export any 4D application in its entirety, and it will allow you to define custom headers and footers for the resulting Web pages. If you can live with the feeling that this isn't quite what you wanted, but it satisfies your users, then this is the way to go.

In terms of performance, this method has a stroke for and against it. On one hand, the database connection can be maintained for the life of the session ID, eliminating the overhead associated with logging in a user. On the other hand, HTML must be regenerated for each page and session.

Making a Choice Between configuring ODBC data sources, installing the CGIs, coordinating access rights and debugging cryptic HTML error messages, configuration of Web-to-DBMS connectivity can be extremely difficult. And to top it off, generally you can't use HTML editors because you'll make heavy use of extended HTML.

As always, the first step is to define your requirements. One of these methods is going to be more appropriate for your application than others. We use three general categories to narrow which tools to use for any given project: highly interactive database access, heavy transactional database access and database publishing (which roughly correspond to using Perl scripts, embedded queries or dynamic table generation). It's also important to remember that you aren't completely limited to using only one of these methods. If you're going to use a dynamic HTML generator, for example, but need a complex task accomplished, then by all means throw in a Perl script.

Anthony Frey can be reached at afrey@nwc.com
.


Normal HTML Page: IPADDR.HTM

<HTML>
<HEAD>
<TITLE> IP Address Query </TITLE>
</HEAD>
<BODY>
<H4>Search For IP Addresses</H4>
<FORM METHOD+POST ACTION="/cgi-bin/dbml.exe?template=lpadder.dbm">
<PRE>
Low IP Address: <INPUT TYPE="text" NAME="lowip" SIZE="20">
High IP Address: <INPUT TYPE="text" NAME="highip" SIZE="20">
<INPUT TYPE="submit" VALUE="Search">
</PRE>
</FORM>
<HR>
</BODY>
</HTML>


Template: IPADDR.DBM

<!-- Define query to lookup IP Address range-->
<DBQUERY Name="qryIPaddr" DataSource="NetDB" MAXROWS=10
SQL="SELECT ipaddr, hwaddr, contact_id FROM tblIPaddresses
WHERE ipaddr > '#lowip#' AND ipaddr <'#highip#'
ORDER BY ipaddr>
<!-- Format HTML results --> <HTML>
<HEAD>
<TITLE>IP Address Search Results</TITLE> </HEAD>
<BODY>
<H1>IP Address Search Results</H1> <HR>
<!-- Table display of selected IP addresses -->
<DBTABLE Query="qryIPaddr" COLHEADERS>
<DBCOL Header="IP Address" Width=18 Text="#ipaddr#">
<DBCOL Header="HW Address" Width=20 Text="#hwaddr#">
<DBCOL Header="Contact ID" Width=15 Text="#contact_id#">
</DBTABLE>
</BODY>
</HTML>