HomeProject ConsultingOpen Source ProjectsResumeContact Me

DotBase - Internationalization Database

DotBase Overview

DotBase is an ambitious database internationalization library that lets you store all or part your database as multilanguage fields. This can reduce the database schema size and the associated application code complexity of a multilanguage project by as much as 90%. Less code means less bugs and greater application flexibility.

Every database field becomes multilingual and multivalued while still running on your current rdbms. Perfect for i18n migration or new projects.

How It Works

Dotbase stores virtual fields associated with each "real" record as a collection of named attributes. The attributes are structured as arbitrary collections of hashtables and arrays nested in any combination and to any depth.

The DotBase class retrieves and stores all or any part of the nested attribute structure, translating them into hashtables and arrays in memory and back out again to the database. Only the exact subset or field of interest is read or written so it is very efficient.

For example, suppose we have some fields that are not language sensitive, and some fields that need storage in multiple languages. They could be arranged in a memory structure like this:

{
  rec
  {
    Name          Koh-I-Noor
    Address1      Westermarkt 29
    City          Amsterdam
    Cards
    [
      Visa
      MasterCard
      AmericanExpress
      DinersClub
    ]
  }
  lang
  {
    en
    {
      ShortDesc   Best Indian food in town
      LongDesc    Koh-I-Noor is a family business that has been...
    }
    nl
    {
      ShortDesc   Best Indir van de stad
      LongDesc    Dit familiebedrijf heeft naam gemaakt de goode...
    }
  }
}

In this example the language insensitive fields are stored in the "rec" hashtable as "rec.Name" and so forth. The "Cards" field is multivalued and is stored in the "rec.Cards" array. The array can be handled as an entire array or individual array elements can be addressed as "rec.Cards.0", "rec.Cards.1"...

The language sensitive fields are stored in the "lang" hashtable, organized in subtables, one hashtable per language such as English "lang.en" and Dutch "lang.nl".

For example, suppose we only want to read the name and the description fields for the current user interface display language:

  // Read the data into memory structures.
  string    name = business.GetString( "rec.Name"           );
  Hashtable desc = business.GetHash(   "lang", display_lang );

  // Use the memory structures.
  NameField.Text = name;
  ShortDesc.Text = desc["ShortDesc"];
  LongDesc.Text  = desc["LongDesc" ];

To update the name and language sensitive descriptions in the database after user input:

  // Write the data out to the database.
  business.PutObj( name, "rec.Name"           );
  business.PutObj( desc, "lang", display_lang );

DotBase will figure out exactly what has changed and only add, update or delete those attributes which now differ from the database copy.

Simplified Database Schema

The main benefit of using DotBase is that it can vastly simplify the database schema table count and make the application to database interface code trivial to implement. Here are several categories of schema simplification.

Multilanguage Simplification

The usual practice is to break out multilanguage fields into their own tables with each record tagged with the language code of the language being stored. With DotBase, multilanguage fields occupy DotBase hashtables and are read and written with simple function calls.

Multivalued Data Simplification

The usual practice is to break out multivalued associated data into separate tables. For example, if a restaurant can accept a set of credit cards, a "restaurant_credit_card" table will hold multiple records, one record per credit card accepted at each eating establishment.

These multivalue associated tables can be absorbed into DotBase array elements and read and written with a single function call. "DotBase" gets its name from the dot notation as in "rec.Cards.3".

  ArrayList CreditCards = restaurant.GetArray( "rec.Cards" );

  foreach ( object cardcode in CreditCards )
  {
    // Display and update the CreditCards array.
  }

  restaurant.PutObj( CreditCards, "rec.Cards" );

Subtype Data Simplification

Often we find that a physical schema object can represent objects of different subtypes. For example a "Business" object can have a type field saying whether each record is a "Restaurant", a "Hotel", or a "Nightclub".

Some data fields such as name and address are common to all of the "Business" objects. Other data fields only relate to one of the subtypes. For example "Hotels" will have daily rates, "Restaurants" will have cuisine types, and "Nightclubs" will have descriptions for the entertainment, event or theme on each weeknight.

The usual practice is to store subtype specific data fields in separate tables such as a "HotelInfo" table, a "NightClubInfo" table and so on.

With DotBase each individual record essentially has its own little private schema. "Hotel" records can store hotel associated data. "Restaurant" records can store restaurant appropriate data.

All of the traditional associated subtype info database tables can be folded into DotBase hashtables and arrays.

Simplified Application Code

When there are far fewer tables, there is a lot less application code and sometimes very little SQL statements to be seen anywhere. Less application code means reduced implementation cost. Less application code also means less bugs and lowered cost of maintenance.

Less application code also delivers a far more flexible application. You can make changes that would otherwise be uneconomic because it there is less code to change.

For example, to add a new DotBase field, just start using it by name: "rec.MothersMaidenName". There is no need to make any database schema changes and none of the other application code needs to be aware that you have just added another DotBase named attribute.

DotBase and Traditional Tables

The main structural objects of a database still need traditional database tables to hold the record ids that DotBase needs to retrieve the attributes associated with each record. In other words you still need to store the structural skeleton of the core database objects as normal database tables.

These core tables can hold data fields that need to be indexed such as "LastName" or "LoginId". Once the record or records of interest are identified DotBase can efficiently read and write the associated named attribute structures.

DotBase Efficiency

All DotBase data is stored in a single table with a single index. Because if the nested nature of the hash/array structure, each DotBase "Get" call retrieves all of its data with a single fully indexed SQL query. This is very efficient.

For databases that support clustered indices this is especially fast because each record's associated data is all clustered together in one or two physical database disk storage blocks.

With this in mind, lay out the DotBase structures and arrays according to which fields will generally be retrieved as a unit without reading too much data that is not of interest.

DotBase Status

The latest version of DotBase forms the core of an ASP.NET multilanguage wedding information site. That site will be complete in January of 2004. When I have time I will package it up and polish the documentation for public consumption in C# and Java. If anyone is interested in DotBase for a current project, contact me.



Copyright 2003 WebbySoft Ltd, All Rights Reserved
Design: kikker.com