Since 1999

 

9 minutes estimated reading time.

A Journey From QBasic Random Access Files to PostgreSQL

As a practicing developer, I have a good bit of experience working with software that needed to store data. In this article I am going to recount a brief history of the world of databases, the long version would take a book, as I personally experienced working on code for the last 28 years. This should be informative to readers who want to understand more about just how powerful modern database systems are in comparison.

My first exposure was building my own database file using the random access file option in QBasic, which was one of the three file IO modes supported by the programming language. With this mode you defined a structure for the record and then could read and write records like an array indexed by number. The file itself was a raw collection of bytes.You could fetch record 12, for example, by multiplying 12 * the fixed width of a record. You had to design your structure carefully, and maybe add some “future use” fields otherwise you needed to rewrite the entire file any time there was a change in the data definition. I used this to build an accounting and receipt printing program for my lawn care business called Lawn Jockeys! I had started cutting grass at age 11 and by 14 when I started to learn how to program this gave me a real purpose behind my programming. I used this software to track all my customers, income, and to provide the data necessary for my tax returns at the time.

To understand how this sort of format worked, you can view the GitHub copy of a Visual Basic program I wrote circa 1996 that used the same technique. In CONVERT.BAS a record type is defined for the old format and new format:

Type PDtype
     CLast_Name As String * 15
    CFirst_Name As String * 15
       cInitial As String * 1
       cAddress As String * 40
          CCity As String * 15
         CState As String * 2
      CZip_Code As String * 10
   cBus_Address As String * 40
      CBus_City As String * 15
     CBus_State As String * 2
  CBus_Zip_Code As String * 10
    cHome_Phone As String * 13
     cBus_Phone As String * 20
         cPager As String * 13
           cFax As String * 13
        CE_Mail As String * 100
End Type

Global Const PD20Size = 324

The specific size in bytes was very important because when opening the file for reading or writing, records on the physical disk were a continuous stream of bytes. For a given index i, reading a record fetched the specific number of bytes (324 above) starting at byte location start=i * 324 and ending at byte location stop=(i + 1) * 324. The 324 bytes of that record were all that needed to be loaded into RAM. Writing to the file worked similarly and would jump to the start position and overwrite the 324 bytes for the record on disk. When this QuickBasic was first introduced, this was exceedingly memory efficient—a critical advantage during the 1980s and early 1990s when RAM was premium space. When one had 640k of RAM for the OS and the program and maybe a 30-100 MB hard disk for data, having the data on disk and not in RAM was vital. The random access method allowed you to read and write records directly on the disk without the need to ever load the entire file into memory, which was essential given the limited system resources.

The size had to be given to Basic to open the file like:

Open ConFile For Random As FreeNum Len = PD20Size

This all worked quite well, but by 1996 this technique was starting to show its age. However, the version of Visual Basic this code example was built on already had an alternative in the form of the Microsoft singe file database component.

The next step along the journey was the Microsoft Jet Database engine, which was included in Visual Basic 3. It had a built in database control where form elements could be bound to fields in a Microsoft Access database. It shipped with a sample database called biblio.mdb. While this technology was good for building business software that ran on a PC in the 1990s, scaling was limited because only one program/user could access the database file at a time. In modern open source terms, this is the same access paradigm as using a SQLlite database for a single user application.

However, the late 1990s was when the Internet really started to boom and the need to build web applications started to supplant the traditional approach to building these applications. The king of the open source solutions was MySQL 3 and PHP! MySQL was originally released in 1995 and the amazing part about MySQL is it was a real multi connection database, unlike Microsoft Access, and most importantly it was extremely affordable compared to Microsoft SQL Server, which had to be licensed and then run on a Windows server.

MySQL was my go to throughout the early 2000s. I had previously done some work with PostgreSQL, which was first released in 1996, but it only really caught my attention when I started to work with Ruby on Rails which used database migrations to apply changes to the database schema. With MySQL, when a Rails migration had an error it could leave the database in an inconsistent state. But with PostgreSQL the database would rollback as if nothing ever happened. This was because PostgreSQL supported transactions around not only data statements but also data definition changes! This was a magical experience that lead me to favor PostgreSQL.

As an aside, a Rails migration is a database change request in the Ruby programming language that lives with with the application in its revision control system. So you can create new tables or make changes with a migration like:

# frozen_string_literal: true

class CreateContacts < ActiveRecord::Migration
  def change
    create_table(:contacts) do |t|
      t.string :last_name, limit: 15
      t.string :first_name, limit: 15
      t.string :initial, limit: 1
      t.string :address, limit: 40
      t.string :city, limit: 15
      t.string :state, limit: 2
      t.string :zip_code, limit: 10
      t.string :business_address, limit: 40
      t.string :business_city, limit: 15
      t.string :business_state, limit: 2
      t.string :business_zip_code, limit: 10
      t.string :home_phone, limit: 13
      t.string :business_phone, limit: 20
      t.string :pager_phone, limit: 13
      t.string :fax_phone, limit: 13
      t.string :email, limit: 100

      t.timestamps
    end

    add_index :contacts, :email,                unique: true
    add_index :contacts, :first_name
    add_index :contacts, :last_name
  end
end

Notice that this data structure is the same as the PDType in my random access file program. But the database is going to give us a unique index constraint on the email field and searchable index on first and last name. But unlike random access, making changes is a breeze. To add a new field, one just needs to add a new migration like:

# frozen_string_literal: true

class AddRolesToContacts < ActiveRecord::Migration
  def change
    add_column :contacts, :roles, :string, default: []
  end
end

Or remove a field, such as the pager field since no one has a pager anymore, like:

# frozen_string_literal: true

class RemovePagerFromContacts < ActiveRecord::Migration
  def change
    remove_column :contacts, :pager_phone
  end
end

And the beauty of PostgreSQL in this context is it automatically provides safety such that if something went wrong with that second migration that the database remained in a consistent state! Ruby on Rails is not the only language with migrations. Most of the major development toolchains have these now, including Django and others.

During the next decade of 2010s, I built more and more applications using PostgreSQL. I have also grown to really appreciate the deep features it has for indexes, native JSON records support, and more. PostgreSQL is a free and open source database and markets itself as “The World’s Most Advanced Open Source Relational Database.” This slogan is consistent with my experience working with it.

The 2010s were also the decade that saw the rise in popularity of various NoSQL databases like MongoDB and Google Datastore. These database types are useful. Part of the promise was that one need not design the database but instead can just have the application define the data it need and then store it and retrieve it efficiently. However, depending on the application there was still the business need to query and report on the data. While there certainly are cases where this type of database makes sense, it has been my professional experience that even somewhat messily designed SQL databases are easier to query for reporting purposes. A database certainly need not be in Boyce Codd third normal form to be queried.

One of our clients faced significant challenges with their Google Datastore-based application, necessitating a batch process to convert data to MySQL for compliance reporting—a considerable inconvenience. The eventual deprecation of their technology stack prompted us to migrate their database to PostgreSQL. This transition not only accommodated all their data but also facilitated efficient reporting through the use of SQL views, showcasing PostgreSQL’s versatility.

As of 2024, my decades of programming experience have solidified my preference for PostgreSQL. Its capacity to manage large and JSON data, coupled with its powerful search capabilities—both indexed and full-text—make it an exceptional tool. It is open-source, cost-effective, and equipped to handle an extensive array of requirements. Developers would be wise to delve into their database’s features and leverage them to the fullest. I’ve observed many applications that underutilize their databases, treating them as mere storage units or hastily adding auxiliary search tools. However, most business applications require only a single, well-utilized database.

With modern Docker-based development, it has never been easier to develop and test with the same database that you run in production. By adding the following to my docker-compose.yml file:

# Docker configuration for PostgreSQL
database:
  image: postgres:13.1  # Specify the PostgreSQL version
  volumes:
    - postgres:/var/lib/postgresql/data  # Persist database data
  environment:
    - POSTGRES_PASSWORD=devonlypassword-NOPRODUCTION  # Development-only password
    - POSTGRES_DB=greatappdatabase  # Database name
  ports:
    - "5432:5432"  # Default PostgreSQL port


I ensure that my application under development has a specific version (in this case 13.1) of PostgreSQL available. This approach guarantees that all my programming is tested with the real database, and the automated tests can run using the real database too. This strategy effectively eliminates the challenges related to the dev environment being insufficiently like production.

One final aside, I do have some academic experience with databases as well. My undergraduate student research related to detecting SQL injection attacks (published by the ACM) was my first publication to the Association of Computing Machinery (ACM). Then after graduating from Georgia Tech, I contributed to the security chapter of the 7th edition of the Fundamentals of Database Systems textbook authored by my professor Sham Navathe and published by Addison-Wesley.

In closing, I urge developers to persistently enhance their SQL skills and ensure their database schema is managed by a reliable migration system integrated within the application’s source repository. This approach not only maintains consistency but also fortifies the application’s integrity over time.