HomeTutorialsTNG: Anatomy of a GEDCOM
GEDCOM If you have ever opened a GEDCOM file using your word processor, you have probably been faced with a seeming jumble of numbers, abbreviations, and bits and pieces of data. There are no blank lines and no indentations in a GEDCOM file. That’s because it is a specification for exchanging information from one computer to another, and was never really intended to be read as a text file.

GEDCOMS basically take your family information and put it in an outline format.  Also, there are no photographs, audio or video files in a GEDCOM.  The GEDCOM file is as basic as you can get and is the lowest common denominator for any family tree data interchange between programs.  It is minimalist, but it works.

The rest of this article might be boring or confusing, or both.  In either case, feel free to skip past and just go to the next article.

A GEDCOM file is a plain-text file, without any fonts, formatting, tables or illustrations.  It is as basic as you are going to get.  Which also means that it should be readable by just about any software tool: text editors, word processors, and importantly, family tree applications.  If you open one up, you will see a column of digits on the left, followed by a space, then a three- or four-character code (the descriptive tag), another space and then information (data).  Each row is called a ‘data element’ or a bit of information on an ancestor or family.  Sequential rows form a record on an individual or family, comprising name, birth date, birth location and so on.  The starting row for each record has an index reference for that person (I1, I2, I3 and so on) or for that family (F1, F2, F3 and so on).  The index data element uses the ampersand symbol ‘@’ at each end as ‘brackets’ for identification.

Records in a GEDCOM file are arranged in groups of lines that hold information about one individual (INDI) or one family (FAM) and each line in an individual record has a level number. The first line of every record is numbered zero (0) to show that it is the beginning of a new record. Within that record, different level numbers are subdivisions of the next level above it.  For example, the birth of an individual may be given level number one (1) and further information about the birth (date, place, etc.) would be given level two (2).  

After the level number, you will see a descriptive tag, which refers to the type of data contained in that line. Most tags are obvious: BIRT for birth and PLAC for place, but some are a little more obscure, such as BARM for Bar Mitzvah.

The current version of the GEDCOM standard is Release 5.5 and the PDF document on GEDCOM 5.5 is available on line.  There you can find all of the descriptive tags and their meaning.

GEDCOM 5.5 formatHere is a sampling of a GEDCOM file content, coming from my own Benedict family line.

Tags can also serve as pointers, such as :(@I5@), which indicate a related individual, family or source within the same GEDCOM file. For example, a family record (FAM) will contain pointers to the individual records (INDI) for the husband, wife and children.

As you can see, a GEDCOM is basically a connected database of records with pointers which keep all of the relationships straight.

While you should now be able to decipher a GEDCOM with a text editor, you will still find it much easier to read with the appropriate software.  There are freeware (no licence fee) software utilities that can show you the contents of a GEDCOM file and even validate it for meeting the standard.    

Are there other data format standards for family tree information?  Sure, but none as omnipresent as GEDCOM 5.5.  For example, FamilySearch, a service of the LDS Church, attempted to introduce GEDCOM X as the next generation of GEDCOM in 2012.  It was supposed to cover the aging gaps of GEDCOM 5.5, but never gathered enough support to go mainstream.  Another is GRAMPS XML, using an industrial data interchange standard known as XML. XML is thorough and detailed in design but tends to be bloated compared to the 5.5 format, by a factor of 5 to 10.

        GEDCOM JSON format

Another upcoming potential standard is GEDCOM X JSON.  JSON (pronounced “Jason”) is very popular for web developers for moving data between internet remote servers and your computer web browser.  JSON is compact and easily readable by all browsers.

Do you need to know all of this to work with GEDCOM?

No, but you might have been curious about it.

Comments are closed.