Principles of Modeling: the Reproducibility Principle

Principles of Modeling: the Reproducibility Principle

A year or so ago, I watched a few episodes of a Dutch television program that had an interesting format. The name of the series was (or is, I have no idea if it still runs) “Sterren op het doek” (“Stars on Canvas”). Every episode featured a Dutch celebrity, three painters, and an interviewer. For the program, the three painters each paint a picture of the celebrity (who is interviewed while posing – and because this takes a long time, the interviews were typically quite deep, and had enough material for an interesting first 15 minutes of the program). After that, the painters were given two weeks to finish their paintings. And at the end of the program, the painters each revealed their painting to the celebrity, who then got to pick one to keep for him- or herself; the remaining two portraits were auctioned off and the proceeds were given to a charity.

What made this program intriguing was the enormous differences between the finished paintings. Not only did each painter have his or her own unique style, they also all chose to capture and emphasize different aspects of the physical appearance and different personality traits of the celebrity. They all made portraits that captured the celebrity really well, and yet they were as different as one could imagine. Even though the painters all received the same “input” (watching the celebrity pose and listening to the interview), they produced radically different “output” (paintings), while still all following the same task. It was also interesting to discuss the results with my wife. We often disagreed on which painting we liked best, and the celebrity would then sometimes pick the one we both disliked! Again, three people who, with the same input, generated vastly different output. Because when it comes to art, picking the “best” is a matter of taste.

But what’s good in art, and in this television format, is not necessarily good elsewhere. When I fly somewhere, I don’t want the result of the pre-flight security check to depend on the engineer assigned to that task. For a given “input” (the current technical state of the plane), I want the “output” (‘clear to go’ or ‘needs repairs’) to be the same, regardless of the engineer who inspects the plane. And luckily, the airline companies are aware of this, so their safety inspection engineers use checklists that list exactly what details of the engine have to be inspected, and what value ranges are or are not acceptable.

Data models, like paintings, often end up as decoration on office walls. Put they serve a very different purpose. For a painting, hanging on the wall is the end goal, pleasing the eye of all beholders. For a data model, being stuck to the wall is only a further step in a longer process to the end product: a working database application that serves some business need. If a painting fails to please the beholder, it’s simply a difference in taste. If a data model fails to serve the business needs, a mistake has been made – and mistakes like this can cost companies large sums of money, or, in some cases, even cost lives.

I want data models to be like the safety inspection on planes: dependent on the input only, not on the person who carries out the task. That brings me to the third principle of modeling, after the Jargon Principle and the Concreteness Principle: The Reproducibility Principle.

The Reproducibility Principle

“The analyst shall perform each of his or her tasks in such a way that repeating the same task with the same input shall invariably yield the same result.”

Note that the wording of the Reproducibility Principle (which, again, is put in my own words because I don’t recall the original wording, only the meaning) does not state who repeats the task, and that is on purpose. This principle should still hold if someone else repeats the same task; the result of an information analysis should not depend on who does the analysis.

Easier said than done?

While writing this, I almost heard the angry outcry from all over the world: “Yeah, right! I can work in a way that almost always ensures consistent results, but some of my co-workers are blithering idiots who simply don’t get it – how can you seriously expect them to produce the same quality analysis that I make?” That’s a fair point, and maybe a weak spot in the wording of the Reproducibility Principle – but not a weak spot in the principle itself! Hang on, I’m sure it will become more clear after a few paragraphs.

But even with coworkers who are level with your experience and knowledge, you’ll still often have different opinions about a data model. I experienced this once, many years ago, when I attended a class on data modeling. The students had to draw a data model for a given scenario, and then present it in class. To my surprise, several models that were very different were all praised by the teacher. That just didn’t make sense to me. How can two models that don’t represent the same information both be correct? Clearly, the teacher of that class did not know or care about the reproducibility principle! If that is how information modeling is taught, it’s impossible to expect the modelers to suddenly all start producing the same results on a given input.

Does this mean that the Reproducibility Principle is good in theory but impossible to achieve in practice for information and process analysis? No! It just means that the currently accepted methods of doing information and process analysis are not equipped for this principle. They lack recipes. Not recipes for pie, as you’d find in a cookbook, but recipes for the steps in making a model – but these modeling recipes should be just like the cookbook recipes, in that they tell you exactly what ingredients you need, and how and in what order you process and combine these ingredients to get the required result: a delicious apple pie, or a completely correct data or process model.

The only way to implement the Reproducibility Principle is to have strict recipes that govern each and every conclusion that an analyst even infers from information given to him or her by the subject matter expert. With recipes, I know for sure that if I give the same information to my co-worker, the result will be either the same – or it will be different, and then we’ll sit together, go over the steps in the recipe until we see where one of us made a mistake.

And those blithering idiots who just don’t get it? Having recipes will not suddenly change them to superstar modelers. They’ll still be idiots, they’ll still not get it. But with recipes, we no longer have to confront their bad insights with our good insights, and still disagree. Now, we can simply point out exactly where they failed to apply the recipe in the correct way, and even our bosses will agree. The blithering idiot coworkers will still be a royal pain, but fighting them will be easier if your company agrees on implementing the Reproducibility Principle by supplying recipes for everything you have to do in your role as information or process analyst.

Putting my money where my mouth is

Look at me, all blathering in abstract terms about how information modeling does not follow the Reproducibility Principle – and all this time, I myself am not applying the Concreteness Principle that I advertised in an earlier blog post. Time to change that!

First, I’d like to start on the positive side. The job of creating a data model is not all up to the experience and insight of the analyst; some of the steps involved actually have very good recipes for them. Normalizing a data model is a fine example of this. For a given set of attributes (columns) and functional dependencies, the steps are clearly described. Simply apply them one by one, and the end result is always a nice, normalized data model. Unless you made a mistake somewhere (which does happen; I never said that the steps are easy, I just say that they are well prescribed!), your teacher (if still on course) or a coworker can tell you exactly where you made a mistake.

But where do that list of attributes and their functional dependencies come from? Now we get into the realm of the bad and the ugly – none of the mainstream analysis methods have strict recipes for this. If I ask how to find the attributes for my model, the answer is always a variation of “talk to the subject matter expert”. Yeah, sure, I get that. But what questions do I ask? How do I ensure that no important attributes are missed in our conversation? And how do I separate the wheat from the chaff, so that I don’t waste time on attributes that are not relevant at all? The only answer I have ever got to that question is “experience”. And that is not an answer that I, as a fan of the Reproducibility Principle, like.

For functional dependencies, there is a similar problem. Given a list of columns, how do I find the dependencies? Again, there is no answer that satisfies me. Most people will tell me that these dependencies are “obvious”, that you “just see” them. And indeed, in 99.9% of all cases, people will agree on the dependencies. It’s not those 99.9% that I’m concerned about; my concern is for the remaining 0.1% – those few cases where the functional dependencies are not obvious is where errors are introduced. Errors that sometimes are caught quickly, costing only a few man months of work. Or sometimes, they are not caught until it’s too late, bringing down entire companies –or, worse, killing people!– as a result.

Bottom line

After reading the above, you’ll understand that I only consider a modeling method to be good if it includes strict “recipe” procedures. And thinking back about my previous posts on the principles of modeling, you’ll also understand that I think those procedures should tell you not only what questions to ask the domain expert, but also tell you to use concrete examples in the jargon of the domain expert when asking them – and tell you how to do just that.

Unfortunately, most design methods fail to embrace these principles. The only method I ever encountered that does fully embrace all these principles is NIAM – a method that, as far as I know, has only been documented in a Dutch book that has never been translated. (There are some more or less closely related methods that are documented better, but they all either lack full support for the Modeling Principles, or focus on the wrong details). Uptake of this method in the Netherlands is minimal, and virtually non-existent in the rest of the world.

However, that has never stopped me from using it, extending it, and improving it. With all the changes I made the method I am now using can be said to have evolved from NIAM to a personal (though obviously still NIAM-derived) method. And I am proud to announce that this personal method will be the subject of a seminar that I will be giving on March 29 in London, on the first day of SQLBits X – a conference that anyone who can should attend. Thanks, SQLBits, for giving me a platform where I can share my experience with this modeling method with others.

Bin packing part 6: Further improvements
Busy months ahead

Related Posts

No results found.

3 Comments. Leave new

  • Dm Unseen AKA M. Evers
    December 23, 2011 11:30

    Hi Hugo,

    which specific vesrion of NIAM do you refer to? and what do you like specifcally about this NIAM version (binary,n-ary,nested, verbalisation) What are the deficits of current NIAM derivatives like ORM, FCO-IM and CogNIAM ?

    Reply
  • Hugo Kornelis
    December 23, 2011 13:45

    Hi Martijn,

    I refer to the version Dr. Ir. Nijssen originally published in his book "Universele Informatiekunde". I don’t know this version by any other name than just "NIAM". This version has its shortcomings and deficiencies (which is why I improved upon it).

    The current derivatives you mention all have their strengths, but I like none of them as much as my own method, for different reasons:

    ORM – from reading Terry Halpins original book, it’s clear that he does not embrace the Principles of Modeling. He does give a procedure that gives you the steps to make a model, but doesn’t tell you what concrete examples to present the domain expert to find, for instance, the uniqueness constraints.
    FCO-IM – this method adds lots of extra information to the model that is only required to enable an exact reconstruction of the original verbalization of the atomic facts. I fail to see the importance of this, as verbalized atomic facts are just one of many steps between the original representation (concrete example in domain experts’ jargon) and the end result (conceptual model). I don’t think spending extra time to enable exact reconstruction of a representation the end users will not normally use is not a good use of my time.
    CogNIAM – this method is, as far as I know, published even less than NIAM. I haven’t seen the last version; last time I spoke with Nijssen (a few years back now), he was considering moving back from grouped notation to elementary notation. That would be an improvement, since recipes for finding constraints are easier to make on elementary fact types. In the few publications I have seen on CogNIAM (and it’s predecessor, Kenniskunde), I also noticed that Nijssen no longer gives recipes for the various steps, which I consider a shortcoming.

    This all being said – the differences between these derivatives can be divided in two categories: differences in the procedure (recipes), or differences in the representation. But it’s easy to apply the procedures for one of these derivatives and use the representation of another one. My seminar at SQLBits will focus completely on the procedure and the recipes. I use NIAM’s IGD diagrams to represent the results, but you can easily use the same procedure and recipes with ORM diagrams, FCO-IM diagrams, CogNIAM diagrams, or even (if you are a bit masochistic) SBVR Structured English.

    Reply
  • Dm Unseen AKA M. Evers
    December 23, 2011 14:46

    Hi Hugo,

    Thanks for the info. I’m using FCO-IM mostly. While i’m not an expert of verbalisation or all the procedures I do know partially why FCO-IM works the way it does, and that it is biased towards easy transformation and verbalisation (also after transformation!). I heavily rely on the transformation part of FCO-IM to produce all kinds of physical schema’s (not just Normal Forms). I’ve never investigated pure NIAM because of this. Too bad I won’t be at SQLBits X (I attended both 2011 spring & fall versions)

    Also, I see that Universele Informatiekune is OOP. However, there is a free course book and free (for students) casetool for FCO-IM, and both are available in english. (there are even courses available in NL). Note that I currently use FCO-IM with DWH/BI projects and very little with OLTP stuff.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close