Synthesis: 2013

Saturday, August 31, 2013

Book Review: Information Dashboard Design by Stephen Few

The guru of Dashboard design, Stephen Few, finally released the much anticipated second edition of his book "Information Dashboard Design: Displaying Data for At-a-Glance Monitoring". The book is quite valuable from my perspective, as he lists the design principles for a good dashboard design.

What is a dashboard?

While most people are familiar with Stephen Few's pioneering work on bullet graphs and other visualizations, I think an equally important contribution is Stephen's definition of a dashboard. He defines a dashboard as following:

A dashboard is a visual display

the most important information needed to achieve one or more objectives

that has been

consolidated on a single screen

so it can be

monitored at a glance

In the book, he goes on to describe common pitfalls, some background theory on how information is perceived by the human brain as well as how to conduct a project for dashboard design. An equally important concept for a novice like me was the Ink to Data ratio, and I can see how some of my previous efforts could be improved substantially.

There are many important sections in the book on ways to start and optimize a dashboard and coupled with Ralph Kimball's suggestions on design of drill downs in a dashboard, I think it provides excellent design principles and patterns for how to think about dashboards.

In summary, I think all the information given is extremely valuable, and this book will be sitting on my shelves for a long time.

Tuesday, July 30, 2013

Combating Food Waste

I read recently that in most households anywhere between 15 to 30% of food is thrown away. Almost 50% of salad leaves are thrown away.

These are certainly jaw dropping statistics, and I am wondering if we can somehow figure out a way to reduce the waste. For many of us, who do like to cook, maybe there is a way to keep track of food that is going to become spoilt soon and incorporate those ingredients in our recipes in our daily cooking.

Equally important, could be uncooked food that we can donate to a food drive that is nearing expiry, and can be picked up from our kerbs. Food that is already cooked that can be picked up from our homes as well, with hours of expiry listed. These could be picked up by volunteer organizations and provided as meals to hungry folks. There of course needs to be a goodwill system that recognizes more dependable and honest folks versus people who try to pass of their already spoilt food into these drives. A recycling fee could be paid to these households that have been consistently good and honest about the quality and condition of their food.

Sunday, July 28, 2013

Understanding the complexity in systems using simplified mechanisms

As the systems we design and build are getting more function rich, the underlying logic is getting bigger and bigger, and as a result, the complexity in these systems is also increasing. The traditional view of adding features to products has meant that we are now left with systems, whose complexity is understood by very few and sometimes nobody at all.

This is causing trouble not only in testing these large systems, but more often than that even the architects and developers are at a loss to understand which all pieces of the system are interacting with one another.

In open source systems, this complexity is somehow managed by adding more eyeballs to the code base as well as constant re-factoring by self-motivated and many time unpaid developers. However, in commercially developed software the complexity can kill or really shorten the product life.

Luckily, there are techniques that can help everyone understand the underlying complexity so that it can be understood and tackled by everyone. One of these techniques is called DSM where all components of a system are put in a NXN matrix and then the interactions between these components is mapped in the resulting matrix/grid. Optimization techniques involve pulling the interacting elements closer to one another spatially on the grid, so that these can then be understood as one system rather than random components interacting with one another.

Lets hope these techniques become more accessible in the days to come so that we don't end up being in the new Dark Ages of automation, where no body really understands why and how our software and hardware components interact with one another.

Thursday, July 18, 2013

Principles of Dashboard Design

I was thinking about the ideal strategy for designing dashboards, till I came across a couple of blog entries by the master himself - Ralph Kimball. In these blog posts, titled "Drill Down to Ask Why" (first and second) the master gives the mantras of good dashboard design.

Here they are, in the order explained by the Master (and mentioned to him by his colleague):

1. Publish reports. Provide standard operational and managerial “report cards” on the current state of a business.

2. Identify exceptions. Reveal the exceptional performance situations to focus attention

3. Determine causal factors. Seek to understand the “why” or root causes behind the identified exceptions.

4. Model alternatives. Provide a backdrop to evaluate different decision alternatives.

5. Track actions. Evaluate the effectiveness of the recommended actions and feed the decisions back to both the operational systems and DW, against which stage one reporting will be conducted, thereby closing the loop.

More on Asking Why?
Giving the example of an air fare planner looking for reasons for poor performance of their data, Ralph provides the following illustrations:

1. Give me more detail. Run the same yield report, but break down the high-level routes by dates, time of day, aircraft type, fare class and other attributes of the original yield calculation.

2. Give me a comparison. Run the same yield report, but this time compare to a previous time period or to competitive yield data if it is available.

3. Let me search for other factors. Jump to nonyield databases, such as a weather database, a holiday/special events database, a marketing promotions database or a competitive pricing database to see if any of these exogenous factors could have played a role.

4. Tell me what explains the variance. Perform a data mining analysis, perhaps using decision trees, examining hundreds of marketplace conditions to see which of these conditions correlates most strongly with the drop in yield (explaining the variance in data mining terminology).

5. Search the Web for information about the problem. Google or Yahoo! the Web for “airline yield 2008 versus 2007.”

I think the above provides a great structure for a real enterprise level executive dashboard.

Wednesday, May 15, 2013

Multi tenant architectures in On-premise settings

When discussing cloud computing, one often hears the term "Multi Tenant Architecture". The term is seen as "the standard" by which one should judge whether a software is truly a cloud solution or simply a software hosted on a remote data center. The expression implies the ability of an enterprise software to be effectively used by multiple enterprises at the same time, without the knowledge of one another.

As mentioned, Multi tenant architectures are supposedly one of the underpinnings of modern SaaS (Software as a Service) or cloud architectures. However multi-tenant architectures are increasingly being asked for in many on-premise deployments.

The reasons for this ask, ranges from convincing to the absurd. This trend is particularly noticeable in organizations where:

1. After an era where Corporate IT had no say (for good reason), in business departments running their own mini-IT shops, IT has now become "a strategic expense". IT is trying to consolidate various home-grown systems, but the requirements have become so divergent that completely different functional and non-functional requirements are being sought.

2. Alternatively, in other cases, IT departments in large Organizations have bought multi-tenant software products without real business analysis. Here the promise was that client departments would utilize their own resources to understand the processes they would want to automate and then go about implementing such solutions in a self-service model.

3. Finally, are organizations where Multi-million dollar data centres have been sold to large organizations with the promise of virtualization without having a real business case for it.

Regardless of the mechanism, reality is that IT departments would like to look for applications that can be rolled out in this self-service way.

For our purposes, on-premise multi-tenant architectures are being desired by organizations wherever there is a self service software delivery model. Where multiple groups or departments or even organizations are using a shared environment and would like to sandbox their interactions and system configurations to their own use scenarios.

Multi tenant as the name implies multiple distinct uses. A good metric to validate if the architecture is truly multi tenant is to Of course ask the question "How many user groups are using distinct configurations, their own databases as well as configuration policies and yet are on the latest release of the software?"

To achieve multi-tenancy is a whole different ball game together. The first step is what is called virtualized multi tenancy where multiple environments and associated user groups are having their own configurations but in sandboxed virtual environments.

The next level is to allow multiple deployments within the same environment, but completely isolated by having dedicated instances of run time environments.

True multi-tenant architectures would imply that its the same run time process but allows completely isolated operations. The way to achieve it is perhaps through configurations that can be altered and spawned at any time. Perhaps resource constraints can be set to prevent monopolization of resources by a single instance/ configuration. Software based memory architecture would allow completely sand-boxed operations. All this requires a lot of hard work in upfront planning and designing and building.

Given these above requirements, I would think that to be functionally complete multi-tenant architectures would require the following:

1. A Configuration driven administration model
2. An ability to spawn and kill configuration based instances that are completely sand-boxed from one another.
3. An ability to set resource constraints on usage
4. A configuration authoring tool
5. A set of policies that dictates what can be done and not done
6. A starting persistence and object model that can be tweaked to a degree.
7. A standardized higher level SDK that can be developed against without having to accessing lower level APIs
8. Ability to monitor and report on usage and optionally get billed for it.

Based on the above, some conventional platforms are evolving to support multi-tenant models. Others, have a long way to go....

Wednesday, May 1, 2013

Design principles of google dashboards

Google analytics is one of the most heavily used BI dashboards on the planet. The design principles used by Analytics are straightforward but can provide a very rich user experience.

At the highest level, the dashboard is made up of metrics, dimensions and visualizations (widgets).

Google analytics collects various metrics about a visit to an enabled site. This includes site usage (new visit, visit duration, bounce flag, page visit), E commerce goals that the user may have set up, Adsense metrics in terms of revenue, CPM, CTR, and others.

The dimensions are grouped under headings of Audience, Traffic Sources, Content and Conversions.

Visualizations are provided as widgets which could be Standard or Real time. Standard widgets could be the metric value reported as a number, Timeline that shows trends over time, Geomap showing locations, Tables, Pie or Bar.

Each widget is optionally predisposed towards a primary dimension. Metric has no primary dimension, Timeline has time dimension hierarchy, map has location hierarchy, Table has a slew of dimensions and allows a cross tabulation between two metrics, Pie uses a grouping dimension and bar can plot one metric grouped by a primary dimension and optionally pivoted by an optional dimension.

Saturday, April 27, 2013

Measuring the Green Economy

Measuring how green an economy is, is perhaps one of the toughest aspects of understanding development trends that are going to affect us in the future. Luckily, there are several places this can be measured from. I recently read a book from OECD called Eco-Innovation in Industry: Enabling Green Growth that provided some excellent thoughts on measuring the growth of the Green Economy. It gave the following indicators for possible KPIs for growth of the Green Economy.

Operating Performance Indicator (OPI)	Management Performance Indicator (MPI)	Environment Condition Indicator (ECI)
Raw material used per unit of product (kg/unit)	Environmental costs or budget ($/year)	Contaminant concentrations in ambient air (ug/m3)
Energy used annually per unit of product (MJ/ 1000/ product)	Percentage of Environmental Targets Achieved (%)	Frequency of photo chemical smogs (per year)
Energy Conserved (MJ)	Number of employees trained (% trained/ to be trained %)	Contaminant concentration in ground or surface water (mg/L)
Number of emergency events or unplanned shutdowns in a year	Number of audit findings	Change in groundwater level (m)
Hours of preventive maintenance (hours/ year)	Number of audit findings addressed	Number of coliform bacteria per liter of potable water
Average fuel consumption of vehicle fleet (l/100km)	Time spent to correct audit findings (person hours)	Contaminant concentration in surface soil (mg/kg)
Hazardous waste generated per unit of product (kg/unit)	Time spent to respond to environmental incidents (person hours per year)	Area of contaminated land rehabilitation (hectares/ year)
Emissions of specific pollutants to air (tonnes CO₂/ year)	Number of complaints from public or employees (per year)	Population of a specific species of animals within a defined area (per m²)
Wastewater discharged per unit of product (1000 litres/ unit)	Number of suppliers contacted for environmental management (per year)	Number of hospital admissions for asthma during smog season (per year)
Hazardous waste eliminated by pollution prevention (kg/year)	Cost of pollution prevention projects ($/year)	Number of fish deaths in a specific watercourse (per year)
Number of days air emission limits were exceeded (days/ year)	Number of management level staff with specific environmental responsibilities	Employee blood lead levels (μg/ 100 ml)

In addition, the Economics and Statistics Administration of the US Department of Commerce did some interesting research on measuring the green economy. The report uses the North American Industry Classification System (NAICS codes) to identify green industry service and products. The selected NAICS codes selected by the study are included in the Annexures within the report.

These can be used by government portals to also understand trends in the green economy. These can also be used for developing Performance Metrics dashboards either within an organization or for a local, state of federal level government agency

Thursday, April 18, 2013

Focusing on what matters: the portfolio approach

Recently I was thinking about which of my competing priorities should I be focused on to make sure it matters in the end. I was reminded of the BCG matrix that I had read about many years ago.

I wanted to think about whether I should focus on the standard stuff or put more priority on new ideas which perhaps had a better future. The BCG (Boston Consulting Group) matrix as most people already know, is such a tool meant for organizations to decide on which products to fund. The matrix used to be popular, till people thought the recommendations made no sense and suggested an incorrect strategy. As I found out, not quite so.

The matrix defined 4 project types based on the four cells in a 2X2 matrix. On the vertical axis was Growth Potential, and on the horizontal axis was Market share potential. The top left quadrant indicated the initiatives that provided high growth potential as well as high market share potential.

To explain, each of the quadrants starting from top left and proceeding anti-clockwise were:

1. Stars (top left) : These are projects/ products which have a high growth potential and a high market share. ( I would also add mind share within organizations)

2. Cash cows (bottom left): These were products in high market share segments but low future growth potential. Think of these as things that are successful for today, but new investment may not produce new growth.

3. Dogs (bottom right) : These are projects and products that are frankly not going anywhere. Also called cash traps.

4. Problem children or question marks (top right): Finally is the category of things that have a lot of growth potential, but are relatively nascent. They have low market and mind share currently.

The traditional interpretation of this matrix, which was also surprisingly echoed by the author, Bruce Henderson, in his original writing was to kill the dogs, milk the cows to fund the stars.

Essentially, it implies that people should take the benefit (cash flows) from the high market share and low growth initiatives, to fund high growth and and high market share potential initiatives. This of course people found objectionable after some time, as it meant to divert cash from low growth segments to high growth segments, almost like a parasitic existence of one on another. This was something that was not sustainable unless the cash outflows (read benefits) from the low growth segment was large enough to sustain the growth in the new segment.

However, this needs to be interpreted in context of other equally important concepts presented by the same author in his other writings. With these in mind, he himself may have not explained himself fully.

First, the author said that market share and not margins is the most important thing to focus on. The rationale was that in the author's view based on analysis of many companies in the late sixties and seventies, margins improve as the entity becomes more experienced in a certain market and product. However these margins are only sustained and improved as long as the company is able to maintain market share. In absence of market share, margins become impossible to defend. Putting it in perspective of the matrix, the left half of the quadrant is where one should focus.

In a separate piece of writing, the author also said that building market share requires new investments to fund growth. In high growth segments, the market share cannot be sustained unless continuous investment is made. So there is a propensity for Stars to become Question marks unless funding can be sustained that is proportional to the growth of the market.

On the other hand, projects that are deemed as pets (they don't have market share or growth potential ), should either be killed or should be invested in to such an extent that they become market leaders in terms of market share. Bottom line, according to the author, entities need to maintain market share at all costs, irrespective of growth potential of the market and let margins take care of themselves.

These, in my personal humble opinion, are very important insights that apply equally to companies building product portfolios as well as individuals when deciding on where to focus.

Saturday, April 13, 2013

What is Big Data?

Someone I know asked me what is Big Data and if I could explain it the way they could understand it. Now this person can understand traditional data architectures but does not deal with technology on a day to day basis. Off late, they are more into strategy consulting and business development for large organizations.

The Big Problem

I explained that Big Data was the entire practice of handling large amounts of data that is growing every minute in a fashion never experienced before. I gave the example of Smart Meters that are being installed by electricity distribution companies in our homes. A typical large electricity distribution company that supplies power to around a million homes, has a million meters sending status updates (consumption, availability, etc.) every 15 minutes. That adds to (1 X 4 ) = 4 million new records an hour, (4 X 24)=96 million data points a day. Multiply that by a year and you start seeing (365 X 96M) = 35.04B data points in a year.

The above problem is still finite, since we can predict by how much the data will grow between a certain period of time. Look at the example of social media and we cannot even predict the rate at which the data will grow. A certain event can trigger a thousand tweets or blogs and no one can figure out what they mean as an overall trend or sentiment.

Of course, eventually the question becomes how do you make sense of this data? Most people are not even able to handle these datasets in traditional data architectures. Why this is the case, we need to understand why traditional database architectures are not able to scale. Then, I will describe how the new Big Data Architectures resolve these problems.

Limitations of Traditional Database Architectures with Big Data

Out of date Indices and Query Plans

Traditional databases were designed and optimized for a certain size and growth for each entity. The science was called Volumetrics. Based on the relative sizes of different entities, distribution in variability of data, as well as type of query to be performed, it was more efficient to perform one query using a strategy that was different from another (called Query plans). Database Indices were then designed that would return results really fast, based on relative sizings of tables, variability in data within each entity for queried or joined attributes, and ofcourse, nature of analysis. In Big Data, the data is churning so fast, it is impossible to keep re-analyzing indices, and coming up with different query plans for fast analysis.

Computational overheads in Minimizing Storage on Disk

Another problem is storage of data. In normalized models, data is made up of primary entities, look up tables and link tables. Typically, the data entry forms in these applications are designed such that upon data insert, the database receives coded values from the data input forms. When this is not the case, the application has to fire multiple database queries to convert user inputs into coded values for lookup tables. These were strategies to minimize the storage of data on disk.

From a computational point of view, a record insert, in traditional cases, would be made up of one insert, with N number of index based database queries on lookup tables. In cases where the user data forms are populated from pick lists that the user chooses from, these are full table scan queries on lookup tables. Ideally, an application can also cache these values upon startup. However, where dimensional models are involved, there is a concept of Slowly changing dimensions, where the lookup tables themselves are getting updated and caches may need to be updated eventually.

In Big Data scenarios, we are forced with two problems. Firstly, when dealing with unstructured data, the concept of lookup tables is just not possible. Secondly, for structured data, we still need to trade-off the computational overhead in performing lookups upon insertion, vs. our ability to validate the lookup values as well as come up with a finite list of lookups in the first place. If lookups is something that we want to apply to structured and unstructured data, we need to introduce some level of control on when to parse data, so that we can improve storage, reduce computational overhead during pre-storage and improve our chances of efficient retrieval eventually.

Re-emphasizing the same, the challenge is to keep storing the data efficiently, such that it uses minimal space on disk and ofcourse eventually, is available for analysis. Also, challenge is how do you do this efficiently, when you are not able to utilize traditional constructs like lookup tables, for ensuring referential integrity as well as index based searches.

Re-stating the problem

In a nutshell, Big Data is the entire practice around storage, retrieval, query and analysis of large volume datasets that are growing with time making traditional database architectures inefficient and obsolete.

The Big Data architecture

Storage and Retrieval using hashcode

The primary tactics is to look for approaches that allow a dataset to index itself, or atleast become more efficient in handling itself. Now programmers have long dealt with this problem. Most programming languages have native data structures for handling multiple data elements in memory. These include Arrays (a data structure in which we can store N elements for each dimension), Lists (single dimensional arrays that can grow as we add new elements), Maps (a list in which elements are accessed through a key rather than an element index) and Sets (lists containing unique values) that grow and sometimes sort themselves. Most of these constructs rely on a generation of an integer number called a hashcode.

A hashcode is an integer value, that is computed for each entry. More importantly, two values that are supposed to be equal should return the same hashcode. So, if we say that the text “Orange” is the same as “orange” and “ORANGE”, these should all return the same hashcode. The computation of hashcode helps in comparing, ordering, sorting and indexing values inside hashcode based data structures. More importantly hashcode computation is light-weight and lends itself to many algorithmic implementations.

Introducing Immutability

Another important benefit of hashcode based data structures is the ability to promote the concept of immutability. Immutability essentially implies that the system will never discard or overwrite (mutate) any value. So, if the system encounters a certain value, it will fill a position in the memory with that value, and never overwrite it. If you wrote a function in an application, that said let A=99.17, let B = 0.83, and then compute A=A+B, an Immutability based architecture will not discard the old A, which was 99.17. It will keep that in memory. It will actually create three values in memory, say X= 99.17, Y = 0.83 and Z = 100.0. At the beginning of your little function, it will assign A= X = 99.17 and at the end, it will re-assign A to Z, implying A=Z=100.0. The advantage of such an architecture, eventually is that if your application encounters hundreds of millions of rows of data (that contain one field value that ranges for example, from Excellent to Poor), the actual memory utilization will be much less than the actual number of rows and equal to the distribution of actual values (from Excellent to Poor). Compare this to lookup tables in traditional database, and you will understand the benefits.

The Big Data Architectures are primarily made up of data structures that can store simply values or Key-Value pairs based on hashcodes and immutability.

Computation and Analysis

Now, to perform computations and analysis on this new data paradigm, users of Big data needed a new construct. This was needed more so that the computation of large data could leverage large scale computation clusters where traditional index based models could not be rolled out. The invention was a Java based SDK that could receive a computation task and distribute it among a large scale deployment. Ofcourse, it had to make use of existing constructs of hashing based data structures. Apache Hadoop donated by Google, was perhaps the most important implementation that can take a problem, distribute it among a large processing node and collect the results in a way it makes sense. The framework is called MapReduce, where any problem is broken up into as many parallel computation tasks as the size of the computation cluster and then distributed over the cluster. Once the results are computed, the results are combined and reduced to generate the final result. It is important to note that MapReduce algorithms will only out perform index based architectures of yesteryear as long as the data is changing so fast that maintaining index based data warehouses is not feasible.

The challenge in implementing MapReduce, is that it is a programming API and one will need to write programs for performing any sort of calculation.

Simplified Hadoop Programming Models

Apache Hive, donated to Apache by Facebook, is a data warehousing software that is built on top of Hadoop. What this means is that users can write SQL like scripts for declaring data structures, and analyzing data that is residing on distributed file systems over large scale clusters. Under the hood, Hive uses Hadoop and Hadoop compatible distributed file systems.

And finally, Pigs, an ETL like platform uses a programming language called Pig Latin and has inbuilt transformers that can read from multiple formats and can perform Hadoop MapReduce computations using an ETL like construct.

There are many more Hadoop frameworks, and new ones are coming up each day. I have perhaps only described the two that are the most popular.

Summary

To summarize, my take of Big Data is Architectures that allow storage, retrieval, query and analysis of large volume rapidly changing data using large scale distributed clusters. In the real world, there are only a limited (though growing) class of problems that can be resolved using Big Data architectures, and one will still need the traditional relational architectures for a long time to come.

Wednesday, March 27, 2013

Planting trees rather than crops

As I thought about the million things each of us has to worry about, I realized that many things that I have initiated, perhaps needs a different approach to management. I need to move to a model where things can grow on their own without needing constant supervision.

In the short term, we tend to go about initiatives that we can initiate, cultivate, grow and harvest. The problem is that there are only so many things one can handle this way. This is akin to planting crops that we need to farm and tend to everyday. The bigger problem is that once harvested, you have to start all over again. This provides quicker gratification, but you are caught in an endless cycle of planting, growing and harvesting.

In comparison, planting trees is a different model. You plant the seed, water it and give it space to flourish. The payback is not immediate but once the tree has flourished, you can reap the rewards. More importantly, they live longer and provide benefits for a much longer time.

Now the task of figuring out, which trees to plant.........

Friday, March 22, 2013

Top 3 new demands for a professional

I have been thinking about the role of a professional and how it has changed from the past and how it will continue to change in the future. This was something that I ofcourse have to think about every now and then. And, it is based on how my own role fits into the equation.

I will state these in a reverse order from the least important to the most important.

#3 : Skill is still very important
In every profession, the key difference between an amateur and a pro was and still is skill. How well to do a job is still very important and perhaps more important now. This is because, in today's world of transactional relationships, if you cannot do your job, you will not get compensated.

#2: Attribution is important.
In today's age of copy paste freedom heaven, it is important that people know who did the job. Whether it is by having your name on your work, or making sure everyone knows through word of mouth, attribution is important than ever before.

#1 Defining what needs to be done for yourself and for others.
This is perhaps the most important job. Everyone is struggling today to define their work. If you can do that not only for yourself, but also for others, that is leadership. People should be able to understand their own work based on your actions, thought processes and outputs.

Let me know what you think .....

Friday, March 15, 2013

Design Patterns Oversold

When Christopher Alexander wrote the book "A Pattern Language" in 1977, it was a part of a 2 volume series, that tried to understand the sophistication of design in everyday buildings. It was trying to capture something that had been learnt over centuries, if not millenia.

He would not have understood the passion with which software engineering discipline embraced and then took over the conversation. Today, software engineers have carried design patterns to the extreme.

Somewhere along the way the meaning or rather the semantics of the word was altered from something that had been observed, to something that is closer in meaning to a blueprint or reference architecture.

Rather than capturing the underlying structure and beauty of a solution that is perfected over time by many generations of designers, and further repeated without formal training, and handed down from practitioner to practitioner, and finally discovered and rationalized by the master; it has been substituted to mean the opposite.

Design patterns in software now means something that is created by ivory tower architects and handed down to practitioners for consumption, to prevent others from doing the unthinkable, Think for themselves.

Wednesday, March 13, 2013

Architecture of Enterprise Content Management systems

There is a variety of document management systems that are available on the market today. To develop distributed document capture and management systems needs several basic components.

1. A document capture and delivery module, which is typically an optical sensor such as a scanner or camera.

2. An optional pdf component that converts scanned documents to pdf formats.

3. An optional OCR module to convert captured graphics to searchable and extractable text

4. A job management system that can split or merge documents to a standard structure

5. A document categorization and relationship system. The categorization aspect is called taxonomy, whereas the relationship rules that build semantic knowledge is called a document ontology.

6. A business rules engine that can categorize the documents based on rules around content, logged in user, location, format, timestamp to the created document ontology.

7. An analytics engine that can track metrics and can provide capabilities for analysis.

8. A record management system for storing records such that their circulation can be controlled, authenticity verified as well as searched.

9. A workflow or business process modelling component that allows document approvals to be routed through departments or organizations.

10. An email/ electronic documents module for managing and storing emails and all sorts of digital content such as word processing documents, spreadsheets, media files and others such that they can be stored, searched, archived and accessed from multiple environments.

Thursday, March 7, 2013

The new software

Just like people can look at most buildings and estimate the decade/century of construction, so it is with software.

Any software is written based on tools and technologies of that time. Software also becomes obsolete very quickly as new technologies emerge and underlying hardware capabilities or access devices and mechanisms change.

Every rewrite is thus an opportunity to make the software more elegant and more functionally coherent.

Wednesday, March 6, 2013

The endless striving

Many of us deal with specifications that are half baked, poorly thought through and some times cobbled together using buzz words and copy paste jobs. The spec writers in my opinion disrespect the institutions that entrust them with acquiring a platform, a solution or a technology.

In commercial dealings, Sales staff who get paid based on number of deals they close, rush customers and prospects into releasing these poorly articulated needs documents to move these "opportunities" along the pipeline.

How should architects react to these? is the question that comes to my mind. Should we educate the customer into understanding what they want, or should we answer the question to make sure we pass this test and get the highest marks in any formal or informal evaluation. Should we penalize the customers for making wrong choices or should we penalize the institutions that hire these executives in the poor choices they made.

My instinct had been to try and serve the institution, understand the organization, the desire behind the specification and articulate it as such. However, this has not worked consistently. The traditional sales thinking is to give the customer what they have asked for instead of what they need.

I am increasingly also seeing the reverse during execution, where delivery organizations try to do as little as possible, to maximise their margins by delivering barely enough compared to the initial ask or the need of the customer.

Increasingly the shrill demands of sales and executives to meet numbers and margins, and ever shortening timelines drowns out any meaningful protest.

However, in my humble opinion, this is a mistake. Architects are supposed to provide the voice of reason, and ability to fight these pressures is the ultimate qualification to the job.

Monday, March 4, 2013

Observations on design of software delivery organization

Off late I have been observing an interesting pattern develop in the organization I work for, in terms of organization design of the project teams. This has been going on for the past few years. I am not sure, how typical it is of other not too big software development shops or is it unique to my organization.

The constraints are this. There is a group of people (ex-developers) who want to become managers. There is another group that wants to become technical leadership. Then there are distributed development teams that are typically under five years of experience.

The challenge is that the manager group is typically made up of people not respected for their technical skills and so cannot lead or guide their teams. They in fact are not even able to counsel their teams on what skills to develop or how to solve a problem. The teams actually respect technical leaders, who themselves do not want the hassle of managing people.

So, the firm has found a convenient way. Make the developers report to managers on a day to day work assignment and rope in the technical leads during projects to guide the teams. The managers typically perform administrative functions and do not really manage or lead their teams.

Eventually, the teams become proficient, and people start leaving as one person starts receiving all the rewards, and others find greener pastures. The managers get the credit for the teams but are in fact clue less. The technical leads are slowly replaced by the new stars and they have to figure out what they want to do next. The new stars have to then decide whether to stay in the teams, become managers removed from the actual work or become technical leads without any real teams. Dead end for every one.

Sunday, March 3, 2013

Handling functional and non-functional requirements in Integrated Solutions

Most architects I know, squirm when told by business users that X system needs to be integrated with system Y and perhaps system Z. This is because integration is such a loose word that one could drive a truck through it. Understanding the scenarios is perhaps easier in understanding the scope, feasibility and ultimately the cost.

Functional Integration Requirements should be Use Case Driven

Thus Integration between systems is primarily functional - use case driven. Once all the Use Cases or Scenarios involving the integrated components are known, architects can dig deeper and evaluate the available and required interfaces on the target system. However, it is important to consider non functional architectural drivers, specially in integration scenarios.

Non-Functional Requirements should span Primary and Integrated Components

When thinking about non-functional requirements, we need to think about the integrated solution and not just the newly developed component. This is important, since we might run into a situation where integration with a legacy system is required to serve an architectural driver in terms of availability, stability as well as recoveribility that can not be met without a system overhaul. Alternatively, Design strategies need to be considered to achieve these non-functional requirements.

Think about non-functional requirements for integration - functionally

Like all good non-functional requirements, these integration scenarios too need to be expressed in functional terms with measurable acceptance conditions to ensure these can be designed for and ultimately tested in production.

Thursday, February 28, 2013

Reasons for boom in asset management in traditional oil economies

The primary oil export economies in the world are in transition. The traditional view is that change is happening since their reserves are exhausting, and hence the need to diversify. While that is probably one of the key reasons, however, I don't think that is the whole story.

Traditionally, these economies operated like a combination of an export economy for their global customers, and a welfare state for their citizens. In earlier era, the world had a very simple operating model. These countries had the world's largest operating supplies of something that everyone wanted to buy. Like an Apple Store on the morning of a new iPhone launch, they decided how the queue should form outside their gates. But then things started to change.

The changing customer segments

On one hand, with increased globalization and resulting demand for energy by paying top dollar from emerging economies, has meant that the demand for their stuff has started increasing and diversifying. On the other, the goal of energy self reliance by their biggest customer (united states) has started to look real. This all makes the need to operate in an international environment more necessary. They need to operate more like a company with multiple trading partners.

But perhaps most importantly, new demographic, social and economic realities are having a critical impact on these economies.

The demographic shift

These countries now have a larger population of young people who are well connected and increasingly well informed, but do not have the professional skills or the inclination to join a global and mobile workforce. Since these economies were traditionally one dimensional, there are limited business opportunities for the workforce to get absorbed. This is leading to some discontentment.

The leaders in these economies are realizing that they need to diversify fast, to create a local economy for people to get absorbed. But this requires a bigger local economy that has at least surpassed a minimum critical mass. For this, there is a need for an infrastructure where professionals from other countries can come and contribute till the local population is ready to take over.

This has meant a widespread investment on local infrastructure. But this brings a different problem. Historically, these countries were providing a decent infrastructure free of cost. But with larger percentage of consumers, coming from non-citizens, this approach is no longer sustainable.

The need to recover costs

There is a need to levy user charges so that operating costs and limited capital costs can be recovered. Some form of user charges are also needed as a demand control measure, since historically the region has some of the largest wastage rates in the world.

Unfortunately, levying user charges is not as straight forward. How do you convince a consumer that what they are paying is only their fair share and are not subsidizing poor efficiencies in the service delivery organization.
Delivery organizations also need to prove that they are only adding new capacity when and where it is needed.
Also the service delivery organizations need to prove that any stuff they already own, is being operated and maintained to maximize its service life. To explain, service delivery organizations have to prove that there are no unplanned service disruptions and expensive repairs. Also, not to try repair things more expensively than the replacement, specially if the repair will not give the benefits in terms of overall costs.

How do you prove that the service organizations have done their part?

This is an interesting challenge for service delivery organizations trying to move to a new model. Luckily, the business of consultants has come up with the next best idea. There are an ever increasing alphabet soup of accreditations and certifications, that will prove to the regulators and eventually the leaders, that the delivery organizations are operating efficiently.

Now everyone is happy...

Sunday, February 24, 2013

Understanding Land databases

Land administration is a major area of concern in many jurisdictions around the world. What seems as a fairly straightforward domain actually involves many datasets that do not integrate well with each other.

Types of databases

These include the following:
1. A Title or ownership database - A Title/ Ownership database is a record of ownership of land or property. In most areas, it maintains a reference to the piece of land through some unique identifier, a legal description of the title which may refer to a lot number in a specific subdivision plan, as well as a record of ownership and transfers. In case, the land organization maintains a separate document library, it will refer to the document and the filing number, or a book and a page number.

2. A Cadastral database - A Cadastral maintains a record of the geometry of land. A cadastral database may maintain its own numbering system for identifying a unique piece of land. A cadastral database typically records data in terms of plans. Each plan has a boundary denoting the study area as well as control points that are established for accuracy. Each plan will then have a set of surveyed lines and points (also called line points) as they are used to establish the accuracy of the lines.
Finally, resulting from these are parcel boundaries which are actually derived from the survey lines.

3. An Assessment or Valuation database - An assessment database keeps track of properties from a valuation perspective. Valuation or assessment maps are prepared in jurisdictions where taxes are collected based on uses and improvements done to a piece of land. The argument being that a five star hotel built on a property should be charged more than a piece of land that is being farmed. Property taxes are also levied so that people don't simply buy land for speculative purposes and leave it vacant but are forced to put it to some economic use. Based on this need, assessment maps are prepared that are then used by taxation authorities to track changes in usage and improvements done to piece of land.

Other information in primary datasets

Typically sales history is tied to the title database. Structures history is tied to the valuation database as these indicate improvements in land. The original subdivision planning is tied to the cadastral database as these include lot and parcel numbers.

Tied to all the above datasets are documents that include all sorts of records and plans around land transactions. Since all these databases do not have a consistent view of a parcel/ property, most land administration organizations maintain cross reference table such that as long as the user had one piece of information, other datasets can be accessed.

Saturday, February 23, 2013

Analytics and Business Intelligence in real life

There is a lot of hype around Analytics and Big Data in the media and technology journals. It is rivalled only by perhaps 3D printing that is touted as the resurgence of American manufacturing.

However, the hype cycle does not match the ground reality. I recently spoke to a customer who is also a good friend on the whole idea of analytics and BI and its place in the real world. He thought for a while, and said "The problem is.... What do you do with it?". In one sentence, he highlighted the main problem around analytics and BI. He alluded to the fact that most organizations are not equipped to take business decisions based on very insightful BI dashboards.

Its great, if you are fresh out of business school and have all your statistical models all on the back of your hand, and can derive actionable insights into what you need to do. Ofcourse, then have the will and the conviction to act on it. However, for most organizations, it needs continuous professional development, reinforcement and knowledge transfer on how to act on the data that the Business Intelligence dashboard or Analytics platform is giving.

The second more worrisome aspect to this problem is, do people truly understand the systems they are dealing with? Do they have Systems - Thinking? As almost all of us were taught as kids...."Every action has an equal and opposite reaction". Systems Thinking or Business Dynamics teaches how most actions lead to two disproportionate feedback loops. One feedback loop accelerates the rate of change in the direction of our actions, and other feedback loop counters the action and tries to put the system back into an equilibrium. Also, there is a lag before these feedback loops kick in. In business dynamics terminology, these feedback loops are called balancing (reaction to our action) or re-inforcing (accelerator to our actions).

Study after study has shown that when people do not truly understand the systems they are dealing with, they cause massive failures. Most accidents such as Chernobyl can be attributed to these.

I am sure that unless people are given richer mental models and system views of the systems they are dealing with - it could lead to worse than inaction....it could lead to disastrous consequences triggered by a desire to act. After all, the road to hell is filled with good intentions.

Friday, February 1, 2013

Going about building the Multi-perspective dashboard.

In my previous post on the topic, I had expressed my thoughts on conceptualizing a multi-perspective multi-organization dashboard. I had presented the ideas of operational, strategic and policy dashboards.

Tying together operational, strategic and policy dashboards

The key question then becomes how do you tie all these together. My personal understanding at this point is that each of these are related/ need to be related.

Essentially, based on the definition provided in the previous post , the only way these can be compared and eventually tied together is by looking at different time horizons. The operational dashboard is looking at real time data to decide if there is something that urgently needs attention.

The policy level is directing my attention at two levels. How am I doing towards my long term goals as well as if there is something that I did on the recent past where I deviated from my principles or the means I have adopted to achieve my ends.

Last but not the least, my strategy is measuring if the approach that I have adopted towards my goals is effective or not. These should all be driven from the same data points, but should present the data differently.

Going across organizations

In a multi-organization dashboard, the three perspectives mean different things to different people. Where as for a utility, an operational dashboard implies real time information probably being acquired from different sensors and a policy dashboard implies adherence to norms set by a regulatory body, the regulatory body would see the policy level KPIs from the utility being aggregated and reported as part of its operational dashboard.

Also varying, would be the grain of the data. For a utility, a minute level or 15 minute interval data would be reported at the operational level, where as the regulatory body may see a day level or week/month level data as the grain at which data is reported. To me, all these aspects need to be storyboarded by the architects at the start of the project to make sure, we can achieve outcomes desired by different stakeholders within such multi-organization dashboards.

This is my take at this point, and lets see how this changes going forward.

Saturday, January 26, 2013

Design of Multi-perspective Multi-Organization KPI dashboards

Many organizations are now pursuing implementation of dashboards that track key performance indicators. The challenge is how do you take this concept across multiple organizations and into a higher level goal that serves a cross cutting concern, both from a short term perspective as well as medium and long term perspectives. For instance, how does an environmental ministry see where it needs to act today, tomorrow and how it is doing with respect to its goal of promoting sustainable development.

Dashboard organization

I feel that the first thing is to break it up into operational, strategic and policy dashboards. I will explain these one by one in the following sections.

Strategic dashboards

At a high level, organizations have goals and a set of strategies. Strategies to me, in an organizational context are, just as the original English word suggests, ways to achieve a certain goal. The reason I thought about mentioning this, is because I have seen many executives who use strategies and goals interchangeably.

A strategic dashboard, then, should provide at the top level, the goal/ or a list of goals that the organization is trying to meet (to provide a context) supported by the set of actual strategies that will support the overall goal. To ensure that we can measure our progress towards our overall goal, we should express our strategy in quantitative terms. We can then list how the overall quantitative strategy can be achieved by improvements in supporting measures across departments and organizations. This of course may need some detailed models for demand and supply side equations to ensure we are going to be effective in achieving our strategy.

Operational dashboards

The next level is of course operational. Operational measures are easy to describe and measure, as they tend to track stuff in real time or last 24 hours. They draw our attention to what is important right now.
In a multi-organization setup, what is operational for each organization may be different. A utility may be looking at its smart meter infrastructure feeds, as the operational dashboard measures. A higher level organization, such as a regulatory board, may be looking at customer satisfaction as the operational data.
Within the context of each operational dashboard, the only thing that is needed is ensuring that whatever it is we are measuring can be related to other measures. Also important is to ensure that the key performance indicators defined are not at going to work at cross-purposes with each other. If these are working at cross purposes, care should be taken to define a third measure that prompts the executive to figure out what is actually going on.

Policy dashboards

Now that we have covered the operational and strategic dashboards, we are still missing one key perspective. That is the policy. Policy as the English word describes, is a guideline or a procedure or a process that must be adhered to in carrying out the operations and implementing the strategy. This ensures that strategy implementation does not violate the basic tenets that the organization believes in.

However, there is another way by which I have seen policy alluded to. And that is policy as a container for the actual goals itself. For example, the policy dashboard may list all the policy objectives that the organization has adopted for itself.

Again, in a multi-organization setup, the policies (both goals and guidelines), may be in conflict within the organization, or across organizations. This needs to be addressed as an aggregate dashboard will highlight this fact.

All of these together to me determine the different perspectives an organization needs to take into account while designing decision support dashboards.

Saturday, January 12, 2013

Quality of life KPIs - Measuring Human Development

Governments around the world are struggling to figure out how to truly develop their societies and chart their way in doing so. It is clear that traditional measures of development such as Gross Domestic Product (GDP) and Per Capita Income are not adequate to reflect the quality of life and even development. Hence, KPIs are being defined to better measure. In this article, I will explore some of these indicators.

Human Development Index

We start with the Human Development Index, an indicator published by UNDP and created by Mahboob Al Haq and Amartya Sen. It was revamped in 2011. Human Development Index is calculated as the following.

We can see that Human Development Index is made up of several components, one of which is Life Expectancy, which is explained in the following section.

Method for calculating Life Expectancy

Life Expectancy is calculated as the following (http://en.wikipedia.org/wiki/Life_expectancy):

Turns out life expectancy is actually calculated by looking at actuarial tables from the health department. I am taking New York State as an example for these posts. Actuarial tables for New York State are available at http://www.health.ny.gov/statistics/vital_statistics/2010/table03.htm as shown below.

Total Population
Age¹	q²	l³	d⁴	L⁵	T⁶	E⁷
< 1	0.00485	100,000	485	99,661	8,104,253	81.0
1-4	0.00071	99,515	71	397,918	8,004,592	80.4
5-9	0.00044	99,444	44	497,110	7,606,674	76.5
10-14	0.00058	99,400	58	496,855	7,109,564	71.5
15-19	0.00181	99,342	180	496,260	6,612,709	66.6
20-24	0.00328	99,162	325	494,998	6,116,449	61.7
25-29	0.00335	98,837	331	493,358	5,621,451	56.9
30-34	0.00366	98,506	360	491,630	5,128,093	52.1
35-39	0.00495	98,146	486	489,515	4,636,463	47.2
40-44	0.00804	97,660	785	486,338	4,146,948	42.5
45-49	0.01298	96,875	1,258	481,230	3,660,610	37.8
50-54	0.01961	95,617	1,875	473,398	3,179,380	33.3
55-59	0.02923	93,742	2,740	461,860	2,705,982	28.9
60-64	0.04298	91,002	3,911	445,233	2,244,122	24.7
65-69	0.06461	87,091	5,627	421,388	1,798,889	20.7
70-74	0.09678	81,464	7,884	387,610	1,377,501	16.9
75-79	0.15219	73,580	11,198	339,905	989,891	13.5
80-84	0.23704	62,382	14,787	274,943	649,986	10.4
85+	1.00000	47,595	47,595	375,043	375,043	7.9
Males
Age¹	q²	l³	d⁴	L⁵	T⁶	E⁷
< 1	0.00507	100,000	507	99,645	7,856,910	78.6
1-4	0.00074	99,493	74	397,824	7,757,265	78.0
5-9	0.00048	99,419	48	496,975	7,359,441	74.0
10-14	0.00065	99,371	65	496,693	6,862,466	69.1
15-19	0.00270	99,306	268	495,860	6,365,773	64.1
20-24	0.00488	99,038	483	493,983	5,869,913	59.3
25-29	0.00496	98,555	489	491,553	5,375,930	54.5
30-34	0.00518	98,066	508	489,060	4,884,377	49.8
35-39	0.00637	97,558	622	486,235	4,395,317	45.1
40-44	0.01024	96,936	992	482,200	3,909,082	40.3
45-49	0.01625	95,944	1,559	475,823	3,426,882	35.7
50-54	0.02447	94,385	2,310	466,150	2,951,059	31.3
55-59	0.03755	92,075	3,458	451,730	2,484,909	27.0
60-64	0.05329	88,617	4,723	431,278	2,033,179	22.9
65-69	0.07796	83,894	6,540	403,120	1,601,901	19.1
70-74	0.11701	77,354	9,051	364,143	1,198,781	15.5
75-79	0.18279	68,303	12,485	310,303	834,638	12.2
80-84	0.28237	55,818	15,761	239,688	524,335	9.4
85+	1.00000	40,057	40,057	284,647	284,647	7.1
Females
Age¹	q²	l³	d⁴	L⁵	T⁶	E⁷
< 1	0.00462	100,000	462	99,677	8,326,446	83.3
1-4	0.00067	99,538	67	398,018	8,226,769	82.6
5-9	0.00040	99,471	40	497,255	7,828,751	78.7
10-14	0.00051	99,431	50	497,030	7,331,496	73.7
15-19	0.00088	99,381	87	496,688	6,834,466	68.8
20-24	0.00164	99,294	163	496,063	6,337,778	63.8
25-29	0.00178	99,131	176	495,215	5,841,715	58.9
30-34	0.00218	98,955	215	494,238	5,346,500	54.0
35-39	0.00358	98,740	353	492,818	4,852,262	49.1
40-44	0.00592	98,387	583	490,478	4,359,444	44.3
45-49	0.00988	97,804	966	486,605	3,868,966	39.6
50-54	0.01501	96,838	1,454	480,555	3,382,361	34.9
55-59	0.02153	95,384	2,054	471,785	2,901,806	30.4
60-64	0.03376	93,330	3,151	458,773	2,430,021	26.0
65-69	0.05330	90,179	4,806	438,880	1,971,248	21.9
70-74	0.08057	85,373	6,878	409,670	1,532,368	17.9
75-79	0.12926	78,495	10,146	367,110	1,122,698	14.3
80-84	0.20735	68,349	14,172	306,315	755,588	11.1
85+	1.00000	54,177	54,177	449,273	449,273	8.3

1 Age - Age interval of life stated in years

2 q - probability of dying during the stated years

3 l - number of survivors at the beginning of the age interval

4 d - number of persons dying during the age interval

5 L - person years lived during the age interval

6 T - person years beyond the exact age at the beginning of the age interval

7 E - expectation of life at the age at the beginning of the age interval

Education Index

The next parameter for measuring Human Development Index is Education Index. This was changed in 2011 to the be the square root of the products of Mean Years of Schooling Index and Expected Years of Schooling Index divided by 0.951.

The Expected Years of Schooling Index is defined as follows:

The statistics data for State of New York is present at its website (http://www.p12.nysed.gov/irs/statistics/public/)

The next parameter in the Human Development Index is Income Index described below.

Income Index

Income Index is defined as following:

where, GNI stands for Gross National Income. Gross National Income consists of: the personal consumption expenditures, the gross private investment, the government consumption expenditures, the net income from assets abroad (net income receipts), and the gross exports of goods and services, after deducting two components: the gross imports of goods and services, and the indirect business taxes. The GNI is similar to the gross national product (GNP), except that in measuring the GNP one does not deduct the indirect business taxes. (Source: http://en.wikipedia.org/wiki/Gross_national_income)

It is calculated using the following formula

Turns out the GDP data for New York State is available at the website for New York State's Economic Development Agency called Empire State Development's data center (http://esd.ny.gov/NYSDataCenter/GrossDomesticProduct.html).

The data is derived from the US Bureau of Economic Analysis's data which is available at its website (http://www.bea.gov/national/index.htm#gdp) as shown in the screenshot below.

This website also contains an Interactive Data tool that provides data for download.

My plan over the next few posts is to cover all the measures of social development.