Benad's Web Site

Introduction

Unlike software architecture, it feels difficult for me to define what is software design other than by its qualities. Software architecture is a well-known concept: Large software systems are built by layers, with clearly defined interfaces between the strata, so from the point of view of your code running attop this tower, the "architecture" provides an environment with its restrictions and benefits.

Software design is about qualities some code would exhibit within this environment. Like good design in general, defining good code design is difficult. Note that I'm using code and software interchangeably: I'm not trying to describe the qualities of software from a user's perspective, but rather the qualities exhibited by the code when placed in the context of it's production, execution and maintenance.

I struggled for years to find way to describe what made some software design's good the same way we commonly struggle describing good design in general. I could look at some code and its behavior and almost instantly recognize its beautiful design, but couldn't tell why.

The practice of Software Engineering spent decades identifying what are good software qualities, finding ways of measuring them, and hopefully building repeatable processes to produce software of such quality. Since good design is difficult to measure and only tangentially affect the end product's quality, the tools and methodologies of Software Engineering tend to overlook, apart from a few design patterns, code design altogether.

Now, I'll stop being vague. Good software design share many characteristics (goals) with good computer science (correctness, speed, predictability) and software architecture (stability against failures, isolation, scalability, traceability, security), but by focusing on qualities unique to design it will make it easier to grasp the still-vague idea of what is software design in the first place.

Consistency and Patterns

Programming gives a lot of freedom to the software developer in the area of mnemonics. Indeed, while operators, primitives and other syntactic structures have fixed representations, user-defined structures are often represented using certain keywords the developer can define (within certain limits, i.e. what characters are allowed in the user-defined name, uniqueness, and so on). From this arises the first kind of design pattern: Consistency. Commonly known as "naming convention" or "name notation", it arises almost naturally when a programmer has to write code of a considerable size or when working with other programmers.

Some conventions for naming don't seem like so, for example the simple choice of building up the names based on some (human) language. While many programming languages are English-centric, some modern languages allow characters beyond non-accented English characters, allowing programmers to name everything in their preferred tongue.

Beyond language, there are conventions about capitalization and certain keywords to identify the variable name as being part of a certain type. As such, by simply looking at some identifier's name, you can determine what is the type it identifies, be it an actual type in the programming language (variable, function and so on) or a higher-level design construct (constants, accessor functions and so on). While this is useful, care must be taken to not be too specific when augmenting the name with type information, since renaming the variable each time its type slightly changes can be time-consuming.

Now, naming conventions do not feel like such "design", but you'd be surprised by how much impact it has on the overall usability of maintaing the source code. Actually, well named identifiers provide a kind of basic documentation, to the point where the code can be considered "self-documented" (compared against full sentences in comments or externally written documentation). It is not surprising, then, that casual programmers with little care about code maintenance, especially those from a background of mathematics or natural sciences, fill their code with generic identifiers such as "x", "y" and the infamous "foo" and "bar". Actually, one of the great difficulties when reverse-engineering code from a lower level (assembly or VM byte code) to a higher-level programming language is that those representative identifier names are lost, all replaced with automatically generated names ("x1", "x2"…), making the understanding of what those identifiers are supposed to represent difficult, even in the context of their use in the code.

This is the simplest form of what I call "software conception", that is the act of associating a cognitive concept to an abstract idea, with the purpose of making it easier to grasp to humans, while at the same time having inherent pragmatic qualities in the context of running software. This isn't something new to software design: Mathematicians still have to "name" newly defined mathematical concepts, and the way to "name" them can make a concrete difference when used by other mathematicians. For example, while differential calculus was proven simultaneously by both Newton and Leibniz, Leibniz's chosen notation stood the test of time by simply being easier to use, regardless of Newton's claim that he proved it first and that his notation should prevail.

Software conception goes well beyond "naming". The repeated act of programming software solutions led over time to the awareness of some useful patterns that became well-known in the literature. Object-oriented programming is one of such patterns deemed so useful that it became the central paradigm of new programming languages. The "gang of four" book titled "Design Patterns: Elements of Reusable Object-Oriented Software" collected many proven patterns built in object-oriented languages, listing the various advantages and disadvantages for each. But these patterns aren't simply used to make the software better, but also as some basis to conceptually describe the ever-increasing abstractedness of high-level software structures. As object-oriented design gained acceptance in large-scale business software, this ultimately led to descriptive languages of such patterns such as UML.

Now, the greatest disadvantage to the rise of the use of design patterns is their over-use, in a feeble attempt to use such patterns to automatically generate code, led to an absurd level of abstractions and redirections that significantly impact the code's performance. Nowadays you can still stumble on some "enterprisey" code that have class names ending with "AbstractFactorySingletonProxy", and updating a row's field in some relational database would imply nested function calls that average in the hundred, with as many objects allocated for each such change. While this seemed good in the early 2000s from a hardware vendor's perspective, it lead many programmers away from most software design patterns towards a myriad of higher-level, type-less scripting languages that are often slower and more difficult to maintain. There is a great loss in this, because when used sparingly those design patterns do make programming (and understanding) much easier at some small performance cost.

It should be noted that software design patterns go well beyond what was described in the "group of four" book, and also well beyond within the sphere of object-oriented programming. It is also quite possible to create new patterns specific to a small domain, such as defining usage patterns for an application programming interface, but this leads one's to the difficultly of defining those new concepts in such a way that they are clearly understood by users while still being useful and practical for a computer. Such "software conception" is not currently well understood and thus led to many usability aberrations we now merely accept in software we use every day.

Convenience

One direct and clear side effect of using consistent patterns is that it makes it possible for some software to manipulate code automatically based on those patterns. The earliest and simplest example is as simple as using file extensions to identify the contents of source code files and intermediate compiled object code, making it possible for a code building tool to infer from those extensions the steps needed to compile the code in executable form.

Looking at things upside-down, what if some software would be able to generate those patterns in the first place, as a result of automatically generating code from some set of rules? This would not only enforce design rules, but also would conveniently save some programmer's work.

This leads us to another important software design quality: Convenience. Anything that can help the programmer write less code will in effect help the programmer make less mistakes (assuming that whatever makes things more convenient doesn't introduce bugs on its own). Since a large part of a programmer's work is (should be?) writing code, then the greatest convenience would be tools that could either write that code automatically, or not having to write that code at all.

That distinction between tools that generate code or avoids it is clearer when you consider that convenience is most needed for code that doesn't require much intelligence: Integration code, also called "glue code". Early examples of that are tools integrate some object-oriented language with a relational database. With some object-relational mapping tools, code is generated in the target programming language that implement the tedious task of mapping the object's data members with their relational equivalent. With others, if the target language supports run-time introspection of the objects' structures, the mapping is done transparently during the code's execution by a third-party library. The choice between code generation and run-time dynamism when mapping between two domains is essentially a choice between speed and flexibility. Either way, the work needed on the programmer's part is kept to a minimum.

There are risks involved in using tool to generate code automatically. The most obvious one is that since generated code is often not made to be read by programmers, it can be excessively difficult to debug such code. Also, generated code may be inflexible to changes and may require to be regenerated each time the environment it runs in changes; not doing so could lead to very difficult to fix integration bugs.

Tools that exist to simplify integration between two different domains also could introduce a level of complexity that would dwarf convenience if the two domains don't map well, even in cases when dynamic code is supposed to map the two automatically. A few examples include mapping a SQL-based relational database to a collection-based object-oriented programming language, or mapping simple objects to a document-based service interface like SOAP.

The difficulty in mapping two domains as fully as possible is that each time you want to mask one domain's functionality with another's that are conceptually far apart you will end up with something that is difficult to grasp. Put simply, you'll end up with complex hacks that few will can understand (if it can be understood correctly), and so few that will want to use. Over time, this led to another trend in software design to simplify as much as possible those interfaces.

Simplification

After all, good design also applies to programming interfaces. While many of their design considerations follow good software design practices in general, the fact that they present an opaque layer to another conceptual domain brings in new considerations.

One of them is how much of the domain's complexity must be minimally used. Standards like the aforementioned SOAP and SQL require their users to know a great deal of the other domain to even minimally use it. Also, to make use of the domain in an efficient way, one may have to learn a great deal about the domain's internal design and implementation, which may overly burden the programmer with unnecessary domain-specific knowledge.

This is what lead to a push to use simpler interfaces in distributed systems. The UNIX text-based input/output is such an example of a simple interface. But as domains increase in complexity by feature-creep, there is also a push to break down those complex software systems into multiple parts thus unburdening their users to fully grasp them when interfacing with them.

Simplifying interfaces is actually the most difficult form of software design. It isn't simply about cramming as many features as possible in as few interface points or paradigms. That would be roughly the equivalent of stuffing binary data as Base64 within a SOAP request that in turn contains another application-specific XML structure.

By having a simpler interface, it will likely reduce the need to produce large amounts of integration code, which in turn would require code generation or run-time mapping. That way, having a minimal amount of "glue code" becomes acceptable. Another indirect effect of having a simpler interface is making it more likely to have a greater acceptance amongst developers. Not only that, but implementations of the interface are more likely to be interchangeable. In effect, a simpler, unambiguous interface makes it more likely to become a de facto standard that increases productivity rather than hinder it with unnecessary knowledge.

Almost the same can be said of programming languages themselves. As an attempt to increase productivity, domain-specific patterns are "bubbled up" to the programming language itself as new syntactic shortcuts (unlike doing so out of programming patterns). In the long term, the language becomes rife with obscure syntax that makes its learning even so more difficult. While excessively simpler programming languages might introduce a programming work burden on its own, simplifying a language to implicitly hide annoyances of the domain greatly helps reduce the burden while still making the domain details available if there's a need for it. Another indirect effect of a simpler language is that by hand-picking only a few software design patterns syntactically available in the language you can push developers consistently towards better design.

Conclusion

Up to this point, I only hinted at to what is good software design, without going too much into detail. Software design can have a great impact on the entire range of software development, from the OS interface, to a library's API, to an automated compilation tool, to variable names, to the way you name files on disk. It is easy to understate how much good design can have on software development, especially considering how much can be done in spite of horrible design. After all, with an infinite number of monkeys in an infinite amount of time...

Good software design is different than just "problem solving", as it tries to make something that avoid future problems. It is also different than being proficient at writing a lot of good code, since it tries to reduce the amount of code in the first place and do so while expecting programming errors. It is also different than optimizing for code performance, since it also considers the programmer's future performance in understanding the code, even if at times it may create sub-optimal code for the computer.

In effect, good software design often goes against our current metrics and expectations of how to write software so much that its effects are often surprising, if not considered "miraculous". While too much of the software development commercial market depends on expediency, there is great value in writing beautiful software that will stand the test of time.