ePSILON: Software design

Why seeking good design? ==> Tackling complexity.
Why design is a continuous activity (cannot be done upfront)? Because software is malleable, alike physical system.
Complexity indicators:
- How it takes to understand existing System?
- How it takes to add a new feature?
- Probability of regression when fixing a bug?

Complexity symptoms:
- Change amplification (small change requires lot of modifications)
- Higher cognitive load (small task requires to spend more time on learning)
- The worst: you have enough time, but no way to figure out where to modify or what should be learned to complete the task.

Two causes of complexity:
- Dependencies & leaky abstractions (e.g. causes change amplification)
- Obscurity & over-abstraction (e.g. leads to higher cognitive loads)

Two approaches to fight complexity:
- traditions-based: use common patterns and names + Avoid special cases + speak the code jargon (ie. develop a coding tradition obvious for all involved people).
- Modular design: separate and encapsulate complexity on modules.

A module can be: a procedure, method, class, component, subsystem ...
Modules have an interface & an implementation.
Good modules have simple/minimal interface, and hide complex/maximal functionalities.
Metaphor: the benefit provided by a module is its functionality. The cost of a module (relatively to other subsystems) is its interface.

Single-responsibility syndrome: In system design, it's advised to break up large entities to smaller parts, each part encapsulate a responsibility.
An excessive use of such advice could lead to many (unnecessary) shallow objects with"combinatorial-explosion" of interfaces.

Prefer Right-responsibility guideline instead of Single-responsibility

How to create deep modules?

Information hiding (cf. David Parnas):

Identify all pieces of knowledge and separate them in isolation.
Each piece of knowledge correspond to a design decision.
One design decision should not be reflected in multiple modules
Encapsulate each design decision in one separate module.
The design decision doesn't appear on the module interface.
This will lead to fewer dependencies between modules.

Example: Http request / response manipulation.
Maybe-Wrong alternative: Two separate objects, RequestReader & ResponseWriter.
If the content format (piece of knowledge) changed, both objects have to as well.

Good alternative: Merge two objects into one:
HTTP {
read(request);
write(response);
}

General theme: Information hiding can often be improved by making a module slightly larger.

Information leakage (the opposite of information hiding) occurs when a design decision is reflected in multiple modules.
Other sources of leakage:

when a design decision is disclosed via module interface.
Temporal decomposition: the structure of system corresponds to the time order in which operations are executed

Beware of information over-hiding which is often caused by false abstraction.
When details important, it's better to put them explicit and avoid obscure design.
Example: general vs special purpose approach. e.g. text editor backspace.

Merge two modules (or create a higher-level module on top of them) if:

They share information
They are used together all the time.
They overlap conceptually
Hard to understand one without looking to the other
Eliminate duplication

Separate two modules if:

One has general purpose mechanism, and the other is special purpose
The special purposes module should probably be on top of general purpose module.
In general: lower layers of a system tend to be more general-purpose, and the upper layers more special-purpose
Two modules haves different abstractions

Part of complexity is generated by special case code.
Exceptions are a significant source of special cases. It is advised to reduce exception handling either by:

Best way : redefine special code out of existence. Thus, corner case is eliminated and captured as part of main-case code (exp : polymorphism)
Mask exceptions at a low level (exp: safe functions. e.g. rm in Linux vs Windows)
Aggregate exceptions by one generic handler

Commenting Code

Any piece of code should answer two questions "what" the problem is, and "how" the problem got resolved. When you read a program code, you're actually trying the solution to a problem ("how"); this way we care about readability. However, most of the time it's more practical to know "what" a program is supposed to do, for some reasons:
- I don't care how a library was implemented. My only concern is to be sure either the library satisfy my requirement or not (the "what").
- Even if the code is easy to read, any readers could make incorrect assumptions, or misleading interpretations. Also, the implementation ("how") may cover only some partial use cases of my real requirements ("what").
- Understandability of what a program is supposed to, is different that program readability.

Therefore, we need comments to:
- Provide a high level description of what a program is doing.
- Describe the meaning of inputs / results of a program.
- Rational for a particular design decision
- Comments is an other form of abstraction, that hides complexity of a program.

The overall idea behind comments is to capture information that was in the mind of the designer but couldn't be represented in the code.

Notes

Cohesion --> Single responsability / Interface segregation
Inheritance --> Liskov substitution
Encapsulation --> Open closed object
Loose coupling --> Depedency inversion

Sometimes it's more significant to test behavior rather than implementation. In such case, it's not necessary to write unit tests for each individual internal structure (For instance, we want to test a public function, but not its nested private functions).
A TDD principle: Considering how to test your code before you write it is a kind of design.
TDD Motivation: Meet requirements + Code Quality (not only safety net).
Unit tests could be situated between two other process: Acceptance testing and Development. Therefore, Analysis & Design is derived from unit testing which is derived itself from acceptance testing (which is derived from the big picture of the behavior we want).

In the industry, legacy code is often used as a slang term for difficult-to-change code that we don’t understand.
Adding a new behaviour is subtly accompanied with changes in existing ones.

Refactoring: Keep the same behaviour + change structures --> Improve maintainability.
Optimization: keep the same behaviour + change in involved resources --> Performances.
Regression testing are useful to check if the system still working properly after changes. But probably you get this feedback pretty late.
Amazing note: Unit tests run fast. If they don’t run fast, they aren’t unit tests.
Much legacy code work involves breaking dependencies so that change can be easier.

When you have to make a change in a legacy code base, here are steps you can follow.

Identify change points.
Find test points.
Break dependencies.
Write tests.
Make changes and refactor.

Design methods

Two big categories of design methodology: Formal design and Systematic design
Systematic Design classification:

Level-Oriented Design: focus on functions / features.

Top-down (decomposition): best-suited when the problem and its environment is well-defined.
Bottom-up (composition): When the problem is ill-defined, the approach should mainly be bottom-up or mixed

Data Flow Oriented Design (structured design): (exp: SADT) focus on input-output transformation which is partitioned in input-output sub-transformation... etc. Convert the outcome of structured analysis (e.g. DFDs) into a structured design. Two methods of conversion:

Transform analysis: simple transformation input=>transform=>output
Transaction analysis: event triggers many dependent or independent transformations

Data Structure-Oriented Design: (exp: Jackson Systems Development ) Special focus on input and outputs as hierarchy data. The outcome is a design defined by a set of data structures that collaborates.

Object oriented design: focus on units (objects). Each unit couples operation to data (modular design). Often begins with bottom-up approach (then it uses decomposition to reach modular units).

Appendix (personal thoughts, might be completely useless):
- Level: stage of construction or deconstruction process (exp: levels of an abstraction thinking). Stage N may or not know about N+1 and N-1 stages.
- Hierarchy: relationship + actors involved in such relation knows about each other (exp: members family relation is hierarchical).
- Transaction: A set of operations/decisions that must be occur to lead to an "agreement" between 2 or more "actors".
- Object vs Data-structure:
    * First, they are not on the same level of comparison
* Object-view try to depict directly real world in terms of entities that collaborate.
        Each entity is  defined by who (identity) and what (abilities).
    * Data-structure-view assumes existing of data as an abstract thing (hens, not mapped
        directly to real world); then try to find how it can be shaped following a specific law or pattern.

Domain driven design

Timeline:

- Tackling complexity is the heart of software (domain complexity, and not the technical one)
- Engage with domain experts (use their terms, culture, concepts ...)
- Separated (on time) workshops with domain experts. It's preferable to involve more than one software designer during discussions (typically two: solution designer and architect)
- Split the Domain into subdomains, define boundaries and connections
- Focus on scope-involved subdomains, and and shape a first high level model for each.
- Unify all communication means (business, technical, specs, design, code, tests ..) under (one) ubiquitous language per bounded context
- Each subdomain should be bounded to a context, so that names, entities, classes, code, schema ... don't expand / overlap with other subdomains bounded contexts
- A subdomain is a problem domain concept; its bounded context is a solution domain concept.
- Shared kernel is where shared concepts / entities / code live (cross cutting concerns)

- Identify objects on a bounded context:

Entities: have identity, mutable or read-only.
Values: no identity, immutable, no side effects.
Domain services: not part of Entity/Value. Consolidate entities and values. Stateless.

- An entity might have a complex identity -e.g. guessing the Entity from secret informations-
- Entities should have a minimal responsibility, and delegate business logic to value objects or services.
- Avoid equality on Entities (it's not always trivial what to check, identifier or properties)
- I see two ways for Entity-Value collaboration:

Entity wraps values: Entity are the Orchestrator/Controller
Value wraps entity: Entities are hidden on lower level

- To reduce objects relationships (complexity): classify objects on clusters. Each cluster has a root and childes. Clusters are related by mean of their root.
On DDD, these clusters are called "Aggregates"
- Aggregates are consistency boundaries to enforce invariants.
- Repository: represents all objects of a certain type as a conceptual set. It acts like a collection, except with more elaborate querying capability; e.g. Account is an aggregate root, then :

AccountRepository{
add(Account)
set(Account)
remove(Account)
query(criterion)
}
- Repositories focus on aggregate roots
- An aggregate is either:

Persistence ignorant: In memory consistency (cluster state)
Persistence aware: In memory and database consistency. The point here is to delegate database consistency to a Repository

- Events: Domain event (communicate onside bounded context) VS application event (cross bounded contexts)

Software design

Commenting Code

Notes

Design methods

Domain driven design

No comments: