Engineer in the white spaces
A system consists of interdependent programs. We call the arrangement of these programs and their relationships "architecture". When we diagram these systems, we often represent individual programs or servers as simplistic little rectangles, connected by arrows.
One little arrow might mean, "Synchronous request/reply using SOAP-XML over HTTP." That's quite a lot of information for one glyph to carry. There's not usually enough room to write all that, so we label the arrow with either "XML over HTTP"---from an internal perspective---or "SKU Lookup"---for the external perspective.
That arrow bridging programs looks like a direct contact, but it isn't. The white space between the boxes is filled with hardware and software components. This substrate may contain:
- Network interface cards
- Network switches
- Firewalls
- IDS and IPS
- Message queues or brokers
- XML transformation engines
- FTP servers
- "Landing zone" tables
- Metro-area SoNET rings
- MPLS gateways
- Trunk lines
- Oceans
- Cable-finding fishing trawlers
There will always be four or five computers between program A and B, running their software for packet switching, traffic analysis, routing, threat analysis, and so on. As the architect bridging between those programs, you must consider this substrate.
I saw one arrow labeled "Fulfillment". One server was inside my client's company, the other server was in a different one. That arrow, critical to customer satisfaction, unpacked to a chain of events that resembled a game of "Mousetrap" more than a single interface. Messages went to message brokers that dumped to files, which were picked up by a periodic FTP job, and so on. That one "interface" had more than twenty steps.
It's essential to understand that static and dynamic loads that arrow must carry. Instead of just "SOAP-XML over HTTP", that one little arrow should also say, "Expect one query per HTTP request and send back one response per HTTP reply. Expect up to 100 requests per second, and deliver responses in less than 250 milliseconds 99.999% of the time."
There's more we need to know about that arrow.
- What if the caller hits it too often? Should the receiver drop requests on the floor, refuse politely, or make the best effort possible?
- What should the caller do when replies take more than 250 milliseconds? Should it retry the call? Should it wait until later, or assume the receiver has failed and move on without that function?
- What happens when the caller sends a request with version 1.0 of the protocol and gets back a reply in version 1.1? What if it gets back some HTML instead of XML? Or an MP3 file instead of XML?
- What happens when one end of the interface disappears for a while?
Answering these questions is the essence of engineering the white spaces.