Tech: Design for Scalability

Common Techniques

Server Farm (real time access)

If there is a large number of independent (potentially concurrent) request, then you can use a server farm which is basically a set of identically configured machine, frontend by a load balancer.
The application itself need to be stateless so the request can be dispatched purely based on load conditions and not other factors.
This strategy is even more effective when combining with Cloud computing as adding more VM instances into the farm is just an API call.

Data Partitioning

Spread your data into multiple DB so that data access workload can be distributed across multiple servers
By nature, data is stateful. So there must be a deterministic mechanism to dispatch data request to the server that host the data
Data partitioning mechanism also need to take into considerations the data access pattern. Data that need to be accessed together should be staying in the same server. A more sophisticated approach can migrate data continuously according to data access pattern shift.

Map / Reduce (Batch Parallel Processing)

The algorithm itself need to be parallelizable. This usually mean the steps of execution should be relatively independent of each other.
Google's Map/Reduce is a good framework for this model. There is also an open source Java framework Hadoop as well.

Content Delivery Network (Static Cache)

This is common for static media content. The idea is to create many copies of contents that are distributed geographically across servers.
User request will be routed to the server replica with close proxmity

Cache Engine (Dynamic Cache)

Resources Pool

DBSession and TCP connection are expensive to create, so reuse them across multiple requests

Asynchronous Processing

The service call in this example is better handled using an asynchronous processing model. This is typically done in 2 ways: Callback and Polling
In callback mode, the caller need to provide a response handler when making the call. Some kind of co-ordination may be required between the calling thread and the callback thread.
In polling mode, the call itself will return a "future" handle immediately. The caller can go off doing other things and later poll the "future" handle to see if the response if ready. In this model, there is no extra thread being created so no extra thread co-ordination is needed.

Implementation design considerations

Use efficient algorithms and data structure. Analyze the time (CPU) and space (memory) complexity for logic that are execute frequently (ie: hot spots). For example, carefully decide if hash table or binary tree should be use for lookup.
Analyze your concurrent access scenarios when multiple threads accessing shared data. Carefully analyze the synchronization scenario and make sure the locking is fine-grain enough. Also watch for any possibility of deadlock situation and how you detect or prevent them. Also consider using Lock-Free data structure (e.g. Java's Concurrent Package have a couple of them)
Analyze the memory usage patterns in your logic. Determine where new objects are created and where they are eligible for garbage collection. Be aware of the creation of a lot of short-lived temporary objects as they will put a high load on the Garbage Collector.