2.2 Memory

The memory of modern computers have vast capacity, in billions of uniquely addressable locations. Memory can be used to store data and code. As a result, it makes sense to connect an ALU to memory so that an ALU can directly access the values to feed an ALU, and to store results directly back to memory.

There are two problems with this approach.

2.2.1 Physical limitations

Because the capacity of memory is huge, they cannot be located in the the processor. For those who want to know why, the limitation is due to the number of transistors we can put on a single silicon die, and more importantly routing resources on a silicon die.

As a result, memory is implemented as a separate module that is connected to a processor by copper traces on a printed circuit board. Modern PCs also have a north bridge that interfaces memory to the processor.

The physical separation of the processor and memory poses certain physical limitations. Inductance and capacitance are unavoidable by products of any copper trace on a circuit board. These physical properties delay signal change from one end of a copper trace to the other end.

As a result, there is a limitation as to how quickly we can move data to and from memory.

Here is a side note related to terminologies. Modern PC memory is DDR2 (double data rate 2), which is basically quad data rate. This means that for each clock cycle, there are up to four memory transfer. Memory is rated by two equivalent units. When used in the DDR2 method, the number after DDR2 describes the number of memory transfers per second in millions. For example, DDR2-1066 means the memory module is capable of 1066 million transfers per second.

The other metric is the PC2 designation used to describe memory modules. Because memory modules transfer 8 bytes per transfer, the PC2 designation is usually eight times DDR2 designation. A DDR2-1066 memory is also known as a PC2-8500 modules because it is capable of transfering 8500 million (8.5 billion) bytes per second.

Note that this is based on the most optimal conditions (reading of consecutive locations in a burst). When memory access pattern is random in nature, memory transfer can drop significantly, back to the basic clock rate of the memory bus. A DDR2-1066 memory module drops back to up to 266MHz of memory transfer in the best case. With addition of address latching and etc., we can only have a fraction of 266 millions transfer per second.

A processor, on the other hand, is rated by the frequency of the core, which is between 2GHz and 3GHz. Simple instructions like add and subtract are single clock instructions. This means the ALU can potentially add numbers 2 to 3 billions per second.

What this all means is that memory cannot feed the ALU nearly fast enough by 10 times! This is the first problem of using only memory to supply values to the ALU and store results from the ALU.

2.2.2 Opcode size

If all operations connect memory to the ALU, then each instruction must be able to specify any one of the 4 billion locations in memory for the first value, the same for the second value, and one to specify where to store the result.

32 bits (binary digits) are needed to specify one of 4 billion locations. This means that an add instruction needs 96 bits just to specify where to get the two input values and where to store the result. This means that instructions will grow big. This, in return, taxes performance as more memory cycles are wasted to get instructions. It also make program (code) much bigger.