Artificial Intellience Compendium: AI25031 Neural Networks. V01 071125

🧠 1. What “Scattering” Means in Neural Networks

In a neural network, information isn’t stored in a single, localized spot (like a database cell).

Instead, it’s distributed across many parameters (weights) — a phenomenon known as distributed representation.

• Each neuron contributes partially to representing many different inputs.

• Each piece of input data (like a pixel or word) affects many neurons across multiple layers.

• So, information becomes “scattered” (distributed) across the whole network.

This distributed encoding makes neural networks:

• Robust: small damage to weights doesn’t erase a specific memory.

• Expressive: allows complex feature combinations.

⚙️ 2. How Data Becomes Distributed

During training, data is not stored directly but influences weights through backpropagation.

Let’s say you train a neural network on images of cats and dogs:

1. The image (as pixel data) is fed into the first layer.

2. Each neuron computes a weighted sum → activation → passes it on.

3. The output is compared to the correct label (“cat” or “dog”).

4. The error is backpropagated, adjusting weights throughout the network.

Each update slightly changes thousands or millions of weights, meaning:

A single training example “touches” nearly every part of the network in some way.

Thus, knowledge about cats and dogs is scattered across many weight matrices, not stored in one location.

💾 3. The “Underlying Medium”

When you talk about the underlying medium, that can mean two things:

A. Software Level (Model Parameters)

• The “medium” is the tensor of weights and activations.

• Data is scattered mathematically — every connection (weight) holds a tiny fraction of learned information.

B. Hardware Level (Physical Medium)

• On GPUs/TPUs, these weights and activations live as floating-point numbers across memory chips (VRAM).

• During computation, data is sharded and parallelized across multiple processing cores.

• Example: Large models like GPT scatter matrix operations across many GPU chips via tensor parallelism or data parallelism.

• So, even physically, data and computations are distributed across the underlying silicon.

🕸️ 4. Analogy

Think of a hologram:

• Each piece of the hologram contains information about the whole image.

• If you break it, each fragment still shows a blurry version of the full image.

That’s how a neural network’s internal representations work — information is **

Artificial Intellience Compendium