I toyed with the idea of vectorising this, but it seemed too hard.
In essence, SAL (System Abstraction Layer) starts the machine with one processor running (the boot processor) and the others asleep.
The boot processor jumps to _start in head.S, which checks a variable task_for_booting_cpu to see if it is the boot processor or not.
After that, it jumps into platform specific code and gets the machine ready to go. Eventually we fall into smp_init() which starts the other processors. For each other processor in the system we call do_boot_cpu which sends an IPI (inter-processor interrupt) wakeup to the other processor.
The other processor wakes up, and again jumps into _start, however this time when it checks the task_for_booting_cpu it will be set as the idle thread, so it knows it is not the boot processor. It jumps into start_secondary but largely follows more or less the same path, but skipping the platform setup stuff. Eventually it calls smp_callin to flag back to the boot processor that it is alive and sitting in the idle thread.
The boot processor waits a few seconds for each CPU to check in as alive before assuming the worse and moving on. Once all the CPUs are online, the system is pretty much booted.