
The main issue with understanding Carry look-ahead adders is that you have to take into consideration that to find the current Carry value you still need ALL the previous inputs. Therès no delay for this, since they are inputs, but therès a problem with the fan-in getting bigger and bigger.
To remember the current Carry value I just say that every Ci depends only on C0, and all previous inputs.
The fan-in problem happens because of this, because we need to reference ALL of the previous inputs, therefore gate calculations becomes expensive for all the inputs.
To solve this we sort of group every 4 inputs (usually it's 4), and calculate the Carry only for those 4 inputs at a time. Then we just pass the resulted carry to the next group by the means of ripple-carry (a delay applies here).
This is a mixed solution, but it works for real life behavior.