
I use a different test scene now, think of a small quake level with radiosity update in real time.
Lightmap resolution 10x10cm, updating 1/4th of texels per frame.
R9 280x: 6.2 ms
Titan: 15.3 ms

On Ati i buffer traversal results in local memory to write them in order.
On NV this is a slow down and it's still faster to write to random memory locations immideatly.
First time i see picking different algorithms for different cards makes sense.