В гостях у Оптерона и Целерона (Гладиаторские бои в Колизее)

Информация о пользователе

Привет, Гость! Войдите или зарегистрируйтесь.



Llano do AVX

Сообщений 1 страница 3 из 3

1

http://www.chip-architect.com/news/AMD-LLano-analysis.jpg
few observations suggest that AMD's Llano could do AVX instructions.

1) A reasonably large new block next to the FP register file.
2) Something what could be a new 3-way extra decoding stage in front of the FP units.
3) The large increase in size of the reorder buffer (3x24 to 3x32 or 3x36)

-It would be faster even if it's still using 128 bit hardware for the 256 bit
operations since typically many time slots are unused in FP units.

-The AVX performance would be ultimately limited by the cache bandwidth
to/from the SSE/AVX units (32 byte/cycle versus 48 byte/cycle for Sandy
Bridge)

-The 256 bit operations would be split into independent 128 bit operations
which would explain the increase in size of the reorder buffer.

-The size of the 3-way decode pack stage in front of the Integer units
has also increased also suggesting that something is added to the
decoding units (cache access for 2x128 bit words?)

------------------------------

Some extra points:

The second level TLB units for the data cache have been doubled from
512 entries to 1024 entries.

There is extra integer logic. A good guess would be a faster version
of the Integer divider. One that can produce multiple result bits/cycle
like the ones in the Core2 and Nehalem architecture.

0

2

It's not that "crippled", not by a factor 2 (=256/128). For example:
If an SIMD FP add takes 4 clock cycles then:

128 bit: A+B+C takes 8 clock cycles.
256 bit: A+B+C takes 9 clock cycles. (using pipelined 128 bit hardware)

128 bit: A+B+C+D takes 9 clock cycles.
256 bit: A+B+C+D takes 11 clock cycles. (using pipelined 128 bit hardware)

It all depends on how many unused time-slots there are due to the data
dependencies. A bigger bottleneck for Llano would be the L1 cache access
bandwidth: 32 bytes/cycle for Llano versus 48 bytes/cycle for Sandy Bridge.

0

3

OPTERON, ну переводи пожалуйста гуглом! У меня мендосино виснет, когда я переводить начинаю...

0