Benchmark » Benchmark Tool Development Log » GfxBench2D Client Updates for AmigaOS and Windows

GfxBench2D Client Updates for AmigaOS and Windows

Both the AmigaOS and Windows versions of GfxBench2D clients have been updated. There was a minor issue in which a graphics card whose name contains particular characters would fail to save. The reason is that files cannot use certain characters, and so attempting to save a file with say, a slash '/' character (which is used to separate directories in a path) would not work. This issue has now been fixed.

The AmigaOS version also updates the RAM-to-VRAM and VRAM-to-RAM tests to to take advantage of the altivec unit in processors that have one. Altivec (a.k.a. VMX or Velocity Engine) is a vector processing unit which can perform operations on multiple numbers (i.e., vectors) simultaneously. The x86 equivalent are the SSE extensions, of which there are multiple versions. It just so happens that the altivec unit can also be used to speed up non-DMA copies.

How Altivec Speeds up Transfers to/from VRAM

Why would using altivec help? Well, computers tend to be more efficient at transferring data in large blocks instead of individual bytes, or even individual 32-bit words (i.e., 4 bytes). In the case of PCI-Express, there is a huge overhead for transferring data in tiny blocks. According to PLX Technology, efficiency drops to 60-70% when transferring blocks of 64 bytes, and drops off sharply below that (source). This puts CPU-based transfers at a significant disadvantage, unless the PCI-Express controller has means of merging multiple writes/reads into larger blocks.

Altivec vectors are 128-bits long, or 16 bytes. This main PowerPC arithmetic unit works on numbers up to 64 bits (8 bytes). So, when altivec reads or writes data to/from VRAM, it requests data in blocks twice the size that non-altivec reads/writes do. The larger block size results in more efficient data transfer.

On the AmigaOne-X1000, this difference results in approximately double-the transfer rate in either direction. To see the difference, compare this result from version 2.5, with this result from 2.6. Version 2.6's Copy to/from VRAM results are roughly twice that of 2.5, and four times the speed that WritePixelArray()/ReadPixelArray() currently manage.

GfxBench2D
Version

RAM to VRAM (MiB/s) VRAM to RAM (MiB/s)
2.6 446.85 24.18
2.5 223.28 12.28

it is still a fraction of the speed that is possible with DMA (support for which is still to-do in the RadeonHD graphics driver), but it is a big improvement over the non-altivec copy routines.

EDIT (2012/06/18): It appears that the new improved code is significantly slower on AmigaOne-XEs with 74xx processors. On other machines though, the new code does improve performance of the RAM To VRAM test. I have decided that I will not add special code that is specific to one particular motherboard and CPU combination. Developers may find these results of use, as they try to optimise their code to work well on as many machines as possible.

EDIT 2 (2012/06/19): After discovering the cause of the problem, I decided to release an update that works around the issue after all.

Developers Please use WritePixelArray()/ReadPixelArray()

This might come as a shock given the results above, but I strongly urge developers to use WritePixelArray()/ReadPixelArray() for transferring data between RAM and VRAM. There are two reasons for this. Firstly, the same altivec enabled algorithm that GfxBench2D 2.6 uses will make its way into a future version of AmigaOS 4.x (hint: I used GfxBench2D as a test-bed for developing copy routines). More importantly, WritePixelArray()/ReadPixelArray() use DMA on the Sam460ex, and this will eventually be expanded to other platforms too. DMA is the only way to achieve high efficiency transfers with PCI Express, which you will not be able to access directly from your applications. There is no way that a CPU-based copy routine can come anywhere close to the performance of DMA. If you insist on using your own routines, then at least include a speed comparison test into your software, and use WritePixelArray()/ReadPixelArray() when it is faster.

Old versions are Banned

Since the results of one of the tests has changed in the AmigaOS version, results are no longer comparable between version 2.6 and previous versions. Hence, results from all previous versions of GfxBench2D can no longer be uploaded to the server. Results from old versions are still available, but have been marked with an exclamation mark and a warning.



Benchmark » Benchmark Tool Development Log » GfxBench2D Client Updates for AmigaOS and Windows

Post your comment

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments


Benchmark » Benchmark Tool Development Log » GfxBench2D Client Updates for AmigaOS and Windows