Radeon HD 2000-4000 Series 2D Hardware Acceleration
Posted by Hans de Ruiter
It has been almost a year since I last posted an update to this driver development log. This was never intended, and my apologies to anyone who was hoping for more regular updates. Things have been rather hectic, and I had decided early on to only post updates when there is something concrete to show. Updates about how many lines of code have been written are boring. That's not to say that there have not been any milestones reached along the way; just that they have been "internal" milestones, which are less interesting to users. Anyway, an update is long overdue, so here it is.
I am pleased to announce another major milestone has been reached. 2D blitter acceleration is now operational on R600/R700 chipsets (Radeon HD 2000-4000 series cards). A few 2D blitter functions are still to-do, but the most critical functions have been implemented, and R600/R700 cards are now finally usable (albeit minus compositing). In fact, my primary Amiga OS development system is now a Sam 460ex with a Sapphire Radeon HD 4650 Ultimate Edition (passively cooled, which is nice), with the old A1-XE now only used for additional testing, and the odd 3D stuff. I'll write more about the new system later; what is key here is that the R600/R700 driver has reached a stage that I now use it every day, rather than only for testing.
There is more good news. Many have asked if the driver will be on the Sam460ex AmigaOS 4.1 CD, and the answer is yes. I have reached an agreement with ACube Systems, and the driver will be included. Having used a Sam 460ex with a Radeon HD 4650, I can say that it is definitely a step up from my A1-XE. Having a real PCI-Express bus (4x) instead of using a PCI version definitely makes a difference. With both the Sam 460ex, and the upcoming A1-X1000, I am quite excited about 2011, and the possibilities that all this new hardware brings. I am eager to help bring 3D support to the new Radeon HD cards, which will be a giant leap forward in graphics capabilities.
The Long Road to Here
Getting R600/R700 acceleration up and running proved to be more effort than I had anticipated (surprise, surprise). Unlike the R500 and previous chipsets, its command processor has no "push" mode, in which commands can be sent by writing to a set of registers. The command processor is responsible for executing rendering commands sent to it by the CPU. To make matters worse, the command processor registers are missing from the documentation, so the code that AMD released (see the Linux drivers), was the documentation. So, with only source-code for documentation, I had to implement "pull" mode command submission. In "pull" mode, the CPU writes commands to a ring-buffer in memory, and then tells the GPU's command processor that there are more command packets to read. The command processor then reads the commands in directly from that memory. To avoid individual Tasks (or Threads/Processes) from blocking each other, Tasks generally do not write to the ring-buffer directly, but to indirect buffers. The driver writes an entry to the ring-buffer telling the GPU to execute the submitted buffer, which the GPU once again reads directly from memory. Using GART, it can read the commands directly from main memory which, thanks to it using DMA, is faster than individual CPU writes.
The down side to "pull" mode is that it requires a lot more support code. It isn't just the ring-buffer code, there is also the indirect buffer management code, and then the 2D blitter acceleration running on top of that. To save time later, the R600/R700 command processor code has been implemented directly into the RadeonHD_RM.resource, which the 3D drivers (and anything else that needs it) will use. Internally, the 2D blitter acceleration also uses the RadeonHD_RM.resource to access the GPU. I will write more about the resource and technical details at a later date.
Rather than try to implement GART and the command processor code at the same time, I opted to put the ring buffer and indirect buffers in VRAM on the graphics card. Performing one step at a time is good Engineering practise, and helps isolate problems. For example, if there were a problem, was it caused by the GART code? The ring-buffer code? Or even both? It turned out that putting the command buffers in VRAM brought its own problems related to data coherency. In some cases the data had not made it to VRAM before the command processor tried to read it. This was just one of the many technical issues that got in the way and were overcome.
Possibly the biggest technical problem that I encountered was a direct result of using PCI versions of what is actually a PCI-Express graphics card. To get it connected to a PCI bus, manufacturers put a PCI-to-PCIe bridge on the board (a PEX 8111 or 8112, to be precise). The performance of the PCI cards was on the slow side. When I benchmarked the CPU's VRAM write and read speed, I got a shocking 0.7 MiB/s! Since not everything is HW accelerated, this read speed really hurts. Fortunately, there was a solution. The PEX 8111 and 8112 bridge chips have a "blind" prefetch feature, that allows the bridge to prefetch blocks of data, thereby decreasing the large latencies of individual reads from the PCI side of the bridge. Setting those up correctly helped boost the read speed back into an acceptable range. I have also made some changes to Picasso96 that improve performance. Having said that, I do hope that the A1's UBoot will be updated so that the Radeon HD card can be used in the 66 MHz slot rather than the 33 MHz slot. Any increase in bandwidth to the graphics card will help; particularly with R600+ cards, as their command packet sequences are longer for 2D acceleration than previous Radeon cards.
The Road Ahead
I am sure that many are asking, "what's next?" Well, in no particular order, here are some of the things still to come:
- DVI output on Radeon HD 4000 series cards (it partially works, but it is still to do)
- Support for R600/R700 VBlank interrupts
- R600/R700 compositing (and HW acceleration of the last few blitter functions)
- 3D drivers using Gallium 3D (requires finishing off the RadeonHD_RM.resource)
It has already been mentioned that Gallium 3D drivers for these cards will be coming, and I am looking forward to getting those up and running. This is no longer only a 2D driver project.
I wish you all a Merry Christmas, and a Happy New Year.
Projects » Amiga OS 4 Projects » RadeonHD Driver » RadeonHD Development Log » Radeon HD 2000-4000 Series 2D Hardware Acceleration
Post your comment
Comments
-
@Jeremy
I would like to support newer cards than the HD 4xxx series, but I don't know when I might have the time to work on that. My highest priority is to make the existing cards fully supported (as much as is practically possible), including 3D.
Hans
Posted by Hans, 26/12/2011 3:03pm (13 years ago)
-
Do you have any plans to update the P96 drivers
to deal with any boards newer than the HD4xxx series?
Posted by Jeremy Kajikawa-Sutherland, 26/12/2011 10:15am (13 years ago)
-
Hi Chip,
Thanks.
Yes, I know that the newer cards support audio over HDMI. It's on my "would be nice to support" list, but is not a priority at present.
Hans
Posted by Hans, 09/02/2011 4:46pm (14 years ago)
-
Nice work Hans! Keep it up!
Are you aware of HD4xxx and some lower cards are capable to push sound (LPCM,DTS,DD) stream throuogh DVI ? ;)
Posted by Chip, 09/02/2011 10:26am (14 years ago)
-
Thank you Hans
Posted by jacknife, 26/12/2010 5:45pm (14 years ago)
RSS feed for comments on this page | RSS feed for all comments
Projects » Amiga OS 4 Projects » RadeonHD Driver » RadeonHD Development Log » Radeon HD 2000-4000 Series 2D Hardware Acceleration