Archive for the ‘Linux’ Category
Pixman gets NEON support
Friday, June 12th, 2009I’ve been working on NEON fastpaths for Pixman lately, and as I write, these are being pushed upstream, hopefully in time for Pixman’s next stable release. They complement some work already done in this area by engineers at ARM. Some ARM hardware does use 32-bit framebuffers, but hardware constraints still seem tight enough that 16-bit framebuffers are still common. So while the ARM guys focused mostly on 32-bit framebuffers and some internal operations, we focused firmly on 16-bit framebuffers.
For those who don’t know, Pixman is a backend library shared by Cairo and X.org, which takes care of various basic 2D graphics operations when there isn’t any specific GPU support for them. It gets pretty heavy use if you use the XRender protocol on a bare framebuffer, for example. So optimising Pixman for the latest ARM developments will make Gecko faster, as well as any of those “fancy” compositing window managers which are all the rage these days.
Now the following operations are accelerated, all on RGB565 framebuffers (which may or may not be cached):
- Flat rectangular fills. (These also work on other framebuffer formats.)
- Copying 16-bit images around.
- Converting 24-bit xRGB images (eg. a decoded JPEG) into the framebuffer format.
- Flat translucent rectangles.
- Compositing 32-bit ARGB images (eg. a decoded PNG).
- Glyphs and strings thereof (8-bit alpha masks, with an overall colour that might be translucent).
Most of the listed operations are now at least twice as fast as they were without NEON, and many come within spitting distance of available memory bandwidth on typical ARMv7 hardware. Using a benchmark of common operations (as issued by a common Web browser visiting a popular news portal), we measured an overall doubling in performance, despite the most common drawing operations being extremely tiny and therefore difficult to optimise.
In some cases on a more synthetic benchmark, the throughput is vastly greater than that, at least when running on an uncached framebuffer (which tends to hurt generic code very badly). The main performance techniques were to read from the framebuffer in big chunks (where required), preload source data into the cache, and then process data in decent-sized chunks per loop iteration. This essentially removes the performance advantage of a “shadowed framebuffer”, so you can now sensibly save memory by turning it off.
We also found some opportunities for reducing per-request overhead in both Pixman and X.org. Hopefully these improvements will also be integrated upstream in the near future.
Maemo5: SGX vs. pixman
Tuesday, March 24th, 2009I finally found the time to check out Maemo5 alpha on Beagle board. I was mainly interested in the X.Org hardware acceleration as they have implemented EXA acceleration API using the PVR2D library for SGX. SGX is the 3D GPU from Imagination Technologies used in the OMAP3 CPU.
I followed the instructions and got it up’n'running in no time encountering nothing undocumented.
Some debug prints from the X.Org output:
PVR2D SGX EXA acceleration initialized
DRI2 initialized
I started with cairoperf and ran it using xf86-video-omapfb driver (no EXA) and then with Nokia’s EXA accelerated fbdev (yeah, they are using the old name).
The results were surprising: 13 relatively small speed ups and 585 relatively big slowdowns.
I also tested with mx11mark and the results were in line with cairoperf results: total score with x-v-o was 17 and with Nokia’s driver only 13.
I wouldn’t mind if somebody proved my tests wrong…
Android on Zoom
Thursday, October 30th, 2008I had never touch Android before but it took only a day to get it running on a Zoom board. I think getting a new complete Linux setup running on a real device could be much harder although there were some bumbs on the way.
I mainly followed the instructions on the omapzoom.org and on elinux.org. The JDK from my Debian Etch didn’t seem to work, so I had to download the JDK from sun.com and set up some environment variables. Once those were set the compilation succeeded without problems. I would have saved some hours if I had read the instructions properly in the first place.
The instructions were a bit vague on to do after the compilation was finished. Eventually I created the card.img that acts as a MMC/SD card and launched the emulator:
./out/host/linux-x86/bin/emulator -sdcard card.img -system out/target/product/ldp1/ -kernel ./prebuilt/android-arm/kernel/kernel-qemu
The emulator runs the real ARM root file system image with qemu so it fully matches the actual file system. The file system includes e.g. ssh so I could IRC with the terminal application but it doesn’t include e.g. cp, which seems a bit odd. The file system directory hierarchy looks completely weird and messy to me. It mounts a read only ramdisk as root and data, system and sdcard under it.
Android provides Android Debug Bridge (adb) that is capable of sending files to the root file system run by the emulator or giving you a remote shell, among other things. I used it to get more complete busybox there and tarred (no compression) the /data and /system directories to the fake MMC/SD card. If I had had a Linux setup that understands YAFFS file systems this would have been a bit easier since I could have used the original images directly without copying the content from the running emulator.
I set up an EXT3 root file system to a real SD card from the ramdisk, system and data tar balls and replaced the init.rc with init.omapldpboard.rc and booted the Zoom board with the uImage built separately according to the instructions.
The desktop (or home) is something new but the menus that open from the bottom actually seem quite traditional. The UI looks quite polished and is very responsive on the Zoom. Everything feels snappier than I would have expected. The Zoom doesn’t have WLAN and I didn’t find anything to set up the wired network with a quick search so I didn’t have a network connection.
I’m sure we’ll see interesting devices (in addition to G1) based on Android in the future.