Friday, 7 January 2011
Thursday, 27 January 2011I have already made quite a few improvements to my mupen64 port since the video.
About half an hour after making it, I fixed almost all the missing screens/skies and the clipped mario by adjusting the Z buffer range (previous range was 0..1, now it is -1..1).
I then improved N64 gfx combiners emulation (combiners are sort of early primitive pixel shaders). I use 360 pixel shaders to emulate them. At first it was really slow because I used many switches and loops in it and it seems doing that in a pixel shader isn't such a good idea. I got everything back to playable speeds by using mainly 3 techniques:
* a color lookup table to emulate combiner 'source' (ie vertex color/texture color/constant/...)
* a math formula that handles all the possible cases for the combiner operation (ie mul/add/sub/...)
* having different pixel shaders (one fast that can only do simple things, one intermediate, and one slow that emulates everything) and switching between then when needed.
So now gfx emulation is quite fast but the emu still runs slowly when it emulates floating point intensive scenes like the mario head demo at the begininng of SM64 so I start looking at how mupen64 emulates floating point operations, I quickly discovered that the whole floating point unit was running in interpreter mode, oops!
A few #define later the emu was up to 50% faster.
Next I had an idea: why not try to get the X360 GPU to render my current frame in background instead of actively waiting for it to finish rendering.
Usually you do this:
/* resolve (and clear) */
/* wait for render finish */
Now I do this:
/* resolve (and clear) */
/* begin rendering in background */
and then I call Xe_Sync() at the last time right before beginning my next frame
I got a huge speed boost with this, Super Mario 64 now runs at around 100fps ingame !
Tuesday, 28 June 2011During the last weeks, I worked a bit on improving my mupen64 port, here are the things I did.
As SM64 started to work well, I switched to the Zelda rom for my testing, it is a much more complex game to emulate graphics wise, so obviously it was completely buggy and painfully slow on the first run. The biggest problem was that unlike SM64, most of the rendering was done with my slowest pixel shader, and fixing the bugs would have made it even slower so I decided to do a complete rewrite of the shader. This time I designed it around something I just discovered: constant boolean registers, it allows flow control without a big performance hit. I took me a few tries to get it fast and accurate, but now that pixel shader is almost as fast as the old one on simple cases while being more accurate and much faster on complex cases. I also made my old shader as accurate as I could, it is now used for some rare cases the new one can't emulate. With a few more fixes to libxenon and the emulator (implementing 2D rendering for example), this makes Zelda reasonably fast and playable.
Next game was Mario Kart 64, this time it was fast and looking good on the first try, but crashed after a few races with some 'out of memory' message. It turns out something very important was missing from the libxenon 3D driver: a way to free what you allocate ! (texures/vertex buffers/...) So I replaced the very basic GFX memory allocator with some malloc-like one I found in libxenon sourcecode, and modified the emulator texture cache to actually free old textures when needed.
I think it's time for a new video so here it is
Sorry for the lack of updates, I was busy with other stuff for some months.
As you can see on my github, ( https://github.com/gligli/libxenon/commits/master ), I think libXenon has improved a lot lately, with lots of stuff added from Xell (NAND access, lwip & network code,...), a new ELF loader, a unified ATA driver (that can access both HDD and DVD), the return of opendir/readdir/... functions and many bug fixes and smaller improvements in almost all drivers.
The reason for many of those changes is to be able to make most of Xell a regular libXenon app: Xell would be splitted into 2 stages, one stage which recovers from exploit and then decompresses and launches a second stage, which is a libXenon ELF. To maintain backwards compatibility, both stages would have to fit in the 256KB limit for a Xell binary.
Last but not least, I'm working on mupen64-360, my Wii64 port these days, I already added sound and done some optimisations to try to get more speed.
I multithreaded a good part of sound processing so it's done almost for free, and in fact anything that isn't multithreaded (RSP emulation) was already running in my version from january. I might be able to multithread RSP too, it would probably give a nice speed boost
I also redone the port of the Wii64 dynarec, I did my first port from the ps3 branch and it seems it wasn't up to date with the trunk speed-wise. Now it's using Wii64 1.1 code. I also changed the way stores were handled in the dynarec, trying to generate more code and rely less on a (slow) generic C function to do the job.
By the way, source code is now available on my github ( https://github.com/gligli/mupen64-360 ). Anybody that can compile it can try it, but please don't distribute binaries ! It's not a good idea at all to release unofficial versions of a work in progess of someone else code so I hope You can be responsible on this.
Here's a new video showing the progress on Mario64, jerkiness is due to the video capture card, trust me that game runs smooth