What is MERKÉN?

MERKÉN is a Game Boy demo that was released during Revision 2020. The music was created by Francisco “Foco” Cerda.

We got 2nd place in the Oldskool Compo category.


Merkén is also a delicious smoked chili pepper condiment traditional in Mapuche cuisine in Chile. I love it!

Picture by McKay Savage - CC BY 2.0


Why make this breakdown?

Personally I love when people share details of how they make things. I feel the information they share has a lot of educational value and can help and motivate other people to do the same.

Here is a small list of my favorite demosceners, articles and repos where people share and talk about their productions.


Why make a Game Boy Demo?

After releasing my first PC Demo Sacrificio, during Flashparty 2019, I became obssessed with wanting to release something for the DMG Game Boy. This handheld console has been my favorite console of all times for many reasons, but the most important one is that it was a symbol that marked my childhood as a kid who grew up in the 90s.

For the last couple of years, I’ve been learning about the internals of the Game Boy and how to program the Sharp LR35902 in assembly. I was confident enough with my knowledge and skills that I could make a small production.

During March of this year Revision announced that they would be doing the whole event online and they would allow for remote entries. I saw this as an opportunity to participate in one of my favorite events of the year and decided to start programming something for the Oldskool Compo.


Tools

The tools for making this demo were:

You can also find the whole source code for this demo in my Github repo.

A lot of the data like sine waves and scanline effects were generated by running a JavaScript script with node.js that created the tables and then I just copy pasted those tables into my source files.


How to Build

I use Windows as my main OS for development so I decided to use batch scripts to help me with my project building process.

The BUILD.BAT script is very simple, but it requires this fixed project structure.

ROOT_PATH/
    PROJECT_NAME/
        code/
            src/
            include/

In the src/ folder you put all code that will be assembled and on the include/ folder you put constant information, macros and assets. Then you can load them with the include preprocessor directive like in C.

With that only requirement this batch script will assemble and link all source files and output a .GB file into a build/ folder in the root of the project.

@echo off

rem You need to install RGBDS https://github.com/rednex/rgbds
rem and set the path here:
set RGBDS_PATH=path\to\rgbds

set PROJECT_NAME=merken-revision2020
set ROOT_PATH=..\
set ASM=%RGBDS_PATH%\rgbasm
set LNK=%RGBDS_PATH%\rgblink
set FIX=%RGBDS_PATH%\rgbfix

rem Project Variables
set PROJECT_PATH=%ROOT_PATH%\%PROJECT_NAME%
set PROJECT_CODE=%PROJECT_PATH%\code
set PROJECT_INCLUDE=%PROJECT_CODE%\include\
set PROJECT_SRC=%PROJECT_CODE%\src

rem Output Variables
set OUTPUT_PATH=%PROJECT_PATH%\build
set OUTPUT_NAME=%OUTPUT_PATH%\%PROJECT_NAME%

rem Flags
set ASM_FLAGS=-i%PROJECT_INCLUDE%
set LNK_FLAGS=-m %OUTPUT_NAME%.map -n %OUTPUT_NAME%.sym -o %OUTPUT_NAME%.gb

if not exist "%RGBDS_PATH%" (
    echo ** Failed to build **
    echo You need to install RGBDS. 
    echo Download it from https://github.com/rednex/rgbds
    echo and then set the path in build.bat line 23.
    exit /b -1
)

rem Create the output directory
if not exist "%OUTPUT_PATH%" mkdir %OUTPUT_PATH%

rem Clear old build info
del "%OUTPUT_PATH%\*.o" /s /f /q
del "%OUTPUT_PATH%\*.map" /s /f /q
del "%OUTPUT_PATH%\*.sym" /s /f /q

echo Assembling

for %%I in (%PROJECT_SRC%\*.asm) do (
    echo    - %%~I
    %ASM% %ASM_FLAGS% -o %OUTPUT_PATH%\%%~nI.o %%~I
)
echo.
echo Linking
setlocal EnableDelayedExpansion
set OBJFILES=
for %%I in (%OUTPUT_PATH%\*.o) do (
    set OBJFILES=!OBJFILES! %OUTPUT_PATH%\%%~nI.o
)


%LNK% %LNK_FLAGS% %OBJFILES%

echo.
echo Checksum Fix

%FIX% -p0 -v %OUTPUT_NAME%.gb

echo    - %OUTPUT_NAME%.gb


The Entry Point

MAIN.ASM is where the entry point is located. It has a very simple structure. Here I initialize every common part of the demo, like the music; common variables, like fading; and scene control.

At the top of the file you can see a lot of boilerplate like setting sections for interrupts, even though I don’t use them, and the cartridge header. The fun part comes after that.


ONLY DMG, SRY

I originally wanted to make this demo only playable on DMG CPUs. By only DMG CPUs I mean the whole generation before Game Boy Color.

I had this piece of code that would act as a firewall if it detected a system that wasn’t what I wanted. At the end I chose to allow any system to play the demo, but I still left the code that displayed “ONLY DMG, SRY”.

; =======================================
;      **** DMG Firewall ****
; =======================================
; Only DMG cool kids can play this demo
; =======================================
; Check we're running on DMG-CPUs
; Accumulator should be $11 on CBG or AGB
    cp $11
    jr nz, .is_dmg
; =======================================
.not_dmg:
; Run my ONLY DMG, SRY code.
; =======================================

On systems like the CGB (Game Boy Color) and AGB (Game Boy Advance) the A register is initialized to $11. What I did here was check if the register was initialized to that value and if it wasn’t the program will jump to the .is_dmg code and start executing the demo. If not it would display this nice message:

This was disabled so it’s only visible if you go to the source code and add jp not_dmg after the line jr nz, .is_dmg.


Start Demo

As I mentioned before, in the MAIN.ASM I do some initialization and handle the sequence of effects. It’s about 30 lines of assembly that puts together all the different scenes.

In the .start code path I initialize the stack, setup the fade color to the default gradient, initialize DMA control, initialize and start music.

.start:
    ld [is_dmg],a           ; Store a 1 or 0 to indicate if the system is DMG.
    ld sp, STACK_TOP        ; Set Stack Pointer
    ld a, %11100100          
    ld [fade_color], a      ; Set initial fade color to default gradient
    call init_dma           ; Load DMA subroutine to HRAM
    call dma_transfer       ; Initialize OAM to 0
    mInitializeMusic        ; Initialize Music
    mSelectSong 0           ; Initialize Song

Next comes the sequence of effects. You might see that the order isn’t sequential, and that is only because the number is related to the order of creation. After writing them I ordered them according to what I though looked the best.

So what happens is that I call fxN that initializes and runs an specific effect. I’ll call these effects source files “modules”. That same module is the one in charge of returning when the effect timer is up. So when that happens it jumps to the next effect. When it reaches the last scene it’ll loop forever playing the music and the last effect.

.demo_run:
    call fx1
    call fx3
    call fx4
    call fx2
    call fx6
    call fx7
    call fx8
    call fx9
    call fx5 ;<-- The end


The Logo Intro


FX1.ASM contains the intro scene. Where the Nintendo logo scrolls and shows the MERKÉN logo with the bleeding eye.

Even though it might look like a very simple intro there are some well timed loading points that are crucial for this to work. The Game Boy has a screen of 160x144 px, but the internal tile map is of 256x256 px, which is split into 8x8 px tiles. That means that the background map is 32x32 tiles. The height of this scrolling scene is around 64 tiles in height, that’s 512 px. This means I need to load tiles while the screen is scrolling. You can see how that looks in VRAM.


Left VRAM Tile Map | Right VRAM Tile Data


All this loading happens frame by frame. Instead of loading everything at once in a single frame and producing an ugly hitch that is very notorious with the music playing, we load small chunks of tiles per frame.

The setup for this is at the initialization of the effect.

    ld [sp_save], sp            ; Save the original Stack Pointer
    ld sp, SAVE_STACK           ; Set the Stack Pointer to $C400 (temporary stack)
    ld de, logo_map_end-32      ; Set the end of the logo map
                                ;  We do this because we're loading backwards.
    ld hl, $9A20                ; Set the starting point for loading our tiles
    push hl                     ; Push both pointers to our temporary stack
    push de
    ld a, [sp_save]             
    ld l, a
    ld a, [sp_save+1]
    ld h, a
    ld sp, hl                   ; Restore the original Stack Pointer
    ld a,[SCY]                  ; We decrement SCY by one and turn it to $FF.
    dec a                       
    ld [SCY],a

The code that handles the loading is near the bottom of the file.

.load_line:
    ld a, [wait_for_scroll]         ; We need to check if 
    cp $01                          ;  we have to skip loading
    jr z, .end_load_line            ;  a line of tiles or not.
    ld [sp_save], sp                ; If not store original Stack Pointer
    ld sp, SAVE_STACK - 4           ; Set Stack Pointer to temporary stack
    pop de                          ; Pop source address (Logo tile map)
    pop hl                          ; Pop destination address (VRAM tile map)
    ld a, h                         
    cp $96                          
    jr z, .skip_store               ; Do a small bound check so we don't overflow
    ld bc, 20                       ; Set the amount of bytes we want to load
    mSetROMBank 2                   ; Our source tile indices are stored in BANK 2
    call safe_vram_memcpy           ; Copy tiles indices to VRAM.
    ld bc,-32                       ; Prepare our next source and dest addresses
    add hl,bc
    push hl
    ld h, d
    ld l, e
    add hl,bc
    ld d, h
    ld e, l
    push de                         ; Push new source and dest into temp stack
.skip_store:
    ld a, [sp_save]
    ld l, a
    ld a, [sp_save+1]
    ld h, a
    ld sp, hl                       ; Restore original Stack Pointer
    ld a, [loaded_line]             
    inc a
    ld [loaded_line], a             ; Increment and save the loaded map line 
.end_load_line:
    ret

At the top of this subroutine we have a check to see if we need to wait for the scroll to reach a specific point. This is because we need to wait for the screen to finish scrolling up to some point so we have enough time to load the tiles that are left. This code is run on every frame so on every frame there is a possibility of loading a chunk of tiles. Very similar to what happens in scrolling games like Super Mario Land.


Double Nintendo Logo?

So if you look through the code you might see that I actually load a “new” Nintendo logo over the one that’s already visible. This is because I am lazy. The reason I did this was because the Game Boy starts up with this palette: %11111100. This is equivalent of having all indices in the palette to be dark except for the last one. If I changed the palette to have a traditional gradient (from dark to light) it would display the Nintendo logo as a light logo. So what I did was load the same logo but with a dark palette, this way I wouldn’t have to be worried of swapping the background palette in the middle of the screen. Of course this only happens on DMG systems because CGB and AGB don’t store the logo once it finishes the boot sequence.


El Matapacos


FX3.ASM and FX4.ASM contain the scene where the Matapacos appears.

This scene is split into two modules because I didn’t want to make a huge file for two effects. FX3.ASM handles loading, scrolling and doing the initial animation. FX4.ASM handles the scanline wave effect.

There aren’t many interesting things happening here. The only thing that might be worth mentioning is the scanline wobble which is a very simple effect. What I did here was first I had a table of 256 bytes aligned by 8 bits of precalculated sine wave values with an amplitude of 8. This allowed me to traverse this table without the need to worry of overflowing and reading outside the table’s memory. To animate the wave motion I had a variable that controls the reading starting point to the table.

The effect is just a couple of lines of assembly and it’s called every frame.

    ld a,[offset]                   ; I load the offset value
    ld c, a                         ; I save the offset value in C for further use
    add a, 4                        ; Increment offset
    ld [offset], a                  ; Store the offset in memory
    ld b, 25                        ; Prepare to wait for scanline 25
.wait0:
    ld a, [LY]                      ; Load the current scanline
    cp b                            ; Compare it to 25
    jr nz, .wait0                   ; If the LCD hasn't reached it we repeat
    ld h, HIGH(sine_wave_table8)    ; Load the high byte of the sine table address
    add a, c                        ; Add to A (our current scanline) the offset
    ld l, a                         ; Load L with A value
    ld a, [hl]                      ; Load to A the value in the sine table
    add a, $DD                      ; Add $DD to increment the displacement
    ld [SCY], a                     ; Store our displacement in the LCD scroll Y reg
    inc b                           ; Increment B and prepare 
                                    ;  to process the next scanline
    ld a, b                         ; Check if the LCD scanline has reached VBLANK
    cp $90
    jr z, .end
    jr .wait0                       ; If not repeat


The Twister


FX2.ASM contains the twister effect. I personally like this effect since it looks more complex than it really is.

The visual effect is mostly a result of a specific image and a bunch of scanline sine waves. This is the image I used for generating the effect.


I was careful to make the image tileable so I could repeat it along the full tile map.

Here you can see how I layed the image in VRAM and you can also see the overall movement of the LCD scroll.



The code that produces this effect wasn’t very long. Similar to the wave effect found in Matapacos scene, this one also used a sine wave table and offsets to animate it. The important parts can be found here.

    xor a                           ; Reset SCX to avoid residual 
    ld [SCX], a                     ;  offset on scroll X after prev frame
    ld a,[offset_x2]                ; Load the scanline scroll offset
    inc a                           ; Increment the offset
    ld [offset_x2],a                ; Store the offset in memory
    ld d,a                          ; Save it for future use and to avoid
                                    ;  memory access inside hot loop
.wait0:
    ld a,[LY]
    cp 0
    jr nz, .wait0                   ; Wait for scanline to reset to 0
    ld b, 0                         ; Set B to be the scanline where we will
                                    ;  the wave displacement
; =====================  
; START SCANLINE EFFECT
; =====================  
.test_limit:
    ld a,[LY]
    cp 110                          ; Check if we've reached the bottom of
    jr z, .next                     ;  the effect which is a couple of pixels
    jr nc, .next                    ;  before VBLANK.
.wait_ly:                           ; If we haven't we wait for 
    ld a,[LY]                       ;  the scanline to be equal to B
    cp b                            ;  our target scanline for displacement
    jr c, .wait_ly                  ;  stall until LY == B
    add a, d                        ; To the current scanline we add the scroll offset
    and WAVE1_DATA_SIZE             ; Mask the value so we don't overflow the table
    ld l, a
    ld h,HIGH(wave_data1_copy)
    ld a, [hl]                      ; Load to A the value of the 
                                    ;  offset address in the table
    add a, d                        ; Add the offset so we 
                                    ;  can do scrolling + displacement
    ld [SCX], a                     ; Store it in the LCD scroll X register.
    ld h,HIGH(wave_data2_copy)      ; Do the same thing with a different wave table
    ld a, [hl]                      ;  on the LCD scroll Y register. 
    add a, 50
    ld [SCY], a                     
    inc b                           ; Increment B an prepare for the next scanline
    jr .test_limit
.next:
; =====================  

What this code does is use two different wave tables to apply displacement to the X and Y scrolling of the LCD on every scanline.

+
=


Drowning Myself


FX6.ASM has the code for this effect. I am very fond of it because it was one of my first Game Boy scanline wobbles I made. This was made a bit more than a year ago and in some sense it was the first step into making a Game Boy demo.

As you may have noticed there is a trend in these effects. Most, if not all, have some kind of scanline displacement and this one is not any different.

The code, the same as the other displacements, is very straight forward. We go through each selected scanline and apply a displacement with an offset based on a wave table. Just to be consistent with the rest of the effects here you can see the commented code that produces the underwater effect.

    ld a,%11100100
    ld [BGP],a                  ; Set the background palette to default gradient
    xor a
    ld [SCX],a
    ld [SCY],a                  ; Reset LCD's scroll X and Y registers.
    ld a,[water_y]              ; Load water Y offset which handles the top wave
    inc a
    and TABLE_SIZE              ; Increment and mask it so it doesn't overflow
    ld [water_y],a              ; Store it in memory
    ld hl,wave
    add a,l
    ld l,a
    ld a,[hl]                   ; Use this offset to retrieve a wave value
    ld c,a                      ; Move it to C so it can be added to the 
    ld a,[control_y]            ;  water Y position. (control_y)    
    add a,c
    ld b,a                      ; Add it and save it in B for scanline check
.waity0:
    ld a,[LY]
    cp b
    jr nz,.waity0               ; Wait for scanline to be equal to B
    ld a,%10010000              ; Once we've reached the scanline == B
    ld [BGP],a                  ; We set the background palette to a light color
                                ; Now we can start the scanline displacement
.repeat_hwave:
    ld a,[LY]
    inc a
    ld b,a
    cp $90                      ; We check if we haven't reached VBLANK if not
    jr nc,.end_horizontal_wave  ;  we proceed with the displacement
.waity1:
    ld a,[LY]
    cp b
    jr nz,.waity1               ; Wait for scanline to equal to B
    ld l,a
    ld a,[water_x]              ; Same as before we load offset to wave table
    add a,l                     ; Add the current scanline
    and TABLE_SIZE              ; Mask it so we don't overflow
    ld l,a
    ld a,[hl]                   ; Load the value in the offset wave table
    ld [SCX],a                  ; Store it in the LCD scroll X register
    sub a, 10                   
    ld [SCY],a                  ; Apply the same displacement - 10 to scroll Y
    jr .repeat_hwave            ; Repeat until we reach VBLANK
.end_horizontal_wave:
    ld a,[water_x]
    inc a
    ld [water_x],a              ; Increment the offset to the wave table.


Bitmap Animation


FX7.ASM, FX8.ASM & FX9.ASM render this bitmap animations. These were the scenes that took the most time to produce, but not for their technical complexity, but because of all the steps needed to generate the animation.

I didn’t have a pipeline to create these animations and it was mostly a manual work which included screen recording or camera recording, Photoshop, export to frames and transform to tilemaps.

For example, for the animation of my cat Shin what I did was record a short video of him on my phone. Then I transferred it to my PC and opened it with Photoshop and reduced by hand the number of frames in the video until I had 9 frames in total. After that I exported all frames into images at a resolution of 20x18 px. You may be asking yourself, why 20x18? This is because the screen when its scroll X and Y is set at 0 the number of visible tiles are 20x18 tiles. Knowing that I can transform each pixel into a tile.

These are the exported frames from Photoshop.


The final step was transforming them from pixel data to Game Boy tile indices.

I created this gradient tileset that would be used to render the image on the Game Boy’s LCD.


This means that I needed to transform the pixels from a range of R8G8B8 to an index not bigger than 16. What I did was use the relative luminance of the image to get a value from 0 to 1 and then multiply that by the tileset tile count. Finally store that into a file I could embed into my ROM.

The code that did that on my tool was this:

uint32_t PixelIndex = (y * Width + x) * DEFAULT_CHANNEL_COUNT;
float R = (float)(ImageData[PixelIndex + 0]) / 255.0f;
float G = (float)(ImageData[PixelIndex + 1]) / 255.0f;
float B = (float)(ImageData[PixelIndex + 2]) / 255.0f;
float Luminance = R * 0.2126f + G * 0.7152f + B * 0.0722f;
if (Luminance > 1.0f) Luminance = 1.0f;
ubyte_t LumToTile = (ubyte_t)(floorf(Luminance * GradientCount));
*(CurrTile++) = Offset + LumToTile;

This is the code for rendering and updating these animations on the system.

.render_frame:
    ld a,[current_frame]        ; Frames are 16 bit addresses to tilemaps
    ld d,a                      
    ld a,[current_frame+1]      ; So we load the high and low bytes into DE
    ld e,a
    ld hl,$9C00                 ; Set the start VRAM address to write
    ld c,18                     ; Set C to 18. C will be our counter
.render_line:
    mSetROMBank 4               ; First we set the ROM bank to 4 
    rept 20
    ld a,[de]                   ; Unroll this loop that loads
    ld [hl+],a                  ; a line of tilemap indices into VRAM
    inc de
    endr
    dec c
    ld a,c
    ld [temp_y],a               ; Decrement our counter C and save it
    ld bc,$000C                 ; Offset the WRAM write address by $0C
    add hl,bc                   ; because the total tiles in the map are 32
    ld a,[temp_y]   
    ld c,a                      ; Restore our counter in C
    jr nz,.render_line          ; If the counter isn't 0 repeat .render_line
    ld a,[frames_rendered]      
    inc a                       
    ld [frames_rendered],a      ; This will act as a delay for rendering 
    and $03                     ; the next frame
    jr nz,.render_frame         ; if we render everything too fast it looks bad
.change_frame:
    ld a,[current_index]        ; Here we change to the next frame
    inc a                       ; We increment the current frame index
    and %111111                 ; Mask it so we don't overflow the frame list
    ld [current_index],a
    ld h,high(frame_list)       ; Load high byte for the address of the frame list
    rl a                        ; Use the frame index as low byte but multiply by 2
    ld l,a
    ld a,[hl+]                  ; Store new frame low byte address
    ld [current_frame+1],a
    ld a,[hl+]
    ld [current_frame],a        ; Store new frame high byte address


3D Cylinder


FX5.ASM has this final scene with the credits. This is one is my favorites. I like that I was able to mix two different effects into a single scene.

This was possible because of the simplicity of running the bitmap animation. The way I was able to have two different effects in a single screen was by changing the bit for selecting the background tilemap on the LCD control register for each effect. When I rendered the bitmap animation I was displaying tile map data from address range $9C00-$9FFF and for the cylinder it was $9800-$9BFF.

Here you can see how it looks in the different regions.


To display the rotating cylinder what I did was, again, have a displacement table with values that represented an arc. When these values reached the peak of the arc they had to be negated so that the effect of cylinder could be mirrored as a displacement on the LCD.

The code for displaying the cylinder effect can be found here. On summary what it does is wait for scanline 40 to be reached, swap the background display map bit on LCDC, start the scanline effect going through each scanline applying the displacement effect plus a scroll. Repeat until scanline 99 and then return to the previous background display map.

    ld a, [scroll_x]
    ld c, a                     ; Save x scroll value for use inside the cylinder
    ld e,HIGH(wave_table)       ; Save the high byte of the wave table address
    ld b,40                     ; The cylinder effect starts at scanline 40
.wait_ly40:
    ld a,[LY]
    cp b
    jr nz,.wait_ly40            ; Wait for scanline 40
    ld a, [fade_line]           
    ld [BGP], a                 ; Set background palette to the first fade color
    ld a, [LCDC]
    res 3, a
    ld [LCDC], a                ; Swap background map to $9800-$9BFF
    ld a,c
    ld [SCX],a                  ; Set LCD scroll X register to our save X scroll
    ld a, b                     
.wait_ly:
    ld a,[LY]
    cp a,b
    jr nz,.wait_ly              ; Wait for the scanline to be equal to B
    ld h,e                      ; Set H to the high byte of the wave table address
    ld a,[hl]                   ; Load wave table displacement value
    add a,d                     ; D holds the Y scroll. We add it to the displacement
    ld [SCY],a                  ; Store it in the LCD scroll Y register
    ld h,HIGH(fade_table1)
    ld a,[hl]                   ; Apply the palette for that current scanline so
    ld [BGP],a                  ; we can make a fading effect along the cylinder
    inc l
    inc b
    ld a,b
    cp a, 99                    ; The effect should only be run until scanline 99
    jr nz,.wait_ly
    ld a,$ff
    ld [BGP],a                  ; We reset the background palette
    ld a, [LCDC]
    set 3, a
    ld [LCDC], a                ; Swap background map to $9C00-$9FFF


Fin

This was a very fun demo to make. It was my first time participating in Revision and the second time I’ve ever participated with a production.

I hope this breakdown can motivate other people to create their own Game Boy demos and participate in demoparties or to just get started in programming this awesome system. One thing I love about the Game Boy is that programming for it is very accessible since there are tons of documentation and a community constantly working on tools.

I highly recommend looking into the GBDEV Community. You’ll find people with a lot of knowledge about the Game Boy.

I would also like to recommend watching this awesome talk by Michael Steil called The Ultimate Game Boy Talk to get familiar with the Game Boy hardware. I also suggest reading and using the gbdev Pan Docs for an in-depth understanding of the system.