Building a rhythm game using Haskell

View project on GitHub

Creating rhythm game with Haskell

Fumiaki Kinoshita (part-time employee of IIJ-II) fumiexcel@gmail.com

日本語版 is available. Thank you, @stefafafan!

Introduction

Rhythm games, also known as rhythm actions, are very popular genre in Japan. Konami released Dance Dance Revolution (DDR) in 1998 -- it is the best successful game among the genre. Another famous one, Taiko no Tatsujin(literally, Expert of Japanese drum) is being immensely supported by broad age-group of people. Today, various forms of rhythm games have been released one after another.

However, there are few tutorials to create such kind of games. Even if there are, they might be too old, or written in non-English, or/and work only in Windows.

This tutorial focuses on creating a rhythm game without pain. Don't be surprised, we use Haskell to do that.

This tutorial has two parts.

  • Part I contains an instruction to set up environment required for Part II and III.
  • In Part II, we build a very simple rhythm game. We use the Call engine to develop.
  • Part III introduces some technical backgrounds (graphics, audio) that support Part II.

I'd be happy if this tutorial helps your curiosity to create a game.

Part I: Preparation

We need to ensure that you have installed GHC. Haskell Platform is an easy way to install GHC.

On Unix or Mac, install libportaudio19.

Note: Currently Call doesn't draw bitmaps well on Mac OS X. Please help me figure out what goes wrong.

$ sudo <your-package-manager> install libportaudio19

The source code used in this tutorial is packed in rhythm-game-tutorial package. You can download it and set it up by:

$ cabal update
$ cabal unpack rhythm-game-tutorial
$ cd rhythm-game-tutorial-<version>
$ cabal install --only-dependencies
% cabal configure
$ cabal build

cabal install --only-dependencies installs a bunch of packages, including two vital packages: objective and call.

  • objective establishes abstraction for stateful objects. It is not neccessary strictly, though it kills the pain of state significantly.
  • call is a cross-platform multimedia library. While it is small and simple, the essentials of games (2D/3D graphics, audio, input handing from keyboard, mouse and gamepad) is assurable.
  • binding-portaudio is low-level audio APIs.

On windows

bindings-portaudio provides built-in source for installation ease. Unfortunately, due to a GHC bug, it is sometimes unstable. Note that using 32-bit version of GHC is safer to avoid problems if your platform is Windows x64.

$ cabal install bindings-portaudio -fBundle -fWASAPI

If it throws up something messy, please report to me.

Part II: Creating a game

Here we bang! -- Wada-don, "Taiko no Tatsujin"

Now, think of a very simple game: There's a circle at the bottom of the window, and another circle(s) is approaching. You hit the space key in exact timing when the another circle overlapped the original one.

tutorial-passive

tutorial-passive

How do we implement this? The structure of the program can be derived by writing components down:

  • Sound: a music is playing through the game.
  • Graphics: pictures depend on the time.
  • Interaction: the score changes when the player hit the space key.

We will explain these in order.

Playing a music

Groove is important. It's time to play a music. Our first game is as follows(src/music-only.hs):

main = runSystemDefault $ do
  music <- prepareMusic "assets/Monoidal Purity.wav"
  playMusic music
  stand

Let's execute it:

$ dist/build/music-only/music-only

Can you hear the music? Note that it takes a moment to load a music.

Let's investigate the code. The following functions are provided by Call engine.

runSystemDefault :: (forall s. System s a) -> IO a
stand :: System s ()

In Call, actions are performed on System s monad. runSystemDefault converts System s into IO. stand does nothing, preventing termination of the program.

The signatures of prepareMusic and playMusic are as follows:

type Music s = Instance (StateT Deck (System s)) (System s)

prepareMusic :: FilePath -> System s (Music s)
playMusic :: Music s -> System s ()

These functions will be defined later.

Drawing a picture

Let's construct a graphical part of the game.

main = runSystemDefault $ do
  allTimings <- liftIO $ parseScore (60/160*4) <$> readFile "assets/Monoidal Purity.txt"
  linkPicture $ \_ -> renderLane allTimings <$> getTime
  stand

linkPicture :: (Time -> System s Picture) -> System () is the only function provided by Call to actually draw something.linkPicture f repeatedly calls f and draws the result of f to the window. The argument of f is the time difference between frames, it is often negligible though.

Due to its game system, we need to prepare the set of times. Let us introduce a new notation to represent timings which is more readable than just the list of decimals.

This notation is consist of a number of packets, representing a sequence of bars. Each packets contain several lines. A bar is divided by the length of line. '.' and '-' represents a note and a rest.

----.-----------
.-----------.---
--------.-------

The implementation of the parser is not so interesting.

parseScore :: Time -> String -> [Set Time]
parseScore d = map (Set.fromAscList . concat . zipWith (map . (+)) [0,d..]) . Data.List.transpose . map (map f) . splitWhen (=="") . lines where
  f l = [t | (t, c) <- zip [0, d/fromIntegral (length l)..] l, c == '.']

Given timings and "life span" of circles, we can compute positions of visible circles from the time.

phases :: Set Time -- ^ timings
    -> Time -- ^ life span
    -> Time -- ^ the current time
    -> [Float] -- ^ phase
phases s len t = map ((/len) . subtract t) -- transform to an interval [0, 1]
  $ Set.toList
  $ fst $ Set.split (t + len) s -- before the limit

Create a function to render circles. Since Picture is a monoid, we can use foldMap or mconcat to combine pictures. translate (V2 x y) shifts the picture into (x, y). bitmap b turns a Bitmap into a Picture.

unsafePerformIO, which has the type IO a -> a, looks really uncanny function. The use of unsafePerformIO must be limited to passive, virtually constant operations like getArgs, readBitmap, etc.

circle_png :: Bitmap
circle_png = unsafePerformIO $ readBitmap "assets/circle.png"

circles :: [Float] -> Picture
circles = foldMap (\p -> V2 320 ((1 - p) * 480) `translate` bitmap circle_png)

renderLane passes the result of phases into circles. color changes a color of a picture.

renderLane :: Set Time -> Time -> Picture
renderLane ts t = mconcat [color blue $ circles (phases ts 1 t)
    , V2 320 480 `translate` color black (bitmap circle_png) -- criterion
    ]

Here is an updated main.

main = runSystemDefault $ do
  music <- prepareMusic "assets/Monoidal-Purity.wav"
  allTimings <- fmap (!!0) $ liftIO $ loadTimings "assets/Monoidal-Purity.txt"
  linkPicture $ \_ -> renderLane allTimings <$> getTime
  playMusic music
  stand

There is a serious problem in this program. The graphics and music may diverge when the program has stumbled accidentally. We need to use the musical time instead of the real one.

Component: prepareMusic

A music is essential for rhythm games.

type Music s = Instance (StateT Deck (System s)) (System s)

prepareMusic :: FilePath -> System s (Music s)
prepareMusic path = do
  wav <- readWAVE path
  i <- new $ variable $ source .~ sampleSource wav $ Deck.empty
  linkAudio $ playbackOf i
  return i

readWAVE loads a sound from .wav file.source .~ sampleSource wav $ Deck.empty is a bit tricky.

Deck is an utility to play a music. source is a Lens which is purely functional representation of accessors. new $ variable $ v instantiates a music. Regard linkAudio $ playbackOf i as a cliché for now.

Component: getPosition and playMusic

The implementation of getPosition and playMusic is as follows:

getPosition :: Music s -> System s Time
getPosition m = m .- use pos

playMusic :: Music s -> System s ()
playMusic m = m .- playing .= True

You notice two new operators: use and .=. These comes from the lens library. This package contains types and utilities to deal with various accessors.

pos, playing are Lens. Given Lens' s a, you can take and modify a value a from s.

pos :: Lens' Deck Time
playing :: Lens' Deck Bool

use and (.=) are getting/setting operators that work on stateful monads.

use :: MonadState s m => Lens' s a -> m a
(.=) :: MonadState s m => Lens' s a -> a -> m ()

With lens, we can access a specific element of a structure easily, allowing you manipulate them just like "fields" in OOP languages. However, the state of the deck is packed in music in gameMain so these can't be used directly. The (.-) operator, provided by objective package, executes an action within a context held by a left operand.

getPosition m returns an accurate time (in seconds) elapsed from an origin of a music m.

Putting them together, we got src/tutorial-passive.hs.

$ dist/build/tutorial-passive/tutorial-passive

It is not a game though -- simply because it has no score, no interaction.

Handling inputs

Let's deal with inputs. Now introduce two components, rate and handle.

rate :: Time -> Int
rate dt
  | dt < 0.05 = 4
  | dt < 0.1 = 2
  | otherwise = 1

handle :: Time -> Set Time -> (Int, Set Time)
handle t ts = case viewNearest t ts of
  Nothing -> (0, ts) -- The song is over
  Just (t', ts') -> (rate $ abs (t - t'), ts')

rate calculates a score from a time lag. handle returns a score and updated timings. viewNearest :: (Num a, Ord a) => a -> Set a -> (a, Set a) is a function to pick up the nearest value from a set. If we fail to attend to remove a nearest one, flamming the button causes undesired score increment.

data Chatter a = Up a | Down a

And the following code actually handles events:

linkKeyboard $ \ev -> case ev of
  Down KeySpace -> do
    t <- getPosition
    ts <- timings .- get
    (sc, ts') <- handle t ts
    timings .- put ts'
    score .- modify (+sc)
  _ -> return () -- Discard the other events

Note that a few variables has instantiated.

timings <- new $ variable (allTimings !! 0)
score <- new $ variable 0

After linkKeyboard is called, the engine passes keyboard events Key. Key is wrapped by Chatter to indicate that a key is pressed, or released. When the space key is pressed, it computes the time difference from the nearest timing and increment the score by accuracy.

We need to load a Font as we want to show players the current score. Call.Util.Text.simple generates a function that renders a supplied text.

text <- Text.simple defaultFont 24 -- text :: String -> Picture

Just add text (show sc) to renderGame. src/tutorial-active.hs is the updated source we made interactive. It's a game, yay!

$ dist/build/tutorial-passive/tutorial-active
tutorial-active

tutorial-active

Extending the game

However, when you actually play this, you may feel dissatisfied. It is because the interaction is still poor. If it would have more showy effects, it'll be exciting. Most rhythm games shows the recent evaluation of the accuracy immediately. So, players can notice whether their playing is good or bad.

Thanks to purely functional design, we can extend lanes so easily(tutorial-extended.hs)!

extended

extended

ix i is a lens that points an i-th element of a list. Just arrange the result of forM using translate.

Another interesting feature, transit, is convenient to create animations.

type Effect m = Mortal (Request Time Picture) m ()

pop :: Monad m => Bitmap -> Effect m
pop bmp = Mortal $ transit 0.5 $ \t -> translate (V2 320 360)
  $ translate (V2 0 (-80) ^* t)
  $ color (V4 1 1 1 (realToFrac $ 1 - t))
  $ bitmap bmp

The argument t varies from 0 to 1, for 0.5 seconds. To instantiate this, put this object into a list:

effects <- new $ variable []
effects .- modify (pop _perfect_png:)

And effects .- gatherFst id (apprises (request dt)) returns Picture, removing expired animations automatically. It benefits from objective much. Here is the complete linkPicture section.

linkPicture $ \_ -> do
  [l0, l1, l2] <- forM [0..2] $ \i -> renderLane <$> (timings .- use (ix i)) <*> getPosition music
  s <- score .- get
  ps <- effects .- gatherFst id (apprises (request dt))
  return $ translate (V2 (-120) 0) l0
    <> translate (V2 0 0) l1
    <> translate (V2 120 0) l2
    <> color black (translate (V2 240 40) (text (show s)))
    <> ps

There is no difficulty around input.

let touchLane i = do
      ((sc, obj), ts') <- handle <$> getPosition music <*> (timings .- use (ix i))
      effects .- modify (obj:)
      timings .- ix i .= ts'
      score .- modify (+sc)

linkKeyboard $ \ev -> case ev of
  Down KeySpace -> touchLane 1
  Down KeyF -> touchLane 0
  Down KeyJ -> touchLane 2
  _ -> return () -- Discard the other events

Moreover, with LambdaCase GHC extension, you can replace \ev -> case ev of with \case.

The overall game goes in only 123 lines!

$ wc -l src\tutorial-extended.hs
123
$ dist/build/tutorial-passive/tutorial-extended

Part III: Technical background

Graphics

Monoid is the general term for composable stuff which has "empty". A picture is one of the monoids since there is an empty picture and pictures can be composed by overlaying. The standard library base provides a typeclass for monoids:

class Monoid a where
  mempty :: a
  mappend :: a -> a -> a

Call uses free monoid to represent picture.

In de-CPSed form,

data Scene = Empty
  | Combine Scene Scene
  | Primitive Bitmap PrimitiveMode (Vector Vertex) -- draw a primitive
  | VFX (VFX Scene) -- apply visual effects
  | Transform (M44 Float) Scene -- transform `Scene` using a matrix

Its Monoid instance is trivial.

instance Monoid Scene where
  mempty = Empty
  mappend = Combine

Using free monoid, we can isolate the drawing process from Scene. Think of drawScene :: Scene -> IO () which calls concrete APIs to draw Scene. For empty picture, we don't do nothing. Combine a b is equivalent to calling drawScene a >> drawScene b.

So the implementation of drawScene will be as follows:

drawScene Empty = return ()
drawScene (Combine a b) = drawScene a >> drawScene b
drawScene (Primitive b m vs) = drawPrimitive b m vs
drawScene (VFX v) = drawScene (applyVFX v)
drawScene (Transform mat s) = withMatrix mat (drawScene s)

where drawPrimitive, applyVFX, withMatrix are environment-dependent.

In other words, free structures are kinds of DSL which encourages the reusability of programs. Andres Löh's Monads for free! is a great introduction for free structures.

Call puts together a few kinds of transformation in Affine class. Thanks to type families, we can use the same operation for both 2D and 3D. Normal is the normal vector, which is 3-dimensional vector in 3D but it is just Float in 2D.

class Affine a where
  type Vec a :: *
  type Normal a :: *
  rotateOn :: Normal a -> a -> a
  scale :: Vec a -> a -> a
  translate :: Vec a -> a -> a

Audio

Currently, there are few packages for audio that work in common platforms and are easy to install. I choosed portaudio for now which supports a bunch of backends. Humans are so sensitive about sound; 20 miliseconds of latency is noticable for us.

Thus, it is important to minimize latency when it comes to audio. This is the main reason of why call relies on callback. The call library aims to be small and concrete, leaving abstraction to objective.

Acknowledgements

Special thanks to Kazuhiko Yamamoto for guidance of the architecture of this tutorial.