Scala for Game Server Development

알림: 한국어로 번역된 글을 보시려면 여기를 클릭하세요.

At Studio Kingdom, we use the Scala programming language to bring our hit game CookieRun: Kingdom to life. Scala is the language used for developing the game servers. Even though it is not the most popular language out there, we think that it’s a perfect fit for us and game server development in general. Let’s see why.

In this article, we will discuss several aspects of game development where Scala shines particularly:

Game Logic correctness is very important. There is nothing worse for a player than seeing their efforts wasted by a programming bug. Depending on the game, this logic can be particularly large and complex.
Games are fast-evolving, and players are always eager for new features. That means the game server codebase needs to constantly grow and evolve. Being able to bring features at a steady pace is primordial in order to stay on top.
Designing a multi-player game means dealing with concurrency and scaling issues. Game servers will receive a lot of concurrent requests, some of them dealing with the same entities.

Game logic correctness

The Power of types

Scala is a statically-typed language, which means that types are defined and known at compile-time. This prevents a whole range of runtime errors common with dynamically-typed languages (such as Javascript’s TypeError). But compared to other statically-typed languages, Scala uses types to express a lot of different things.

A simple example of that is the type Option[A] that is used to represent an optional value of type A. Rather than returning A that can be null and add the proper null-handling logic everywhere at the risk of missing some places, the idiomatic Scala way is to return Option[A]. That forces the developer to always handle the case where A is missing, using pattern matching or functions such as getOrElse or fold.

// this returns Option[GnomeResearchLevel]
val currentLevel = getGnomeResearchLevel(researchDataId)

// we have to handle the case where there is no current level
// in this case, we take the initial level
val nextLevel = currentLevel.map(_.up).getOrElse(GnomeResearchLevel.initial)

We push this idea even further by using refined types to add constraints to the values types can have. For example, a “Kingdom Level” is an integer that can only be 1 or more. Instead of simply using Int, we use type PosInt that enforces that the value is positive. We know for sure that the value will never be 0, so we don’t need to handle that case or to be careful with things like division by 0.

type PosInt = Int Refined Positive // alias for a positive Int

val a: PosInt = 1
val b: PosInt = 0 // will fail to compile

The compiler is able to detect if static values are valid or not. For values coming from the client, an error will be raised immediately after parsing the client request if a value doesn’t respect the type constraint.

Using PosInt is great, but if a “Cookie Level” is also a PosInt, there is a risk that we pass the wrong level to a function. To avoid that, we use a newtype, which is basically a wrapper around our PosInt to give it a special meaning.

@newtype case class KingdomLevel(value: PosInt)

def receiveKingdomLevelRewards(level: KingdomLevel): List[Reward]

Now, the compiler will enforce that when we call receiveKingdomLevelReward, we pass a KindomLevel and nothing else.

In summary, by using types that describe precisely what they hold, we avoid a whole range of silly bugs and improve the quality of our code.

Types as documentation

Just like returning an Option tells you that a function may or may not return a result, we use types that describe the kind of computation that a function runs. Here are a few examples:

Option[A]: a function that may or may not return a result of type A. It does not have any side effect.
Either[E, A]: a function that may either fail with an E or succeed with an A. It does not have any side effect.
ZIO[R, E, A]: a function that requires an environment R and may either fail with an E or succeed with an A. It may have some side effects (such as I/O). It is equivalent to R => Either[E, A] + the notion of side-effecting code.
ZPure[W, S1, S2, R, E, A]: a function that requires an environment R and an initial state S1 and may either fail with an E or succeed with an updated state S2 and an A along with a log with entries of type W. It does not have any side effect. It is equivalent to (R, S1) => (List[W], Either[E, (S2, A)]).

We use these various types to reflect exactly what a function is doing. You may have noticed that there are a lot of type parameters. But in practice we use type aliases to make it much shorter and convenient. Here are a few examples.

// a function that increments a metric
// the return type means it is side-effecting, and can never fail
def incrementInFlightMessagesMetric: UIO[Unit]

// a function that converts protobuf data into kingdom level
// the return type means that it may fail with a non-empty list of errors
// EitherNel[E, A] = Either[NonEmptyList[E], A]
def extractKingdomLevel(raw: Protos): EitherNel[ExtractionError, KingdomLevel]

// a function that returns the current pvp (a.k.a. "Kingdom Arena" in the game) state of the user
// the return type means it may access the user state but not modify it
def getCurrentPvpState: KingdomReader[Pvp]

Not only do these types enforce the behavior of a function, but they give you a precise idea of what a function is doing just by reading its type signature. All possible errors are clearly declared in return types (as well as the absence of error, which is just as useful), so we are never surprised by an exception thrown unexpectedly. The same goes with side effects.

Make your own DSL

We’ve seen a few examples where we use types to constrain what a value can be or what a function can do, and rely on the compiler to catch any programming mistake early on. We can go even further and make the whole code constrained to only allow “valid” operations. We do that by creating custom DSLs for specific operations. Because Scala is both powerful and very concise, it excels at creating sub-languages. You can push it pretty far by making the code read almost like a spec document.

Without going that far, creating a DSL is useful when you want to restrict some parts of your code to only a small subset of operations. Let’s see an example: in our game, we support transactions between multiple entities. That means we can send commands to two different users (or a user and a guild), and commit all changes only if both commands are handled successfully.

We created a small DSL for transactions, with only the following operations:

identify an entity by ID: kingdom(userId) , guild(guildId)
send a command to an entity: entity ! command
combine sending operations: >>> for sequential, &&& for parallel

We can then write simple transactions like this:

transaction { context =>
  (kingdom(context.userId) ! SendFriendRequest(friendId)) &&&
    (kingdom(friendId) ! ReceiveFriendRequest(context.userId))
}

The transaction block defines where our DSL can be used. We won’t be able to use “normal code” within the boundaries of this block, but only what our DSL allows. It gives us access to a context that contains information about the current user. We will then send SendFriendRequest to the current user (that will check that this user is allowed to send a friend request, and save the pending request), and ReceiveFriendRequest to the other user (which will save the received request). These 2 operations will be done in parallel, and if any one of those 2 fail, nothing will be persisted.

This little transaction system does not support other types of operations. If we tried to persist something to Redis or Kafka within the transaction block, it wouldn’t be rollbacked in case of error. But because we use a custom DSL, it is actually impossible to call that kind of code (we would get a compiler error), because we are restricted by our DSL operations.

Most of our business logic is implemented using a DSL whose basic operations are:

succeed with a result
fail with a ValidationError
access the environment (that is a large object containing all the game configuration: the list of characters, levels, events, etc)
access the user state
lift an event (we use event sourcing), which will automatically update the user state

Thanks to this limited set of operations, it is impossible to do something unexpected in the middle of our business logic.

Evolving fast

Focus on what’s important

As we’ve just seen, writing DSLs constrain our code to a smaller subset of valid operations. It has another huge benefit, combined with the conciseness of Scala: it makes the code super short and readable.

Creating a DSL is a good way to hide the complexity of our program to focus on what really matters which is the business logic. Considering our transaction example above, it actually involves complex low-level code. We need to locate and send messages to entities that may be hosted on different nodes, using a particular messaging protocol. The persistence layer also needs to combine all writing operations into a single transaction that can be committed or rollbacked, communicating with an actual database. But we don’t want to care about any of this when we write our business logic.

The general idea behind this is to separate the “technical” layers from the business logic so that both of them can be implemented and tested entirely separately. That allows implementing business logic, which represents the majority of the work of adding new features, without any kind of interference, which is much more efficient.

Here’s a simplified example of business logic function used when a user joins a guild: we get some metadata from the game configuration, get the guild state and check that the request is valid. On success, we lift an event to record the fact that this user joined the guild. Note how we don’t care about messaging protocols, persistence, transactions, locks or whatever technical challenges that actually exist under the hood.

val joinGuild: KingdomProgram[Unit] =
  for {
	metadata    <- inquireGuildMetadata
    requesterId <- inquireRequesterId
    memberCount <- getGuild(_.members.size)
    ()          <- assertThat(
                     memberCount < metadata.maxMemberCount,
                     ValidationError.GuildFull
                   )
    ()          <- liftEvent(GuildMemberJoined(requesterId))
  } yield ()

An interesting consequence of this is that it’s easy to onboard new developers to our team: because there are only a few operations available, they will be able to understand the DSL and start writing code easily, without necessarily understanding how the DSL is implemented. They can focus on learning about the domain rather than technical concepts.

Death to boilerplate!

Implementing new features usually means writing a lot of code: exposing new APIs to the client, implementing the persistence layer, converting data types between different layers, etc. One thing Scala is great at is keeping that boilerplate to a minimum, thanks to a powerful macro engine and the ability to write compiler plugins.

As it is commonly done in many languages, we generate server code from protobuf file definitions. However we don’t use the generated code in our business logic, because this generated code doesn’t use our refined types and we sometimes use more convenient data types than the ones generated (collections are usually an area where the choice of data type is quite significant). It means that we need to convert the proto-generated objects that we receive from the client into our own. We are talking about thousands of objects, which could become a huge source of boilerplate. However, thanks to Scala macros, these transformations are done automatically for the most part, and we can verify it at compile-time!

All we have to do is make sure we have an instance of Transformer for each data type. A macro is used to generate a Transformer automatically between 2 classes, as long as their field names are matching and that we have a Transformer between each corresponding field types.

Let’s see an example. We have a protobuf type defined like this:

message CraftProduction {
  string id = 1;
  repeated int32 cookie_data_ids = 2;
  int32 completed_job_count = 3;
}

The generated code will look like this:

case class CraftProduction(
  id: String,
  cookieDataIds: Seq[Int],
  completedJobCount: Int
)

But our own class with “enhanced” types is more like this:

case class CraftProduction(
  id: ActivityId,
  participants: NonEmptyList[CookieDataId],
  completedJobCount: Int
)

All IDs use more specific types such as ActivityId or CookieDataId so we don’t mix them up, we also use a NonEmptyList because there must be at least one participant, and finally the cookieDataIds field is named differently because it makes more sense in the game server code.

We’ll need transformer for the following types:

a Transformer between String and ActivityId
a Transformer between Int and CookieDataId
a Transformer between Seq and NonEmptyList

Transformer for IDs are very simple and can be created in a generic way since IDs are just wrapper types. A Transformer between Seq and NonEmptyList is also simple, failing if the Seq is empty, otherwise wrapping it into a NonEmptyList. The last problem we have is that the field cookieDataIds was renamed into participants.

In this case, we can create a transformer like this:

implicit val transformer =
  Transformer
    .define[entities.CraftProduction, protobuf.CraftProduction]
    .withFieldRenamed(_.participants, _.cookieDataIds)
    .buildTransformer

The macro is able to convert all fields automatically, and since we defined a mapping between the participants field and the cookieDataIds field, it is able to generate a transformer. Most of the time, fields are named the same way so we don’t need to write this kind of code at all. The great thing about this is that if anything is missing, we will get a compile error telling us which field couldn’t be transformed and why. We can be sure that if the code compiles, it will work as expected.

Another example of relying on macros and code generation to cut down the boilerplate is our usage of GraphQL. Macros allow us to generate a GraphQL schema from our server data classes automatically, and a compiler plugin automatically generates client code powered by Scala.js (Scala code that is converted into JavaScript). We get both a client and a server from the same code with almost no effort!

Most of the macros and compiler plugins mentioned here were not even developed by us: they are available in open source libraries. The Scala OSS ecosystem is very dynamic and offers a lot of options for cutting down the boilerplate.

Safe refactoring

An important concept in the world of functional programming with Scala is referential transparency. A function that is referentially transparent means that we can replace that function with its value without changing the behavior of the program.

Let’s see an example.

def add(a: Int, b: Int) = a + b

val sum = add(1, 2)

Everywhere in our code, we can use sum or add(1, 2) interchangeably. It won’t change anything in the code whether we use one or the other.

On the other hand:

object Test {
  var total = 0

  def add(a: Int): Int = {
	total = total + a
	total
  }
}

val sum = Test.add(2)

Using sum or Test.add(2) will lead to totally different behavior: calling sum will always return the same result, however calling Test.add(2) will trigger another modification of total and return a different result each time. In this case we can say that add is NOT referentially transparent.

Why does this matter? If we are able to replace pieces of code by functions interchangeably, it means we can do heavy refactorings without risking to break the code’s behavior. In addition to that, static typing will ensure that moving code around always make sense and the compiler will tell us if types don’t match. The ease of refactoring means that our code will be easy to maintain and extend in the feature, allowing it to grow and evolve together with our feature set.

Referential transparency is not specific to Scala, but this style of programming where all functions are referentially transparent and use immutable data types is heavily encouraged and followed within the Scala community and its library ecosystem.

Multi-player challenges

Local concurrency

When building a game server for a multiplayer game, we usually need multiple concurrent requests to access the same data. We sometimes need to run tasks concurrently and gather the results. We also need to run tasks in the background for the whole lifecycle of our application. The list goes on. We need to be able to perform a lot of actions concurrently without exhausting the resources that we have.

Scala comes with the concept of Future but there are even better alternatives offered by libraries such as ZIO (which is what we use), Cats Effect or Monix. They all have in common to be built on top of fibers, which are lightweight “green threads” managed by a runtime system. Unlike traditional threads, they are very cheap to create and make concurrent programming super simple.

These libraries provide a bunch of operators to let you run computations in parallel: you can fork a computation to run it in a different fiber, join a fiber to wait for its result, interrupt a fiber, race multiple fibers, etc. Most of the time, you don’t even need to manipulate them as you can simply use high level operators such as ZIO.foreachPar or ZIO#zipPar. There are many convenient data types built on top of fibers such as Promise, Schedule or Queue.

Let’s see a simple example. Whenever a player open our game client, a gRPC stream is opened to the server and maintained for as long as the user uses the game. While this stream is alive, we want to regularly update the “last connection date” of the user in the database.

updateLastConnectionDate(userId)
  .repeat(Schedule.spaced(1 hour))
  .fork

That’s it. Running this code means that the updateLastConnectionDate function will be called every one hour. Because we use fork, the function will immediately return and run in the background. By default, a fiber is terminated when its parent fiber is terminated, so that background process will automatically be stopped when our gRPC stream is disconnected. If we wanted to keep the fiber running even after the termination of its parent, we could have used .forkDaemon.

Another super useful data type from ZIO is the Hub. Unlike a Queue where each item is given to a single consumer, items that are published to a Hub are broadcasted to all subscribers. Let’s imagine that we have global messages that we want to send to all connected users through that gRPC stream we just talked about. On each game server, we can start one fiber that consumes messages from Kafka (where global messages come from), and publishes each received messages to a hub. Then, each connected gRPC stream can simply subscribe to this hub and it will receive all the messages.

def createHub(consumer: Consumer): UIO[Hub[Message]] =
	Hub.unbounded[Message].flatMap { hub => // create a hub with no max size
    consumer
     .read                                // read from Kafka
     .mapZIO(event => hub.publish(event)) // publish all messages to the hub
     .runDrain                            // consume the stream to the end
     .fork                                // do this in a separate fiber
	 .as(hub)                             // return the hub
  }

// create a stream that will contain all messages published to the hub
val stream = ZStream.fromHub(hub)

At the end, our gRPC stream is just the combination of various streams coming from different sources like this one and merged together. This Hub data type is so convenient that we created our own variant, IndexedPubSub, that allows subscribing to particular keys only. Thanks to Hub but also Ref, Promise, Queue and a few others, local concurrency problems become really trivial with ZIO.

Distributed concurrency

In a real application, concurrency problems are often not only local but also distributed. We want to able to scale out our game servers by simply adding more nodes. Having multiple nodes causes some problems: for example, if a guild only has 1 spot left but 2 players send a request to join that guild at the exact same time, only the first one should succeed and the other should fail.

A typical way to solve that is the single writer principle: at any given time, there should be only a single instance of each entity that can modify its state. In other words, there should be a single process handling messages for any given guild at any time. That way, this process will handle the 2 messages sequentially, succeeding the first one and failing the second.

There are several ways to implement this, and the Scala ecosystem is rich with options. One of them is using Kafka consumer groups to ensure only a single consumer handles all messages of a given entity ID. In our case, because of the “synchronous” nature of our workflows (the client expects an instant response for every request), we went another route and use the actor sharding pattern. This pattern was popularized by the Akka library which is ideal for getting started (we are now using our own custom implementation that follows similar principles).

An actor is basically a tiny process that consumes messages sequentially from a queue (called “mailbox”). Sharding means that actors are distributed across a group of nodes and that you can send messages to an actor just by knowing its ID. This concept is known as location transparency, meaning that you can send messages to an actor without knowing where the actor is actually located (it might be on the same node or another one).

Following the DSL introduced earlier to send message, 2 different nodes can call the following code concurrently:

guild(guildId) ! JoinGuild(inviteId)

Under the hood, the sharding system will find on which node the actor in charge of guildId is located, and send a request to that node containing the payload (JoinGuild). That node will forward those messages to the local actor handling this guild ID. The actor will handle the message that arrived first and respond with a success. It will then handle the second message and respond with a failure.

Thanks to a few powerful libraries, Scala makes it really easy to handle distributed and concurrent payloads without resorting to things like distributed locks that might clog response times and create issues like deadlocks. The actor pattern, whether it is implemented with Akka or ZIO, makes our life easier once again.

Through a handful of practical and simple examples, we’ve seen that Scala is a really interesting choice for building a game server. Combining its powerful type system with the best practices of functional programming, we are able to minimize the risk of errors and focus entirely on the business logic. Thanks to Scala’s rich ecosystem of open source libraries, we can cut down the boilerplate and resolve all kinds of concurrency problems.

Sounds attractive to you? Want to know more? We are hiring!

데브시스터즈는 최고의 인재를 찾고 있습니다.

자세한 내용은 채용 사이트를 확인해 주세요!

채용 바로가기