Server split

From CatchChallenger wiki
Jump to: navigation, search
Split of master to control and external node, here login and game server

The server need be able to be splited and work as cluster, or work as one and unique server standalone.

As well, each step need user input like:

  1. Login/pass (Login server), with Character selection grouped by group of world (group of game server). Where the same character can evolve into a world and switch to another world (with same inventory, ...) (Characters server)
  2. Game server selection, where the world can be the same but is mostly different to change the user experience (Game server)

As well, if the point 2 is useless (only one Character or/and Game server), the selection is skipped, and the RTT is dropped.

To be more clear for the network, when the server is grouped, only one port is open. If you have the unique server for Login server, Characters server, Game server only one port is open. It's more easy to administrate. And skip inter-server communication and list of performance.

The game server connect to Login server (register on it). A unique key and login server id is passed to swith to the Game server. One server is a master, game and login server is connected on it. It allow communication and lock the character played to prevent double usage.

Login server switch


The client jump to another server, as 302 redirect into http

The client will disconnect to the login server and connect on game server

  • Advantage: More performance due client connect directly on game server (no multiple RTT, no routing and filter overhead). No connexion limit (C10k, C10M)
  • Disavantage: Don't keep the game server ip/location unknow, directly filter the DDOS attack (can do an overhead).


The client keep connected on same server and the server act as proxy

The client will keep the connexion to login server and the login server will connect on game server

  • Advantage: Keep the game server ip/location unknow, can act as cache for datapack (prefer CDN), filter the DDOS attack
  • Disavantage: Less performance due relay mode (multiple RTT, routing and filter overhead). Force the login server to be connected on game server then can host maximum 10000 connexion.


This server adapte the trafic between 2 incompatible network and cache the content

The gateway is here to adapt two unconnected network like normal internet to crypted internet (TOR, i2p, ...).

You can to cascade the gateway and network: Internet -> TOR -> I2P -> Gnutella -> Other ... the gateway auto send the place into the network to send to the client the cache sync from server to client by passing by X gateways.

Why exists? Why use it?

  • The datapack can be downloaded via the tcp communication pipe or http mirror, this unify the output
  • Don't use bandwidth of the sub net: some sub net like i2p can have very limited bandwidth, that's boost the speed
  • Autodetect change: It cache the datapack (via http or internal protocol). With the integrity control into the protocol, the change is directly detected and the corresponding file downloaded. That's offload the corresponding network (I2P for example) and optimise the datapack download.
  • The gateways will jump by reconnect to the game server, see above: Login server switch mode reconnect
  • Able to change and offload the encryption and compression

The limit?

  • This as MiM intercept the traffic (needed to change the datapack traffic to download from the gateway), you need have trust into the gateway. The couple login + password can't be intercepted, but the traffic can be modified.



Peer-to-peer architecture

The master will be replaced into the version 3 with p2p to be more scalable and be in high availability.

It's into trusted network but prefer prevent trust.

Not share content, but locks, route, and link the login servers with game servers.

Exascale computing

Exascale computing

The power supported is into tree organisation: Power of node ^ Tree depth, example: if my code is able to support 100M of client, with 2 depth tree (100M of game server and one master server), you will have 10 000 000 billions of user supported. As you can see the scalability is not the unique factor, the power of the node too.

For this kind of processing take care of this cpu factor: in order/out of order, L1/L2/Memory bandwith. Take care to the network layer, network driver and network card.

With single core:

  • I never see more than 65535 player on single MMORPG server, then to minimize the memory and network usage the server limit is 65535. The scale is done by multiplication of server count. Mean: average network bandwidth: 7MB/s, 64MB of memory, 80000 packet per second
  • 4 billion of players implies very large map, logic slow down because need scan large list (map: O(n), player: O(ln(n))), need large resources on each node, can have problem with bottleneck (hardware as software). Mean: average network bandwidth: 400GB/s (yes need an 1000Gbps interface), 4TB of memory, 4 800 000 000 packet per second

If multiple core, scale better doing multiple server on each core than multi threading

MIMD is important factor, but we speak of more than 2000 core as you have into GPGPU.


I had lot of problem into the dev phase to have a stable infrastructure. This have help to do a good monitoring tools.

Example of status monitored and exposed to the site

Player online

The player is more used by the player to see what server is with people. But too the game server can have each their gameplay style. All the statistique is merge with the master server, and dispatched to the login server to send it to the client.

Admin tools

  • if a game server is detected as down (registered and connected on master, responds to ping of the master), game server will show offline count
  • the login server is for authenticate the player (route and proxy it too), is checked by php script with protocol control (check is read and respond correctly to the protocol, don't responds if master disconnected)
  • Mirror: control if is up, if the pack, file list, file is accessible on all the server, and for some file control the content
  • Backup: control if the backup is correctly done, if the database is well streamed and replicated into realtime into other datacenter
  • Other: Other mix of control as if a bot can connect on the full chain: bot -> login -> game server


The login server act as computing multiplier, cache to limit the master usage and network bandwidth multiplier to transmit the diff to stats clients and full list (huge size) to new normal player client. Computing and network bandwidth is multiplied by login server

Cluster of master

Example of p2p cluster

The needs are:

  • Minimise the bandwidth (no real time propagation time to help to this)
  • Balance when node is shutdown or detected as down
  • P2P: save node list
  • Time out of request
  • Unicast to game server for token
  • Check settings consistency
  • Broadcast to login server (new server)
  • Database part: (important part now)
    • Check duplicate gameserver unique key via DB lock
    • Lock id by database:
    • SELECT pg_try_advisory_lock(id) FROM X WHERE id=1
    • Check time sync TIME()
  • Security
    • Node type certified by certificate
    • All message authentificated
    • Relay attack limited
    • Replay attack forbidden
    • No MIM can emit orden/message on the p2p
    • No MIM can modify message between 2 authenticated host

Example with partial link fail and network split udp secur p2p

  • First sequence number is randomly initialized, the rest is +1 (when overflow reset to 1, 0 is reserved)
  • First acknowledgement number is 0, the rest is the last remote sequence number received, that's flush the receiver buffer util the number
    • sequence number: 8 Bytes (64Bits)
    • acknowledgement number: 8 Bytes (64Bits)
    • size: 2 Bytes (16bits)
    • request type: 1 Byte
    • ED25519_SIGNATURE_SIZE 64 Bytes
    • ED25519_KEY_SIZE 32 Bytes

udp data p2p