Server split

From CatchChallenger wiki
Jump to: navigation, search
Split of master to control and external node, here login and game server

The server need be able to be splited and work as cluster, or work as one and unique server standalone.

As well, each step need user input like:

  1. Login/pass (Login server), with Character selection grouped by group of world (group of game server). Where the same character can evolve into a world and switch to another world (with same inventory, ...) (Characters server)
  2. Game server selection, where the world can be the same but is mostly different to change the user experience (Game server)

As well, if the point 2 is useless (only one Character or/and Game server), the selection is skipped, and the RTT is dropped.

To be more clear for the network, when the server is grouped, only one port is open. If you have the unique server for Login server, Characters server, Game server only one port is open. It's more easy to administrate. And skip inter-server communication and list of performance.

The game server connect to Login server (register on it). A unique key and login server id is passed to swith to the Game server. One server is a master, game and login server is connected on it. It allow communication and lock the character played to prevent double usage.

Login server switch

Reconnect

The client jump to another server, as 302 redirect into http

The client will disconnect to the login server and connect on game server

  • Advantage: More performance due client connect directly on game server (no multiple RTT, no routing and filter overhead). No connexion limit (C10k, C10M)
  • Disavantage: Don't keep the game server ip/location unknow, directly filter the DDOS attack (can do an overhead).

Proxy

The client keep connected on same server and the server act as proxy

The client will keep the connexion to login server and the login server will connect on game server

  • Advantage: Keep the game server ip/location unknow, can act as cache for datapack (prefer CDN), filter the DDOS attack
  • Disavantage: Less performance due relay mode (multiple RTT, routing and filter overhead). Force the login server to be connected on game server then can host maximum 10000 connexion.

Gateway

This server adapte the trafic between 2 incompatible network and cache the content

The gateway is here to adapt two unconnected network like normal internet to crypted internet (TOR, i2p, ...).

You can to cascade the gateway and network: Internet -> TOR -> I2P -> Gnutella -> Other ... the gateway auto send the place into the network to send to the client the cache sync from server to client by passing by X gateways.

Why exists? Why use it?

  • The datapack can be downloaded via the tcp communication pipe or http mirror, this unify the output
  • Don't use bandwidth of the sub net: some sub net like i2p can have very limited bandwidth, that's boost the speed
  • Autodetect change: It cache the datapack (via http or internal protocol). With the integrity control into the protocol, the change is directly detected and the corresponding file downloaded. That's offload the corresponding network (I2P for example) and optimise the datapack download.
  • The gateways will jump by reconnect to the game server, see above: Login server switch mode reconnect
  • Able to change and offload the encryption and compression

The limit?

  • This as MiM intercept the traffic (needed to change the datapack traffic to download from the gateway), you need have trust into the gateway. The couple login + password can't be intercepted, but the traffic can be modified.

Arch

P2P

Peer-to-peer architecture

The master will be replaced into the version 3 with p2p to be more scalable and be in high availability.

It's into trusted network but prefer prevent trust.

Not share content, but locks, route, and link the login servers with game servers.

Exascale computing

Exascale computing

https://en.wikipedia.org/wiki/Exascale_computing

The power supported is into tree organisation: Power of node ^ Tree depth, example: if my code is able to support 100M of client, with 2 depth tree (100M of game server and one master server), you will have 10 000 000 billions of user supported. As you can see the scalability is not the unique factor, the power of the node too.

For this kind of processing take care of this cpu factor: in order/out of order, L1/L2/Memory bandwith. Take care to the network layer, network driver and network card.

With single core:

  • I never see more than 65535 player on single MMORPG server, then to minimize the memory and network usage the server limit is 65535. The scale is done by multiplication of server count. Mean: average network bandwidth: 7MB/s, 64MB of memory, 80000 packet per second
  • 4 billion of players implies very large map, logic slow down because need scan large list (map: O(n), player: O(ln(n))), need large resources on each node, can have problem with bottleneck (hardware as software). Mean: average network bandwidth: 400GB/s (yes need an 1000Gbps interface), 4TB of memory, 4 800 000 000 packet per second

If multiple core, scale better doing multiple server on each core than multi threading

MIMD is important factor, but we speak of more than 2000 core as you have into GPGPU.

Monitoring

I had lot of problem into the dev phase to have a stable infrastructure. This have help to do a good monitoring tools.

Example of status monitored and exposed to the site

Player online

The player is more used by the player to see what server is with people. But too the game server can have each their gameplay style. All the statistique is merge with the master server, and dispatched to the login server to send it to the client.

Admin tools

  • if a game server is detected as down (registered and connected on master, responds to ping of the master), game server will show offline count
  • the login server is for authenticate the player (route and proxy it too), is checked by php script with protocol control (check is read and respond correctly to the protocol, don't responds if master disconnected)
  • Mirror: control if is up, if the pack, file list, file is accessible on all the server, and for some file control the content
  • Backup: control if the backup is correctly done, if the database is well streamed and replicated into realtime into other datacenter
  • Other: Other mix of control as if a bot can connect on the full chain: bot -> login -> game server

Multiplier

The login server act as computing multiplier, cache to limit the master usage and network bandwidth multiplier to transmit the diff to stats clients and full list (huge size) to new normal player client. Computing and network bandwidth is multiplied by login server

Cluster of master

Cluster of master server with query forward, replication copy by diff, conflict management

The needs are:

  • Deterministic
  • Redundancy with automatic management of conflict
  • Asynchronous
  • Each master route the query to the affected master for this id range
  • Master and client have multiple connexion
  • Client have not same power than master, client is splited into login and game server
  • Save the ip/port of all known node, try connect firstly on last active node
  • Minimise the bandwidth (no real time propagation time to help to this)
  • Low number of master nodes
  • Possible direct master communication blocked (reroute via another node)
  • Balance when node is shutdown or detected as down
  • Work over i2p
  • Benchmark pool via tcp ping for the client connected
  • Client save the master node list with number of client connected and max, ping
  • Client need the master node list with hop on each connexion to send to the best connexion the query
  • Master node is prefered be able to do client and server, but can work only only one mode
  • If route change for current request in progress, resend it, duplicate request detected and dropped

Maybe:

  • Dynamic range, and transparent one to multiple node act

Example with partial link fail and network split