Syncing date centres

nfored · Mar 25, 2011

TheRealAndyCook;4994317; said:
So lets say we have 3 datacenters each named after its physical location.

Canada.site.com
Usa.site.com
Uk.site.com

When a user connects to site.com it trys to forward them to the closet site possible.

So, Andy who is a canadian lands on Canada.site.com

His request for index goes to apache, apache then goes to the sql server, located at sql.canada.site.com, then the sql data goes back to apache and is visible in Andys browser.

So, Jeff who is an American lands on Usa.site.com

Like andy, his request for index goes to apache, then to sql.usa.site.com, then the sql data goes back to apache and is visable in Jeffs browser.

Now John has been using the site and is about to make a update.

Johns data gets sent from the apache server @ Uk.site.com, apache tells sql at sql.uk.site.com to update with the new data, then John can see the data when he views the page.

HOWEVER, when Andy Or Jeff visit the page, they will not see johns update, because they are not on the same servers they will not be loading from the same SQL server.

Similarally if Andy or Jeff make an update neither will see it.

This is where we implement the syncing script. When John made his update, the apache server at uk.site.com also sent out a "warning" to canada.site.com and us.site.com that a change was made and where the change is located.

Canada.site.com and us.site.com then go an get the new data and update there respective sql servers.

The whole update takes appropriately 2 times the total latency from server to server plus the latency from sql to apache. Since the databases are typically located on fibre backbones, the update MAY propagate Faster then the original user will get the original page back.

(thus taking that math into effect a fast users in the states might actually see the change before the user making the change in the uk)

I assume you would be using GeoIP bind for proper resolution. Your sql connections depending on the frequency of database changes, and the size of those changes could be setup using MySQL clustering. Where each DC would have a pair of mysql data nodes that replicate across the wire as it happens. You would also then have Cluster manager server in each of the data centers which would talk to all the data nodes and each of the other cluster managers. The data nodes in a MySQL cluster do most of the heavy lifting while the actual MySQL nodes in the MySQL cluster process the request through the data nodes, the MySQL node could reside on the LAMP server. In additon the MySQL nodes in each data center would communicate with the MySQL data node in that data center; with the ability fail over to one of hte other DC to provide reduced performance access to The DC with the failed node. All of this is of course Dependant on the quality and speed of your link between the data centers.

This leaves the data on the web servers. From the sounds of it you copying data from one dc to the other to keep them in sync. Here you could setup File servers in each data center, setup replication. It's a bit tricky since you have 3 DC you need to replicate, if you only had two you could use something like DRBD. Again all of this depends on the quality and speed of the link that you would dedicate between the DC's.

From my experience File replication across DC can work well with minimal latency If you have a high quality dedicated link. However with a multi primary clustered file system such as this is describing, you would need to be able to quickly detect and restore a failed file server to mitigate possible split brain damage.

With that said, if I had to set this up myself, depending on the actual content type I would do something like this.

A central location that houses the Core File Server and Core SQL server. The US and UK locations would house a small Varnish/MemCache cluster, this cluster would provide caching for the US and UK location for the most used content, including DB content. Then you would only be traveling back for the obscure information. This would assume the sites are mostly dynamic php sites that are backed by mysql. The varnish memcache would then be controlled by the main location, upon updates only that section of the varnish cache would be cleared, same with memcache. For example on a typical word press site once configured with varnish and memcache never hits the apache server unless the cache is flushed. Mysql is hit slighly more often do to the cache expiring, but this is manageable.

TheRealAndyCook · Mar 25, 2011

I'm slightly confused by your first paragraph.
However, this is meant to be a LAMP mesh.

Allows for scalability while not being hardware, os, set up dependant.

when a change is made to a node, it sends the change to every other server.

When a node gets a change request it does it then replies back

if a node doesn't get a reply within 5 minutes, it resends the change, if it doesn't reply again it stores that as an update the server didn't complete.

a node will ping the failed node every 15 minutes until it replies, at this time it repeats the update process.

nfored · Mar 25, 2011

I would be interested in seeing more technical details of your setup, and your script if you like to share. I am Level 2 admin at a dedicated hosting company, I have setup a few clusters but never any cross data centers production clusters.

Do you run heartbeat or something newer like Pacemaker?

How are you directing traffic the the appropriate DC using GeoIP enabled Bind? or Is it done through a Load Balancer?

Have you thought about setting up caching? I have found that while caching doesn't improve my sites load time a whole lot, it increase the amount of simultaneous connection I can sustain by 100 fold. For example if I where to run an apache bench for say 10000 connection with 200 concurrent connection from a server in the data center to remove the bandwidth bottle neck I never seem my server load average spike above 0.1.

The following test didn't even bump my load above 0.0, Had this been a Gigabit connection within the data center I would have seen even faster load times, as you can see I was restricted by the 100mbps port speed, where as this same setup without the cashing would have began overloading the server well before it reached the bandwidth cap.

[*****@dssh ~]$ ab -n 10000 -c 200 http://*****/blog/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking ******** (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Finished 10000 requests

Server Software: ******
Server Hostname: *********
Server Port: 80

Document Path: /blog/
Document Length: 46919 bytes

Concurrency Level: 200
Time taken for tests: 44.171994 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 474247979 bytes (452.278117 megabytes)
HTML transferred: 469190000 bytes
Requests per second: 226.39 [#/sec] (mean)
Time per request: 883.440 [ms] (mean)
Time per request: 4.417 [ms] (mean, across all concurrent requests)
Transfer rate: 10484.74 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 3 67.1 2 3003
Processing: 10 17 11.4 17 267
Waiting: 1 2 1.4 3 14
Total: 11 21 68.0 19 3021

Percentage of the requests served within a certain time (ms)
50% 19
66% 20
75% 21
80% 21
90% 22
95% 23
98% 24
99% 25
100% 3021 (longest request)

bbortko · Mar 25, 2011

Am I correct to assume this is a yad more advance than googling smut films

TheRealAndyCook · Mar 25, 2011

Its nothing special really but here would be more in-depth explanation.

Problem 1: server_a gets an update and needs to update other servers.
server_a sends a http request to every other server, something like this

update.php?action=delete_post&id=12&api_key=1234567890

server b then processes that command, deleting post 12 if the api key matchs the server it was sent from.

Problem 2: server_a AND server_b get a new post created at the same time:

Logic would tell us if we have a table with the ID table slowly increasing that if two entries were made at once we would then have a collision. To solve this we allocate tables to servers.

Server_a might get post id 1-100, then server_b might get 101-200

It does create a SMALL issue with data Technically being entered out of order, but we should be sorting data by timestamp any ways since we've stored that.

Problem 3: server goes offline

I discussed this above but after a server runs threw its update.php it sends a request back to the other server and says that it compleated "delete_post 12". The master server then knows it no longer needs to tell that server to delete that post and removes it from its list of updates.

(Updates are stored in another table on the sql database, a small amount of overhead, but when you think about it were really accomplishing alot here)

problem 4: latency

The latency is really now down to what ever protocol we use to send the updates from each server. This is a small issue i've yet to test fully, cURL should be able to handle small requests...tho I would expect that I should be able to use FTP to send bigger files. However I haven't known FTP to be the fastest protocal so it'll really be up to how cURL preforms for each server.

Of course it would be possible to define how each client "gets" its updates sent to it. If we really wanted we could run a server off updates via IMAP :WHOA:

However im looking for something to actually be useful this isn't a science fair project

bbortko;4996110; said:
Am I correct to assume this is a yad more advance than googling smut films

Only slightly

nfored · Mar 25, 2011

When I ran that same test on a LVS load balanced pair of Nginx servers with a dedicated database server, and memcache for the database and php. I was able to load 1000 concurrent connection across 2 servers with a load of .4 on the web servers and .1 on the LVS.

TheRealAndyCook · Mar 25, 2011

I assume your talking about cURL? and the latency problem to over come?
Im not super worried about it, as long as updates don't take over a minute to propagate I wont get worried. And, tbh i wouldn't expect them to take over a few seconds since I do plan on caching everything I can so sql performance should be fine

nfored · Mar 25, 2011

TheRealAndyCook;4996127; said:
Its nothing special really but here would be more in-depth explanation.

Problem 1: server_a gets an update and needs to update other servers.
server_a sends a http request to every other server, something like this

update.php?action=delete_post&id=12&api_key=1234567890

server b then processes that command, deleting post 12 if the api key matchs the server it was sent from.

Problem 2: server_a AND server_b get a new post created at the same time:

Logic would tell us if we have a table with the ID table slowly increasing that if two entries were made at once we would then have a collision. To solve this we allocate tables to servers.

Server_a might get post id 1-100, then server_b might get 101-200

It does create a SMALL issue with data Technically being entered out of order, but we should be sorting data by timestamp any ways since we've stored that.

Problem 3: server goes offline

I discussed this above but after a server runs threw its update.php it sends a request back to the other server and says that it compleated "delete_post 12". The master server then knows it no longer needs to tell that server to delete that post and removes it from its list of updates.

(Updates are stored in another table on the sql database, a small amount of overhead, but when you think about it were really accomplishing alot here)

problem 4: latency

The latency is really now down to what ever protocol we use to send the updates from each server. This is a small issue i've yet to test fully, cURL should be able to handle small requests...tho I would expect that I should be able to use FTP to send bigger files. However I haven't known FTP to be the fastest protocal so it'll really be up to how cURL preforms for each server.

Of course it would be possible to define how each client "gets" its updates sent to it. If we really wanted we could run a server off updates via IMAP However im looking for something to actually be useful this isn't a science fair project

Only slightly

I would use rsync over ftp if the changes where not frequent.

I have a question about this.
update.php?action=delete_post&id=12&api_key=1234567890

Does this API key change? I see you said it must match the server, but if it doesn't change often you could be at high risk of attack through this. Unless of course you mean that the API key must match whats listed for the source ip the request comes from.

Problem 2 was common for one of our customers that had a web farm. We did almost exactly what you are talking about. Recently I have been playing with MySQL Cluster which is a separate package from MySQL server. This would allow you to eliminate your #2 problem, but requires more setup and would need low latency high speed connections between data centers to effectively replicate.

As far as the clients getting updates, what I am currently working on might not be useful but it might give you an idea. I am working on makeing the Plesk Control panel fully cluster able with automatic fail-over. The problem I ran into with this is updating the system users across servers. The solution I came up with was two parts, first a shared database, that listed every node, and kept a history of all changes from the beginning of the cluster until present. This database also listed what servers have gotten what update.

So anytime there was an update on one of the nodes it would update the database with the update, and update its history in the database to indicate that it has this update. It think issues a check command to the other nodes, each node logs into the database and compares the updates against its own history, and applys any changes needed. This gave me the benefit of being able to simply add another server to the cluster, that server would check the database and auto update itself.

GhostShrimpMan · Mar 25, 2011

Hey TRAC, you as good with phones?

TheRealAndyCook · Mar 25, 2011

GhostShrimpMan;4996167; said:
Hey TRAC, you as good with phones?

mobile? or as in telco routing?

And, nfored, i didn't notice you updated your post, but im sure you got most of what you asked from my other post. Users get sent to which ever server by ip/load/latency... however the system isn't in place right now, server selection is random at the moment.

updates can only be done from trusted locations, the IP should match the api key.

servers can, however, if not in a trusted location, just get copies of the data basically allowing them to be a mirror with constantly fresh data.

With leasing rows to different servers, it allows us to scale and not worry about comparability. The server just needs to be able to contact another server with the latest lease table.

I do like your "changelog" idea...
maybe i'll just send an "update" notice from the master to each node, then the node goes and checks to see what exactly has been changed. Loading up a xml like file with the changes.

Search

Search

Syncing date centres

nfored

Fire Eel

TheRealAndyCook

Gambusia

nfored

Fire Eel

bbortko

Polypterus

TheRealAndyCook

Gambusia

nfored

Fire Eel

TheRealAndyCook

Gambusia

nfored

Fire Eel

GhostShrimpMan

Fire Eel

TheRealAndyCook

Gambusia