Recently one of my application servers developed an intermittent issue where the MaraiDB instance running on it would randomly fail. Apache however would stay up, so when my original status check page was hit by the Layer7 HAProxy check, it would get a valid response from my application.
My current cluster configuration has an instance of Apache and MariaDB Galera Cluster running on each application server. This allows for very fast database connections since each application is connecting to a local instance of the database. MariaDB Galara Cluster deals with the multi-master replication between each node. This method has reduced my fear of an individual database server failing out bringing down the whole cluster.
Example HAProxy Config (/etc/haproxy/haproxy.cfg):
The important part here is line #35 "option httpchk GET /who.php" the rest of this config file is all pretty standard HAProxy LB Stuff. HAProxy will check the "who.php" on each backend server, if it gets a valid response it will use that backend server in the load balancing pool.
Example Apache/MySQL Status Check Page (who.php):
With this status check page, if the script can't connect to MySQL (be sure to supply valid credentials in the check) it will return a simple 404 error page, HAProxy will see this and mark this backend server as "down" and will stop directing traffic to it. Using the "option redispatch" option in the HAProxy config will cause my users requests to be directed to another available backend server.
While down HAProxy will keep checking the "who.php" page, once MySQL recovers and the script is no longer returning an error HAProxy will automatically mark this backend as "up" and will begin directing traffic to it again.
An important note, each time HAProxy calls this status check page there is some additional overhead as each call results in a MySQL connection. While this MySQL connection isn't running an actual query there will definitely be a slower response than if the MySQL check didn't happen at all. Depending on your own unique infrastructure you may want to adjust your HAProxy Rise, Fall, Inter settings. Adjusting Rise and Fall determine how many passed/failed checks must occur before HAProxy brings the server up or down. Inter sets the number of milliseconds between status checks. The default Inter value is 2000 milliseconds (2 seconds). Adjusting these three may allow for a few occasional errors that may not significantly impact the user experience. We also may want to slow the check frequency, due to the additional MySQL load. These values will need to be tuned and adjusted to your specific application needs and cluster resiliency.