Tuesday, 23 August 2016

How To Create Hot Backups of MySQL Databases with Percona XtraBackup on Ubuntu 14.04

Introduction

A very common challenge encountered when working with active database systems is performing hot backups—that is, creating backups without stopping the database service or making it read-only. Simply copying the data files of an active database will often result in a copy of the database that is internally inconsistent, i.e. it will not be usable or it will be missing transactions that occurred during the copy. On the other hand, stopping the database for scheduled backups renders database-dependent portions of your application to become unavailable. Percona XtraBackup is an open source utility that can be used to circumvent this issue, and create consistent full or incremental backups of running MySQL, MariaDB, and Percona Server databases, also known as hot backups.

As opposed to the logical backups that utilities like mysqldump produce, XtraBackup creates physical backups of the database files—it makes a copy of the data files. Then it applies the transaction log (a.k.a. redo log) to the physical backups, to backfill any active transactions that did not finish during the creation of the backups, resulting in consistent backups of a running database. The resulting database backup can then be backed up to a remote location using rsync, a backup system like Bacula, or DigitalOcean backups.

This tutorial will show you how to perform a full hot backup of your MySQL or MariaDB databases using Percona XtraBackup on Ubuntu 14.04. The process of restoring the database from a backup is also covered. The CentOS 7 version of this guide can be found here.

Prerequisites

To follow this tutorial, you must have the following:

Superuser privileges on an Ubuntu 14.04 system
A running MySQL or MariaDB database
Access to the admin user (root) of your database

Also, to perform a hot backup of your database, your database system must be using the InnoDB storage engine. This is because XtraBackup relies on the transaction log that InnoDB maintains. If your databases are using the MyISAM storage engine, you can still use XtraBackup but the database will be locked for a short period towards the end of the backup.

Check Storage Engine

If you are unsure of which storage engine your databases use, you can look it up through a variety of methods. One way is to use the MySQL console to select the database in question, then output the status of each table.

First, enter the MySQL console:


mysql -u root -p

Then enter your MySQL root password.

At the MySQL prompt, select the database that you want to check. Be sure to substitute your own database name here:


USE database_name;

Then print its table statuses:


SHOW TABLE STATUS\G;

The engine should be indicated for each row in the database:


Example Output:
...
*************************** 11. row ***************************
           Name: wp_users
         Engine: InnoDB
...

Once you are done, leave the console:


exit

Let's install Percona XtraBackup.

Install Percona XtraBackup

The easiest way to install Percona XtraBackup is to use apt-get.

Add the Percona repository key with this command:


sudo apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A

Then add the Percona repository to your apt sources:


sudo sh -c "echo 'deb http://repo.percona.com/apt trusty main' > /etc/apt/sources.list.d/percona.list"

sudo sh -c "echo 'deb-src http://repo.percona.com/apt trusty main' >> /etc/apt/sources.list.d/percona.list"

Run this command to update your apt sources:


sudo apt-get update

Finally, you can run this command to install XtraBackup:


sudo apt-get install percona-xtrabackup

XtraBackup consists primarily of the XtraBackup program, and the innobackupex Perl script, which we will use to create our database backups.

First Time Preparations

Before using XtraBackup for the first time, we need to prepare system and MySQL user that XtraBackup will use. This section covers the initial preparation.

System User

Unless you plan on using the system root user, you must perform some basic preparations to ensure that XtraBackup can be executed properly. We will assume that you are logged in as the user that will run XtraBackup, and that it has superuser privileges.

Add your system user to the "mysql" group (substitute in your actual username):


sudo gpasswd -a username mysql

While we're at it, let's create the directory that will be used for storing the backups that XtraBackup creates:


sudo mkdir -p /data/backups

sudo chown -R username: /data

The chown command ensures that the user will be able to write to the backups directory.

MySQL User

XtraBackup requires a MySQL user that it will use when creating backups. Let's create one now.

Enter the MySQL console with this command:


mysql -u root -p

Supply the MySQL root password.

At the MySQL prompt, create a new MySQL user and assign it a password. In this example, the user is called "bkpuser" and the password is "bkppassword". Change both of these to something secure:


CREATE USER 'bkpuser'@'localhost' IDENTIFIED BY 'bkppassword';

Next, grant the new MySQL user reload, lock, and replication privileges to all of the databases:


GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'bkpuser'@'localhost';

FLUSH PRIVILEGES;

These are the minimum required privileges that XtraBackup needs to create full backups of databases.

When you are finished, exit the MySQL console:


exit

Now we're ready to create a full backup of our databases.

Perform Full Hot Backup

This section covers the steps that are necessary to create a full hot backup of a MySQL database using XtraBackup. After ensuring that the database file permissions are correct, we will use XtraBackup to createa backup, then prepare it.

Update Datadir Permissions

On Ubuntu 14.04, MySQL's data files are stored in /var/lib/mysql, which is sometimes referred to as adatadir. By default, access to the datadir is restricted to the mysql user. XtraBackup requires access to this directory to create its backups, so let's run a few commands to ensure that the system user we set up earlier—as a member of the mysql group—has the proper permissions:


sudo chown -R mysql: /var/lib/mysql

sudo find /var/lib/mysql -type d -exec chmod 770 "{}" \;

These commands ensure that all of the directories in the datadir are accessible to the mysql group, and should be run prior to each backup.

If you added your user to the mysql group in the same session, you will need to login again for the group membership changes to take effect.

Create Backup

Now we're ready to create the backup. With the MySQL database running, use the innobackupex utility to do so. Run this command after updating the user and password to match your MySQL user's login:


innobackupex --user=bkpuser  --password=bkppassword --no-timestamp /data/backups/new_backup

This will create a backup of the database at the location specified, /data/backups/new_backup:


innobackupex output
innobackupex: Backup created in directory '/data/backups/new_backup'
150420 13:50:10  innobackupex: Connection to database server closed
150420 13:50:10  innobackupex: completed OK!

Alternatively, you may omit the --no-timestamp to have XtraBackup create a backup directory based on the current timestamp, like so:


innobackupex --user=bkpuser  --password=bkppassword /data/backups

This will create a backup of the database in an automatically generated subdirectory, like so:


innobackupex output — no timestamp
innobackupex: Backup created in directory '/data/backups/2015-04-20_13-50-07'
150420 13:50:10  innobackupex: Connection to database server closed
150420 13:50:10  innobackupex: completed OK!

Either method that you decide on should output "innobackupex: completed OK!" on the last line of its output. A successful backup will result in a copy of the database datadir, which must be prepared before it can be used.

Prepare Backup

The last step in creating a hot backup with XtraBackup is to prepare it. This involves "replaying" the transaction log to apply any uncommitted transaction to the backup. Preparing the backup will make its data consistent, and usable for a restore.

Following our example, we will prepare the backup that was created in /data/backups/new_backup. Substitute this with the path to your actual backup:


innobackupex --apply-log /data/backups/new_backup

Again, you should see "innobackupex: completed OK!" as the last line of output.

Your database backup has been created and is ready to be used to restore your database. Also, if you have a file backup system, such as Bacula, this database backup should be included as part of your backup selection.

The next section will cover how to restore your database from the backup we just created.

Perform Backup Restoration

Restoring a database with XtraBackup requires that the database is stopped, and that its datadir is empty.

Stop the MySQL service with this command:


sudo service mysql stop

Then move or delete the contents of the datadir (/var/lib/mysql). In our example, we'll simply move it to a temporary location:


mkdir /tmp/mysql

mv /var/lib/mysql/* /tmp/mysql/

Now we can restore the database from our backup, "new_backup":


innobackupex --copy-back /data/backups/new_backup

If it was successful, the last line of output should say "innobackupex: completed OK!"

The restored files in datadir will probably belong to the user you ran the restore process as. Change the ownership back to mysql, so MySQL can read and write the files:


sudo chown -R mysql: /var/lib/mysql

Now we're ready to start MySQL:


sudo service mysql start

That's it! Your restored MySQL database should be up and running.

Conclusion

Now that you are able to create hot backups of your MySQL database using Percona XtraBackup, there are several things that you should consider setting up.

First of all, it is advisable to automate the process so you will have backups created according to a schedule. Second, you should make remote copies of the backups, in case your database server has problems, by using something like rsync, a network file backup system like Bacula, or DigitalOcean backups. After that, you will want to look into rotating your backups (deleting old backups on a schedule) and creating incremental backups (with XtraBackup) to save disk space.

Good luck!

Monday, 22 August 2016

Dockerizing a Node.js web app

The goal of this example is to show you how to get a Node.js application into a Docker container. The guide is intended for development, and not for a production deployment. The guide also assumes you have a working Docker installation and a basic understanding of how a Node.js application is structured.

In the first part of this guide we will create a simple web application in Node.js, then we will build a Docker image for that application, and lastly we will run the image as a container.

Docker allows you to package an application with all of its dependencies into a standardized unit, called a container, for software development. A container is a stripped-to-basics version of a Linux operating system. An image is software you load into a container.

Create the Node.js app

First, create a new directory where all the files would live. In this directory create apackage.json file that describes your app and its dependencies:

{
  "name": "docker_web_app",
  "version": "1.0.0",
  "description": "Node.js on Docker",
  "author": "First Last <first.last@example.com>",
  "main": "server.js",
  "scripts": {
    "start": "node server.js"
  },
  "dependencies": {
    "express": "^4.13.3"
  }
}

Then, create a server.js file that defines a web app using the Express.js framework:

'use strict';

const express = require('express');

// Constants
const PORT = 8080;

// App
const app = express();
app.get('/', function (req, res) {
  res.send('Hello world\n');
});

app.listen(PORT);
console.log('Running on http://localhost:' + PORT);

In the next steps, we'll look at how you can run this app inside a Docker container using the official Docker image. First, you'll need to build a Docker image of your app.

Creating a Dockerfile

Create an empty file called Dockerfile:

touch Dockerfile

Open the Dockerfile in your favorite text editor

The first thing we need to do is define from what image we want to build from. Here we will use the latest LTS (long term support) version argon of node available from the Docker Hub:

FROM node:argon

Next we create a directory to hold the application code inside the image, this will be the working directory for your application:

# Create app directory
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

This image comes with Node.js and NPM already installed so the next thing we need to do is to install your app dependencies using the npm binary:

# Install app dependencies
COPY package.json /usr/src/app/
RUN npm install

To bundle your app's source code inside the Docker image, use the COPY instruction:

# Bundle app source
COPY . /usr/src/app

Your app binds to port 8080 so you'll use the EXPOSE instruction to have it mapped by thedocker daemon:

EXPOSE 8080

Last but not least, define the command to run your app using CMD which defines your runtime. Here we will use the basic npm start which will run node server.js to start your server:

CMD [ "npm", "start" ]

Your Dockerfile should now look like this:

FROM node:argon

# Create app directory
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

# Install app dependencies
COPY package.json /usr/src/app/
RUN npm install

# Bundle app source
COPY . /usr/src/app

EXPOSE 8080
CMD [ "npm", "start" ]

Building your image

Go to the directory that has your Dockerfile and run the following command to build the Docker image. The -t flag lets you tag your image so it's easier to find later using thedocker images command:

$ docker build -t <your username>/node-web-app .

Your image will now be listed by Docker:

$ docker images

# Example
REPOSITORY                      TAG        ID              CREATED
node                            argon      539c0211cd76    3 weeks ago
<your username>/node-web-app    latest     d64d3505b0d2    1 minute ago

Run the image

Running your image with -d runs the container in detached mode, leaving the container running in the background. The -p flag redirects a public port to a private port inside the container. Run the image you previously built:

$ docker run -p 49160:8080 -d <your username>/node-web-app

Print the output of your app:

# Get container ID
$ docker ps

# Print app output
$ docker logs <container id>

# Example
Running on http://localhost:8080

If you need to go inside the container you can use the exec command:

# Enter the container
$ docker exec -it <container id> /bin/bash

Test

To test your app, get the port of your app that Docker mapped:

$ docker ps

# Example
ID            IMAGE                                COMMAND    ...   PORTS
ecce33b30ebf  <your username>/node-web-app:latest  npm start  ...   49160->8080

In the example above, Docker mapped the 8080 port inside of the container to the port49160 on your machine.

Now you can call your app using curl (install if needed via: sudo apt-get install curl):

$ curl -i localhost:49160

HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html; charset=utf-8
Content-Length: 12
Date: Sun, 02 Jun 2013 03:53:22 GMT
Connection: keep-alive

Hello world

We hope this tutorial helped you get up and running a simple Node.js application on Docker.

You can find more information about Docker and Node.js on Docker in the following places:

Thursday, 11 August 2016

Storm vs. Spark Streaming: Side-by-side comparison

Overview

Both Storm and Spark Streaming are open-source frameworks for distributed stream processing. But, there are important differences as you will see in the following side-by-side comparison.

Processing Model, Latency

Although both frameworks provide scalability and fault tolerance, they differ fundamentally in their processing model. Whereas Storm processes incoming events one at a time, Spark Streamingbatches up events that arrive within a short time window before processing them. Thus, Storm can achieve sub-second latency of processing an event, while Spark Streaming has a latency of several seconds.

Fault Tolerance, Data Guarantees

However, the tradeoff is in the fault tolerance data guarantees. Spark Streaming provides better support for stateful computation that is fault tolerant. In Storm, each individual record has to be tracked as it moves through the system, so Storm only guarantees that each record will be processed at least once, but allows duplicates to appear during recovery from a fault. That means mutable state may be incorrectly updated twice.

Spark Streaming, on the other hand, need only track processing at the batch level, so it can efficiently guarantee that each mini-batch will be processed exactly once, even if a fault such as a node failure occurs. [Actually, Storm's Trident library also provides exactly once processing. But, it relies on transactions to update state, which is slower and often has to be implemented by the user.]

Storm vs. Spark Streaming comparison.

Summary

In short, Storm is a good choice if you need sub-second latency and no data loss. Spark Streaming is better if you need stateful computation, with the guarantee that each event is processed exactly once. Spark Streaming programming logic may also be easier because it is similar to batch programming, in that you are working with batches (albeit very small ones).

Implementation, Programming API

Implementation

Storm is primarily implemented in Clojure, while Spark Streaming is implemented in Scala. This is something to keep in mind if you want to look into the code to see how each system works or to make your own customizations. Storm was developed at BackType and Twitter; Spark Streaming was developed at UC Berkeley.

Programming API

Storm comes with a Java API, as well as support for other languages. Spark Streaming can be programmed in Scala as well as Java.

Batch Framework Integration

One nice feature of Spark Streaming is that it runs on Spark. Thus, you can use the same (or very similar) code that you write for batch processing and/or interactive queries in Spark, on Spark Streaming. This reduces the need to write separate code to process streaming data and historical data.

Storm vs. Spark Streaming: implementation and programming API.

Summary

Two advantages of Spark Streaming are that (1) it is not implemented in Clojure :) and (2) it is well integrated with the Spark batch computation framework.

Production, Support

Production Use

Storm has been around for several years and has run in production at Twitter since 2011, as well as at many other companies. Meanwhile, Spark Streaming is a newer project; its only production deployment (that I am aware of) has been at Sharethrough since 2013.

Hadoop Distribution, Support

Storm is the streaming solution in the Hortonworks Hadoop data platform, whereas Spark Streaming is in both MapR's distribution and Cloudera's Enterprise data platform. In addition, Databricks is a company that provides support for the Spark stack, including Spark Streaming.

Cluster Manager Integration

Although both systems can run on their own clusters, Storm also runs on Mesos, while Spark Streaming runs on both YARN and Mesos.

Storm vs. Spark Streaming: production and support.

Summary

Storm has run in production much longer than Spark Streaming. However, Spark Streaming has the advantages that (1) it has a company dedicated to supporting it (Databricks), and (2) it is compatible with YARN.

Tuesday, 23 August 2016

How To Create Hot Backups of MySQL Databases with Percona XtraBackup on Ubuntu 14.04

Introduction

Prerequisites

Check Storage Engine

Install Percona XtraBackup

First Time Preparations

System User

MySQL User

Perform Full Hot Backup

Update Datadir Permissions

Create Backup

Prepare Backup

Perform Backup Restoration

Conclusion

Monday, 22 August 2016

Dockerizing a Node.js web app

Create the Node.js app

Creating a Dockerfile

Building your image

Run the image

Test

Thursday, 11 August 2016

Storm vs. Spark Streaming: Side-by-side comparison

Overview

Processing Model, Latency

Fault Tolerance, Data Guarantees

Summary

Implementation, Programming API

Implementation

Programming API

Batch Framework Integration

Summary

Production, Support

Production Use

Hadoop Distribution, Support

Cluster Manager Integration

Summary

Further Reading

The best ways to connect to the server using Angular CLI