Monday 1 August 2011

PostgreSQL 9.1, meet MySQL

So having spent the last few months with my head buried deep in a project at work, I finally managed to get back to my previous hacking on SQL/MED at the weekend after the kids went away for a week (good $DEITY it's quiet here)! Within a couple of hours, I had my half-baked Foreign Data Wrapper for MySQL up and running, and am now able to create foreign table objects in PostgreSQL 9.1 that map either directly to tables in a MySQL database, or to queries run on the MySQL server.

Here's an example:

-- Create the required functions for the FDW.
CREATE FUNCTION mysql_fdw_handler()
RETURNS fdw_handler
AS '$libdir/mysql_fdw'
LANGUAGE C STRICT;

CREATE FUNCTION mysql_fdw_validator(text[], oid)
RETURNS void
AS '$libdir/mysql_fdw'
LANGUAGE C STRICT;

-- Create the data wrapper or "transport".
CREATE FOREIGN DATA WRAPPER mysql_fdw
HANDLER mysql_fdw_handler
VALIDATOR mysql_fdw_validator;

-- Create the foreign server, a pointer to the MySQL server.
CREATE SERVER mysql_svr
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (address '127.0.0.1', port '3306');

-- Create one or more foreign tables on the MySQL server. The first of
-- these maps to a remote table, whilst the second uses an SQL query.
CREATE FOREIGN TABLE employees (
id integer,
name text,
address text)
SERVER mysql_svr
OPTIONS (table 'hr.employees');

CREATE FOREIGN TABLE ex_staff (
id integer,
name text,
address text)
SERVER mysql_svr
OPTIONS (query 'SELECT * FROM hr.employees WHERE date_left IS NOT NULL');


-- Create a user mapping to tell the FDW the username/password to
-- use to connect to MySQL, for PUBLIC. This could be done on a per-
-- role basis.
CREATE USER MAPPING FOR PUBLIC
SERVER mysql
OPTIONS (username 'dpage', password '');

So let's try it out. Here's the test table in MySQL:

mysql> SELECT * FROM employees;
+----+-------------+----------------------------+------------+
| id | name | address | date_left |
+----+-------------+----------------------------+------------+
| 1 | Dave Page | 27 High Street, Georgetown | NULL |
| 2 | Fred Bloggs | 46 Mill Road, Klasborough | NULL |
| 3 | Fred Bloggs | 24 The Wharf, Westhampton | 2010-05-23 |
+----+-------------+----------------------------+------------+
3 rows in set (0.00 sec)

And here we are in PostgreSQL 9.1 beta 3:

postgres=# SELECT * FROM employees;
id | name | address
----+-------------+----------------------------
1 | Dave Page | 27 High Street, Georgetown
2 | Fred Bloggs | 46 Mill Road, Klasborough
3 | Fred Bloggs | 24 The Wharf, Westhampton
(3 rows)

postgres=# SELECT * FROM ex_staff;
id | name | address
----+-------------+---------------------------
3 | Fred Bloggs | 24 The Wharf, Westhampton
(1 row)

For the curious, here's what the EXPLAIN output looks like:

postgres=# EXPLAIN SELECT * FROM employees;
QUERY PLAN
----------------------------------------------------------------
Foreign Scan on employees (cost=10.00..13.00 rows=3 width=68)
Local server startup cost: 10
MySQL query: SELECT * FROM hr.employees
(3 rows)

Pretty neat huh? There are a couple of limitations in the current implementation:
  • No attempt is currently made to push down quals (WHERE clauses) to the MySQL server, so every row MySQL finds is returned to PostgreSQL and filtered there. There's no defined API for this in PostgreSQL yet, and it's not immediately clear how to build something more complex than the simple example I used in my Redis FDW that would be required for a remote relational database. That said, you can build WHERE clauses into the foreign table definition of course.
  • The MySQL C API doesn't seem to offer a simple way to either randomly access a result set, or at least reset the cursor to the first row, unless you copy the entire resultset to the client (PostgreSQL in this case). Because we need to be able to return to the first row if PostgreSQL calls the Rescan function, we therefore currently copy the entire resultset, rather than reading it from the server, on demand.
Aside from minor tweaks, this is probably about as far as I'll take this little project for now. I'll be talking about it at both Postgres Open 2011 in Chicago in September, and PGBR 2011 in São Paulo in November - hopefully I'll see you there.

The MySQL FDW source code is available on Github, and uses the PostgreSQL licence.

2 comments:

  1. It's great that you're doing this, but it won't actually be readier for production than DBI-Link, which preceded it by seven years, until it actually pushes quals and/or pulls only part of the result set into memory. The point of doing these things using server APIs is to have them do things that couldn't be done in a client such as DBI-Link.

    That said, it's very exciting to see this finally come closer to fruition. Kudos!

    ReplyDelete
  2. Have you tried the more direct HANDLER interface? This way you could use an index.

    ReplyDelete