恒｜知識: 2月 2015

2015年2月27日星期五

MySQL: Another Ranking trick

I just read SQL: Ranking without self join, in which Shlomi Noach shares a nice MySQL-specific trick based on user-defined variables to compute rankings.

Shlomi's trick reminds me somewhat of the trick I came across little over a year ago to caclulate percentiles. At that time, several people pointed out to me too that using user-defined variables in this way can be unreliable.

The problem with user-defined variables

So what is the problem exaclty? Well, whenever a query assigns to a variable, and that same variable is read in another part of the query, you're on thin ice. That's because the result of the read is likely to differ depending on whether the assignment took place before or after the read. Not surprising when you think about it - the whole point of variable assignment is to change its value, which by definition causes a different result when subsequently reading the variable (unless you assigned the already assigned value of course, duh...).

Now watch that previous statement clearly - the word subsequently is all-important.

See, that's the problem. The semantics of a SQL SELECT statement is to obtain a (tabular) resultset - not specifying an algorithm to construct that resultset. It is the job of the RDBMS to figure out an algorithm and thus, you can't be sure in what order individual expressions (including variable evaluation and assignment) are executed.

The MySQL manual states it like this:

The order of evaluation for user variables is undefined and may change based on the elements contained within a given query. In SELECT @a, @a := @a+1 ..., you might think that MySQL will evaluate @a first and then do an assignment second, but changing the query (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may change the order of evaluation.

The general rule is never to assign a value to a user variable in one part of a statement and use the same variable in some other part of the same statement. You might get the results you expect, but this is not guaranteed.

So what good are these variables anyway?

On the one hand, this looks really lame: can't MySQL just figure out the correct order of doing the calulations? Well, that is one way of looking at it. But there is an equally valid reason not to do that. If the calculations would influence execution order, it would drastically lessen the number of ways that are available to optimize the statement.

This begs the question: Why is it possible at all to assign values to the user-defined variables? The answer is quite simple: you can use it to pass values between statetments. My hunch is the variables were created in the olden days to overcome some limitations resulting from the lack of support for subqueries. Having variables at least enables you to execute a query and assign the result temporarily for use in a subsequent statement. For example, to find the student with the highest score, you can do:

mysql> select @score:=max(score) from score;
+--------------------+
| @score:=max(score) |
+--------------------+
|                 97 |
+--------------------+
1 row in set (0.00 sec)

mysql> select * from score where score = @score;
+----------+--------------+-------+
| score_id | student_name | score |
+----------+--------------+-------+
|        2 | Gromit       |    97 |
+----------+--------------+-------+
1 row in set (0.03 sec)

There is nothing wrong with this approach - problems start arising only when reading and writing the same variable in one and the same statement.

Another way - serializing the set with `GROUP_CONCAT`

Anyway, the percentile post I just linked to contains another solution for that problem that relies on GROUP_CONCAT. It turns out we can use the same trick here.

(Some people may like to point out that using GROUP_CONCAT is not without issues either, because it may truncate the list in case the pre-assigned string buffer is not large enough. I wrote about dealing with that limitation in several places and I remain recommending to set the group_concat_max_len server variable to the value set for the max_packet_size server variable like so:

SET @@group_concat_max_len := @@max_allowed_packet;

)

The best way to understand how it works is to think of the problem in a few steps. First, we make an ordered list of all the values we want to rank. We can do this with GROUP_CONCAT like this:

mysql> SELECT  GROUP_CONCAT(
    ->             DISTINCT score
    ->             ORDER BY score  DESC
    ->         )                   AS scores
    -> FROM    score
    -> ;
+-------------+
| scores      |
+-------------+
| 97,95,92,85 |
+-------------+
1 row in set (0.00 sec)

Now that we have this list, we can use the FIND_IN_SET function to look up the position of any particlar value contained in the list. Because the list is ordered in descending order (due to the ORDER BY ... DESC), and contains only unique values (due to the DISTINCT), this position is in fact the rank number. For example, if we want to know the rank of all scores with the value 92, we can do:

mysql> SELECT FIND_IN_SET(92, '97,95,92,85')
+--------------------------------+
| FIND_IN_SET(92, '97,95,92,85') |
+--------------------------------+
|                              3 |
+--------------------------------+
1 row in set (0.00 sec)

So, the answer is 3 because 92 is the third entry in the list.

(If you're wondering how it's possible that we can pass the integer 92 as first argument for FIND_IN_SET: the function expects string arguments, and automatically converts whichever non-string typed value we pass to a string. In the case of the integer 92, it is silently converted to the string '92')

Of course, we are't really interested in looking up ranks for individual numbers one at a time; rather, we'd like to combine this with a query on the scores table that does it for us. Likewise, we don't really want to manually supply the list of values as a string constant, we want to substitute that with the query we wrote to generate that list.
So, we get:

mysql> SELECT score_id, student_name, score
    -> ,      FIND_IN_SET(
    ->            score
    ->        ,  (SELECT  GROUP_CONCAT(
    ->                        DISTINCT score
    ->                        ORDER BY score  DESC
    ->                    )
    ->            FROM    score)
    ->        ) as rank
    -> FROM   score;
+----------+--------------+-------+------+
| score_id | student_name | score | rank |
+----------+--------------+-------+------+
|        1 | Wallace      |    95 |    2 |
|        2 | Gromit       |    97 |    1 |
|        3 | Shaun        |    85 |    4 |
|        4 | McGraw       |    92 |    3 |
|        5 | Preston      |    92 |    3 |
+----------+--------------+-------+------+
5 rows in set (0.00 sec)

Alternatively, if you think that subqueries are for the devil, you can rewrite this to a CROSS JOIN like so:

SELECT      score_id, student_name, score
,           FIND_IN_SET(
                score
            ,   scores
            ) AS rank
FROM        score
CROSS JOIN (SELECT  GROUP_CONCAT(
                       DISTINCT score
                       ORDER BY score  DESC
                    ) AS scores
            FROM    score) scores

Now that we have a solutions, lets see how it compares to Shlomi's original method. To do this, I am using the payment table from the sakila sample database.

First, Shlomi's method:

mysql> SELECT   payment_id
    -> ,        amount
    -> ,        @prev := @curr
    -> ,        @curr := amount
    -> ,        @rank := IF(@prev = @curr, @rank, @rank+1) AS rank
    -> FROM     sakila.payment
    -> ,       (SELECT @curr := null, @prev := null, @rank := 0) sel1
    -> ORDER BY amount DESC;
+------------+--------+----------------+-----------------+------+
| payment_id | amount | @prev := @curr | @curr := amount | rank |
+------------+--------+----------------+-----------------+------+
|        342 |  11.99 |           NULL |           11.99 |    1 |
.        ... .  ..... .          ..... .           ..... .    . .
|      15456 |   0.00 |           0.00 |            0.00 |   19 |
+------------+--------+----------------+-----------------+------+
16049 rows in set (0.09 sec)

Wow! It sure is fast :) Now, the GROUP_CONCAT solution, using a subquery:

mysql> SELECT payment_id, amount
    -> ,      FIND_IN_SET(
    ->            amount
    ->        ,  (SELECT  GROUP_CONCAT(
    ->                        DISTINCT amount
    ->                        ORDER BY amount  DESC
    ->                    )
    ->            FROM    sakila.payment)
    ->        ) as rank
    -> FROM   sakila.payment
+------------+--------+------+
| payment_id | amount | rank |
+------------+--------+------+
|          1 |   2.99 |   15 |
.          . .   .... .   .. .
|      16049 |   2.99 |   15 |
+------------+--------+------+
16049 rows in set (0.14 sec)

(In case you're wondering why the results are different, this is because the result set for Shlomi's solution is necessarily ordered by ascending rank (or descending amount - same difference. To obtain the identical result, you need to add an ORDER BY clause to my query. But since the point was to calculate the ranks, I didn't bother. Of course, adding an ORDER BYcould slow things down even more.)

Quite a bit slower, bummer. But at leastt we can't run into nasties with the user variables anymore. For this data set, I get about the same performance with the CROSS JOIN, but I should warn that I did not do a real benchmark.

Conclusion

Don't fall into the trap of reading and writing the same user-defined variable in the same statement. Although it seems like a great device and can give you very good performance, you cannot really control the order of reads and writes. Even if you can, you must check it again whenever you have reason to believe the query will be solved differently by the server. This is of course the case whenever you upgrade the server. But also seemingly harmless changes like adding an index to a table may change the order of execution.

Almost all cases where people want to read and write to the same user variables within the same query, they are dealing with a kind of serialization problem. They are trying to maintain state in a variable in order to use it across rows. In many cases, the right way to do that is to use a self-join. But this may not always be feasible, as pointed out in Shlomi's original post. For example, rewriting the payment rank query using a self join is not going to make you happy.

Often, there is a way out. You can use GROUP_CONCAT to serialize a set of rows. Granted, you need at least one pass for that, and another one to do something useful with the result, but this still a lot better than dealing with semi-cartesian self join issues.

2015年2月25日星期三

Something awesome in InnoDB -- the insert buffer

The InnoDB insert buffer significantly reduces the disk IO required to support a change intensive workload when the database does not fit in the buffer pool. Eventually I must begin calling it the change buffer as it does more in MySQL 5.5.

The insert buffer reduces the disk IO required to maintain secondary indexes. For many servers that I care about it reduces secondary index maintenance IO by a factor of 2 to 5 -- something between 2:1 and 5:1. Someone reported a server today where the savings was 16:1. On the benchmark I describe in this note, the insert buffer makes InnoDB 36X faster and that rate is likely to approach 80X faster if I let the test run to completion. I think it can do even better with smarter management of the LRU.

Benchmarks provide a simple way to demonstrate the performance. I ran the Insert Benchmark on two servers with InnoDB. Both servers used a 16G buffer pool and loaded 1B rows using iibench.py and one connection. Each server can do ~1600 reads/second across 8 SAS disks and ~200 reads/second to 1 disk.

With the insert buffer enabled the test completed in ~65 hours. The row insert rate over the last 1M rows was ~2500/second. The row insert rate over the 1M rows from 595M to 596M was ~4500.

Rows inserted per second with the insert buffer enabled

Rows inserted per second with the insert buffer enabled and log scale for the y-axis

With the insert buffer disabled the test is still running after 450 hours. It has inserted ~596M rows and the row insert rate over the last 1M rows was ~125/second. I expect the rate to drop to ~30/second at which point the insert buffer makes InnoDB 80X faster.

Rows inserted per second with the insert buffer disbled

Rows inserted per second with the insert buffer disbled and the log scale for the y-axis

The poor performance from InnoDB with the insert buffer disabled isn't a bug in InnoDB. Any database engine that does secondary index maintenance via update-in-place at statement completion will have similar performance. Alas, while you can repeat my results with Percona Server you cannot with official MySQL until feature request 59214 is fixed. The benefits from the insert buffer stop as soon as it gets full and for change intensive workloads it tends to stay full once it gets full because the background IO thread does not merge changes from it fast enough.

Rows inserted per second for both

Rows inserted per second for both and log scale for the y-axis

High Rate insertion with MySQL and Innodb

I again work with the system which needs high insertion rate for data which generally fits in memory. Last time I worked with similar system it used MyISAM and the system was built using multiple tables. Using multiple key caches was the good solution at that time and we could get over 200K of inserts/sec.

This time I worked with Innodb tables… it was a different system with different table structure, not to mention different hardware so It can’t be compared directly, still it is nice to see you can get the numbers as high with Innodb too.

I will spare you all experiments we went through and just share final numbers. On 8 core Opteron Box we were able to achieve 275K inserts/sec at which time we started to see load to get IO bound because of log writes and flushing dirty buffers. I’m confident you can get to 400K+ inserts/sec on faster hardware and disks (say better RAID or Flash) which is a very cool number. Of course, mind you this is in memory insertion in the simple table and table with long rows and bunch of indexes will see lower numbers.

So what’s the deal ? First MySQL 5.5 (frankly I did not try Percona Server 5.1 in this case) With MySQL 5.1 and Innodb Plugin we could see 40%+ CPU wasted on mutex spinlocks (per oprofile), which went down to about 15% in MySQL 5.5.8 with 8 concurrent threads. This both shows there is a substantial gains as well as room for more performance optimizations. Dmitri has good suggestions on tuning MySQL 5.5 and this is what I used for start. Using multiple buffer pools with innodb_buffer_pool_instances=8 was very important.

Second thing – Partitioning. Unfortunately MySQL 5.5 leaves the huge bottleneck for write workloads in place – there is per index rw lock, so only one thread can insert index entry at the time, which can be significant bottleneck. We got 2x+ better performance by hash partitioning table by one of the columns and I would expect gains can be higher with more cores. PARTITION BY HASH(col) PARTITIONS 8 is what we used. This looks like a good workaround but remember partitioning can impact performance of your select queries dramatically.

The inserts in this case of course are bulk inserts… using single value inserts you would get much lower numbers. In fact we used load data infile which is one of the ways to get a great performance (the competing way is to have prepared bulk insert statements).

We need to try new Percona Server 5.5 on our Cisco box to see if we can get to 500K inserts/sec – this can be a nice round number

MySQL Partitioning – can save you or kill you

I wanted for a while to write about using MySQL Partitioning for Performance Optimization and I just got a relevant customer case to illustrate it. First you need to understand how partitions work internally. Partitions are on the low level are separate table. This means when you’re doing lookup by partitioned key you will look at one (or some of) partitions, however lookups by other keys will need to perform lookup in all partitions and hence can be a lot slower. The gain from updates typically comes from having smaller BTREE on the active partition(s) which allows for a lot better fit. Having potentially fewer level in BTREE is not that significant issue.

So lets see at example:

Shell

1

2

3

4

5

6

7

 CREATE TABLE `tbl` (

  `id` bigint(20) unsigned AUTO_INCREMENT NOT NULL,

  `uu` varchar(255) DEFAULT NULL,

  `data` bigint(20) unsigned DEFAULT NULL,

  PRIMARY KEY (`id`),

  KEY `uu` (`uu`),

) ENGINE=InnoDB

The access pattern to this table is to lookup data by “uu” which has UUID values and when number of deletes by “id” and bunch of inserts. The deletes are mainly clustered around most recent id values.
The table (and index) is much larger than buffer pool size.

The first problem was replication lag, which are mainly due to modifying the uu index. This is because UUID() spreads values prefix very well effectively giving almost uniform access to all BTREE. To solve this problem partitioning was a good choice – PARTITION BY HASH (id div 10000000) PARTITIONS 32 – This allows to partition data to 32 partitions placing sequential ranges of 10M values in the same partition – very handy if you have very active access to values which ave been added to the table recently.

Using this trip replication could be speed up about 10 times as couple of partitions which were actively used could fit in buffer pool completely so replication became CPU bound (single thread) instead of IO bound.

You could celebrate but hey…. you need to check the impact on master too. Master in its turn was getting a lot of lookups by the uu value which is not part of partitioned key and hence we’re looking at 32 logical lookups, one per partition. True only one of the partitions would contain the value but many of them will require physical IO and going down to the leaf key to verify such value does not exist, which reduced performance for random selects by UUID from 400 to 20 per second (from single thread).

Decreasing number of partitions made replication less efficient but the number of selects the table could deliver was increasing and there seems to be a reasonable number which would allow replication to perform better when it is now, while selects still performed in the amount system needs.

What is a take away ? When you’re creating partitions think clearly what you’re trying to archive. Partitioning is not some magic feature which just makes everything a lot faster. I’ve seen some people applying partition to basically all of their tables without much a thought and believe me results were not pretty.

2015年2月23日星期一

Browser Detection With PHP Browscap: Have a Better Solution Than GET_BROWSER()

Basically, long time ago, browscap.ini and browscap.dll were created by Microsoft as an easy way to get information about the user’s browser.

Browser detection these days is a mess! With over 100,000 known browser userAgents we need an easier way of getting info about the user’s browser without checking all of the known userAgents manually.

PHP has a built in function for using browscap.ini; but unfortunately, that function doesn’t always work fine because:

a) most servers either don’t have a browscap.ini file or have an outdated version

b) get_browser(); really slows down your application, architecture of parsing, caching, execution time, etc.

c) binding and setting of “browscap” directive in php.ini on shared-hosting systems and keeping the databaseup to date

Code example get_browser();

< ?php
echo $_SERVER['HTTP_USER_AGENT'] . "\n\n";
$browser = get_browser(null, true);
print_r($browser);
?>

Today we can find many different, standalone and of course improved classes of Browser Detection in PHP, so i want to introduce another one here:

phpbrowscap – Browscap PHP Project
Browscap is a standalone class for both PHP4 and PHP5 that gets around the limitations of get_browser()and manages the whole thing. It offers methods to update, cache, adapt and get details about every supplied user agent on a standalone basis.

Link: http://code.google.com/p/phpbrowscap/

Code example of Browscap:

< ?php
require('Browscap.php');
$bc = new Browscap('cache');
$bc->localFile = 'php_browscap.ini';
$data = $bc->getBrowser();
?>

to update your browscap.ini file automatically, which is also necessary here, have a look atphcomp.co.uk/Packages/UpdateBrowsCap.html or use a cURL cronjob to get a new ini-file.

2015年2月9日星期一

How to remove Android Apps through ADB

ADB is a toolkit provided by Google to access your Android device from your desktop. Before reading the steps below, we assume you know how to install a non-market application through ADB (if not, please see: How to Install Non-Market .Apk Application on an Android Phone).

The few reasons you may want to remove the apk via ADB are the following:

Removing the application via Manage Applications appears to not work perhaps because the developer packaged the application poorly.
Convenience: if you are already at the adb shell and have run into an installation problem because the application already exists.
Geekness: you wanna be just too cool.

The sequence of command lines will walk you through how to remove an app using ADB. We also warn that the following steps can harm your phone. Do not remove APKs flippantly.

Connect your phone to the computer and successfully log into the adb shell. (refer to theadb setup)
su (Login as root)
mount -o remount,rw -t rfs /dev/stl5 /system (the ‘rw’ allows you to modify the system directory.)
rm -r /system/app/[AppName].apk (or apps)
… You may repeat #4 for other apps.
mount -o remount,ro -t rfs /dev/stl5 /system (Set system to read only, ‘ro’, to prevent malicious activity.)

If you need to look up the AppName:

cd /system/app
ls *.apk (This will provide you with a list of APKs)

If you need to backup your app:

cat [AppName].apk > /sdcard/[AppName].apk (This will backup the APK specified to your SD Card. Do this if you are unsure if it is safe to remove.)

訂閱：意見 (Atom)

2015年2月27日 星期五