跳至主要内容

SQL查询优化记录: MYSQL 中 FULL OUTER JOIN 的处理


从 FULL OUTER JOIN 到 UNION 再到 IFNULL

数据库中有一个记录对特定资源访问的表,名为tbl_navigation_logs(结构如下)。

idint(11)NOPRIauto_increment
customerint(11)YESMUL
sessionvarchar(100)NO
ipaddressint(11)NO
timedatetimeNO
uidvarchar(512)NO
portalint(11)YESMUL
titleint(11)YESMUL
itembigint(20)YESMUL
channelint(11)NO

generated by HeidiSQL

在开发业务逻辑的时候遇到一个分支查询:根据访问者来统计对某个特定资源的访问次数。仔细分析一下,一共有两种可能的情况——若customer字段非空,则根据 DISTINCT(`customer`, DATE(`time`))进行访问次数统计,即同一天内同一用户的若干次访问仅算作1次有效,这适用于登录用户的访问统计。否则,就根据DISTINCT(`session`)来统计访问次数。就业务逻辑而言,就是分别将这两种情况都统计出来,最后做一个加法求和。


第一种直观的思路,就是分别将两种分支情况用select语句选择出来,再做一个合并。这里,FULL OUTER JOIN 是合适的。可惜开发环境所使用的MySQL 5.1数据库不支持FULL OUTER JOIN的实现,只能用一个拐弯的UNION来实现……

select _t.* FROM
(
SELECT t1.vdate as `visitDate`,
(IFNULL(t1.vcount,0) + IFNULL(t2.vcount,0)) as `accessCount` FROM
(
SELECT DATE(`time`) AS `vdate`,
COUNT(DISTINCT `customer`, DATE(`time`)) AS `vcount`
FROM `tbl_navigation_log`
WHERE `customer` is not null
AND DATE_SUB(CURDATE(), INTERVAL 2 MONTH) < `time`
AND `title` = 47
GROUP BY `vdate` ORDER BY `vdate`
) AS `t1`
LEFT JOIN
(
SELECT DATE(`time`) AS `vdate`,
COUNT(DISTINCT `session`) AS `vcount`
FROM `tbl_navigation_log`
WHERE `customer` is null
AND DATE_SUB(CURDATE(), INTERVAL 2 MONTH) < `time`
AND `title` = 47
GROUP BY `vdate` ORDER BY `vdate`
) AS `t2`
ON `t1`.vdate = `t2`.vdate

UNION ALL

SELECT t2.vdate as `visitDate`,
(IFNULL(t1.vcount,0) + IFNULL(t2.vcount,0)) as `accessCount` FROM
(
SELECT DATE(`time`) AS `vdate`,
COUNT(DISTINCT `customer`, DATE(`time`)) AS `vcount`
FROM `tbl_navigation_log`
WHERE `customer` is not null
AND DATE_SUB(CURDATE(), INTERVAL 2 MONTH) < `time`
AND `title` = 47
GROUP BY `vdate` ORDER BY `vdate`
) AS `t1`
RIGHT JOIN
(
SELECT DATE(`time`) AS `vdate`,
COUNT(DISTINCT `session`) AS `vcount`
FROM `tbl_navigation_log`
WHERE `customer` is null
AND DATE_SUB(CURDATE(), INTERVAL 2 MONTH) < `time`
AND `title` = 47
GROUP BY `vdate` ORDER BY `vdate`
) AS `t2`
ON `t1`.vdate = `t2`.vdate WHERE `t1`.vdate is null

) AS _t order by _t.visitDate;


后来经过同事提醒,找到了另一种解决办法,利用MySQL自带的函数IFNULL构造一个非空字符串,再用DISTINCT来基数。


SELECT DATE(`time`) AS `vdate`,
COUNT(DISTINCT IFNULL(CONCAT('', `customer`), CONCAT('', `session`)))
AS `vcount`
FROM `tbl_navigation_log`
WHERE
DATE_SUB(CURDATE(), INTERVAL 2 MONTH) < `time`
AND `title` = 47
GROUP BY `vdate` ORDER BY `vdate`;


第二种方法仅查询单表一次,从效率上来看比第一种方法明显快了不少,实际的运行结果显示,至少快三倍。

评论