title: mysql踩坑系列之查询空格 tags:
- mysql
- padspace
- like
- trailing categories: mysql date: 2017-08-11 18:18:53
业务背景
某个雷电交加上线夜的凌晨 zzz:
测试小伙伴突然反馈了问题,某个业务去重失败导致db中重复插入数据。
知识背景
mysql作为传说中的开源界RDBMS翘楚,在官方文档上如是描述
11.4.1 The CHAR and VARCHAR Types
The
CHAR
andVARCHAR
types are similar, but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained.The
CHAR
andVARCHAR
types are declared with a length that indicates the maximum number of characters you want to store. For example,CHAR(30)
can hold up to 30 characters.The length of a
CHAR
column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. WhenCHAR
values are stored, they are right-padded with spaces to the specified length. WhenCHAR
values are retrieved, trailing spaces are removed unless the SQL mode is enabled.Values in
VARCHAR
columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of aVARCHAR
is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. See .In contrast to
CHAR
,VARCHAR
values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.For
VARCHAR
columns, trailing spaces in excess of the column length are truncated prior to insertion and a warning is generated, regardless of the SQL mode in use. ForCHAR
columns, truncation of excess trailing spaces from inserted values is performed silently regardless of the SQL mode.
VARCHAR
values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.
这是啥意思?就是说mysql中默认情况下查询的时候都会去掉末尾的空格匹配,这个和sql mode没有关系。
OK 那我们来使用一下正常的mysql服务器做一下测试
SELECT 'a' = 'a', 'a' = ' a', 'a' = 'a ', 'a' = ' a ', 'a' = 'a ', 'ab' = 'a b', 'ab' = 'ab ', 'ab' = 'Ab ', 'a' LIKE 'a', 'a' LIKE 'a ', 'a' LIKE ' a', 'a' LIKE ' a ', 'a' LIKE 'A', BINARY 'ab' = 'aB ', 'ab' = 'aB '\G复制代码
*************************** 1. 行 *************************** 'a' = 'a' : 1 'a' = ' a' : 0 'a' = 'a ' : 1 'a' = ' a ' : 0 'a' = 'a ' : 1 'ab' = 'a b' : 0 'ab' = 'ab ' : 1 'ab' = 'Ab ' : 1 'a' LIKE 'a' : 1 'a' LIKE 'a ' : 0 'a' LIKE ' a' : 0 'a' LIKE ' a ' : 0 'a' LIKE 'A' : 1 BINARY 'ab' = 'aB ': 0 'ab' = 'aB ' : 1 1 行于数据集 (0.01 秒) 复制代码
也就是说mysql在varchar char text的字段中匹配会忽略大小写和末尾空格
tips 这个知识点其实也知道,没有引起重视
踩坑原因
在做数据匹配时可能出现如下几种情况
- 原先db中存在名称末尾有空格和名称末尾有空格数据各一条
- 原先db中存在名称末尾无空格一条
- 原先db中存在末尾名称有空格一条
无论从sql中作为参数的有否空格我们都可以认为是没有空格(默认情况下等价)
那么第一个情况查询可能查出两条数据后面可能只能查出单条数据
所以某些情况利用一些方案做的去重可能存在问题,需要重视
解决方案
某些重复敏感的数据可以
- 去空格,对于db中已存在的无能为力。。。 空格到处有,哪能除的尽?
- 直接使用like 以前使用like都是做匹配的,一般要求做前缀匹配,比如通配符不允许出现最左侧等等,现在对于varchar char 等类型字段使用like直接匹配效果更逼真!
参考