This time we will talk about how to build a simple query for selecting data in a MySQL database, of which charset was set to utf8_general_ci but we need to distinguish between data with, lets say, accents and data without accents (or others UTF-8 characters, like German umlauts), and going a little further, if possible that not distinguish between words that begin with uppercase or lowercase but have accents. Let’s see how we could implement it.
Defining our problem
Let us explain a little better what is our problem. We have a MySQL database where we have a table, for example, “users”, with several fields: id, name, etc. This database was created with utf8_general_ci charset. But what does this mean? Obviously, “utf8” stands for the charset that our database will manage. In second place we have “general”, which indicates that we are not going to a specific set of UTF-8 characters for any specific language (we could use “utf8_spanish_ci” for the character set of the Spanish language, so we couldn not use Chineese characters in that case), but generically UTF-8. Finally, “ci” means “Case Insensitive”, so that in our searches, the MySQL engine will not distinguish between “UPPERCASE”, “Uppercase” or “uppercase”. What is not so well known is that being Case Insensitive implies no distinction between Spanish accents or German umlauts, so that it does not distinguish between “ESPAÑA” or “España”.
Read More »How to distinguish uppercase, lowercase and UTF-8 characters in MySQL