Skip to content

Conversation

@rolandwalker
Copy link
Contributor

@rolandwalker rolandwalker commented Jan 7, 2021

Description

xref #915

The utf8 character set in current MySQL versions is not actually standards-compliant. The standards-compliant UTF-8 character set is spelled utf8mb4, and that should be mycli's default.

WIP because this should be researched/tested for MariaDB and Percona.
Edit: tested on MariaDB. Researched for Percona: pages such as https://www.percona.com/blog/2018/04/10/migrating-database-charsets-to-utf8mb4/ have no suggestion of incompatibility.

Checklist

  • I've added this contribution to the changelog.md.
  • I've added my name to the AUTHORS file (or it's already there).

@rolandwalker rolandwalker changed the title WIP: Default to standards-compliant utf8mb4 character set Default to standards-compliant utf8mb4 character set Jan 8, 2021
@rolandwalker rolandwalker requested a review from amjith January 8, 2021 14:03
@gfrlv
Copy link
Member

gfrlv commented Jan 10, 2021

I think this default should depend on the server version: mysql < 5.5 cannot handle utf8mb4.

@rolandwalker
Copy link
Contributor Author

@pasenor excellent point.

@rolandwalker rolandwalker changed the title Default to standards-compliant utf8mb4 character set WIP Default to standards-compliant utf8mb4 character set Jan 11, 2021
@rolandwalker
Copy link
Contributor Author

Wait, how would that work? We set the charset before we make the connection.

@gfrlv
Copy link
Member

gfrlv commented Jan 11, 2021

Oops, indeed, that's messy. But we should be able call set_charset() on the PyMySQL connection object inside our SQLExecute once know the version

@dveeden
Copy link
Contributor

dveeden commented Oct 27, 2022

Note that utf8 is considered an alias for utf8mb3 and both MySQL and MariaDB are actively doing work to eventually change the alias to map to utf8mb4.

Important Change: A previous change renamed character sets having deprecated names prefixed with utf8_ to use utf8mb3_ instead. In this release, we rename the utf8_ collations as well, using the utf8mb3_ prefix; this is to make the collation names consistent with those of the character sets, not to rely any longer on the deprecated collation names, and to clarify the distinction between utf8mb3 and utf8mb4. The names using the utf8mb3_ prefix are now used exclusively for these collations in the output of SHOW statements such as SHOW CREATE TABLE, as well as in the values displayed in the columns of Information Schema tables including the COLLATIONS and COLUMNS tables. (Bug #33787300)

The utf8mb4 character set was introduced in MySQL 5.5.4, this was not a G.A. release (5.5.8 was the first G.A. release). So utf8mb4 should be used for MySQL 5.5 and newer.

@amjith
Copy link
Member

amjith commented Apr 20, 2023

@rolandwalker Is this PR still valid? Should this be merged?

@rolandwalker
Copy link
Contributor Author

@amjith yes we should do something about it. Will review.

@j-bennet j-bennet marked this pull request as draft October 16, 2023 06:13
@rolandwalker rolandwalker force-pushed the RW/utf8mb4-default-charset branch from 02d03ef to ecf6826 Compare January 21, 2026 16:00
@rolandwalker rolandwalker marked this pull request as ready for review January 21, 2026 16:04
@rolandwalker rolandwalker self-assigned this Jan 21, 2026
@rolandwalker rolandwalker changed the title WIP Default to standards-compliant utf8mb4 character set Default to standards-compliant utf8mb4 character set Jan 21, 2026
@rolandwalker
Copy link
Contributor Author

Reviving this old PR!

The concern earlier was mostly that MySQL 5.5 didn't support the utf8mb4 charset. But 5.5 was released in 2010. At this point we can set the standards-compliant utf8mb4 charset and document how to downgrade it for any users of 5.5 or earlier.

The PR has also been updated to add a default_character_set to ~/.myclirc. In general, we should find a path to move away from supporting ~/.my.cnf on top of ~/.myclirc.

 * default to standards-compliant utf8mn4 character set
 * create a default_character_set key in ~/.myclirc which overrides
   any setting in ~/.my.cnf (previously the only way to set a default)
 * document how to connect to ancient versions of MySQL which lack
   this character set
@rolandwalker rolandwalker force-pushed the RW/utf8mb4-default-charset branch from ecf6826 to 592b84d Compare January 21, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants