Skip to content

Conversation

@kartalbas
Copy link
Contributor

Summary

This PR adds support for getting column names when using MySQL 5.7, which doesn't include column names in the binary log by default.

What Changed

  • Added new option use_column_name_cache to enable column name fetching
  • Fetches column names from INFORMATION_SCHEMA.COLUMNS when binlog metadata is missing
  • Uses in-memory caching to avoid repeated database queries
  • Only runs when explicitly enabled (opt-in feature)

Why This Is Needed

  • MySQL 5.7 does not support binlog_row_metadata=FULL setting
  • Without this feature, column names are unavailable in binlog events
  • Many legacy applications still run on MySQL 5.7 and cannot upgrade due to operational constraints, infrastructure dependencies, or business requirements
  • This feature enables these applications to continue using mysql-replication without requiring a database upgrade
  • Applications need column names to properly process row change events

How It Works

  • When use_column_name_cache=True is set, the library queries INFORMATION_SCHEMA for column names
  • Results are cached to improve performance
  • Falls back gracefully if the query fails
  • Default behavior remains unchanged (feature is disabled by default)

Testing

Tested with MySQL 5.7 databases where binlog metadata is not available.

@julien-duponchelle
Copy link
Owner

Sorry you have a failure in the test :
pkt = <pymysql.protocol.MysqlPacket object at 0x00007f9b9fdcd2f0>

def create_binlog_packet_wrapper(pkt):
    return BinLogPacketWrapper(
        pkt,
        self.stream.table_map,
        self.stream._ctl_connection,
        self.stream.mysql_version,
        self.stream._BinLogStreamReader__use_checksum,
        self.stream._BinLogStreamReader__allowed_events_in_packet,
        self.stream._BinLogStreamReader__only_tables,
        self.stream._BinLogStreamReader__ignored_tables,
        self.stream._BinLogStreamReader__only_schemas,
        self.stream._BinLogStreamReader__ignored_schemas,
        self.stream._BinLogStreamReader__freeze_schema,
        self.stream._BinLogStreamReader__ignore_decode_errors,
        self.stream._BinLogStreamReader__verify_checksum,
      self.stream._BinLogStreamReader__optional_meta_data,
    )

E TypeError: init() missing 1 required positional argument: 'enable_logging'

test_basic.py:621: TypeError

(I know it's not easy to run the test :( )

@kartalbas
Copy link
Contributor Author

thank you, is fixed - local it failes with "FAILED pymysqlreplication/tests/test_basic.py::TestBasicBinLogStreamReader::test_event_validation - pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'localhost' ([WinError 10061] No connection could be made because the target machine actively refused it)")" because in this machine I dont have amysql but the required fix is in the commit - hope it works now

@kartalbas
Copy link
Contributor Author

can you please try again to merge? Im still using my branch in my fork

kartalbas and others added 2 commits November 16, 2025 12:07
- Query INFORMATION_SCHEMA for column names when not in binlog
- Module-level cache to prevent repeated queries
- Opt-in via use_column_name_cache parameter
- Handle both dict and tuple cursor types
- Backward compatible (disabled by default)
@julien-duponchelle julien-duponchelle force-pushed the feature/mysql57-column-name-cache branch from 242dfaa to 0cce0cf Compare November 16, 2025 11:08
@julien-duponchelle julien-duponchelle merged commit a94c6c2 into julien-duponchelle:main Nov 16, 2025
@julien-duponchelle
Copy link
Owner

I'm doing a release 1.0.11 with it

@julien-duponchelle
Copy link
Owner

Thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants