Skip to content

MDEV-37316#5073

Draft
ParadoxV5 wants to merge 1 commit into
mainfrom
MDEV-37316
Draft

MDEV-37316#5073
ParadoxV5 wants to merge 1 commit into
mainfrom
MDEV-37316

Conversation

@ParadoxV5
Copy link
Copy Markdown
Contributor

Just a draft/preview that also serves as CI upload and backup; not final yet, not even close…


What problem is the patch trying to solve?
If some output changed that is not visible in a test case, what was it looking like before the change and how it's looking with this patch applied?
Do you think this patch might introduce side-effects in other parts of the server?
Anything from release notes?

To prepare for MDEV-37321, this commit moves them to a
`Remote_event_stream` class outside of `sql/`.
The `.cc` file will switch to Connector/C unbeknownst to `rpl_mi.cc`.

Only the following steps are reörganized to reusable instance methods:
* Connection setup and startup
* VIO management
  * Abort when the thread is killed
  * Closing
    (already covered by Connector/C, but not by the current `libmysql`)

The following steps are currently not öptimal for moving
(have verbose changesets), so they are only wrapped, not reörganized:
* steps that run queries *with copy-pasted code*
* sending commands `COM_REGISTER_SLAVE` & `COM_BINLOG_DUMP`,
  including serializing their arguments
  * Connector/C does not have a formal API for serializing
    command arguments, especially variable-length strings.
  * Connector/C has a Binlog API that duplicates this step, but that
    API is not modular for the Server’s full replication capabilities.
* error checking

These require refactors and ideally an evaluation for `mariadb-binlog`.

To enable this migration, this commit additionally:
* Reserves the management of the IO thread’s VIO to the client library
* Draft-specific: omits MDEV-28114’s kludge failure
  recovery for Semi-Sync graceful self-KILL since
  * MDEV-32551 fixed the root bug on the ACK receiver, and
  * MDEV-39583 plans to remove this entire sub-component.

Draft-specific: Until Connector/C exports the pre-decompress (network)
byte count, `@@read_binlog_speed_limit` will measure the post-decompress
count even in `@@slave_compressed_protocol` for now.
@ParadoxV5 ParadoxV5 added MariaDB Corporation Replication Patches involved in replication labels May 13, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Remote_event_stream class to encapsulate MariaDB Connector/C handles, refactoring the replication slave I/O thread and semi-sync logic to use this new abstraction instead of raw MYSQL pointers. Feedback highlights critical bugs, including a missing error check for cli_safe_read that could cause memory corruption, a buffer over-read vulnerability in semisync_ack, and a 1-byte offset error in read_event that breaks event parsing. Other recommendations include ensuring my_global.h is the first include, removing a redundant vio_close call, and using const char * in Connection_options for better ownership semantics.

Comment on lines +148 to +149
auto strlen= static_cast<size_t>(cli_safe_read(connector));
return {connector->net.read_pos, strlen};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The return value of cli_safe_read is not checked for errors. If it returns (unsigned long)-1 (the value of packet_error), strlen will be cast to a very large size_t, resulting in an invalid string_view. This will cause next().size() to be non-zero, bypassing error checks in the caller and potentially leading to memory corruption.

  unsigned long len= cli_safe_read(connector);
  if (unlikely(len == (unsigned long)-1))
    return {};
  return {connector->net.read_pos, static_cast<size_t>(len)};

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packet_error is 0 in libmysql.

Comment on lines +158 to +165
log_name.copy(&(payload[HEAD_SIZE]), sizeof(payload)-HEAD_SIZE);
NET *net= &connector->net;
//@deprecated: not required in Connector/C
net->pkt_nr_can_be_reset= true;
// Connector/C might not require resetting; better be safe until confirmed.
net_clear(net, false);
return my_net_write(net, reinterpret_cast<const unsigned char *>(payload),
HEAD_SIZE + (log_name.size()+1)) || net_flush(net);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

There is a buffer over-read risk in semisync_ack. The payload buffer has a fixed size based on FN_REFLEN, but my_net_write uses log_name.size() + 1 as the length without validation. If log_name.size() exceeds FN_REFLEN, my_net_write will read past the end of the payload array. Also, std::string_view::copy does not null-terminate.

  size_t copied= log_name.copy(&(payload[HEAD_SIZE]), sizeof(payload) - HEAD_SIZE - 1);
  payload[HEAD_SIZE + copied]= '\0';
  NET *net= &connector->net;
  //@deprecated: not required in Connector/C
  net->pkt_nr_can_be_reset= true;
  // Connector/C might not require resetting; better be safe until confirmed.
  net_clear(net, false);
  return my_net_write(net, reinterpret_cast<const unsigned char *>(payload),
    HEAD_SIZE + copied + 1) || net_flush(net);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If log_name.size() exceeds FN_REFLEN, my_net_write will read past the end of the payload array.

It will not de facto.

Also, std::string_view::copy does not null-terminate.

The array initialization does.

Comment thread sql/slave.cc
DBUG_RETURN(len - 1);
DBUG_PRINT("exit", ("len: %zu net->read_pos[4]: %d",
len, packet[4]));
DBUG_RETURN(packet);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The read_event function now returns the entire packet including the MySQL protocol status byte (at index 0). However, the rest of the replication logic in handle_slave_io expects event_buf and event_len to represent only the event data (excluding the status byte). Returning the full packet here causes a 1-byte offset error in event parsing.

  DBUG_RETURN(packet.substr(1));

Comment thread rpl/remote_event_stream.cc
Comment thread rpl/remote_event_stream.cc
Comment thread rpl/remote_event_stream.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

MariaDB Corporation Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

1 participant