-
Notifications
You must be signed in to change notification settings - Fork 843
Newsletter debug - add initial debug info on publish. #46068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.
Interested in more tips and information?
|
|
Thank you for your PR! When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:
This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖 Follow this PR Review Process:
If you have questions about anything, reach out in #jetpack-developers for guidance! Jetpack plugin: The Jetpack plugin has different release cadences depending on the platform:
If you have any questions about the release process, please ask in the #jetpack-releases channel on Slack. |
Code Coverage SummaryCannot generate coverage summary while tests are failing. 🤐 Please fix the tests, or re-run the Code coverage job if it was something being flaky. |
|
|
||
| while ( true ) { | ||
| $api_path = sprintf( | ||
| '/sites/%d/subscribers/?page=%d&per_page=%d&filter=email_subscriber', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're fetching all subscribers, do we need that extra email_subscriber filter? If we do, do you think you could explain why that's needed in the docblock for the method?
Since we already have other functions used to fetch subscribers in the codebase, I think that if we add a new one we need to be extra clear what it does, and how it differs from the others.
Alternatively, maybe that should be a parameter that could be passed to the method, this way this method can become the one method we use for everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're fetching all subscribers, do we need that extra email_subscriber filter? If we do, do you think you could explain why that's needed in the docblock for the method?
I think this limits it to only subscribers that would be sent emails, excluding people who are only subscribed via the reader with no emails. So we don't NEED to limit it here, but it didn't seem like the others would be very useful for our purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated. The fetch call has been moved to a helper class, is generalized and takes extra query params as an arg, and this method has been renamed to note email subscribers only while passing in that filter param.
| $page = 1; | ||
| $per_page = 100; // Maximum per page to minimize requests. | ||
|
|
||
| while ( true ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're potentially making multiple calls to WordPress.com to get all that data, I think this should be saved in a transient, to save any too frequent calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I generally agree we should limit calls, how long do we think that transient should last?
If we limit running this to posts being published, and return early if the meta is already set, then I think any extra calls that would be made would be triggered by a new separate post being published? (Or is there another means I am not thinking of?). If in that time between 2 posts being published the subscriber list changed, that transient would then also have us adding stale data on the meta that might not be very helpful for the debugger. If we are setting the timer for the transient much shorter to avoid that situation, then its unlikely to have any effect but also seems completely fine/appropriate to add it as a safety.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some transients, they are currently set at 5 minutes.
| 'email_subscribers' => isset( $subscriber_data['email_subscribers'] ) ? (int) $subscriber_data['email_subscribers'] : 0, | ||
| 'paid_subscribers' => isset( $subscriber_data['paid_subscribers'] ) ? (int) $subscriber_data['paid_subscribers'] : 0, | ||
| 'all_subscribers' => isset( $subscriber_data['all_subscribers'] ) ? (int) $subscriber_data['all_subscribers'] : 0, | ||
| 'subscriber_list' => isset( $subscriber_data['subscriber_list'] ) && is_array( $subscriber_data['subscriber_list'] ) ? $subscriber_data['subscriber_list'] : array(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a lot of subscribers, that will make for a huge amount of post meta, saved for each post. That seems like it would be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a main point of concern. With most newsletters there should be no issue, and leaving this out of rest and api responses is helpful, but once we start getting into extremes of tens to hundred of thousands of subscribers it starts becoming a bit more concerning.
Two possible options:
- We choose to support this for sites with up to X subscribers -- either not adding the subscriber list at all or noting it truncated/incomplete once it reaches the cutoff.
- Scrap the idea of adding this post meta. For debugging we could use the live subscribers list on the dashboard and filter out any subscribers that have joined since after the publish date. That does add more limitation to the debugger idea tho as any subscribers that unsubscribed, unsubbed & resubbed, or changed their paid tier status since the original publish wouldn't be reflected correctly in the tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with option 1 in this PR - we don't try to make the requests or add the list if there are more than so many (I started with 2000) subscribers.
Even if large amounts fit/work in the post_meta well, I wonder about the amount of requests we have to make to get the list. At 100 per_page we would be looping over 20 requests to get the 2000, if we wanted to support more that could be like 100 requests to support 10,000.
This does make me start to wonder if this is too much from this side, and if we should be adding the post_meta from the wpcom side. In retrospect it might make more sense to go that way, I asked about any technical concerns regarding that below - #46068 (comment)
| // First, get subscriber counts from stats endpoint. | ||
| $stats_path = sprintf( '/sites/%d/subscribers/stats', $site_id ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have logic to fetch from that endpoint in
Lines 58 to 67 in b84f02d
| if ( Jetpack::is_connection_ready() ) { | |
| $site_id = Jetpack_Options::get_option( 'id' ); | |
| $api_path = sprintf( '/sites/%d/subscribers/stats', $site_id ); | |
| $response = Client::wpcom_json_api_request_as_blog( | |
| $api_path, | |
| '2', | |
| array(), | |
| null, | |
| 'wpcom' | |
| ); |
Maybe we can take the opportunity to consolidate things into one central method? That would be helpful I think, given that we already fetch subscribers in multiple places in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! I added a Jetpack_Subscriptions_Helper class with fetch_subscriber_stats and fetch_subscribers methods and transients.
|
|
||
| add_filter( 'jetpack_published_post_flags', array( $this, 'set_post_flags' ), 10, 2 ); | ||
|
|
||
| add_action( 'jetpack_published_post', array( $this, 'store_subscribers_when_sent' ), 10, 3 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Automattic/jetpack-vulcan I would have a question about this. Do we need to sync the new _jetpack_newsletter_subscribers_when_sent post meta when we do this, to ensure we have access to that data on WordPress.com (we will be using that post meta in a page on WordPress.com).
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - currently with this PR checked out and after creating a new post with Newsletter enabled, _jetpack_newsletter_subscribers_when_sent exists as a meta key for the post on the remote site, but not on the cache site.
If the post meta should be available on WordPress.com, it would need to be whitelisted
(in the Sync package and then in WPcom in the jetpack mu-plugin, in $post_meta_whitelist within sync/class.jetpack-sync-defaults.php)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately it looks like a Sync test needs updating for this PR - test_sends_publish_post_action. The failing assertion is due to another event being queued after the publish hook, so the most recent event isn’t jetpack_published_post anymore (but updated_option). Instead, it looks like we should assert that an event with action jetpack_published_post exists for the post ID, so we're testing for intent but not ordering.
@Addison-Stavlo I'd be happy to work on that in a separate PR so it passes for changes made here if it would be helpful, since it's a change that looks like it should happen anyway.
Edit to add - I've started on a PR for the Sync test here: #46105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Sync test PR is merged now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @coder-karen, that is all very helpful!
If the post meta should be available on WordPress.com, it would need to be whitelisted
(in the Sync package and then in WPcom in the jetpack mu-plugin, in $post_meta_whitelist within sync/class.jetpack-sync-defaults.php)
I am thinking that some potential post_meta items might make more sense to set on the wpcom side. Would I still set up the meta key in these same whitelists, or something different in that case? Im curious if there are other concerns here, as I remember some teams have had issues with setting on wpcom and syncing back in the past (but I think that was mostly for blog stickers at the time, I'm not sure about post_meta). If I add post_meta from the wpcom side would I expect it to sync back properly in the future? If I were to set some part of the new post meta from the jetpack side and a separate part from wpcom, might that create issues if I do not separate them into different post meta keys altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @coder-karen !
Sync is only one-way though, so if you plan on setting data on the WPcom side, then Sync isn't the right tool to make sure that is updated at the remote site too.
Fair! We will have to consider our options here. For our immediate purpose the main req is having the info on the wpcom side. However, future project shapings make it look possible that we may want it on the remote side in the future, but there could be other solutions for those goals as well.
? So if I set post_meta on the wpcom side, will it persist there even if that post is later sync'd or re-sync'd? I assume we would have to avoid enabling sync for that specific post meta key in that case to prevent anything from being overwritten, but would the sync still wipe/clear post meta keys on wpcom that aren't enabled for sync at all? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side Q - when a post is published, is it sync'd over almost immediately? Im wondering if its a fair expectation for the cache version to be there to set post_meta on that quickly after the post is published.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when a post is published, is it sync'd over almost immediately?
Generally, yes. That is, unless a site has a very high amount of site activity resulting in a processing lag, then that published post would join the queue. This generally should only happen for high levels of activity though.
So if I set post_meta on the wpcom side, will it persist there even if that post is later sync'd or re-sync'd
If a post has post meta added on the cache site that doesn't exist on the remote site, the problem that I see is checksums. The checksum process compares the remote and cache databases, and if they don't match then the cache site is modified to match the remote site. There is some more info on checksums here: P9dueE-2Dz-p2.
I'm not sure if there is a way around it, and if there is whether that would be good practice. @fgiannar would you be able to confirm this, in case I'm missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. So to recap and clarify my goals and status:
- I want to store some information about individual subscribers when posts are published/emails are sent.
- From Jetpack, I need to request subscriber info from wpcom to get the needed info. For sites with many subscribers that could add up to a lot of paginated requests just to get that info. So it doesn't really feel right to do this from Jetpack.
- I can set it on the wpcom side when emails are sent, but as you mention there would be other issues with syncing post meta with checksums, etc.
So its starting to sound like post meta may not be the right thing to use here for this part. We may either need to find another way to store that info, or make some smaller compromises with what we will be willing to support for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there!
I can set it on the wpcom side when emails are sent, but as you mention there would be other issues with syncing post meta with checksums, etc.
I can confirm that WPCOM-only post meta ARE NOT removed during Fix checksums or Full Sync so feel free to store these on the WPCOM side when emails are sent, as you suggested :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking @jeherve !
I think I addressed most of your feedback here:
- Extracted requests to a shared helper class
- added transients
- limited fetching/adding the meta if subscribers count is over a certain amount
- added the meta slug to sync
And responded inline to some of the comment discussions you created above. The main thing on my mind currently is this though. In retrospect it might make more sense to add some of this from the wpcom side instead 🤔
It seems like something we should be able to do, based on this discussion: |
Yep, we can explore elsewhere and hold on this here. We could potentially move this forward with just the stats numbers on the new post meta as a data point to reference. I am unsure how helpful it would actually be, but could see it possibly being useful in some cases. Either way, pausing on this for now. |
|
@jeherve I updated this PR to:
|
|
It seems i created more sync test failures somehow 🤔 I will need to look at these later. Im not quite sure what is going on, these seemed to be passing previously and I haven't made many changes how it is setup. |
|
With the sync tests, they're all still related to the new action. Also not sure why they didn't show up earlier, but I figured creating a PR based off this one with the tests might make more sense, to catch anything else. Created one at #46236 |
fgiannar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, many thanks for continuing to work on this PR!
I'm afraid that we are introducing a major performance bottleneck for Sync by making requests to WPCOM every time a post is published.
I was under the impression we agreed that _jetpack_newsletter_initial_debug_info would be instead set only on WPCOM when we process jetpack_published_post actions.
If that's the case, we don't need to add it in Sync's whitelist btw :)
|
Hi @fgiannar, thanks for your concern on the bottleneck. I would obviously like to clean that up as much as possible. Il recap thoughts since we last discussed:
For me, this was specifically concerning the requests to get any individual subscriber info, as they are limited in number per request and would even require pagination to get all the info we wanted for many sites. That seems like a huge bottleneck and is part of why I cancelled that idea and set this PR aside. In the meantime, we received requests for other debugging info to be set at the time of publish, and most of them seemed like they would make sense to set on the site side and enable with sync. So I thought I would revive this PR by adding those and removing the ridiculous paginated requests.
I could be missing another request somewhere, but I think the main one you refer to at this point must be requesting subscriber stats? I think everything else is just checking post_meta and options on the site. That request for subscriber stats is also now setup with a transient and shared with the dashboard widget that also requests it. I had assumed this request was much more reasonable than individual subscribers with pagination, but if this itself is causing a bottleneck, I think it would be fine to compromise and remove the 'stats' section of this debug info here and we can find other ways to handle that as. I think all the other info on the meta thats grabbed from options and other post meta at the time would still be very valuable from here though. wdyt? |
|
Also big thanks @coder-karen on all your help with the tests! |
|
Thanks for following up @Addison-Stavlo !
Yes, it's still an issue for Sync even if it's optimized to use a transient. In high traffic sites with a lot of editorial activity even fetching options and meta can be an issue when it comes to actions like publishing a post. Would you mind sharing here the options and meta you are interested in fetching from the remote site and are not available on WPCOM? I'd like to double check in case we already have this data available. For example, we already set the I wonder if we could use a similar approach to set some extra flags with the debugging info we need but I'd like to ensure we don't already have this info on WPCOM :) |
|
Ah, thanks again @fgiannar
👍 We can definitely remove that from this PR.
A couple first thoughts on this:
I wasn't trying to set things here specifically because they were not available on wpcom, but more because there are potential future applications where it could be beneficial to have some of this info saved on the site side and shared with standard sync behaviors - and also simply because it seemed more fitting to set meta for post/site values closer to the site. If the post_meta and option checks are blockers here, then we can compromise to work from the wpcom side instead - I just did not anticipate those being as concerning as they are. The wpcom side should work assuming all the info we need is sync'd over by the time first immediate delivery emails are sent, and that the meta set there would persist (which you already mentioned it would). The data I am looking to save on meta once (and not ever be overwritten once initially saved) is:
I think that exhausts what I am currently looking for here. If all of that is on the wpcom side at the time the immediate emails are sent I can probably set it all there if it doesn't yet exist for the post (hopefully in a way that handles both simple and jetpack together). Perhaps some of these may still need to be added to sync or to the flags that are sync'd in order to have everything on that side. So at this point, it sounds like even just getting these post_meta and option values with no wpcom requests is too much to do on publish here? If thats the case then next steps are looking to do all of these from the wpcom side, and adding extra handling to sync over whichever of these is not yet available there. |
|
Many thanks for providing the list of debug data we are interested in! I audited all of them and it looks like we already sync them:
We already sync this to WPCOM
This is already synced via the
This is already in the list of options we sync to WPCOM.
The result of
We already sync this to WPCOM
Here's a suggested flow: |
|
@fgiannar thanks so much for taking the time, thought, and suggested direction! That sounds like a reasonable way to move forward. I will look into that shortly. 😄 |
Fixes # https://linear.app/a8c/issue/NL-300/newsletter-logs-should-indicate-whether-a-post-was-published-as-post and https://linear.app/a8c/issue/NL-95/newsletters-add-post-meta-for-subscribers-list-at-time-of-first
Proposed changes:
_jetpack_newsletter_initial_debug_infometa when a post is published, populated with information about the subscriber list at the time (timestamp, numbers for different types of subscribers (email, paid, all)), whether the setting to disable emails is checked (email_to_subs_disabled), whether a larger check for sending emails passes/fails (will_send_to_subscribers), access level, whether categories are enabled, and categories set on the post (segmented by Newsletter categories and non-newsletter categories.Other information:
Jetpack product discussion
Does this pull request change what data or activity we track or use?
Testing instructions:
_jetpack_newsletter_initial_debug_infofor the post that was published.