Skip to content

Conversation

@somiaj
Copy link
Contributor

@somiaj somiaj commented Dec 27, 2025

Apply the spacing between paragraphs added by the Par block in in HTML to the previous block instead of inserting an empty div with a margin. This adds the margin from the Par block to the previous Indent div (including adding a div around an indent of zero), list, or list item block.

Based on the code from @dpvc in #1355.

Unsure if this should be extended to any other blocks.

@somiaj somiaj force-pushed the pgml-add-par-to-prev-block branch 2 times, most recently from 8a1ff45 to 5743412 Compare December 27, 2025 03:13
Comment on lines 1274 to 1276
for my $i (0 .. $n - 1) {
my $item = $stack->[$i];
my $next_par = $i + 1 < $n && $stack->[ $i + 1 ]{type} eq 'par' ? $stack->[ $i + 1 ] : undef;
Copy link
Member

@drgrice1 drgrice1 Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to

Suggested change
for my $i (0 .. $n - 1) {
my $item = $stack->[$i];
my $next_par = $i + 1 < $n && $stack->[ $i + 1 ]{type} eq 'par' ? $stack->[ $i + 1 ] : undef;
for my $i (0 .. $#$stack) {
my $item = $stack->[$i];
my $next_par = $i < $#$stack && $stack->[ $i + 1 ]{type} eq 'par' ? $stack->[ $i + 1 ] : undef;

and delete my $n = scalar(@$stack); above (line 1271). Perl is optimized for this and stores this number as a property of the array. So there is no reason to save the array length to yet another location. Also, perl offers the $# syntax for getting the index of the last entry in an array, so use that rather than adding or subtracting one from the array length.

@somiaj somiaj force-pushed the pgml-add-par-to-prev-block branch from 5743412 to fc008f1 Compare December 27, 2025 15:27
Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

I think that the essential thing is that the divs that were added between list items be fixed, since that is invalid HTML. Adding the style to other block items (such as the lists themself) instead of injecting an empty div after it is not particularly important since that is at least valid HTML. I am not entirely certain what benefit that provides either.

@somiaj
Copy link
Contributor Author

somiaj commented Dec 27, 2025

I am unsure on the benefit as well, but I do feel the empty divs for spacing is not ideal. I was also wondering if there were any accessibility issues with not using <p> tags or grouping text. But to me the PGML output is doesn't feel correct.

Paragraph one.
<div style="margin-top:1em"></div>
Paragraph two.
<div style="margin-top:1em"></div>
Paragraph three.

And this just feels better to me. But I couldn't figure out a way to do this, and assume that if the PGML parser was setup for this, <p> tags would have just been used in the first place.

<div style="margin-bottom:1em">
Paragraph one.
</div>
<div style="margin-bottom:1em">
Paragraph two.
</div>
<div style="margin:0">
Paragraph three.
</div>

@drgrice1
Copy link
Member

There is nothing wrong with using an empty div with a margin for spacing. Your second example does not add anything semantically. Furthermore, attempting to wrap the paragraph could cause structural problems, because it could result in something the author added not being terminated properly. Simply put, the empty div tag is less likely to cause problems, and works fine.

Using <p> tags really wouldn't work. Not only does that have the same problems as attempting to wrap with a div tag, but the content might have something in it that is not valid inside a <p> tag. There are way to many things that are not valid in a <p> tag. Even answers are not valid in a <p> tag anymore.

@dpvc
Copy link
Member

dpvc commented Dec 30, 2025

I agree with @drgrice1 that removing the <div> elements in general is not necessary. In PGML, the empty line acts as a separator, not a container, and that is why it is implemented this way.

As for the implementation itself, I had considered something like this in my suggestion, but because this is in the base PGML::Format class (not the HTML subclass), I thought it best to be as general as possible. That is why I passed the block and index rather than the next block as you are doing.

In fact, I had considered something more general yet. Were I writing this code today (rather than something like 20 years ago), I would replace the whole for ($item->{type}) {/.../ && do {...; last} ... } construct with something like

my $method = ucfirst($item->{type});
if (Value::can($self, $method)) {
  $string = $self->$method($item, $block, $i);
} else {
  PGML::Warning "Warning: unknown block type '$item->{type}' in " . ref($self) . "::format\n";
}

In fact, I might replace the main loop by

my $string, $di;
for (my $i = 0; $i < $#$stack; $i += 1 + ($di || 0)) {
  my $item = $self->{item} = $stack->[$i];
  $self->{nl} = (!defined($strings[-1]) || $strings[-1] =~ m/\n$/ ? "" : "\n");
  my $method = ucfirst($item->{type});
  if (Value::can($self, $method)) {
    ($string, $di) = $self->$method($item, $block, $i);
    push(@strings, $string) unless !defined $string || $string eq '';
  } else {
    PGML::Warning "Warning: unknown block type '$item->{type}' in " . ref($self) . "::format\n";
  }
}

Then the various methods could check the stack in any way they wanted to (not just know about a following par block), and items like bullet could test for the par and tell the loop to skip it by returning ($string, 1) rather than just $string. Then you don't have to modify the par item by adding a new property, and the Par method doesn't have to be changed. The other methods don't have to be changed, as $di will end up being undefined for them, and ($di || 0) will be 0 in that case.

This makes all the item implementations get the same data and work the same way. It allows the creation of new item types without having to modify the main loop. And it allows the item implementations to deal with the block's stack in whatever way they need to. Finally, it avoids all the pattern matching done and the awkward do {... ; last} constructions in the original loop.

Anyway, this would mean that you don't have to keep adding extra parameters to specific item implementations in the future, and all items are implemented in the same way.

@somiaj
Copy link
Contributor Author

somiaj commented Dec 30, 2025

@dpvc Thanks. I am looking at implementing what you stated, but unsure how to modify the Quote methods. That method is sent the previous string, not the full block. It appears it is doing a test to determine if it needs to open or close the quote.

@dpvc
Copy link
Member

dpvc commented Dec 30, 2025

Sorry, I missed the string being sent to the Quote method. You could add that to the $self->$method(...) call arguments. Alternatively, you could create a $state variable that holds any data needed to be passed to the item implementations. Something like

my $state = {
  block => $block,
  strings => [],
  i => 0,
};
while ($state->{i} < $#$stack) {
  my $item = $self->{item} = $stack->[$state->{i}++];
  $self->{nl} = (!defined($state->{strings}[-1]) || $state->{strings}[-1] =~ m/\n$/ ? "" : "\n");
  my $method = ucfirst($item->{type});
  if (Value::can($self, $method)) {
    my $string = $self->$method($item, $state);
    push(@{$state->{strings}}, $string) unless !defined $string || $string eq '';
  } else {
    PGML::Warning "Warning: unknown block type '$item->{type}' in " . ref($self) . "::format\n";
  }
}

Then Quote() can look in the $state->{strings} array, and Bullet() can use $state->{block} and increment $state->{i} if it needs to. Note that here $state->{i} will be pointing to the next item in the stack, as it is post-incremented when getting $item, so Bullet() can use $state->{i} rather than $state->{i} + 1.

Of course, this requires changes to the implementation functions that previously used $block and $i as arguments, but there were only a couple of those.

@somiaj
Copy link
Contributor Author

somiaj commented Dec 30, 2025

Implemented the suggested changes to the main PGML::format::string loop as suggested by @dvpc. One difference is I am treating the $block and $state as two different variables passed to all methods, this way no update is needed to the existing methods that just use the $block.

@somiaj somiaj requested a review from drgrice1 December 30, 2025 18:42
@somiaj somiaj force-pushed the pgml-add-par-to-prev-block branch from 92ac3ed to 094666c Compare December 30, 2025 18:44
@somiaj
Copy link
Contributor Author

somiaj commented Dec 30, 2025

Seems my change missed adding a new line character that makes one of the unit tests fails. Trying to track that down unless someone else sees where a newline wasn't added correctly.

my $self = shift;
my $item = shift;
return $self->nl . '<div style="margin-top:1em"></div>' . "\n";
return '<div style="margin-top:1em"></div>' . "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test is failing because of the removal of $self->nl . here. This removal seems to be an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I should add that for the PGML::Format::html package, almost all of these new lines should be removed. I think that at some point there was an attempt to make the resulting output more human readable by adding these newlines. However, that also results in larger output that is sent. That is not desirable, and instead minimized html code should be the goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yea it was an error, in trying to revert the Par function I missed that.

I could go remove some of those new lines if you would like, though maybe that should be a new PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, lets leave that for another pull request.

@somiaj somiaj force-pushed the pgml-add-par-to-prev-block branch from 094666c to a013860 Compare December 30, 2025 19:21
Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are fixing HTML validation issues here, there is one more that I have observed. On line 1638 (in this pull request at least), there is a span tag that contains an hr tag. It is invalid for a span to contain an hr. That can be easily fixed by changing that span to a div. That will not affect the resulting display, and is then valid HTML.

You could add that to this pull request if you like.

Apply the spacing between paragraphs added by the Par block in
in HTML to the previous block instead of inserting an empty div
with a margin. This adds the margin from the Par block to the
previous Indent div, list, or list item block.

Based on the code from @dpvc in openwebwork#1355.
As suggested by @dvpc, refactor the PGML::format::string loop to
call the appropriate method if it exists. Also all methods now have
access to the block and current state of the loop to use as needed.
@somiaj somiaj force-pushed the pgml-add-par-to-prev-block branch from e299780 to 154e463 Compare December 31, 2025 02:20
This was never used anywhere else (that I could find), so no
need to store the current item in the PGML::format::string loop.
Copy link
Member

@dpvc dpvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this as is, but regarding passing $block, I think it makes more sense to put that into the state and modify the few functions that use that. I almost put $item into that as well, but didn't want you to have to edit every function. But there are only a couple that use $block.

/tag/ && do { $string = $self->Tag($item); last };
my ($self, $block) = @_;
my $stack = $block->{stack};
my $state = { strings => [], i => 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On looking at this, with all the $state->{strings}[-1] references I'm wondering if it wouldn't be better to do

my @strings;
my $state = { strings => \@strings, i => 0 };

so that 1275 can go back to its original form, as can 1283 and 1284, and 1278 can use @strings.

. "</div>\n";
my $em = 2.25 * $item->{indent};
my $bottom = '0';
if (defined $block->{stack}[ $state->{i} ] && $block->{stack}[ $state->{i} ]{type} eq 'par') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to do

my $next = $block->{stack}[ $state->{i} ];
if (defined $next && $next->type eq 'par') {

to reduce the number of dereferences that need to be performed. Similarly for lines 1544 and 1558.

$bottom = '1em';
$state->{i}++;
}
return $self->nl . "<div style=\"margin: 0 0 $bottom ${em}em;\">\n" . $self->string($item) . $self->nl . "</div>\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really are trying to save individual characters being part of the HTML output, then remove the semi-colon after '${em}em' and the \n at the end, here.

my $list = $bullet{ $item->{bullet} };
my $margin = '0';
if (defined $block->{stack}[ $state->{i} ] && $block->{stack}[ $state->{i} ]{type} eq 'par') {
$margin = '0 0 1em 0';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if you want to reduce characters, leave off the final 0, which is implied by the second 0.

@somiaj
Copy link
Contributor Author

somiaj commented Dec 31, 2025

@dpvc thanks for the suggestions, I'll go make some of the changes. I party didn't include $block with the $state was that the $block wasn't changed (at least that I saw), and thought the $state contains the values that change between calls. But I can go update the few places that expect the block. I also don't mind updating more places if you think $item should be included.

All of the format methods now receive a single state hash
instead of the item, block, and state as separate inputs.

Reduce the number of times some variables are dereferenced
from the state hash.

Remove some white space to minimize HTML output. Leaving
newline characters `\n` alone for now.
@dpvc
Copy link
Member

dpvc commented Dec 31, 2025

I appreciate your making those changes. That alls looks good. One last thing I just thought of is that the new $self->$method() approach would potentially match functions other than the ones in the original list. In practice, that is not really a problem, as the item types don' include those, but it is a potential loophole with this approach. One solution is to only use capitalized function names for the ones that implement the items, and use all lower case for the others. That is almost the case now, with the exception of Escape() (though I haven't checked the subclasses). Do you think it would be good to chance Escape() to escape() to make the separation between the methods clear?

@drgrice1
Copy link
Member

I think that changing Escape to lower case would be good. Looking at the PGML::Format package and its subpackages, I see that the Escape method is the only method that does not correspond to an item type that is capitalized (the only other methods that do not correspond to item types are in the base class and are new, format, string, and nl).

So I am just up voting @dpvc's last suggestion!

This way only "item" methods are capitalized to distinguish
them from other methods which are not a block item type.
@somiaj
Copy link
Contributor Author

somiaj commented Dec 31, 2025

The method is now called escape. Do either of you happen to have a problem file which uses all of PGML syntax just to double check I didn't put a typo somewhere? I've tried to test things, but I don't have a nice problem file for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants