FIX: Ignore max excerpt length for div excerpts too (PR #13058)

We support two types of custom excerpts. It can be <div class="excerpt"> or <span class="excerpt">. We also ignore max excerpt length for custom excerpts. But we forgot to process div when ignoring max length.

GitHub

The regex is not correct. And wasn’t correct before. :stuck_out_tongue: It confuses character sets and capturing groups:

  CUSTOM_EXCERPT_REGEX = /<\s*(span|div)[^>]*class\s*=\s*['"]excerpt['"][^>]*>/

is it right to ignore? seems a bit odd. I would just truncate at 100

It was implemented like this when the feature was introduced. So it’s just documenting existent implementation.

I had the same doubts, but eventually, I feel that yes, it’s a good design decision.

We always truncate automatic excerpts. But when it comes to custom excerpts it’s better to give a user control of this. Probably the user decided to use a custom excerpt just to show that last word or even sentence that was truncated by automatic excerpt extraction.

A perfect catch :+1:

Since post.excerpt is calling PrettyText.excerpt, we don’t really need to test both every time? Just test PrettyText.excerpt and avoid doing any duplicate work?

Having read this, I think we’re using one variable length for two different purposes.

When you create an instance of the ExcerptParser, the length parameter is the maximum length of the excerpt to be created.

However, when the parser detects the user added a custom excerpt, it will change the value of that length parameter to ensure it’s properly picked up even if it’s at the end of the post.

While I like the idea of giving users the ability to provide a custom excerpt anywhere in the post, I do not like that it’s not limited in any sort of ways. That’s a recipe for abuse :scream:

What I reckon we should do here @AndrewPrigorshnev is when we detect a custom excerpt, we should change the html with the content of the custom excerpt instead of changing the length of the excerpt. That way we get the best of both worlds: users can add a custom excerpt anywhere in their post using a div or a span, and it’s limited according to the site setting.

However, when the parser detects the user added a custom excerpt, it will change the value of that length parameter to ensure it’s properly picked up even if it’s at the end of the post.

Not only for that, but to override max excerpt length from settings too. It was intentionally implemented in this way. Here is the post from the same topic, and it was committed back then in 2014 with the message “FEATURE: Allow manual excerpt to be specified anywhere in the post and override max excerpt length”.

I think this behavior makes sense. The main reasoning behind this change was to give users more control over excerpts:

“My major issue is that there is zero mod control here, you have an excerpt and can not control how it looks or when it stops.”

Let’s say my site has a limit for automatic excerpts in 100 chars, and I want to use 200 chars length excerpt for some important pinned topic. I may use a custom excerpt, and I wouldn’t want it to be truncated to 100 chars. I wouldn’t want to increase the limit for the whole site just because of this only topic, either.

Speaking of abuse, probably someone can start using tags <div class="excerpt">a looong message</div> to introduce clutter on the site. But in the same way, he can just start using four-letter words in his posts. Both problems are approachable by using standard moderation tools (flags, suspension, and so on). Maybe there could be another way to abuse it that I don’t see? I think the worst thing that could happen here is that someone includes the whole post in the excerpt.

We definitely can change this behavior. But it’s quite bigger than the change this PR is introducing. This would mean changing behavior that was intentionally introduced before and, in my opinion, makes sense. It would make this power user feature less powerful. Do we really want it?

It would make this power user feature less powerful. Also if someone out there uses long custom excerpts let say for categories, their excerpts will get truncated after this fix. And the only thing they will be able to do about it is to increase the excerpts limit for the whole site.

I would be fine if we’d have a modifier for the length of the custom excerpt, like 2x the value of the site setting. But there has to be a limit otherwise it will be abused…

It looked like a unit test + an integration test on one spec. I removed it from this and several other tests.

Probably it can be abused by somehow making the site parse a lot of very long excerpts.

if we’d have a modifier for the length of the custom excerpt, like 2x the value of the site setting

What if we use an absolute upper bound instead? We already have an upper bound for automatic excerpts, it’s impossible to set max excerpt length upper than 1000:

We can let custom excerpts bypass max excerpt length but always truncate all excerpts by 1000 chars.

I think we can merge this bug fix and then implement this abuse protection in another PR.

The title of this pull request changed from “FIX: Ignore max excerpt length for div excepts too” to "FIX: Ignore max excerpt length for div excerpts too

That’s an excellent idea :+1:

We can let custom excerpts bypass max excerpt length but always truncate all excerpts by 1000 chars.

:thinking: not following, are you saying

We no longer let custom excerpts bypass max excerpt length we truncate at 1000 chars.

Agree with @ZogStriP we need an upper limit otherwise people can sadly grief us and use this escape hatch to break rendering on front page.

:thinking: not following, are you saying

We no longer let custom excerpts bypass max excerpt length we truncate at 1000 chars.

Nope, not exactly. An example:

Let’s say topic excerpt maxlength set to 220 characters:

Then our algorithm for automatic excerpt extraction will take maximum first 220 characters of a post. The rest will be truncated.

If a user decides to use a custom excerpt (using <div class="excerpt"> or <span class="excerpt">) that excerpt would bypass this restriction. If he wants to add a few more words above the topic excerpt maxlength limit it would work. If he wants to use a custom excerpt of 500 characters for some important pinned topic it would work too.

But to prevent misuse and abuse we will truncate all excerpts by 1000 chars anyway:

  • It’s already implemented for automatic excerpts, users just can’t set topic excerpt maxlength to a value bigger than 1000:
  • And we will add this protection for custom excerpts too. So custom excerpts would bypass topic excerpt maxlength setting but would be truncated on 1000’th character anyway.

Again I’d like us to add this protection in a separate PR or it would be really hard to follow. We can firstly merge this PR since it’s just a bug fix, we just forgot to process div in one place and then we can merge another PR with misuse protection.

1 Like

It is a bit confusing but sure sounds fine to merge this as is

Custom excerpts are such a giant edge case, as long as we have some level of protection (as you described) I think we are ok

Fine, @SamSaffron thank you for bringing it up! And @ZogStriP thank you for this nice idea:

What I reckon we should do here @AndrewPrigorshnev is when we detect a custom excerpt, we should change the html with the content of the custom excerpt instead of changing the length of the excerpt.

it’ll help a lot with implementing this.