Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,24 @@ All notable changes to this project will be documented in this file.
* Major improvements in build time for the `unescape_fast` features (went from
8 seconds to 3 seconds on my laptop).
* Add `BARE_ENTITY_MAX_LENGTH` constant that contains the length of the longest
entity without a semicolon (enabled with feature `entities`).
entity without a semicolon (enabled with features `entities` or `unescape`).
* Clarify examples in documentation and README.
* Fix a few spelling mistakes in documentation.

### Breaking changes

* `unescape`: Use [hashify] to map entity byte strings to their expansions. This
is faster than the old [phf] map, but still slower than [matchgen] in
`unescape_fast`. Thanks to [xamgore] for the PR!
* The `unescape` feature no longer automatically enables the `entities` feature.
If you need the `ENTITIES` map, enable the `entities` feature.
* Updated minimum supported Rust version (MSRV) to 1.74.1 to support [hashify].

[hashify]: https://crates.io/crates/hashify
[matchgen]: https://crates.io/crates/matchgen
[phf]: https://crates.io/crates/phf
[xamgore]: https://github.com/xamgore

## Release 1.0.6 (2025-04-26)

* Switch dependency from [paste], which is no longer maintained, to a new fork,
Expand Down
14 changes: 13 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 4 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "htmlize"
version = "1.0.6"
version = "2.0.0"
authors = ["Daniel Parks <oss-htmlize@demonhorse.org>"]
description = "Correctly encode and decode HTML entities in UTF-8"
homepage = "https://github.com/danielparks/htmlize"
Expand All @@ -11,15 +11,15 @@ keywords = ["html", "entities", "escape", "unescape", "decode"]
categories = ["web-programming", "encoding"]
license = "MIT OR Apache-2.0"
edition = "2021"
rust-version = "1.60"
rust-version = "1.74.1"

[package.metadata.docs.rs]
all-features = true
rustdoc-args = ["--cfg", "docsrs"]

[features]
default = []
unescape = ["entities", "_unescape_either"]
unescape = ["_unescape_either", "dep:hashify", "dep:serde_json"]
unescape_fast = ["_unescape_either", "dep:matchgen", "dep:serde_json"]
entities = ["dep:phf", "dep:phf_codegen", "dep:serde_json"]
# Enable iai benchmarks
Expand All @@ -36,6 +36,7 @@ phf_codegen = { version = "0.11.1", optional = true }
serde_json = { version = "1.0", optional = true }

[dependencies]
hashify = { version = "0.2.6", optional = true }
memchr = "2.5.0"
pastey = "0.1.0"
phf = { version = "0.11.1", default-features = false, optional = true }
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![docs.rs](https://img.shields.io/docsrs/htmlize)][docs.rs]
[![Crates.io](https://img.shields.io/crates/v/htmlize)][crates.io]
![Rust version 1.60+](https://img.shields.io/badge/Rust%20version-1.60%2B-success)
![Rust version 1.74.1+](https://img.shields.io/badge/Rust%20version-1.74.1%2B-success)

Htmlize handles both encoding raw strings to be safely inserted in HTML, and
decoding HTML text with entities to get back a raw string. It closely follows
Expand Down Expand Up @@ -134,8 +134,8 @@ The `escape` functions are all available with no features enabled.
performance of of the `unescape` version is already pretty good, so I don’t
recommend enabling this unless you really need it.

* `unescape`: provide normal version of `unescape()`. This will
automatically enable the `entities` feature.
* `unescape`: provide normal version of `unescape()`. Enabling this will add a
dependency on [hashify] and may slow builds by a few seconds.

* `entities`: build `ENTITIES` map. Enabling this will add a dependency
on [phf] and may slow builds by a few seconds.
Expand Down Expand Up @@ -225,6 +225,7 @@ additional terms or conditions.
[`unescape_bytes_in()`]: https://docs.rs/htmlize/1.0.6/htmlize/fn.unescape_bytes_in.html
[`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html
[official WHATWG spec]: https://html.spec.whatwg.org/multipage/parsing.html#character-reference-state
[hashify]: https://crates.io/crates/hashify
[phf]: https://crates.io/crates/phf
[features]: https://docs.rs/htmlize/1.0.6/htmlize/index.html#features
[iai]: https://crates.io/crates/iai
Expand Down
4 changes: 2 additions & 2 deletions benches/unescape.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ fn benchmarks(c: &mut Criterion) {
util::benchmark_name!(
group,
"map",
(Phf, ContextGeneral),
(Map, ContextGeneral),
&name,
&input
);
Expand All @@ -70,7 +70,7 @@ fn benchmarks(c: &mut Criterion) {
util::benchmark_name!(
group,
"map",
(Phf, ContextAttribute),
(Map, ContextAttribute),
&name,
&input
);
Expand Down
4 changes: 2 additions & 2 deletions benches/unescape_iai.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ macro_rules! iai_benchmarks {
$(
#[cfg(feature = "unescape")]
fn [<iai_map_unescape_ $name>]() -> Cow<'static, str> {
unescape_in((Phf, ContextGeneral), black_box($input))
unescape_in((Map, ContextGeneral), black_box($input))
}

#[cfg(feature = "unescape")]
fn [<iai_map_unescape_attribute_ $name>]() -> Cow<'static, str> {
unescape_in((Phf, ContextAttribute), black_box($input))
unescape_in((Map, ContextAttribute), black_box($input))
}

#[cfg(feature = "unescape_fast")]
Expand Down
116 changes: 99 additions & 17 deletions build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,31 @@
//! }

fn main() {
#[cfg(any(feature = "unescape_fast", feature = "entities"))]
#[cfg(any(
feature = "unescape_fast",
feature = "unescape",
feature = "entities"
))]
let entities = load_entities("entities.json");

#[cfg(feature = "unescape_fast")]
generate_matcher_rs(&entities);

#[cfg(feature = "unescape")]
generate_unescape_entity_rs(&entities);

#[cfg(any(feature = "unescape", feature = "entities"))]
generate_entities_length_rs(&entities);

#[cfg(feature = "entities")]
generate_entities_rs(&entities);
}

/// Generate entities.rs file containing all valid HTML entities in a
/// [`phf::Map`] along with a few useful constants. It also generates
/// documentation with all entities in a table.
/// [`phf::Map`]. It also generates documentation with a table of all the
/// entities and their expansions.
#[cfg(feature = "entities")]
fn generate_entities_rs(entities: &[(String, String)]) {
use std::cmp::{max, min};
use std::env;
use std::fs::File;
use std::io::{BufWriter, Write};
Expand All @@ -52,16 +61,8 @@ fn generate_entities_rs(entities: &[(String, String)]) {
/// -------------------------------|--------------------|------").unwrap();

let mut map_builder = phf_codegen::Map::<&[u8]>::new();
let mut max_len: usize = 0;
let mut min_len: usize = usize::MAX;
let mut bare_max_len: usize = 0;
for (name, glyph) in entities {
map_builder.entry(name.as_bytes(), &format!("&{:?}", glyph.as_bytes()));
max_len = max(max_len, name.len());
min_len = min(min_len, name.len());
if !name.ends_with(';') {
bare_max_len = max(bare_max_len, name.len());
}

// `{:28}` would pad the output inside the backticks.
let name = format!("`{name}`");
Expand All @@ -84,13 +85,42 @@ fn generate_entities_rs(entities: &[(String, String)]) {
writeln!(out, "/// {name:30} | {codepoints:18} | {glyph}",).unwrap();
}

let map = map_builder.build();
writeln!(out, "#[allow(clippy::unreadable_literal)]").unwrap();
writeln!(
out,
"pub static ENTITIES: phf::Map<&[u8], &[u8]> = {};",
map_builder.build()
)
.unwrap();
}

/// Generate `entities_length.rs` file containing constants with the minimum
/// and maximum entity lengths.
#[cfg(any(feature = "unescape", feature = "entities"))]
fn generate_entities_length_rs(entities: &[(String, String)]) {
use std::cmp::{max, min};
use std::env;
use std::fs::File;
use std::io::{BufWriter, Write};
use std::path::Path;

let out_path =
Path::new(&env::var("OUT_DIR").unwrap()).join("entities_length.rs");
let mut out = BufWriter::new(File::create(out_path).unwrap());

let mut max_len: usize = 0;
let mut min_len: usize = usize::MAX;
let mut bare_max_len: usize = 0;
for (name, _) in entities {
max_len = max(max_len, name.len());
min_len = min(min_len, name.len());
if !name.ends_with(';') {
bare_max_len = max(bare_max_len, name.len());
}
}
writeln!(
out,
"\
#[allow(clippy::unreadable_literal)]\n\
pub static ENTITIES: phf::Map<&[u8], &[u8]> = {map};\n\
\n\
/// Length of longest entity including ‘&’ and possibly ‘;’.\n\
pub const ENTITY_MAX_LENGTH: usize = {max_len};\n\
\n\
Expand All @@ -103,6 +133,54 @@ fn generate_entities_rs(entities: &[(String, String)]) {
.unwrap();
}

/// Generate `expand_entity.rs` file containing a function that maps entity byte
/// strings to their expansions.
#[cfg(feature = "unescape")]
fn generate_unescape_entity_rs(entities: &[(String, String)]) {
use std::env;
use std::fs::File;
use std::io::{BufWriter, Write};
use std::path::Path;

let out_path =
Path::new(&env::var("OUT_DIR").unwrap()).join("expand_entity.rs");
let mut out = BufWriter::new(File::create(out_path).unwrap());

writeln!(
out,
"\
/// Get expansion or `None` for a candidate HTML entity byte string.\n\
#[must_use]\n\
#[allow(clippy::too_many_lines)]\n\
fn expand_entity(candidate: &[u8]) -> Option<&[u8]> {{\n\
hashify::map! {{\n\
candidate,\n\
&[u8],"
)
.unwrap();

for (name, glyph) in entities {
write!(
out,
"\n\
b\"{name}\" => &["
)
.unwrap();
for &byte in glyph.as_bytes() {
write!(out, "{byte},").unwrap();
}
write!(out, "],").unwrap();
}

writeln!(
out,
"\n\
}}\n\
}}"
)
.unwrap();
}

/// Generated matcher.rs file containing a function `entity_matcher()` that is
/// basically just a giant nested tree of `match` expressions to check if the
/// next bytes in an iterator are an HTML entity.
Expand All @@ -127,7 +205,11 @@ fn generate_matcher_rs(entities: &[(String, String)]) {
}

/// Load HTML entities as `vec![...("&gt;", ">")...]`.
#[cfg(any(feature = "unescape_fast", feature = "entities"))]
#[cfg(any(
feature = "unescape_fast",
feature = "unescape",
feature = "entities"
))]
fn load_entities<P: AsRef<std::path::Path>>(path: P) -> Vec<(String, String)> {
let input = std::fs::read(path.as_ref()).unwrap();
let input: serde_json::Map<String, serde_json::Value> =
Expand Down
5 changes: 0 additions & 5 deletions src/entities.rs

This file was deleted.

22 changes: 18 additions & 4 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ assert!(htmlize::unescape("3 &times 4 &gt; 10") == "3 × 4 > 10");
//! performance of of the `unescape` version is already pretty good, so I
//! don’t recommend enabling this unless you really need it.
//!
//! * `unescape`: provide normal version of [`unescape()`]. This will
//! automatically enable the `entities` feature.
//! * `unescape`: provide normal version of [`unescape()`]. Enabling this will
//! add a dependency on [hashify] and may slow builds by a few seconds.
//!
//! * `entities`: build [`ENTITIES`] map. Enabling this will add a dependency
//! on [phf] and may slow builds by a few seconds.
Expand All @@ -89,10 +89,11 @@ assert!(htmlize::unescape("3 &times 4 &gt; 10") == "3 × 4 > 10");
//!
//! # Minimum supported Rust version
//!
//! Currently the minimum supported Rust version (MSRV) is **1.60**. Future
//! Currently the minimum supported Rust version (MSRV) is **1.74.1**. Future
//! increases in the MSRV will require a major version bump.
//!
//! [official WHATWG spec]: https://html.spec.whatwg.org/multipage/parsing.html#character-reference-state
//! [hashify]: https://crates.io/crates/hashify
//! [phf]: https://crates.io/crates/phf
//! [iai]: https://crates.io/crates/iai
//! [benchmarks]: https://github.com/danielparks/htmlize#benchmarks
Expand Down Expand Up @@ -133,9 +134,22 @@ feature! {
pub use unescape::*;
}

feature! {
#![any(feature = "unescape", feature = "entities")]

/// For some reason `rustdoc` doesn’t show the feature flags without `mod`.
mod entities_length {
include!(concat!(env!("OUT_DIR"), "/entities_length.rs"));
}
pub use entities_length::*;
}

feature! {
#![feature = "entities"]

mod entities;
/// For some reason `rustdoc` doesn’t show the feature flags without `mod`.
mod entities {
include!(concat!(env!("OUT_DIR"), "/entities.rs"));
}
pub use entities::*;
}
Loading