Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
f87e095
fix a possible crash when retrieving the name of a character without …
Aethor Jul 12, 2025
8c5c5cb
fix a potential crash when computing layout of an empty network
Aethor Jul 12, 2025
0dc8394
WIP: work on renard gradio ui
Aethor Jul 12, 2025
1e1a0db
upgrade dependencies
Aethor Jul 18, 2025
c436aa3
Add a simple Makefile for running common tasks
Aethor Jul 18, 2025
001f2c1
support for dynamic parameter in renard.ui
Aethor Jul 18, 2025
9fc5fd1
update Makefile to include correct uv dep groups
Aethor Jul 18, 2025
ead4a89
add param description in renard.ui
Aethor Jul 18, 2025
8051098
greatly improve ui
Aethor Jul 19, 2025
eb49c10
ui is now an extra instead of a dependency group
Aethor Jul 19, 2025
781377e
fix a crash in renard.ui in case of empty graph
Aethor Jul 19, 2025
3891ff3
Add a mechanism to load example novels, add Pride and Prejudice from PG
Aethor Jul 21, 2025
3563dc0
fix the incorrect target `make ui`
Aethor Jul 21, 2025
41540ee
add a predefined example in Renard ui
Aethor Jul 21, 2025
209f34f
fix an issue where it was impossible to upload file in renard ui
Aethor Jul 21, 2025
e42a1fa
limit Pride&Prejudice example in the UI to 10 chapters
Aethor Jul 21, 2025
3d5290c
change Pride&Prejudice text_area label to reflect the content
Aethor Jul 21, 2025
2bc2212
correctly indicate that ConversationalGraphExtractor supports any lang
Aethor Sep 13, 2025
7cb79f8
support for AMD ROCm
Aethor Sep 14, 2025
595cc41
relation extraction prototype
Aethor Sep 14, 2025
6164dae
add `edge_label_kwargs` interface to :meth:`.PipelineState.plot_graph`
Aethor Sep 14, 2025
bbae55a
update install instructions in the README to add cuda/rocm builds
Aethor Oct 11, 2025
ecd793a
update deprecated character_extractor_kwargs argument for preconfigur…
Aethor Oct 11, 2025
da1e9ea
add T5RelationExtractor GPU support
Aethor Oct 11, 2025
31b8161
progress towards relation extraction
Aethor Dec 19, 2025
d5ff35d
WIP: moving toward use of t5gemma for relation extraction
Aethor Dec 26, 2025
079fb46
fix training for relation extraction model
Aethor Dec 28, 2025
dd6e0cf
fix sentiment analysis related tests
Aethor Dec 28, 2025
bc38e6f
fix the docstring of BlockBounds to update obsolete information
Aethor Dec 28, 2025
b2d65c0
ConversationalGraphExtractor now supports dynamic networks
Aethor Dec 28, 2025
091e329
update T5RelationExtractor to GenerativeRelationExtractor in preconfi…
Aethor Dec 28, 2025
9b3a750
change test flags names
Aethor Dec 28, 2025
d8f80ac
supply a default conversation_dist value for the preconfigured conver…
Aethor Dec 28, 2025
1bb2ac8
Add cpu and rocm64 extra
Aethor Jan 5, 2026
0a0bab0
improve format for relation extraction
Aethor Jan 5, 2026
359d152
do not use additional special tokens for relation extraction
Aethor Jan 6, 2026
fdf59b7
Add support for 'fra' language in ui
Aethor Jan 6, 2026
b7d3f4d
remove wrong import causing crash from renard.ui
Aethor Jan 6, 2026
2d295ed
always manually add special tokens when loading relation extraction d…
Aethor Jan 7, 2026
52eaada
update docs to indicate the in-dev status of GenerativeRelationExtractor
Aethor Jan 7, 2026
72c0423
bump version
Aethor Jan 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.PHONY: test
test:
uv run --group dev python -m pytest tests

.PHONE: ui
ui:
uv run --extra ui python -m renard.ui
29 changes: 26 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,16 @@ Renard (Relationship Extraction from NARrative Documents) is a library for creat

# Installation

You can install the latest version using pip:
Currently, Renard supports Python>=3.9,<=3.12. You can install the
latest version using pip:

> pip install renard-pipeline

Currently, Renard supports Python>=3.9,<=3.12
If you have a GPU, there are accelerated versions for Nvidia CUDA and
AMD ROCm:

> pip install renard-pipeline[cuda128]
> pip install renard-pipeline[rocm63]


# Documentation
Expand Down Expand Up @@ -59,7 +64,25 @@ For more information, see `renard_tutorial.py`, which is a tutorial in the `jupy

> uv run python -m pytest tests

Expensive tests are disabled by default. These can be run by setting the environment variable `RENARD_TEST_ALL` to `1`.
Alternatively, the project Makefile has a test target:

> make test

Expensive tests are disabled by default. These can be run by setting the environment variable `RENARD_TEST_SLOW` to `1`.



# Renard UI

Since version 0.7, Renard has a web interface powered by gradio. First, install the additional dependencies:

> uv sync --group ui

Then, simply run:

> make ui

And open your browser at http://127.0.0.1:7860


# Contributing
Expand Down
10 changes: 4 additions & 6 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ issue if you encounter a problem or want to discuss a specific
feature. If you want to contribute a patch:

1. Check that your code matches our code quality guidelines and that
all existing tests are passing with ``RENARD_TEST_ALL=1``.
all existing tests are passing with ``RENARD_TEST_SLOW=1``.
2. Create a Github pull request with your patch, explaining the
rationale behind it and giving a high level overview of your
code. Mention the relevant issue if applicable.
Expand Down Expand Up @@ -36,9 +36,7 @@ the ``tests`` directory. We use ``pytest`` to test code, and also use
``hypothesis`` when applicable. If you open a patch, make sure that
all tests are passing. In particular, do not rely on the CI, as it
does not run time costly tests! Check for yourself locally, using
``RENARD_TEST_ALL=1 python -m pytest tests``. Note that there are
``RENARD_TEST_SLOW=1 python -m pytest tests``. Note that there are
specific tests and environment variable for optional dependencies such
as *stanza* (``RENARD_TEST_STANZA_OPTDEP``). These must be explicitely
set to ``1`` if you want to test optional dependencies, as
``RENARD_TEST_ALL=1`` does not enable test on these optional
dependencies.
as *stanza* (``RENARD_TEST_OPTDEP_STANZA``). These must be explicitely
set to ``1`` if you want to test optional dependencies.
6 changes: 5 additions & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@ Installation
Using Pip
=========

Simply use ``pip install renard-pipeline``.
For the simplest case, use ``pip install renard-pipeline``. By default, this installs the CPU version of PyTorch. If you want GPU support to accelerate inference:

- CUDA 12.8: ``pip install renard-pipeline[cuda128]``
- ROCm 6.3: ``pip install renard-pipeline[rocm63]``


Note that for some modules, you might need to install additional
libraries:
Expand Down
65 changes: 44 additions & 21 deletions docs/pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,17 +75,22 @@ For simplicity, one can use one of the preconfigured pipelines:

.. code-block:: python

from renard.pipeline.preconfigured import bert_pipeline
from renard.pipeline.preconfigured import co_occurence_pipeline

with open("./my_doc.txt") as f:
text = f.read()

pipeline = bert_pipeline(
graph_extractor_kwargs={"co_occurrences_dist": (1, "sentences")}
)
pipeline = co_occurrence_pipeline()
out = pipeline(text)


The following preconfigured pipelines are available:

- :func:`.co_occurrence_pipeline`
- :func:`.conversational_pipeline`
- :func:`.relational_pipeline`


Pipeline Output: the Pipeline State
===================================

Expand Down Expand Up @@ -137,7 +142,7 @@ Tokenization
Tokenization is the task of cutting text in *tokens*. It is usually
the first task to apply to a text. 2 tokenizer are available:

- :class:`.NLTKTokenizer`
- :class:`.NLTKTokenizer` is the tokenizer from NLTK.
- :class:`.StanfordCoreNLPPipeline` does contain a tokenizer as part
of its full NLP pipeline.

Expand All @@ -148,16 +153,19 @@ Named Entity Recognition
Named entity recognition (NER) detects entities occurences in the
text. 3 modules are available:

- :class:`.NLTKNamedEntityRecognizer`
- :class:`.BertNamedEntityRecognizer`
- :class:`.NLTKNamedEntityRecognizer` is a lightweight NER module from
NLTK, based on POS tagging and rules.
- :class:`.BertNamedEntityRecognizer` is a NER module employing a
finetuned BERT model.
- :class:`.StanfordCoreNLPPipeline` contains a NER model as part of
its full NLP pipeline.


Coreference Resolution
----------------------

- :class:`.SpacyCorefereeCoreferenceResolver`
- :class:`.SpacyCorefereeCoreferenceResolver` uses the spacy coreferee
module.
- :class:`.BertCoreferenceResolver`, using the Tibert library.
- :class:`.StanfordCoreNLPPipeline` can execute a coreference
resolution model as part of its pipeline.
Expand All @@ -166,14 +174,14 @@ Coreference Resolution
Quote Detection
---------------

- :class:`.QuoteDetector`
- :class:`.QuoteDetector` detect quotes using simple logic.


Sentiment Analysis
------------------

- :class:`.NLTKSentimentAnalyzer` leverages NLTK's Vader for sentiment
analysis
analysis.


Characters Extraction
Expand All @@ -183,21 +191,36 @@ Characters extraction (or alias resolution) extract characters from
occurences detected using NER. This is done by assigning each mention
to a unique character.

- :class:`.NaiveCharacterUnifier`
- :class:`.GraphRulesCharacterUnifier`
- :class:`.NaiveCharacterUnifier` assigns each mention with a unique
form to a character.
- :class:`.GraphRulesCharacterUnifier` uses a set of rules to assign
each mention to a character.


Relation Extraction
-------------------

- :class:`.GenerativeRelationExtractor` is currently in development
and should not be used.


Speaker Attribution
-------------------

- :class:`.BertSpeakerDetector`
- :class:`.BertSpeakerDetector` detects speaker using a finetuned BERT
model.


Graph Extraction
----------------

- :class:`.CoOccurrencesGraphExtractor`
- :class:`.ConversationalGraphExtractor`
- :class:`.CoOccurrencesGraphExtractor` extracts a graph of
co-occurrence between characters.
- :class:`.ConversationalGraphExtractor` extracts a conversational
graph: either conversation between characters, or of character
mentions.
- :class:`.RelationalGraphExtractor` extracts a relational graph,
where the relation between each character is typed.


Dynamic Graphs
Expand Down Expand Up @@ -240,8 +263,9 @@ When executing the above block of code, the output attribute
>>> out.character_network
[<networkx.classes.graph.Graph object at 0x7fd9e9115900>]

See :class:`.CoOccurrencesGraphExtractor` for more details on the
usage of the ``dynamic`` and ``dynamic_window`` arguments.
Both :class:`.CoOccurrencesGraphExtractor` and
:class:`.ConversationalGraphExtractor` support dynamic networks. See
their documentation for more details.

Plot and export functions work as one would expect
intuitively. :meth:`.PipelineState.plot_graph` allow to visualize the
Expand All @@ -255,10 +279,9 @@ dynamic graph to the Gephi format.
Custom Segmentation
-------------------

The ``dynamic_window`` parameter of
:class:`.CoOccurencesGraphExtractor` determines the segmentation of
the dynamic networks, in number of interactions. In the example above,
a new graph will be created for each 20 interactions.
The ``dynamic_window`` parameter determines the segmentation of the
dynamic networks, in number of interactions. In the example above, a
new graph will be created for each 20 interactions.

While one can rely on the arguments of the graph extractor of the
pipeline to determine the dynamic window, Renard allows to specify a
Expand Down
95 changes: 80 additions & 15 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "renard-pipeline"
version = "0.6.5"
version = "0.7.0"
description = "Relationships Extraction from NARrative Documents"
authors = [
{name = "Arthur Amalvy", email = "arthur.amalvy@univ-avignon.fr"},
Expand All @@ -9,20 +9,23 @@ license = { text = "GPL-3.0-only" }
readme = "README.md"
requires-python = ">=3.9,<3.13"
dependencies = [
"torch>=2.0.0,!=2.0.1",
"transformers>=4.37",
"nltk>=3.9",
"tqdm>=4.62",
"networkx>=3.0",
"more-itertools>=10.5",
"nameparser>=1.1",
"matplotlib>=3.5",
"pandas>=2.0",
"pytest>=8.3.0",
"tibert>=0.5",
"grimbert>=0.1",
"datasets>=3.0",
"torch>=2.7.0",
"transformers>=4.57.1",
"nltk>=3.9.1",
"tqdm>=4.67.1",
"networkx>=3.2",
"more-itertools>=10.7",
"nameparser>=1.1.3",
"matplotlib>=3.9",
"pytest>=8.4.1",
"tibert>=0.5.2",
"grimbert>=0.1.5",
"datasets>=4.0.0",
"rank-bm25>=0.2.2",
"accelerate>=1.10.1",
"scikit-learn>=1.6.1",
"tiktoken>=0.12.0",
"protobuf>=6.33.2",
]

[build-system]
Expand All @@ -43,4 +46,66 @@ dev = [
"Sphinx>=4.3",
"sphinx-rtd-theme>=1.0.0",
"sphinx-autodoc-typehints>=1.12.0",
]
]

[project.optional-dependencies]
ui = [
"gradio>=4.44.1",
"pyvis>=0.3.2",
]
cpu = [
"torch>=2.7.1",
]
cuda128 = [
"torch>=2.7.1",
]
rocm63 = [
"torch>=2.7.1",
"pytorch-triton-rocm>=3.1.0",
]
rocm64 = [
"torch>=2.7.1",
"pytorch-triton-rocm>=3.1.0",
]

[tool.uv]
conflicts = [
[
{ extra = "cpu" },
{ extra = "cuda128" },
{ extra = "rocm63" },
{ extra = "rocm64" },
],
]

[tool.uv.sources]
torch = [
{ index = "pytorch-cpu", extra = "cpu" },
{ index = "pytorch-cuda128", extra = "cuda128" },
{ index = "pytorch-rocm63", extra = "rocm63" },
{ index = "pytorch-rocm64", extra = "rocm64" },
]
pytorch-triton-rocm = [
{ index = "pytorch-rocm63", extra = "rocm63" },
{ index = "pytorch-rocm64", extra = "rocm64" },
]

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[[tool.uv.index]]
name = "pytorch-cuda128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

[[tool.uv.index]]
name = "pytorch-rocm63"
url = "https://download.pytorch.org/whl/rocm6.3"
explicit = true

[[tool.uv.index]]
name = "pytorch-rocm64"
url = "https://download.pytorch.org/whl/rocm6.4"
explicit = true
10 changes: 6 additions & 4 deletions renard/pipeline/character_unification.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,14 @@ class Character:
mentions: List[Mention]
gender: Gender = Gender.UNKNOWN

def longest_name(self) -> str:
def longest_name(self) -> Optional[str]:
if len(self.names) == 0:
return None
return max(self.names, key=len)

def shortest_name(self) -> str:
def shortest_name(self) -> Optional[str]:
if len(self.names) == 0:
return None
return min(self.names, key=len)

def most_frequent_name(self) -> Optional[str]:
Expand Down Expand Up @@ -236,7 +240,6 @@ def __call__(

# * link nodes based on several rules
for name1, name2 in combinations(G.nodes(), 2):

# preprocess name when needed
pname1 = self._preprocess_name(name1)
pname2 = self._preprocess_name(name2)
Expand Down Expand Up @@ -294,7 +297,6 @@ def try_remove_edges(edges):
pass

for name1, name2 in combinations(G.nodes(), 2):

# preprocess names when needed
pname1 = self._preprocess_name(name1)
pname2 = self._preprocess_name(name2)
Expand Down
Loading