feat(medcat): CU-869b9h7y6 Add faster linker #243

mart-r · 2025-11-25T16:15:43Z

This PR adds a faster linker to the mix.

This faster linker (primary_name_only_linker) is designed to link names only if
a) There's 1 suitable concept
b) There's 1 concept that considers the name a primary name

This results in faster linking. But it's also likely to reduce performance in cases where disambiguation is needed.

I ran a few performance / speed tests to look at the throughput and performance tradeoffs:

Dataset	Configuration	Precision	Recall	F1	Time (s)
COMETA
Spacy	Vector context	0.9245	0.4521	0.6072	68.16
Spacy	Faster linker	0.9266	0.4225	0.5804	51.64
Regex	Vector context	0.9130	0.4136	0.5693	30.54
Regex	Faster linker	0.9205	0.4108	0.5681	6.21
2023 Linking Challenge
Spacy	Vector context	0.5353	0.3337	0.4112	75.40
Spacy	Faster linker	0.5934	0.2873	0.3871	48.05
Regex	Vector context	0.4522	0.3162	0.3722	117.55
Regex	Faster linker	0.5091	0.2862	0.3664	82.61

As we can see, for the COMETA dataset, there's a clear benefit in running the faster componetns (tested for both regex tokenizer and this new faster linker). You can improve throughput by an order of magnitude! And the performance benefit isn't that big (up to around 10% in recall - no change in precision).

However, the Linking Challenge dataset shows that the situation is quite a bit more nuanced. In this case, the regex tokenizer results in slower execution than its spacy counterpart. I'm not entirely sure what the underlying cause is here (because the regex tokenizer creates aroudn 25% fewer tokens accross the dataset). But it's a good example of having to tailor the config to the specific usecase.

…e CUI options

tomolopolis · 2025-11-25T16:15:48Z

Task linked: CU-869b9h7y6 Add simple/fast linker

mart-r added 4 commits November 25, 2025 13:18

CU-869b9h7y6: Add faster linker that only links to primary names

98e64e1

CU-869b9h7y6: Remove debug output

d72b4f9

CU-869b9h7y6: Add proper filtering as well as usage of single-possibl…

0839a24

…e CUI options

CU-869b9h7y6: Add a simple test for the new linker

48396af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(medcat): CU-869b9h7y6 Add faster linker #243

feat(medcat): CU-869b9h7y6 Add faster linker #243

mart-r commented Nov 25, 2025

Uh oh!

tomolopolis commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(medcat): CU-869b9h7y6 Add faster linker #243

Are you sure you want to change the base?

feat(medcat): CU-869b9h7y6 Add faster linker #243

Conversation

mart-r commented Nov 25, 2025

Uh oh!

tomolopolis commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants