Add lower bound wilson score for bernoulli parameter #22

c-martinez · 2014-12-01T12:41:38Z

Alternative way of converting counts to probabilities (as suggested by #13 ).

In the following example, entities like: Chiliometrum, Magnum opus, Litus, Occidens, Septentrio, etc, are assigned 1.0 probability when using 'simple' count conversion, but get different probabilities when using 'wilson' method.

import re
from semanticizest import Semanticizer

sem = Semanticizer('la.model', score='wilson', wilson_confidence=0.95)

text = """Area 389.434 km² Naxos est maxima Cycladum insula. Insulae orientali sunt litora ardua, in occidentem versus loca planiora patent, a septentrionibus ad meridiem montes granitici insulam transeunt, qui usque ad 1000 metra surgunt; quorum summa cacumina sunt Mons Iovis et Coronus."""
toks = re.findall('\w+', text)

for cand in sem.all_candidates(toks):
   print cand

However, it is significantly slower.

larsmans · 2014-12-01T13:30:41Z

semanticizest/_semanticizer.py

Should add scipy to requirements.txt.

Add lower bound wilson score for bernoulli parameter

3889633

larsmans reviewed Dec 1, 2014
View reviewed changes

semanticizest/_semanticizer.py Outdated

Copy link

Contributor

larsmans Dec 1, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add scipy to requirements.txt.

larsmans added 2 commits December 1, 2014 16:50

require SciPy for Wilson ranking

76bd6fd

install SciPy from apt-get on Travis

02e92ff

larsmans mentioned this pull request Jan 30, 2015

Timeline & beta testing #24

Open

refactor/vectorize Wilson confidence interval code

ed03d2a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add lower bound wilson score for bernoulli parameter #22

Add lower bound wilson score for bernoulli parameter #22

Uh oh!

c-martinez commented Dec 1, 2014

Uh oh!

larsmans Dec 1, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add lower bound wilson score for bernoulli parameter #22

Are you sure you want to change the base?

Add lower bound wilson score for bernoulli parameter #22

Uh oh!

Conversation

c-martinez commented Dec 1, 2014

Uh oh!

larsmans Dec 1, 2014

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants