Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ test: malayalam.a python

coverage-analysis: malayalam.a python
@python tests/coverage-test.py
sort -u tests/unanalyzed.lex --output=tests/unanalyzed.lex
wc tests/unanalyzed.lex

dataset:
pip install tqdm
python scripts/create-dataset.py
python scripts/create-dataset.py
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,16 @@ The analyser is being developed with lot of tests. To run tests :
$ make test
```

```bash
$ make coverage-test
```
runs a coverage-tests and creates unanalyzed.lex file in tests with unanalyzed words.
## Dataset
```bash
$ make dataset
```
creates a .csv file with words from tests/coverage/*.txt files.

## Citation

Please cite the following publication in order to refer to the mlmorph:
Expand Down
9 changes: 6 additions & 3 deletions tests/coverage-test.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,9 @@ def test_total_coverage(self):
start = clock()
print("%40s\t%8s\t%8s\t%s" %
('File name', 'Words', 'Analysed', 'Percentage'))
for filename in glob.glob(os.path.join(CURR_DIR, "coverage", "*.txt")):
with open(filename, 'r') as file:
with open("./tests/unanalyzed.lex", "w+") as unanFile:
for filename in glob.glob(os.path.join(CURR_DIR, "coverage", "*.txt")):
with open(filename, 'r') as file:
tokens_count = 0
analysed_tokens_count = 0
for line in file:
Expand All @@ -41,12 +42,14 @@ def test_total_coverage(self):
analysis = self.analyser.analyse(word, False)
if len(analysis) > 0:
analysed_tokens_count += 1
else:
unanFile.write(word+"\n")
percentage = (analysed_tokens_count/tokens_count)*100
total_tokens_count += tokens_count
total_analysed_tokens_count += analysed_tokens_count
print("%40s\t%8d\t%8d\t%3.2f%%" % (os.path.basename(
filename), tokens_count, analysed_tokens_count, percentage))
file.close()
file.close();
percentage = (total_analysed_tokens_count/total_tokens_count)*100
time_taken = clock() - start
print('%40s\t%8d\t%8d\t%3.2f%%' %
Expand Down