Skip to content

feat: Update SS with evals for Grade 5-12#5

Merged
adnanrhussain merged 1 commit intomainfrom
ahussain/sentence_structure_all_grades
Feb 19, 2026
Merged

feat: Update SS with evals for Grade 5-12#5
adnanrhussain merged 1 commit intomainfrom
ahussain/sentence_structure_all_grades

Conversation

@adnanrhussain
Copy link
Contributor

@adnanrhussain adnanrhussain commented Jan 10, 2026

Summary

Extends the Sentence Structure Evaluator (notebook) to support grades 5-12, beyond the existing grades 3-4 support.

Key Changes

  • Prompts: Added grade-specific rubrics for grades 5-12 with new sentence complexity definitions
  • Output Model: Extended with 7 new fields for advanced sentence analysis
  • Helper Function: Added get_rubric_for_grade() to dynamically select appropriate rubric
  • Testing: Added comprehensive test suite with multi-attempt validation

Documentation

  • Internally depends on PR-419 for documentation

Testing

Test cases covering grades 2-6 with expected complexity scores. Multi-attempt testing validates consistency.

Please expand to review the test script

Test Script

# Test cases
test_cases = {
    0: {
        'grade': 2, 
        'excerpt': "The Roman Empire was a powerful empire that lasted for hundreds of years. It started as a small village in Italy and grew into a huge empire that controlled much of Europe, Asia, and Africa. The Roman Empire had many strong leaders like Julius Caesar and Augustus. These leaders helped the empire grow and become very powerful.\n \n\n The Roman Empire had a period of peace and prosperity called the Pax Romana. This time was good for the empire, but it didn't last forever. The empire started to have problems. The army became weaker, and the economy had problems. The empire was also attacked by groups of people called barbarians.\n \n\n The Roman Empire was divided into two parts: the Western Roman Empire and the Eastern Roman Empire. The Western Roman Empire eventually fell apart in 476 AD. The Eastern Roman Empire, also known as the Byzantine Empire, lasted for many more years. The Roman Empire left behind many things that we still use today, like the Roman alphabet and the calendar.",
        'score': 'Moderately Complex'
    },
    1: {
        'grade': 3, 
        'excerpt': "The hoisting gear consists of a double system of chains 13/16 in. in diameter placed side by side; each chain is anchored by an adjustable screw to the end of the jib, and, passing round the traveling carriage and down to the falling block, is taken along the jib over a sliding pulley which leads it on to the grooved barrel, 3 ft. 9 in. in diameter. In front of the barrel is placed an automatic winder which insures a proper coiling of the chain in the grooves. The motive power is derived from two cylinders 10 in. in diameter and 16 in. stroke, one being bolted to each side frame; these cylinders, which are provided with link motion and reversing gear, drive a steel crank shaft 2¾ in. in diameter; on this shaft is a steel sliding pinion which drives the barrel by a double purchase.",
        'score': 'Exceedingly Complex'
    },
    2: {
        'grade': 4,
        'excerpt': "Before corals bleach, they do not show many other signs of feeling stressed. So, if we want to understand a coral's health, we have to study its cells. Inside cells we have a lot of information, including DNA, RNA, and proteins. These molecules can help us find clues about the communication between the coral and the algae. But also, these molecules can teach us how to know when corals are stressed.\nWhen an organism is stressed, every cell in its body will react. Everything will do its best to survive! In response to stress, the cell will use its DNA to make RNA, so that it can then make proteins that will fight off the stress. If an organism has been stressed before, it can respond to the stress faster and better. Think of it like visiting a city: the first time you visit, you will need a map to find your hotel. The more often you visit the city, the less you will need the map because you will remember, and you will get back to the hotel faster.",
        'score': 'Very Complex'
    },
    3: {
        'grade': 5,
        'excerpt': "Mesopotamia, located in present-day Iraq, is known as the 'Cradle of Civilization' because it was home to some of the earliest civilizations in the world. The region got its name from the ancient Greek words for 'land between the rivers,' referring to the Tigris and Euphrates rivers. These rivers provided water for the fertile land, making it perfect for farming. The regular flooding of the rivers made the land around them ideal for growing crops, which helped people settle down and form permanent villages. These villages eventually grew into cities, where people developed many of the characteristics of civilization, like organized government, complex buildings, and different social classes.\n \n\n The first civilizations in Mesopotamia were the Sumerians, who lived around 5,000 years ago. They invented the world's first written language, called cuneiform, which they used to keep track of things like food supplies and trade. They also developed a system of numbers, which helped them with math and measurement. The Sumerians built impressive cities like Ur, Eridu, and Uruk, which had populations of over 50,000 people. These cities were centers of learning and culture, and they helped spread knowledge and ideas throughout the region.\n \n\n Over time, other civilizations rose and fell in Mesopotamia, including the Akkadians, Babylonians, and Assyrians. Each civilization made its own contributions to the development of human society. The Babylonians are famous for their code of laws, which was one of the first written legal systems in the world. The Assyrians were known for their powerful military and their impressive palaces. Mesopotamia's history is full of amazing inventions and innovations that shaped the world we live in today.\n \n\n The development of civilization in Mesopotamia was not just about the fertile land and the rivers. Changes in climate and the environment also played a role. People had to become more organized and work together to survive. This led to the development of complex societies and governments. Mesopotamia's story is a reminder of how human ingenuity and adaptability can lead to amazing achievements.\n \n\n The 'Cradle of Civilization' is a term that refers to the regions where the earliest known human civilizations emerged. Mesopotamia is a prime example of this, as it was a place where people learned to live together, build cities, and develop new technologies that changed the course of human history. The innovations that came from Mesopotamia, like writing, mathematics, and agriculture, continue to influence our lives today. By studying ancient Mesopotamia, we can learn about the origins of our own civilization and the challenges and triumphs of early humans.",
        'score': 'Exceedingly Complex'
    },
    4: {
        'grade': 6,
        'excerpt': "Benjamin Franklin was a very important person in American history. He was born in Boston, Massachusetts in 1706. He was one of 17 children. Franklin did not go to school for very long. He learned to be a printer from his brother. Franklin was a very smart man. He invented many things, like bifocals, the Franklin stove, and the lightning rod. He also started the first public library in Philadelphia. Franklin was a writer, too. He wrote a book called *Poor Richard's Almanack*. It had many famous sayings, like \"Lost Time is never found again.\"\n\nFranklin was also a politician. He helped write the Declaration of Independence. He was a diplomat, too. He helped the United States get help from France during the Revolutionary War. He was a very busy man! Franklin was a scientist, a writer, a politician, and an inventor. He was a very important person in American history.\n\nFranklin was a very interesting person. He was a scientist who did experiments with electricity. He was a writer who wrote a book of sayings. He was a politician who helped the United States become independent. He was a diplomat who helped the United States get help from other countries. He was a very busy man!\n\nFranklin was a very smart man. He was a self-taught man who learned a lot on his own. He was a very creative man who invented many things. He was a very kind man who helped others. He was a very important man who helped shape the United States.\n\nFranklin was a very influential person. He was a leader who helped people. He was a thinker who came up with new ideas. He was a writer who shared his thoughts with others. He was a scientist who helped people understand the world. He was a very important person who helped make the United States what it is today.",
        'score': 'Slightly Complex'
    }
}

# Function to run all test cases with up to 3 attempts and short-circuit on match
async def run_test_cases():
    print("="*80)
    print("SENTENCE STRUCTURE EVALUATOR - TEST RESULTS")
    print("="*80)
    
    for test_id, test_case in test_cases.items():
        grade = test_case['grade']
        excerpt = test_case['excerpt']
        expected_score = test_case['score']
        
        print(f"\n{'='*80}")
        print(f"Test Case {test_id} | Grade: {grade}")
        print(f"{'='*80}")
        print(f"Expected Score: {expected_score}")
        
        # Store all attempts
        attempts = []
        matched = False
        matched_on_attempt = None
        
        # Run up to 5 attempts
        for attempt_num in range(1, 6):
            print(f"\n--- Attempt {attempt_num} ---")
            
            # Run prediction
            result = await predict_text_complexity_level(excerpt, grade)
            actual_score = result['answer']
            
            # Store this attempt
            attempts.append({
                'attempt': attempt_num,
                'score': actual_score,
                'reasoning': result['reasoning']
            })
            
            print(f"Actual Score: {actual_score}")
            
            # Check if matched
            if actual_score == expected_score:
                matched = True
                matched_on_attempt = attempt_num
                print("✓ MATCH - Short-circuiting")
                break
            else:
                print("✗ MISMATCH")
        
        # Print summary
        print(f"\n{'-'*80}")
        print("SUMMARY:")
        print(f"Total Attempts: {len(attempts)}")
        if matched:
            print(f"✓ Matched on attempt {matched_on_attempt}")
        else:
            print("✗ No match after 5 attempts")
        
        print(f"\nAll Results:")
        for attempt_data in attempts:
            print(f"  Attempt {attempt_data['attempt']}: {attempt_data['score']}")

    print(f"\n{'='*80}")
    print("TEST COMPLETED")
    print(f"{'='*80}")

# Run the tests
await run_test_cases()
Please expand to review the test results

Test Results

================================================================================
SENTENCE STRUCTURE EVALUATOR - TEST RESULTS
================================================================================

================================================================================
Test Case 0 | Grade: 2
================================================================================
Expected Score: Moderately Complex

--- Attempt 1 ---
Actual Score: Moderately Complex
✓ MATCH - Short-circuiting

--------------------------------------------------------------------------------
SUMMARY:
Total Attempts: 1
✓ Matched on attempt 1

All Results:
  Attempt 1: Moderately Complex

================================================================================
Test Case 1 | Grade: 3
================================================================================
Expected Score: Exceedingly Complex

--- Attempt 1 ---
Actual Score: Exceedingly Complex
✓ MATCH - Short-circuiting

--------------------------------------------------------------------------------
SUMMARY:
Total Attempts: 1
✓ Matched on attempt 1

All Results:
  Attempt 1: Exceedingly Complex

================================================================================
Test Case 2 | Grade: 4
================================================================================
Expected Score: Very Complex

--- Attempt 1 ---
Actual Score: Very Complex
✓ MATCH - Short-circuiting

--------------------------------------------------------------------------------
SUMMARY:
Total Attempts: 1
✓ Matched on attempt 1

All Results:
  Attempt 1: Very Complex

================================================================================
Test Case 3 | Grade: 5
================================================================================
Expected Score: Exceedingly Complex

--- Attempt 1 ---
Actual Score: Very Complex
✗ MISMATCH

--- Attempt 2 ---
Actual Score: Exceedingly Complex
✓ MATCH - Short-circuiting

--------------------------------------------------------------------------------
SUMMARY:
Total Attempts: 2
✓ Matched on attempt 2

All Results:
  Attempt 1: Very Complex
  Attempt 2: Exceedingly Complex

================================================================================
Test Case 4 | Grade: 6
================================================================================
Expected Score: Slightly Complex

--- Attempt 1 ---
Actual Score: Slightly Complex
✓ MATCH - Short-circuiting

--------------------------------------------------------------------------------
SUMMARY:
Total Attempts: 1
✓ Matched on attempt 1

All Results:
  Attempt 1: Slightly Complex

================================================================================
TEST COMPLETED
================================================================================

Status

🔄 Under internal review and testing before release.

Breaking Changes

None - backwards compatible.

@adnanrhussain adnanrhussain requested a review from gary-mu January 10, 2026 00:52
Copy link
Contributor

@gary-mu gary-mu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this integration of grade 5-12.
2 comments:

  1. Can we also add Ariena as reviewer? She's not a member of this repo, so I can't add her
  2. Can we add test result? I think running test passages and paste result can be sufficient.

@adnanrhussain adnanrhussain requested a review from aychi1 January 13, 2026 20:30
@aychi1
Copy link

aychi1 commented Jan 16, 2026

Overall LGTM. Two small callouts:

  • Some duplicative fields between Gr 3-4 and 5-12 (e.g. num_compound vs num_compound_sentences, perc_simple_sentences vs. perc_simple, etc).
  • Prompt definition for simple sentence is not exactly the same between Noah's version and Wayne's version, and this isn't captured in the PR. Noah's version specifically mentions that simple sentences with relative clauses still count as simple. It likely won't make a big difference, but just documenting here that this is a known departure.

Copy link

@aychi1 aychi1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Commented on some non-blocking nits.

@adnanrhussain
Copy link
Contributor Author

Note: Updated the PR description to include the test script used, and the test results validating the changes.

@adnanrhussain adnanrhussain marked this pull request as ready for review January 27, 2026 01:33
@adnanrhussain adnanrhussain force-pushed the ahussain/sentence_structure_all_grades branch from d76247f to 3ab2493 Compare January 28, 2026 07:11
@adnanrhussain adnanrhussain merged commit 43c343a into main Feb 19, 2026
1 check passed
@adnanrhussain adnanrhussain deleted the ahussain/sentence_structure_all_grades branch February 19, 2026 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments