Error handling for Grobid when not responding#35
Error handling for Grobid when not responding#35Sanakhamassi wants to merge 6 commits intoScienciaLAB:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds explicit error handling for Grobid failures so the Streamlit UI can surface a clear “please try later” message instead of failing ambiguously (issue #11).
Changes:
- Introduce
GrobidServiceErrorand raise it when Grobid errors or returns non-200. - Catch
GrobidServiceErrorin the Streamlit upload/embedding flow and display an error message. - Add a (currently redundant) guard in
DocumentQAEnginefor missing Grobid output.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
streamlit_app.py |
Catches Grobid failures during embedding creation and shows a user-facing error. |
document_qa/grobid_processors.py |
Defines GrobidServiceError and raises it from Grobid processing on failure/non-200. |
document_qa/document_qa_engine.py |
Imports/raises GrobidServiceError when Grobid structure is missing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lfoppiano
left a comment
There was a problem hiding this comment.
There is a missing space, the rest looks fine. I did not test it, so please make sure you tested it before merge/squash.
| tmp_file = NamedTemporaryFile() | ||
| tmp_file.write(bytearray(binary)) | ||
| st.session_state['binary'] = binary | ||
|
|
||
| st.session_state['doc_id'] = hash = st.session_state['rqa'][model].create_memory_embeddings( | ||
| tmp_file.name, | ||
| chunk_size=chunk_size, | ||
| perc_overlap=0.1 | ||
| ) |
There was a problem hiding this comment.
Yes, @Sanakhamassi here you need to either add tempFile in the with () or handle that somehow
| st.session_state['doc_id'] = None | ||
| st.session_state['loaded_embeddings'] = False | ||
| st.session_state['uploaded'] = False | ||
| st.error(f"{message} Please try later.") |
| if grobid_url: | ||
| self.grobid_processor = GrobidProcessor(grobid_url) | ||
| self.grobid_processor = GrobidProcessor(grobid_url, ping_server=False) | ||
|
|
| try: | ||
| pdf_file, status, text = self.grobid_client.process_pdf("processFulltextDocument", | ||
| input_path, | ||
| consolidate_header=True, | ||
| consolidate_citations=False, | ||
| segment_sentences=False, | ||
| tei_coordinates=coordinates, | ||
| include_raw_citations=False, | ||
| include_raw_affiliations=False, | ||
| generateIDs=True) | ||
| except Exception as exc: | ||
| raise GrobidServiceError("Grobid service did not respond.") from exc | ||
|
|
||
| if status != 200: | ||
| return | ||
| raise GrobidServiceError( | ||
| f"Grobid service returned status {status}.", | ||
| status_code=status | ||
| ) |
lfoppiano
left a comment
There was a problem hiding this comment.
Better than before, however there are few changes that needs to be done.
| if grobid_url: | ||
| self.grobid_processor = GrobidProcessor(grobid_url) | ||
| self.grobid_processor = GrobidProcessor(grobid_url, ping_server=False) | ||
|
|
| tmp_file = NamedTemporaryFile() | ||
| tmp_file.write(bytearray(binary)) | ||
| st.session_state['binary'] = binary | ||
|
|
||
| st.session_state['doc_id'] = hash = st.session_state['rqa'][model].create_memory_embeddings( | ||
| tmp_file.name, | ||
| chunk_size=chunk_size, | ||
| perc_overlap=0.1 | ||
| ) |
There was a problem hiding this comment.
Yes, @Sanakhamassi here you need to either add tempFile in the with () or handle that somehow
Related to issue #11