Skip to content

Town6: Add API endpoints for forms, resources, people, meetings, political businesses and parliament#2347

Open
somehowchris wants to merge 2 commits intoOneGov:masterfrom
somehowchris:town6-api-enrich-form-text-person-fields
Open

Town6: Add API endpoints for forms, resources, people, meetings, political businesses and parliament#2347
somehowchris wants to merge 2 commits intoOneGov:masterfrom
somehowchris:town6-api-enrich-form-text-person-fields

Conversation

@somehowchris
Copy link
Copy Markdown

Commit message

Town6: Add API endpoints for forms, resources, people, meetings, political businesses and parliament

Registers 8 new API endpoints. Enriches form endpoint with text field for internal forms. Aligns PersonApiEndpoint with the agency model: 20 data fields, respects hidden_people_fields.

TYPE: Feature
LINK: -

Checklist

  • I have performed a self-review of my code
  • I considered adding a reviewer
  • I have added tests for my changes/features

Copy link
Copy Markdown
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start. I'm not happy with the handling of ExternalLink however, they should be properly mixed into the pagination, so we don't lie about our items per page and/or how many pages there are.

There is a more general problem with models that make use of AccessExtension and UTCPublicationMixin (for some models you will only find those on the onegov.org specific subclass, but you still need to handle them in that case), since some of these collections can contain non-public entries and there is a mtan access which is technically public, but you shouldn't be able to see anything beyond what is displayed on the listing view. Take a look at NewsCollection/TopicCollection for how to handle access/publication restrictions.

Comment on lines +87 to +148
class PaginatedResourceCollection:

def __init__(
self,
resource_collection: ResourceCollection,
page: int = 0
) -> None:
self.resource_collection = resource_collection
self.session = resource_collection.session
self.page = page
self.batch_size = 25

def by_id(self, id: PKType) -> Resource | None:
return self.resource_collection.by_id(id) # type: ignore

def subset(self) -> Query[Resource]:
return self.resource_collection.query().order_by(Resource.title)

@cached_property
def cached_subset(self) -> Query[Resource]:
return self.subset()

@property
def page_index(self) -> int:
return self.page

def page_by_index(self, index: int) -> Self:
return self.__class__(
self.resource_collection, page=index
)

@property
def subset_count(self) -> int:
return self.cached_subset.count()

@property
def offset(self) -> int:
return self.page * self.batch_size

@property
def pages_count(self) -> int:
if not self.subset_count:
return 0
return max(1, -(-self.subset_count // self.batch_size))

@property
def batch(self) -> tuple[Resource, ...]:
return tuple(
self.cached_subset.offset(self.offset).limit(self.batch_size)
)

@property
def previous(self) -> Self | None:
if self.page > 0:
return self.page_by_index(self.page - 1)
return None

@property
def next(self) -> Self | None:
if self.page < self.pages_count - 1:
return self.page_by_index(self.page + 1)
return None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to get rid of most of this code by using the Pagination mixin and the original non-paginated collection class. You will run into some mypy errors, because libres/onegov.reservation models use their own base class. But that should go away once the SQLAlchemy 2.0 changes have been merged.

Comment on lines +533 to +534
FormOrExternalLink = FormDefinition | ExternalFormLink
ResourceOrExternalLink = Resource | ExternalResourceLink
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FormOrExternalLink = FormDefinition | ExternalFormLink
ResourceOrExternalLink = Resource | ExternalResourceLink
FormOrExternalLink: TypeAlias = FormDefinition | ExternalFormLink
ResourceOrExternalLink: TypeAlias = Resource | ExternalResourceLink

This way these type aliases will be automatically rewritten to type statements by Ruff, once we migrate to Python 3.12.

Comment on lines +552 to +564
def batch(self) -> dict[ApiEndpointItem[FormOrExternalLink],
FormOrExternalLink]:
result: dict[ApiEndpointItem[FormOrExternalLink],
FormOrExternalLink] = {}
for item in self.collection.batch:
endpoint_item = self.for_item(item)
if endpoint_item:
result[endpoint_item] = item
for ext in self.session.query(ExternalFormLink).all():
endpoint_item = self.for_item(ext)
if endpoint_item:
result[endpoint_item] = ext
return result
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a great solution, external links should be properly mixed into the collection and respect the pagination page and limits, instead of getting appended to every page. This could get pretty hairy if we cared about the order of entries, since for a SQL UNION both queries need to share the same columns.

Luckily we don't particularly care about the order of entries here, so you may be able to emulate this by writing a generic PaginatedSumCollection helper which takes N collections and then transparently dispatches to the correct underlying collection(s) it is a sum off, based on the current page. The only slightly tricky part with be handling the offsets and limits in the second collection, since the first page will be shortened by however many entries there are in the final page of the first collection, so you can't just rely on each collection implementing Pagination, you will need to write some helpers for modifying the query of each collection, rather than being able to rely on their subset_count attributes etc.

On the plus side: The two sub-collections won't need to implement Pagination, so you can reuse the existing collections and you can reuse the PaginatedSumCollection for resources.

It might also be worth cutting down the total number of required queries from 2N to N+1 by emitting a single query for getting the total counts of all the collections using something like:

queries = [
    collection.query().order_by(None).with_entities(
        func.count(text('1')).label('count')
    )
    for collection in self.collections
]
query = queries[0]
for extra in queries[1:]:
    query = query.union_all(extra)
return query.scalars()

This would return a list of all collection counts in the same order as our collections, so you can determine which collection(s) you need to emit a query for and their corresponding limits based on the current page.

Note that you won't be able to rely on subet() and cached_subset in PaginatedSumCollection, so even though those attributes will exist since you inherit from Pagination, they should never be called.

Copy link
Copy Markdown
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good.

While I like the generic approach, there's a little bit too much wrapping and re-wrapping of collections going on, I think we can peel back at least one layer without too much code duplication.

You also wrapped at least one collection that could've been used as-is (apart from that collection missing a filter it should've had). I didn't check all of them, but if the source collection already is a pagination, you should be using that directly and extend it if necessary.

Comment on lines +115 to +153
class FilteredCollection(Generic[T]):

def __init__(
self,
request: TownRequest,
collection: Any,
model_class: type[T],
query_transform: Callable[[Query[T]], Query[T]],
) -> None:
self.request = request
self.collection = collection
self.model_class = model_class
self.query_transform = query_transform

def query(self) -> Query[T]:
return self.query_transform(self.collection.query())

def by_id(self, id: PKType) -> T | None:
try:
if callable(getattr(self.collection, 'by_id', None)):
item = self.collection.by_id(id)
if item is not None:
return item if self.request.is_visible(item) else None

primary_key = sa_inspect(self.model_class).primary_key[0]
return self.query().filter(
primary_key == self._normalize_id(id)
).first()
except SQLAlchemyError:
return None

@staticmethod
def _normalize_id(id: PKType) -> PKType:
if isinstance(id, str):
try:
return UUID(id)
except ValueError:
return id
return id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little bit contrived, don't all collections already have a by_id method? While this approach is more flexible, I don't think we need this flexibility. I'd rather keep the code and data representation more simple, without wrapping collections multiple times into proxy collections.

It might be simpler to just directly implement this filter into PaginatedCollection and PaginatedSumCollection, since the filters work the same for all of the API endpoints.

It should reduce down to a couple of line changes in each collection.

model_class: Any,
) -> Query[T]:
accesses = available_accesses(request)
if accesses and hasattr(model_class, 'access'):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check might be a little unreliable, if AccessMixin is only used on some of the submodels.

I think it's fine to check for the presence of meta instead. If access is never set this filter never will exclude anything, so apart from potentially degrading performance slightly, it is harmless. The filter getting skipped, when it shouldn't be, is a lot more harmful.


class SupportsQueryAndById(Protocol[T]):
def query(self) -> Query[T]: ...
def by_id(self, id: PKType) -> T | None: ...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This protocol is potentially too strict on PKType (parameters are contravariant, so you can only substitute for a callable which accepts a wider type, a callable that only accepts a narrow type would be rejected), I would probably just use Any here for now. If we want to make this more robust in the future we should switch to using a second type parameter IdT, like we already use in some places.

Comment on lines +277 to +278
if order_by is not None:
query = query.order_by(order_by)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if order_by is not None:
query = query.order_by(order_by)
if order_by is not None:
query = query.order_by(None).order_by(order_by)

The original collection may already have its own order by clause, so it's safer to first remove any potential existing order by clauses.

Comment on lines +781 to +790
def collection(self) -> Any:
return PaginatedCollection(
visible_collection(
self.request,
PoliticalBusinessCollection(self.request),
PoliticalBusiness,
),
page=self.page or 0,
batch_size=API_BATCH_SIZE,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PoliticalBusinessCollection is already paginated and takes a request. I think it's better to directly use that collection and extend it with the access/publication filters, since those are also relevant for the page that displays political businesses. It's a bug, that this pagination isn't already handling that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants