-
Notifications
You must be signed in to change notification settings - Fork 634
MAINT: Adding simulated assistant role #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MAINT: Adding simulated assistant role #1292
Conversation
| Role to use for API calls. | ||
| Maps simulated_assistant to assistant for API compatibility. | ||
| Use this property when sending messages to external APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this in part depend on the API itself? For example, OpenAI started using "developer" instead of "system" recently. But if you want to continue an existing conversation with a target that uses "system" then it should be possible to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense for the target to have to decide what to do with simulated assistant response vs real assistant response. IMO it'd be easy for a target to forget to if role == simulated_assistant then role = assistant, which would likely be a bug.
I do think a target could format these differently from one another. But from the target's perspective, imo these should always be treated the same. To take your system/developer example, for us that's always "system" and a target can deserialize to whatever makes sense. It's the same with "assistant", but I don't think a target shouldn't ever treat simulated assistant differently than assistant.
| """ | ||
| Check if this is a simulated assistant response. | ||
| Simulated responses come from prepended conversations or generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we are just branching off of an existing conversation? Then it's not really simulated but actually happened...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this PR, the place we're setting conversations to simulated assistant responses are essentially when we're passing in prepended_conversations to attacks. Whether or not these were real responses that happened in the past, for the current attack conversation, it's not a conversation that took place, but rather user supplied.
Adds a new
simulated_assistantrole to distinguish synthetic responses (prepended conversations,SeedPrompts) from actual target responses. Behaves identically to assistant for API calls but is preserved in memory.Key Changes
MessagePiece/Message: Addedapi_role(mapssimulated_assistant→assistant),is_simulated, andget_role_for_storage(). Deprecated.rolegetter.mark_messages_as_simulated()helper;format_conversation_context()labels simulated as "Assistant (simulated)"SeedGroup.to_messages()convertsassistant→simulated_assistantapi_role)