You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FlashAttention is now enabled by default in model parameter initialization for embedding and text generation. The unused SeqMax parameter has been removed from unit tests to simplify configuration. Minor formatting improvements were made in IContextParamsExtensions and NativeApi for consistency.
Copy file name to clipboardExpand all lines: LLama/Native/NativeApi.cs
+3-6Lines changed: 3 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -175,15 +175,12 @@ public static void llama_empty_call()
175
175
/// <param name="buf">A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)</param>
176
176
/// <param name="length">The size of the allocated buffer</param>
177
177
/// <returns>The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.</returns>
0 commit comments