Skip to content

Conversation

@0marperez
Copy link
Contributor

@0marperez 0marperez commented Oct 17, 2025

Issue #

N/A

Description of changes

  • Project setup
  • Publishing config
  • Transfer manager client
  • Business metric
  • Transfer interceptors
  • Upload file operation
    • Concurrent uploads
    • MPU part buffering
    • Code generated IO
    • Code generated type converters

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@0marperez 0marperez added the no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly. label Oct 17, 2025
@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@0marperez 0marperez marked this pull request as ready for review October 17, 2025 15:14
@0marperez 0marperez requested a review from a team as a code owner October 17, 2025 15:14
Copy link
Member

@lauzadis lauzadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

@github-actions
Copy link

A new generated diff is ready to view.

@lauzadis lauzadis mentioned this pull request Oct 30, 2025
1 task
@github-actions
Copy link

A new generated diff is ready to view.

@github-actions
Copy link

A new generated diff is ready to view.

@github-actions
Copy link

A new generated diff is ready to view.

Comment on lines 115 to 116

"s3-transfer-manager-codegen", // TODO: Disable publishing ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: 👍 Yes, this is in the right place. We don't want to publish this since we have no use case for it right now. We can scratch the TODO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't disable publication right? Just API validation and docgen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is configured in repo tools right?

Comment on lines 40 to 71
val mpuUploadId = initiateTransfer(
multipartUpload,
transferContext,
contentLength,
uploadFileRequest,
interceptors,
client,
)

val uploadedParts = transferBytes(
multipartUpload,
contentLength,
partSizeBytes,
logger,
uploadFileRequest,
transferContext,
mpuUploadId,
interceptors,
client,
maxInMemoryParts,
maxConcurrentPartUploads,
)

completeTransfer(
multipartUpload,
transferContext,
uploadFileRequest,
mpuUploadId,
uploadedParts,
interceptors,
client,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: The volume of arguments passed into these functions is too great. Can any of these be grouped into objects, derived from other parameters, etc.? This level of data coupling is an indicator that we might be better served modelling a base operation type which can be implemented for each operation type (e.g., UploadFile) or for each subtype (e.g., UploadFileSingle and UploadFileMultipart), which would reduce the amount of if (multipartUpload) calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a TODO for this, once we write other operations it'll be easier to create a common abstraction.

Comment on lines +137 to +139
) = produce(
capacity = maxInMemoryParts,
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: Isn't maxInMemoryParts supposed to limit the parts for the entire S3TM? This looks like it applies to individual objects but we'll be parallelizing multi-object transfers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to use a Channel for each object but while keeping track of total parts in memory using a Semaphore.

Comment on lines 115 to 116

"s3-transfer-manager-codegen", // TODO: Disable publishing ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't disable publication right? Just API validation and docgen

Comment on lines 11 to 30
internal val uploadFileConversions = listOf(
ConversionMapping(
source = TypeRef(
"aws.sdk.kotlin.services.s3.model",
"PutObjectResponse",
),
destination = TypeRef(
"aws.sdk.kotlin.hll.s3transfermanager.model",
"UploadFileResponse",
),
setOf(
"bucketKeyEnabled",
"checksumCrc32",
"checksumCrc32C",
"checksumCrc64Nvme",
"checksumSha1",
"checksumSha256",
"checksumType",
"eTag",
"expiration",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a data string will make this any more readable / maintainable? The internal spec already has this modeled as a JSON list, we can make changes (which I believe are unlikely) by inspecting the diff of that file

Comment on lines 23 to 30
/**
* Preferred part size for multipart uploads.
* If using this size would require more than 10,000 parts (the S3 limit),
* the smallest possible part size that results in 10,000 parts is used instead.
*
* Default to 8,000,000 bytes.
*/
public val partSizeBytes: Long = builder.partSizeBytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my fault since I left a comment asking to change it to just partSize to simplify the name. We are logging a warning when deviating from the configured part size. It's not a strong opinion so I will let @0marperez make the decision

#1712

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

A new generated diff is ready to view.

Copy link
Contributor

@ianbotsf ianbotsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good, down to the final sticking point!

Comment on lines +70 to +74
write("@PublishedApi")
write(
"internal fun build(): #1L = #1L(this)",
className,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is @PublishedApi necessary here? Is this getting inlined somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting inlined by the public invoke function. It used to be public but you asked to make it internal. I used @PublishedApi because it's what we use often in the rest of the codebase.

Comment on lines +205 to +218
executePhase(
TransferPhase.BytesTransferred,
localContext,
interceptors,
) {
localContext.s3Response = client.withTmBusinessMetric {
it.uploadPart(localContext.s3Request as UploadPartRequest)
}

// -1 part in memory
semaphore.release()

localContext.transferredBytes = localContext.transferredBytes!! + partSize
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: executePhase only invokes the given block if no exception is thrown first, right? When an exception is thrown, what should happen to the part in memory and the semaphore? Presently it looks like they stay stuck and won't be resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a try/catch block around the consumer to make sure the parts don't stay in memory and we update the semaphore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing your question is for when users are try/catching their S3 TM requests and then continue making S3 TM calls?

Comment on lines 55 to 95
/**
* The context around an [aws.sdk.kotlin.hll.s3transfermanager.S3TransferManager] transfer.
* Used to track transfer progress or to modify in progress transfers.
*/
public data class TransferContext(
override val s3Request: Any? = null,
override val s3Response: Any? = null,
override val tmRequest: Any? = null,
override val tmResponse: Any? = null,
override val transferableBytes: Long? = null,
override val transferredBytes: Long? = null,
override val transferableObjects: Long? = null,
override val transferredObjects: Long? = null,
) : TransferInterceptorContext

/**
* The context around an [aws.sdk.kotlin.hll.s3transfermanager.S3TransferManager] transfer.
* Used to track transfer progress or to modify in progress transfers.
*/
public data class MutableTransferContext(
override var s3Request: Any? = null,
override var s3Response: Any? = null,
override var tmRequest: Any? = null,
override var tmResponse: Any? = null,
override var transferableBytes: Long? = null,
override var transferredBytes: Long? = null,
override var transferableObjects: Long? = null,
override var transferredObjects: Long? = null,
) : TransferInterceptorContext {
internal fun immutableCopy() =
TransferContext(
s3Request,
s3Response,
tmRequest,
tmResponse,
transferableBytes,
transferredBytes,
transferableObjects,
transferredObjects,
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: We must not expose data class types in our public API. Can the implementations be internal instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed these to classes. I think they need to be public or either internal with @PublishedApi since they're exposed to the user in TransferInterceptor.

@0marperez 0marperez requested a review from ianbotsf November 12, 2025 19:31
@github-actions
Copy link

A new generated diff is ready to view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants