AWS S3 MD5 digest with multipart uploads
In version v2023.1.31.0 we added Verify support for multipart uploads. Some very interesting information can be found here:
So ..
- If an object is created by a Multipart Upload operation, the ETag is not an MD5 digest, regardless of the method of encryption.
- For multipart uploads the ETag is the MD5 hexdigest of each part’s MD5 digest concatenated together, followed by the number of parts separated by a dash.
We added code to recalculate the MD5 hex digest using the Filesize of the download and the ETag ‘stripped number of parts’ we received from the S3 object. This way we could calculate the size and MD5 checksum for each part. This information was used to calculate the MD5 hex digest of the concatenated checksums.
By default ‘Verify File after Copy / Move’ is enabled which uses the ETag of the S3 object (File) as MD5 hash to compare the download result.
In version v2024.11.24.0 we optimized the MD5 Hash calculation for multipart uploads using the following information:
After a decent amount of reading, debugging and monitoring browser network tabs. Here are the values used most commonly
- 8388608 used by Aws Cli and Boto3 (=8MiB)
- 15728640 used by S3 cmd (= 15MiB)
- 17179870 used by S3 Browser Console (= custom MiB from AWS S3 Browser Console)
- Factors of 1MiB used by common uploaders (= Limagito File Mover default)
So if our default (= factors of 1 MiB) does not return the correct result we’ll try the other options too. If one of them is successful, this will be our default for the next file we download.
#awss3 #filetransfer #mangedfiletransfer
Best regards,
Limagito Team