Skip to main content

Checking object integrity

Astran S3 uses checksum values to verify the integrity of data that you upload or download. In addition, you can request that another checksum value be calculated for any object that you store in Astran S3. You can choose a checksum algorithm to use when uploading your data.

When you upload your data, Astran S3 uses the algorithm that you've chosen to compute a checksum on the server side and validates it with the provided value before storing the object and storing the checksum as part of the object metadata. This validation works consistently across object sizes for both single part and multipart uploads. When you copy your data, however, Astran S3 copies the checksum on the source object and moves it to the destination object.

note

When you perform a single part or multipart upload, you can optionally include a precalculated checksum as part of your request, and use the full object checksum type. To use precalculated values with multiple objects, use the AWS CLI or AWS SDKs.

Using supported checksum algorithms

With Astran S3, you can choose a checksum algorithm to validate your data during uploads. The specified checksum algorithm is then stored with your object and can be used to validate data integrity during downloads. You can choose one of the following checksum algorithms to calculate the checksum value:

  • CRC-32 (CRC32)

When you upload an object, you specify the algorithm that you want to use:

  • When you use the Astran portal, every uploads automatically calculates a CRC32 checksum. When Astran S3 receives the object, it calculates the checksum by using the algorithm that you specified. If the two checksum values don't match, Astran S3 generates an error. For your convenience, the Astran portal display the checksum as an hexadecimal representation, even though the Astran S3 API returns a base64 representation. See Verifying the checksum below for more information on how you can perform a check on your end.
  • When you use an SDK or the CLI, be aware of the following:
    • Set the ChecksumAlgorithm parameter to the algorithm that you want Astran S3 to use. If you already have a precalculated checksum, you pass the checksum value to the AWS SDK, and the SDK includes the value in the request. If you don’t pass a checksum value or don’t specify a checksum algorithm, the SDK automatically calculates a checksum value for you and includes it with the request to provide integrity protections. If the individual checksum value doesn't match the set value of the checksum algorithm, Astran S3 fails the request with a BadDigest error.
    • If you’re using an up to date AWS SDK, the SDK chooses a checksum algorithm for you. However, you can override this checksum algorithm.
  • When you use the REST API, don't use the x-amz-sdk-checksum-algorithm parameter. Instead, use one of the algorithm-specific headers (for example, x-amz-checksum-crc32).

To apply any of these checksum values to objects that are already uploaded to Astran S3, you can copy the object. If the source object doesn’t have a specified checksum algorithm or checksum value, the destination object will not have any checksum.

Full object and composite checksum types

In Astran S3, there are two types of supported checksums:

  • Full object checksums: A full object checksum is calculated based on all of the content of a multipart upload, covering all data from the first byte of the first part to the last byte of the last part.
note

All PUT requests require a full object checksum type.

  • Composite checksums: A composite checksum is calculated based on the individual checksums of each part in a multipart upload. Instead of computing a checksum based on all of the data content, this approach aggregates the part-level checksums (from the first part to the last) to produce a single, combined checksum for the complete object.
note

When an object is uploaded as a multipart upload, the entity tag (ETag) for the object is not an MD5 digest of the entire object. Instead, Astran S3 calculates the MD5 digest of each individual part as it is uploaded. The MD5 digests are used to determine the ETag for the final object. Astran S3 concatenates the bytes for the MD5 digests together and then calculates the MD5 digest of these concatenated values.

Astran S3 supports the following full object and composite checksum algorithm types:

  • CRC-32 (CRC32): Supports both full object and composite algorithm types.

Single part uploads

Checksums of objects that are uploaded in a single part (using PutObject) are treated as full object checksums. When you upload an object in the Astran portal, you can choose the checksum algorithm that you want S3 to use and also (optionally) provide a precomputed value. Astran S3 then validates this checksum before storing the object and its checksum value. You can verify an object's data integrity when you request the checksum value during object downloads.

Multipart uploads

When you upload the object in multiple parts using the MultipartUpload API, you can specify the checksum algorithm that you want Astran S3 to use and the checksum type (full object or composite).

The following table indicates which checksum algorithm type is supported for each checksum algorithm in a multipart upload:

Checksum algorithmFull objectComposite
CRC-32 (CRC32)YesYes

Using full object checksums for multipart upload

When creating or performing a multipart upload, you can use full object checksums for validation on upload. This means that you can provide the checksum algorithm for the MultipartUpload API, simplifying your integrity validation tooling because you no longer need to track part boundaries for uploaded objects. You can provide the checksum of the whole object in the CompleteMultipartUpload request, along with the object size.

When you provide a full object checksum during a multipart upload, the AWS SDK passes the checksum to Astran S3, and S3 validates the object integrity server-side, comparing it to the received value. Then, Astran S3 stores the object if the values match. If the two values don’t match, S3 fails the request with a BadDigest error. The checksum of your object is also stored in object metadata that you use later to validate an object's data integrity.

For full object checksums, you can use CRC-32 (CRC32) checksum algorithm in S3. Full object checksums in multipart uploads are only available for CRC-based checksums because they can linearize into a full object checksum. This linearization allows Astran S3 to parallelize your requests for improved performance. In particular, S3 can compute the checksum of the whole object from the part-level checksums. This type of validation isn’t available for other algorithms, such as SHA and MD5. Because S3 has default integrity protections, if objects are uploaded without a checksum, S3 automatically attaches the recommended full object CRC-32 checksum algorithm to the object.

note

To initiate the multipart upload, you can specify the checksum algorithm and the full object checksum type. After you specify the checksum algorithm and the full object checksum type, you can provide the full object checksum value for the multipart upload.

Using part-level checksums for multipart upload

When objects are uploaded to Astran S3, they can be uploaded as a single object or uploaded in parts with the multipart upload process. You can choose a Checksum type for your multipart upload. For multipart upload part-level checksums (or composite checksums), Astran S3 calculates the checksum for each individual part by using the specified checksum algorithm. You can use UploadPart to provide the checksum values for each part.

Astran S3 then uses the stored part-level checksum values to confirm that each part is uploaded correctly. When each part’s checksum (for the whole object) is provided, S3 uses the stored checksum values of each part to calculate the full object checksum internally, comparing it with the provided checksum value. This minimizes compute costs since S3 can compute a checksum of the whole object using the checksum of the parts.

When the object is completely uploaded, you can use the final calculated checksum to verify the data integrity of the object.

When uploading a part of the multipart upload, be aware of the following:

  • For completed uploads, you can get an individual part's checksum by using the GetObject or HeadObject operations and specifying a byte range that aligns with a single part. If you want to retrieve the checksum values for individual parts of multipart uploads that are still in progress, you can use ListParts.

Using trailing checksums

When uploading objects to Astran S3, you can either provide a precalculated checksum for the object or use an AWS SDK to automatically create trailing checksums for chunked uploads, on your behalf. If you use a trailing checksum, Astran S3 automatically generates the checksum by using your specified algorithm to validate the integrity of the object in chunked uploads, when you upload an object.

To create a trailing checksum when using an AWS SDK, populate the ChecksumAlgorithm parameter with your preferred algorithm. The SDK uses that algorithm to calculate the checksum for your object (or object parts) and automatically appends it to the end of your chunked upload request. This behavior saves you time because Astran S3 performs both the verification and upload of your data in a single pass.

Trailing checksum headers

To make a chunked content encoding request, Astran S3 requires clients to include several headers to correctly parse the request. Clients must include the following headers:

  • x-amz-decoded-content-length:This header indicates the plaintext size of the actual data that is being uploaded to Astran S3 with the request.
  • x-amz-content-sha256: This header indicates the type of chunked upload that is included in the request. For chunked uploads with trailing checksums, the header value is STREAMING-UNSIGNED-PAYLOAD-TRAILER for requests that don’t use payload signing.
  • x-amz-trailer: This header indicates the name of the trailing header in the request. If trailing checksums exist (where AWS SDKs append checksums to the encoded request bodies), the x-amz-trailer header value includes the x-amz-checksum- prefix and ends with the algorithm name. The following x-amz-trailer values are currently supported:
    • x-amz-checksum-crc32
note

You can also include the Content-Encoding header, with the chunked value, in your request. While this header isn’t required, including this header can minimize HTTP proxy issues when transmitting encoded data. If another Content-Encoding header (such as gzip) exists in the request, the Content-Encoding header includes the chunked value in a comma-separated list of encodings. For example, Content-Encoding: aws-chunked, gzip.

Verifying the checksum

The Astran S3 API uses a base64 encoding format for all the checksums algorithm. The Astran portal conveniently displays it in the hexadecimal format instead, which is the most common used representation of a checksum.

If you need to verify that the file that you received has indeed the same checksum, here's how you can do it for each algorithm.

CRC32

  • Base64 encoding:
    • If you have the crc32 binary installed on your computer, you can use the following command to retrieve the CRC32 checksum in base64 encoding:
      crc32 /path/to/file | xargs echo -n | xxd -r -p | base64
    • If you don't have the crc32 binary installed, you can use this one liner python command:
      python3 -c "import zlib, base64; print(base64.b64encode(zlib.crc32(open('/path/to/file','rb').read()).to_bytes(4, 'big')).decode())"
  • Hexadecimal encoding:
    • If you have the crc32 binary installed on your computer, you can use the following command to retrieve the CRC32 checksum in hexadicmal encoding:
      crc32 /path/to/file
    • If you don't have the crc32 binary installed you can use this one liner python command:
      python3 -c "import zlib; print('{:08x}'.format(zlib.crc32(open('/path/to/file', 'rb').read()) & 0xffffffff))"