| |
https://community.arm.com/groups/arm-mali-graphics/blog/2012/06/28/arm-unveils-details-of-astc-texture-compression-at-hpg-conference--part-1
The internal details of ARM's Adaptive Scalable Texture Compression (ASTC)
technology were launched this week at the High Performance Graphics conference in Paris, France.
Tom Olson presented his paper entitled "Adaptive Scalable Texture Compression", as
part of the session on texture and appearance on Wednesday 27 June 2012.
This is the first of two blog posts giving an overview of the ASTC
technology as presented in the paper.
Features and Quality
As a quick recap, ASTC offers a number of advantages over existing texture
compression schemes:
·
It is flexible, allowing bit rates from
8 bits per pixel (bpp) down to less than 1 bpp. This allows content developers
to fine-tune the tradeoff of space against quality.
·
It supports from 1 to 4 color channels,
together with modes for uncorrelated channels for use in mask textures and
normal maps.
·
It supports both low dynamic range (LDR)
and high dynamic range (HDR) images.
·
It supports both 2D and 3D images.
·
All of these features are interoperable.
You can choose any combination that suits your needs.
Despite all this flexibility, quality is universally better than existing
texture compression schemes for LDR images, and is comparable to the de facto
industry standard for HDR.
How Does It Work?
ASTC, like all current texture compression schemes, divides the image into
fixed-size blocks. These blocks cover a fixed-size "footprint" in the
texture image, and are encoded using a fixed number of bits. This feature makes
it possible to access texels quickly in any order, with a well-bounded cost for
that access.
This is in contrast to stream-based, variable-bitrate image formats such
as PNG, where the decoding process requires that you have decoded the previous
texels in the image. Obviously, this would be a problem if the texels you wish
to access are at the bottom right of the texture.
The 2D block footprints in ASTC range from 4x4 texels up to 12x12. By
dividing the 128 bits by the number of texels in the footprint, we derive bit
rates from 8 bpp (128 bits / 16 texels) down to 0.89 bpp (128 bits / 144 texels).
Luoy
Each texel can have 4/3/2/1(rgba/rgb/ya/y) components, each texel can split to one/two plane.
Detail of
ASTC compressed image, at 8bpp, 3.56bpp and 2bpp
In the simplest case, the encoder analyses each block in isolation and
selects two colors which define the end points of a line in the color space.
The approximate colors of texels can then be reconstructed from these color
endpoints by interpolating between them. For each texel in the footprint, a
weight value is stored, and the weighted average calculated. The weight,
mathematically, is a value in the range 0 to 1, but for storage this is quantized
to a few bits. Selecting the endpoint colors and the weights to make an optimal
match to the texel colors in the original block is the job of the encoder.
Luoy
前提假设是一个block中的颜色不会占据整个颜色空间(即使占据了整个空间,那插值出来的颜色值就初略些)
As to each block’s color, we select two endpoints in color space as the block’s color range. 然后我们需要对block中的每个texel插值出颜色值(可以是线性插值),因此需要计算出插值权重(weight),权重计算可以有多种计算方式,颜色插值完之后再与真实的图像比较误差,最终选择误差最小的权重。编码就是上述过程(找取一组或两组endpoint,计算出最优的插值权重)
Most of the existing formats use similar methods, and it is possible to trace
the origins of this technique as far back as 1979. However, most schemes use a
fixed split between the number of bits used to represent the endpoint colors,
and the number of bits used to represent the color weights.
Some formats, offer different precision at different bit rates, but the
number of bits for endpoints and weights is determined globally by the block
footprint.
Tom's previous blog post on
ASTC goes into some detail about the
constraints of each of the existing texture compression methods.
Luoy
这篇文章对比了之前一些纹理压缩的不足
Trading Spaces
The
"Adaptive" part of ASTC allows the encoder to tune the number of bits
assigned to each piece of data, on a block-by-block basis. There are sixteen
different color endpoint modes, any of which can be chosen for any block in the
image. If a block is largely gray, then it is possible to encode the color
endpoints more efficiently and devote the resulting extra bits to representing
the texel weights more accurately.
Luoy
“Adaptive”是指能够自适应endpoint的bit数和weight的bit数(不是自适应块大小)
But here, apparently, is a problem. If we have a 4x4 texel footprint, and
we want to increase the amount of data in each weight, then it would seem that
the minimum increment would be one extra bit per texel. The resulting shift in
the balance between weights and endpoints would thus be 16 bits, which is
rather crude. For larger block footprints, the problem gets worse. For finer
control, we need a way to add less than one bit per texel.
Luoy
但是这种调节是比较粗糙的
Bounded Integer Sequence Encoding
Initially,
fractional bits per pixel sounds implausible, or even impossible, but it's not
quite as strange as it initially sounds.
In principle we can choose our quantization of the texel weights (and the
color components of the endpoint colors) to use any number of values.
For the sake of illustration, let us assume that we can best represent
each texel in a particular block using one of five weight values - 0.0, 0.25,
0.5, 0.75 and 1.0. We can easily quantize these to the integer values 0..4. Two
bits is insufficient to represent these, as that would only represent four
values, so conventionally we would need to allocate 3 bits. We would then
either expand the quantization to use all eight possible 3-bit values, or leave
three of the values unused.
However, a combination of any three of these texels has one of 53 possible values, or 125. This is very close to the number of values that
it is possible to encode in 7 bits (27= 128). So if we can group the
texels into triplets, and find an appropriate encoding scheme for these base-5
values ("quints"), we can use just 7 bits, instead of the 9 we would
need for storing three bits per value. This is a significant saving, and has
the somewhat weird property of assigning a non-integer number of bits - 2.33 -
to each value.
Luoy
每个texel 3b,三个texels need 9b; now three texels as a group, have 5^3 possible values, or
125, so only need 7b; 7b/3=2.33b/texel
Similar reasoning shows that it is possible to pack base-3 values (trits)
in groups of five, each group taking 8 bits (35 = 243, 28= 256), for 1.6 bits per value.
Luoy
0.0, 0.5, 1.0; 8/5 = 1.6
The Bounded Integer Sequence Encoding (BISE) technique used in ASTC always
quantizes values to ranges which conform. to one of three patterns: values from
0 up to 2n-1, using n bits; up to 3 x 2n-1, using n bits
and a trit; or up to 5 x 2n-1 using n bits and a quint. This allows
us to encode any ideal quantization range with much less waste than the
traditional whole-number-of-bits approach.
When the number of values is not a multiple of three or five, we need to
avoid wastage at the end of the sequence. Thus, we have another constraint on
the chosen encoding. If the last few values in the sequence to encode are zero,
the last few bits in the encoded bit string must also be zero. Ideally, the
number of non-zero bits should be easily calculated and not depend on the
magnitudes of the previous encoded values.
This is a little tricky to arrange, but it is possible. This means that we
do not need to store any padding after the end of the bit sequence, as we can
happily assume that they are zero bits, safe in the knowledge that they will
not affect the decoding of the actual values.
With this
constraint in place, and by interleaving the bits, trits and quints
appropriately, BISE encodes a sequence of length S (i.e.an array of S integer values) using a fixed number of
bits:
·
For S values in the range 0 up to 2n-1, it uses nS bits.
·
For S values in the range 0 up to 3 x 2n-1, it uses nS + ceiling(8S/5) bits.
· For S values in the range 0 up to 5 x 2n-1, it uses nS + ceiling(7S