Compound document files divide the data into streams, and to store these streams in different storages in the file. This way compound document files support a complete file system inside the file, the streams are like files in a real file system, and the storages are like sub directories.
Sectors and Sector Identifiers
All streams of a compound document file are divided into small blocks of data, called sectors.The entire file consists of a header structure and a list of all sectors following the header. The size of the sectors can be set in the header and is fixed for all sectors then.
The (zero-based) index of a sector is called sector identifier(SecID). SecIDs are signed 32-bit integer values. If a SecID is not negative, it must refer to an existing sector.
If a SecID is negative, it has a special meaning:
- sec with SecID -1( "Free SecID") means the sector may exist in the file, but is not part of any stream.
- sec with SecID –2( "End Of Chain SecID" ) means no more SecIDs in a SecID chain.
- sec with SecID -3("SAT SecID") refers to a SAT sector.
- sec with SecID -4("MSAT SecID") refers to a MSAT sector.
Sector Chains and SecID Chains
The list of all sectors used to store the data of one stream is called sector chain. Different ways of building SecID chain will be detailed later.Master Sector Allocation Table
The master sector allocation table (MSAT) is an array of SecIDs of all sectors of the sector allocation table(SAT), which finally is needed to read any other stream in the file. The size of the MSAT (number of SecIDs) is equal to the number of sectors used by the SAT.The first 109 SecIDs of the MSAT are contained in the header. Following is an example:
file = 451e17dd6fcfcbf2c32bf89e99577d29, off = hdr.off76, that is, 0x4c: 04c:0x0000002A 050:-1 -1 -1 -1 060:-1 -1 -1 -1 ... 1f0:-1 -1 -1 -1 so for this file, the first-part only has 1 sector.If additional sectors are used. The header contains the SecID of the first (extra) sector used for the MSAT, otherwise there is a -2. For file 451e17dd6fcfcbf2c32bf89e99577d29, hdr.off68(0x44) is -2.
An (extra) MSAT sector formated as: (sec_size–4)/4 SecIDs, next_sector's ID or -2
the hdr also has a field at offset 72 tells how many sectors are needed for MSAT, after building of MSAT, we know value of this field, but if someone don't wanna build MSAT but need this value, simply trust it.
Sector Allocation Table
The sector allocation table (SAT) is an array of SecIDs. Total num of sectors used by the SAT is stored at hdr's offset 44(0x2C), it looks this field is also for information usage, since it's similar to the field at offset 72A SAT sector contains sec_size/4 SecIDs.
Using the Sector Allocation Table
When building SecID chain for a stream, the current position (array index) into the SAT array refers to the current sector, while the SecID contained at this position specifies the following sector in the chain. The position referring to the last sector of a stream contains the special End Of Chain SecID with the value –2.Sectors used by the SAT itself are marked with the special SAT SecID with the value –3. Finally, sectors used by the MSAT are marked with the special MSAT SecID with the value –4.
The entry point of a SecID chain has to be obtained else where.
Short-Streams
Whenever a stream is shorter than a specific length (specified in the header.off56 ), it is stored as a short-stream. Shortstreams do not directly use sectors to store their data, but are all embedded in a specific internal control stream, the short-stream container stream.short-stream container stream
short-stream container stream's SecID chain is contained in the SAT, while the first used sector has to be obtained from the root storage entry in the directory. The data of all sectors used by the short-stream container stream are concatenated in order of its SecID chain. In the next step this stream is virtually divided into short-sectors, the first short-sector (with SecID 0) is always located at offset 0 inside the short-stream container stream. The number of sectors for this container is stored in the root directory entry.Short-Sector Allocation Table
The short-sector allocation table (SSAT) is an array of (short) SecIDs and contains the SecID chains of all shortstreams. The SSAT will be used similarly to the SAT with the difference that the SecID chains refer to short-sectors in the short-stream container stream.The first SecID of the SSAT is contained in the header, the remaining SecID chain is contained in the SAT.
Directory
The directory is an internal control stream that consists of an array of directory entries. Each directory entry refers to a storage or a stream. Directory entries are enumerated in order of their appearance in the stream. The zero-based index of a directory entry is called directory entry identifier(DirID).There is a special directory entry at the beginning of the directory (with the DirID 0). It represents the root storage and is called root storage entry.
example
sample file = 451e17dd6fcfcbf2c32bf89e99577d29HDR
24 REV = 3E 26 VER = 03 30 ssz = 09//Size of a sector is 2 ^ ssz = 512. 32 sssz = 06//Size of a short sector is 64 bytes 40 Number of Directory Sectors, This field is not supported for version 3 compound files. 48|0x30 SecID of first sector of the directory stream = 0x2B 56:0x38 Minimal normal stream size = 0x1000 60:0x3C SetID of first sect used by SSAT = 0x2d 64 total num of sects used by SSAT 72 total num of sects used by MSAT, MSAT here takes NO|0 extra sects except for it's first 109 SecIDs specified at offset 76. 76
DIR
entry 0: 200+2B*200 = 0x5800
off0.name = L"Root Entry" off40.name_num_of_bytes = 16 off42.type=RootStorage off43.node_color=Black off44.left_sibling = None off48.right_sibling = None off4c.children_root_entry = 03(storage only, else -1) //off50.GUID off60.UserFlag off64.TimeCreate off6c.TimeModi off74.SectIdOfFirst|SectIdOfShortStreamContainer|0 = 2F off78.num_of_bytes.StreamSize|ShortStreamContainerSize|0=0x2BC0 = 0x16 sectors //off7c.not_used
entry 3(children_root_entry of the root storage): 200+2B*200 + 180= 0x5980
name = "\u0500SummaryInformation" type = UserStream color = Black left_sibling = 2 right_sibling = 4 SectIdOfFirst=0x1A num_of_bytes.StreamSize=0x1000=8sects
entry 2(left_sibling of ent3): 5900
name = "WordDocument" type = UserStream color=Black left_sibling = 5 right_sibling = None SectIdOfFirst=0 num_of_bytes = 0x1634
entry 4(right_sibling of ent3):200 + 2C*200 + 0 * 80 = 5A00
name = "\u0500DocumentSummaryInformation" type = UserStream color=Black left_sibling = None right_sibling = None SectIdOfFirst=22 num_of_bytes=0x1000
entry 5(left_sibling of ent2):5A80
name = "Macros" type = (User) Storage color=Black left_sibling = 1 right_sibling = 14 children_root_entry=0D
entry 1(left_sibling of ent5): 5880
name = "1Table" type = UserStream color = Red left_sibling = None right_sibling = None SectIdOfFirst=0c num_of_bytes=1AA4=0x0E sects
entry 14(right_sibling of ent5): 200 + 5B*200 + 0 = B800
name = \u0100CompObj file is broken(truncated)
entry 0d(children_root_entry of ent5): 200 + 4c*200 + 80 = 9A80
name = Tower type = (User) Storage color=Black left_sibling = 6 right_sibling = 12 children_root_entry=0F
entry 6(left_sibling of 0d): 200+2C*200+100=5b00
name = VBA type = (User) Storage color=Black left_sibling = -1 right_sibling = -1 children_root_entry=09
entry 12(right_sibling of 0d): 200+58*200+100=B300
name = "PROJECTwm" type = UserStream color = Black left_sibling = 13 right_sibling = None SectIdOfFirst=A2 num_of_bytes=4A//short stream
entry 13(left_sibling of 12): B380
name = "PROJECT" type = UserStream color = Red left_sibling = None right_sibling = None SectIdOfFirst=A4 num_of_bytes=226//short stream
SAT:only one sects at offset 5600
//{[00:0B],} specifed possible chains [0,0C), [1,0C),..., or [0b,0c), the base is to be decided //considering SAT[x] == 0, then the chain may also be x,0,1,..0b 01000000 02000000 03000000 04000000 05000000 06000000 07000000 08000000 09000000 0A000000 0B000000 FEFFFFFF //0C:19: a 0x1AA4 bytes stream named "1Table" 0D000000 0E000000 0F000000 10000000 11000000 12000000 13000000 14000000 15000000 16000000 17000000 18000000 19000000 FEFFFFFF //1A:21: a 0x1000 bytes stream named "\u0500SummaryInformation" 1B000000 1C000000 1D000000 1E000000 1F000000 20000000 21000000 -2 //22:29: a 0x1000 bytes stream named "\u0500DocumentSummaryInformation" 23000000 24000000 25000000 26000000 27000000 28000000 29000000 -2 //Sector 2A is used by SAT FDFFFFFF //DIR Sector Chain: 2B,2C,2E,4C,58,5B 2B:2C000000 2C:2E000000 //SSAT: 2D,48 2D:48000000 2E:4C000000 //Sector Chain 2F:33,3D:47,49,4D:4F,59:5A, in totally, 5+(48-3d)+1+3+2 = 16 //ShortStreamContainerSize( ceil(0x2BC0,0x200) / 0x200 = 0x16 ) 2F:30000000 30:31000000 31:32000000 32:33000000 33:3D000000 //a stream named "Main" with 0x1095 bytes: 34:3C 34:35000000 35:36000000 36:37000000 37:38000000 38:39000000 39:3A000000 3A:3B000000 3B:3C000000 3C:FEFFFFFF 3D:3E000000 3E:3F000000 3F:40000000 40:41000000 41:42000000 42:43000000 43:44000000 44:45000000 45:46000000 46:47000000 47:49000000 48:FEFFFFFF 49:4D000000 4A:4B000000 4B:FEFFFFFF 4C:58000000 4D:4E000000 4E:4F000000 4F:59000000 50:51000000 51:52000000 52:53000000 53:54000000 54:55000000 55:56000000 56:57000000 57:4A000000 58:5B000000 59:5A000000 5A:FEFFFFFF //5B: a stream named "\u0100CompObj" //note: this sect has only 0x26 bytes(file is broken) 5B:FEFFFFFF 5C:FFFFFFFF 5D:FFFFFFFF ... 7F:FFFFFFFF
short streams
short sector A2
linear off = A2 * 40 = 2880 segmented off = 2880 / 200 = {14,80} segmented addr = 80 + 200 + 59*200 = B480 [B480] = "Main\0" L"Main\0" "bronco\0" L"bronco\0" "venus\0" L"venus\0" "Tower\0" L"Tower\0"
short sectors A4:AC
linear off = 2880 + 80 = 2900 segmented off = {14,100} segmented addr = 100 + 200 + 59*200 = B500 [B500] = ID="{B306AC40-4018-4AF3-8665-D3ECA63C6A0D}" Document=Main/&H00000000 Module=bronco Package={AC9F2F90-E877-11CE-9F68-00 AA00574A4F} Module=venus BaseClass=Tower Name="Project" HelpContextID="0" VersionCompatible32="393222000" CMG="EEE C3428CCF8BEFCBEFCBEFCBEFC" DPB="DCDE06561A451B451B45" GC="CAC8104CFF4DFF4D00" [Host Extender Info] &H00000001={383 2D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000 [Workspace] Main=44, 58, 775, 340, bronco=66, 87, 797, 369, ve nus=88, 116, 1016, 563, Tower=0, 0, 0, 0, C, 22, 29, 1049, 465, Z
SSAT:sect 2D and sect 48
200 + 200*2D = 5C00 and 200 + 200*48 = 9200://entry A2's addr = 9200+4*(A2-80) = 9288 9288:A3 928C:-2 //entry A4's addr = 9290 9290:A5 9290:A6 .., 92AC:AC 92B0:-2
No comments:
Post a Comment