File transfer protocol
Each downloaded file has a Chunk Map which represents the file as a collection of fix-sized chunks of data, and maintains the information for each chunk as being downloaded, available or not available. The chunk map is stored in each ftFileCreator, and is responsible for sending slices of data to be asked to the various servers depending on their availability. Consequently the file transfer protocol is able of handling downloads from partial files, with multiple sources. Clients and servers need to exchange such information in both ways, for clients to know what is available and for servers to know what was downloaded for the client to display.
The GUI shows chunks maps in different forms:
In the download tab:
- the progress bar of the file being downloaded shows what is already downloaded for this file
- the progress bar of each source shows what is available for this file at the server's side
In the upload tab:
- the progress bar shows what the uploading client already has.
For this, the xProgressBar class was updated and now takes a structure in its constructor to retrieve the compressed chunk map to display and the file completion.
In the "Selected Transfer" tab, the status of each chunk is displayed with a color code, red being chunks that are being requested to the server. A blue bar shows the combined chunks availability from the different sources.
In addition, from the drop down menu in the download tab, one can change dynamically the strategy of allocating chunks. The strategy is either Streaming (chunks are allocated in order, as available from each source) of Random (chunks are allocated randomly, according to availability at the source). In the left image above, the download was started "random" and further set to "streaming", hence the repartition of downloaded chunks.
The handling of chunkmaps
The files are chunked into 1MB chunks. The size could be made proportional to the total file size, at the cost of some additional coding and information transfer. The last chunk is most of the time smaller than 1MB. This is handled.
This class is responsible for giving slice of data to be asked to a given peer. It is a member of ftFileCreator.
ChunkMap maintains two levels of data:
- the chunk level (Chunks with fixed 1MB size) with a map of who has which chunk and what locally is the state of each chunk: downloaded, ongoing and outstanding.
- the slice level: each active chunk is cut into slices (basically a list of intervals) being downloaded, and a remaining slice to cut off new candidates from. When notified for a complete slice by ftFileCreator, ChunkMap removes the corresponding active slice. When asked a slice, ChunkMap chops out a slice from the remaining part of the chunk to download, sends the slice's coordinates and gives a unique slice id (such as the slice offset) that will be used to be notified of the slice being downloaded.
In addition, the ftFileCreator handles sub-slices (ftChunk class) for representing the real chunks of data that transfer between peers at the level of RsItem. Depending on the network bandwidth on both client/server side, the slices given by the ChunkMap class have indeed little chance to be transmitted at once.
To manage peer data availability, a compressed chunk map of what each peer has is stored in the chunk map for the treated file. These maps are checked for deciding what new chunk should be treated for a given source peer. They also have a time stamp, that may trigger the response "map is too old" to the ftFileCreator. This flag is never set more often than every 60 seconds, so that chunk map transfers won't happen too often. This response is forwarded to ftFileControl to initiate a file map request to the corresponding sources. As soon as the availability map is full, this flag is set 10 times less often, to limit the unnecessary transfer.
ChunkMap can be used without accounting for chunk availability. This is useful for cache files, where the source is never a partial source (although this should be discussed for the Channel service). This behavior is deciding when constructing the ChunkMap. In such a case, the "map is too old" flag is never returned set, and as a result, chunk maps are never transmitted from servers to clients.
Except for chunk maps in ftFileCreator where the chunks have 3 possible states, chunk maps are represented by a bit array defined in rstypes.h as CompressedchunkMap. This allows a significant reduction of copy/allocation/transmission costs when handling chunkmaps. For a 700MB file, the compressed chunk map as the size of 700/32+1 which is 22 unsigned ints (uint32_t).
Using the compressed chunk map needs knowing the exact number of chunks in the file. This info is most of the time available in the current context, mostly from the ChunkMap class.
Chunk maps are transfered as needed, encoded into a CompressedChunkMap class. This is done by deriving new methods in the DataSend and DataRecv classes for both sending chunk map requests and compressed chunk maps. The various scenarios of chunkmap transfers are:
- a client downloading a file will ask for the chunk map to each of his sources (see above)
- a server uploading a file will ask for the chunkmap to each of his clients. In such a case the request is initiated by the GUI when it displays the chunk map of the client. If the gui is not showing this, the request won't happen as it is not needed.
As a result, chunk requests and chunk maps both circulate in both ways along tunnels and peer connexions.
File transfer Flags
A new flag can be used in rsFiles::FileRequest in addition to the existing ones:
- RS_FILE_HINTS_NETWORK_WIDE: will ask the turtle router to handle tunnels for that file.
Because the download speed of RS is based on the size of data slices we ask to the server, one can not directly ask for plain 1MB chunks. Consequently, the ChunkMap class keeps a list of 1MB chunks where to pick up data requests, and returns slices of the requested size. It also handles completion of chunks when the ftFileCreator sends the information of received data. Because of the delay between requesting data and completing it, two lists of 1MB chunks are handled by ChunkMap: the chunks being sliced and requested to the server, and the list of chunks being completed. This is why, for a single source, two red chunks may be displayed in the GUI.
The ftController class saves the compressed chunk availability map for each file being downloaded. As a result, re-starting a file download, e.g. when RS restarts, keeps the chunks that where already marked as done, but forgets the one that where being downloaded at the time of quit. As chunks are small, it's by far sufficient.
The search for a hash has been extended so that it will also search in the list of ftFileCreator being in use, which means that it also considers files being downloaded as possible sources for a requesting client. Additional virtual methods from ftFileCreator have thus been virtualized in ftFileProvider so that it responds with consistent values when asked for a chunk availability map for instance: if the provider is not also a creator, it will respond with a plain map.
In addition, ftFileProvider now stores the availability chunk maps of its clients, so that they can be displayed. It also handles a time stamp to warn that a given chunk map should be updated. For now the display of uploaders is only based on the last request, but with this, it's possible to display all clients being uploading a given file.
The file transfer protocol in RS cannot produce errors during normal function, as chunks are always marked as received when they are totally written on disk. However, due to external factors errors still might occur in some rare cases. These include for instance the defaults in computer memory, hard disk, and software error, such as exceeded disk quota.
After each download, the downloaded data is hashed, and the hash is compared to the announced sha1 hash of the file. If the two hashes match, the download is completed. If not, the file is tagged to be CRC checked. For this RS first asks to a random source (of this file) a CRC32 map for all chunks of the file. Once received, each chunk of downloaded data is CRC32ed and compared to the reference. If available but not similar to the reference, the chunk is removed, and the download continues.
The double sha1+crc32 mecanism might look heavy, but it's actually a good trade-off between overload of peers and security of transfers: most of the time the sha1 hash will respond that the file is fine. In rare cases of hash not matching, a single source of the file will cmpute and send a CRC32map of this file to the client peer. This computation is not cached, but it is extremely fast, so thanks to disk caching, it does not really need to be.
The sha1 file hashing is costly, so it's performed into a separate thread. This allows to display the "Checking..." status in the transfer tab.
- Cache transfers to oneself are short-circuited into direct copy.
- file of size less than 1 chunk are never downloaded at more than 1 source. This is probably not an issue.
Data flow (to be completed)
ftServer | +-> ftController (cancel, getChunksDetail, getDownloads,move,complete) (ticks over current downloads) | +-> ftDataMultiplex (get/send packets) +-> p3turtle +-> ftSearch | +-> map<hash,ftFileControl> mDownloads | | | +-> ftFileCreator (used by FileCancel and FileDetails) | | | +-> ftTransferModule (setFileSources, [pause,cancel]Transfer). Attached to 1 File (ticks over peers) | | | | +-> queue, requests +-> mMultiplexor (called to send data requests) +-> [save/load]List | | +-> getCacheFile() | +-> transmits/gets from turtle router and pqiPeers <------------------------------+ | | +-> mFileCreator (public ftFileProvider) | | | | | +-> getMissingChunk, addFileData <-------------------------------------+ | | +-> ChunkMap <-------------------------------------|---------|-+ | +-> getFileData() (should read in availability map) | | | | | | | +-> online peers | | | +-> mFileSources (map<peerId,peerInfo) | | | +-> storeData -> mfileCreator | | | +-> recv data -> adjust speed, call store, | | | +-> query inactive -> loop( tickPeerTransfer(peerInfo) ) -> getChunk (to ftFileCreator) | | | -> requestData (to Multiplexor) -+ | +-> Check Hash -> if ok, Complete file | | | | | else | | | | | +---> Check CRC32. Request CRC32map ----------------------------------------------+ | Setup new chunks ------------------------------------------------+