From 39986a2ff606e9057a9bfe07ff8f6cc549916465 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 5 Jun 2018 09:15:30 -0400 Subject: [PATCH 01/23] multi-pack-index: add design document Signed-off-by: Derrick Stolee --- Documentation/technical/multi-pack-index.txt | 109 +++++++++++++++++++ 1 file changed, 109 insertions(+) create mode 100644 Documentation/technical/multi-pack-index.txt diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt new file mode 100644 index 00000000000000..d7e57639f70d70 --- /dev/null +++ b/Documentation/technical/multi-pack-index.txt @@ -0,0 +1,109 @@ +Multi-Pack-Index (MIDX) Design Notes +==================================== + +The Git object directory contains a 'pack' directory containing +packfiles (with suffix ".pack") and pack-indexes (with suffix +".idx"). The pack-indexes provide a way to lookup objects and +navigate to their offset within the pack, but these must come +in pairs with the packfiles. This pairing depends on the file +names, as the pack-index differs only in suffix with its pack- +file. While the pack-indexes provide fast lookup per packfile, +this performance degrades as the number of packfiles increases, +because abbreviations need to inspect every packfile and we are +more likely to have a miss on our most-recently-used packfile. +For some large repositories, repacking into a single packfile +is not feasible due to storage space or excessive repack times. + +The multi-pack-index (MIDX for short) stores a list of objects +and their offsets into multiple packfiles. It contains: + +- A list of packfile names. +- A sorted list of object IDs. +- A list of metadata for the ith object ID including: + - A value j referring to the jth packfile. + - An offset within the jth packfile for the object. +- If large offsets are required, we use another list of large + offsets similar to version 2 pack-indexes. + +Thus, we can provide O(log N) lookup time for any number +of packfiles. + +Design Details +-------------- + +- The MIDX is stored in a file named 'multi-pack-index' in the + .git/objects/pack directory. This could be stored in the pack + directory of an alternate. It refers only to packfiles in that + same directory. + +- The pack.multiIndex config setting must be on to consume MIDX files. + +- The file format includes parameters for the object ID hash + function, so a future change of hash algorithm does not require + a change in format. + +- The MIDX keeps only one record per object ID. If an object appears + in multiple packfiles, then the MIDX selects the copy in the most- + recently modified packfile. + +- If there exist packfiles in the pack directory not registered in + the MIDX, then those packfiles are loaded into the `packed_git` + list and `packed_git_mru` cache. + +- The pack-indexes (.idx files) remain in the pack directory so we + can delete the MIDX file, set core.midx to false, or downgrade + without any loss of information. + +- The MIDX file format uses a chunk-based approach (similar to the + commit-graph file) that allows optional data to be added. + +Future Work +----------- + +- Add a 'verify' subcommand to the 'git midx' builtin to verify the + contents of the multi-pack-index file match the offsets listed in + the corresponding pack-indexes. + +- The multi-pack-index allows many packfiles, especially in a context + where repacking is expensive (such as a very large repo), or + unexpected maintenance time is unacceptable (such as a high-demand + build machine). However, the multi-pack-index needs to be rewritten + in full every time. We can extend the format to be incremental, so + writes are fast. By storing a small "tip" multi-pack-index that + points to large "base" MIDX files, we can keep writes fast while + still reducing the number of binary searches required for object + lookups. + +- The reachability bitmap is currently paired directly with a single + packfile, using the pack-order as the object order to hopefully + compress the bitmaps well using run-length encoding. This could be + extended to pair a reachability bitmap with a multi-pack-index. If + the multi-pack-index is extended to store a "stable object order" + (a function Order(hash) = integer that is constant for a given hash, + even as the multi-pack-index is updated) then a reachability bitmap + could point to a multi-pack-index and be updated independently. + +- Packfiles can be marked as "special" using empty files that share + the initial name but replace ".pack" with ".keep" or ".promisor". + We can add an optional chunk of data to the multi-pack-index that + records flags of information about the packfiles. This allows new + states, such as 'repacked' or 'redeltified', that can help with + pack maintenance in a multi-pack environment. It may also be + helpful to organize packfiles by object type (commit, tree, blob, + etc.) and use this metadata to help that maintenance. + +- The partial clone feature records special "promisor" packs that + may point to objects that are not stored locally, but available + on request to a server. The multi-pack-index does not currently + track these promisor packs. + +Related Links +------------- +[0] https://bugs.chromium.org/p/git/issues/detail?id=6 + Chromium work item for: Multi-Pack Index (MIDX) + +[1] https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/ + An earlier RFC for the multi-pack-index feature + +[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/ + Git Merge 2018 Contributor's summit notes (includes discussion of MIDX) From e3ece9e84bfb0af480fefcf1c4b770b6fb0924c7 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 29 May 2018 10:10:42 -0400 Subject: [PATCH 02/23] multi-pack-index: add format details The multi-pack-index feature generalizes the existing pack-index feature by indexing objects across multiple pack-files. Describe the basic file format, using a 12-byte header followed by a lookup table for a list of "chunks" which will be described later. The file ends with a footer containing a checksum using the hash algorithm. The header allows later versions to create breaking changes by advancing the version number. We can also change the hash algorithm using a different version value. We will add the individual chunk format information as we introduce the code that writes that information. Signed-off-by: Derrick Stolee --- Documentation/technical/pack-format.txt | 49 +++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 70a99fd1423894..e060e693f41a76 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -252,3 +252,52 @@ Pack file entry: <+ corresponding packfile. 20-byte SHA-1-checksum of all of the above. + +== multi-pack-index (MIDX) files have the following format: + +The multi-pack-index files refer to multiple pack-files and loose objects. + +In order to allow extensions that add extra data to the MIDX, we organize +the body into "chunks" and provide a lookup table at the beginning of the +body. The header includes certain length values, such as the number of packs, +the number of base MIDX files, hash lengths and types. + +All 4-byte numbers are in network order. + +HEADER: + + 4-byte signature: + The signature is: {'M', 'I', 'D', 'X'} + + 1-byte version number: + Git only writes or recognizes version 1. + + 1-byte Object Id Version + Git only writes or recognizes version 1 (SHA1). + + 1-byte number of "chunks" + + 1-byte number of base multi-pack-index files: + This value is currently always zero. + + 4-byte number of pack files + +CHUNK LOOKUP: + + (C + 1) * 12 bytes providing the chunk offsets: + First 4 bytes describe chunk id. Value 0 is a terminating label. + Other 8 bytes provide offset in current file for chunk to start. + (Chunks are provided in file-order, so you can infer the length + using the next chunk position if necessary.) + + The remaining data in the body is described one chunk at a time, and + these chunks may be given in any order. Chunks are required unless + otherwise specified. + +CHUNK DATA: + + (This section intentionally left incomplete.) + +TRAILER: + + 20-byte SHA1-checksum of the above contents. From fe4303660359e8418a21ae5052b316b7b207b23e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 16 Nov 2017 11:14:09 -0500 Subject: [PATCH 03/23] multi-pack-index: add builtin This new 'git multi-pack-index' builtin will be the plumbing access for writing, reading, and checking multi-pack-index files. The initial implementation is a no-op. Signed-off-by: Derrick Stolee --- .gitignore | 3 ++- Documentation/git-multi-pack-index.txt | 36 ++++++++++++++++++++++++++ Makefile | 1 + builtin.h | 1 + builtin/multi-pack-index.c | 34 ++++++++++++++++++++++++ command-list.txt | 1 + git.c | 1 + 7 files changed, 76 insertions(+), 1 deletion(-) create mode 100644 Documentation/git-multi-pack-index.txt create mode 100644 builtin/multi-pack-index.c diff --git a/.gitignore b/.gitignore index 388cc4beee54fa..25633bc51551a1 100644 --- a/.gitignore +++ b/.gitignore @@ -99,8 +99,9 @@ /git-mergetool--lib /git-mktag /git-mktree -/git-name-rev +/git-multi-pack-index /git-mv +/git-name-rev /git-notes /git-p4 /git-pack-redundant diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt new file mode 100644 index 00000000000000..ec9982cbfc8079 --- /dev/null +++ b/Documentation/git-multi-pack-index.txt @@ -0,0 +1,36 @@ +git-multi-pack-index(1) +====================== + +NAME +---- +git-multi-pack-index - Write and verify multi-pack-indexes + + +SYNOPSIS +-------- +[verse] +'git multi-pack-index' [--object-dir=] + +DESCRIPTION +----------- +Write or verify a multi-pack-index (MIDX) file. + +OPTIONS +------- + +--object-dir=:: + Use given directory for the location of Git objects. We check + `/packs/multi-pack-index` for the current MIDX file, and + `/packs` for the pack-files to index. + + +SEE ALSO +-------- +See link:technical/multi-pack-index.html[The Multi-Pack-Index Design +Document] and link:technical/pack-format.html[The Multi-Pack-Index +Format] for more information on the multi-pack-index feature. + + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Makefile b/Makefile index e4b503d259906d..54610875ece02a 100644 --- a/Makefile +++ b/Makefile @@ -1047,6 +1047,7 @@ BUILTIN_OBJS += builtin/merge-recursive.o BUILTIN_OBJS += builtin/merge-tree.o BUILTIN_OBJS += builtin/mktag.o BUILTIN_OBJS += builtin/mktree.o +BUILTIN_OBJS += builtin/multi-pack-index.o BUILTIN_OBJS += builtin/mv.o BUILTIN_OBJS += builtin/name-rev.o BUILTIN_OBJS += builtin/notes.o diff --git a/builtin.h b/builtin.h index 4e0f64723ed8dd..70997d7ace5ba6 100644 --- a/builtin.h +++ b/builtin.h @@ -191,6 +191,7 @@ extern int cmd_merge_recursive(int argc, const char **argv, const char *prefix); extern int cmd_merge_tree(int argc, const char **argv, const char *prefix); extern int cmd_mktag(int argc, const char **argv, const char *prefix); extern int cmd_mktree(int argc, const char **argv, const char *prefix); +extern int cmd_multi_pack_index(int argc, const char **argv, const char *prefix); extern int cmd_mv(int argc, const char **argv, const char *prefix); extern int cmd_name_rev(int argc, const char **argv, const char *prefix); extern int cmd_notes(int argc, const char **argv, const char *prefix); diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c new file mode 100644 index 00000000000000..3161ddae86d689 --- /dev/null +++ b/builtin/multi-pack-index.c @@ -0,0 +1,34 @@ +#include "builtin.h" +#include "cache.h" +#include "config.h" +#include "parse-options.h" + +static char const * const builtin_multi_pack_index_usage[] = { + N_("git multi-pack-index [--object-dir=]"), + NULL +}; + +static struct opts_multi_pack_index { + const char *object_dir; +} opts; + +int cmd_multi_pack_index(int argc, const char **argv, + const char *prefix) +{ + static struct option builtin_multi_pack_index_options[] = { + OPT_FILENAME(0, "object-dir", &opts.object_dir, + N_("object directory containing set of packfile and pack-index pairs")), + OPT_END(), + }; + + git_config(git_default_config, NULL); + + argc = parse_options(argc, argv, prefix, + builtin_multi_pack_index_options, + builtin_multi_pack_index_usage, 0); + + if (!opts.object_dir) + opts.object_dir = get_object_directory(); + + return 0; +} diff --git a/command-list.txt b/command-list.txt index e1c26c1bb7e618..61071f8fa2b94d 100644 --- a/command-list.txt +++ b/command-list.txt @@ -123,6 +123,7 @@ git-merge-index plumbingmanipulators git-merge-one-file purehelpers git-mergetool ancillarymanipulators complete git-merge-tree ancillaryinterrogators +git-multi-pack-index plumbingmanipulators git-mktag plumbingmanipulators git-mktree plumbingmanipulators git-mv mainporcelain worktree diff --git a/git.c b/git.c index c2f48d53dd4aab..a7509fa5f7198d 100644 --- a/git.c +++ b/git.c @@ -505,6 +505,7 @@ static struct cmd_struct commands[] = { { "merge-tree", cmd_merge_tree, RUN_SETUP | NO_PARSEOPT }, { "mktag", cmd_mktag, RUN_SETUP | NO_PARSEOPT }, { "mktree", cmd_mktree, RUN_SETUP }, + { "multi-pack-index", cmd_multi_pack_index, RUN_SETUP_GENTLY }, { "mv", cmd_mv, RUN_SETUP | NEED_WORK_TREE }, { "name-rev", cmd_name_rev, RUN_SETUP }, { "notes", cmd_notes, RUN_SETUP }, From fac992bbd842bf2d7ffb2752dcd7e48f7f8b365c Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 29 May 2018 10:43:58 -0400 Subject: [PATCH 04/23] multi-pack-index: add 'write' verb In anticipation of writing multi-pack-indexes, add a skeleton 'git multi-pack-index write' subcommand and send the options to a write_midx_file() method. Also create a skeleton test script that tests the 'write' subcommand. Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 22 +++++++++++++++++++++- Makefile | 1 + builtin/multi-pack-index.c | 17 +++++++++++++++-- midx.c | 7 +++++++ midx.h | 6 ++++++ t/t5319-multi-pack-index.sh | 10 ++++++++++ 6 files changed, 60 insertions(+), 3 deletions(-) create mode 100644 midx.c create mode 100644 midx.h create mode 100755 t/t5319-multi-pack-index.sh diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index ec9982cbfc8079..a62af1caca0ceb 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes SYNOPSIS -------- [verse] -'git multi-pack-index' [--object-dir=] +'git multi-pack-index' [--object-dir=] DESCRIPTION ----------- @@ -23,6 +23,26 @@ OPTIONS `/packs/multi-pack-index` for the current MIDX file, and `/packs` for the pack-files to index. +write:: + When given as the verb, write a new MIDX file to + `/packs/multi-pack-index`. + + +EXAMPLES +-------- + +* Write a MIDX file for the packfiles in the current .git folder. ++ +----------------------------------------------- +$ git multi-pack-index write +----------------------------------------------- + +* Write a MIDX file for the packfiles in an alternate object store. ++ +----------------------------------------------- +$ git multi-pack-index --object-dir write +----------------------------------------------- + SEE ALSO -------- diff --git a/Makefile b/Makefile index 54610875ece02a..f5636c711dbb96 100644 --- a/Makefile +++ b/Makefile @@ -890,6 +890,7 @@ LIB_OBJS += merge.o LIB_OBJS += merge-blobs.o LIB_OBJS += merge-recursive.o LIB_OBJS += mergesort.o +LIB_OBJS += midx.o LIB_OBJS += name-hash.o LIB_OBJS += notes.o LIB_OBJS += notes-cache.o diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 3161ddae86d689..6a7aa00cf2e7b1 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -2,9 +2,10 @@ #include "cache.h" #include "config.h" #include "parse-options.h" +#include "midx.h" static char const * const builtin_multi_pack_index_usage[] = { - N_("git multi-pack-index [--object-dir=]"), + N_("git multi-pack-index [--object-dir=] write"), NULL }; @@ -30,5 +31,17 @@ int cmd_multi_pack_index(int argc, const char **argv, if (!opts.object_dir) opts.object_dir = get_object_directory(); - return 0; + if (argc == 0) + goto usage; + + if (!strcmp(argv[0], "write")) { + if (argc > 1) + goto usage; + + return write_midx_file(opts.object_dir); + } + +usage: + usage_with_options(builtin_multi_pack_index_usage, + builtin_multi_pack_index_options); } diff --git a/midx.c b/midx.c new file mode 100644 index 00000000000000..32468db1a28752 --- /dev/null +++ b/midx.c @@ -0,0 +1,7 @@ +#include "cache.h" +#include "midx.h" + +int write_midx_file(const char *object_dir) +{ + return 0; +} diff --git a/midx.h b/midx.h new file mode 100644 index 00000000000000..dbdbe9f8732239 --- /dev/null +++ b/midx.h @@ -0,0 +1,6 @@ +#ifndef __MIDX_H__ +#define __MIDX_H__ + +int write_midx_file(const char *object_dir); + +#endif diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh new file mode 100755 index 00000000000000..ec3ddbe79cc13b --- /dev/null +++ b/t/t5319-multi-pack-index.sh @@ -0,0 +1,10 @@ +#!/bin/sh + +test_description='multi-pack-indexes' +. ./test-lib.sh + +test_expect_success 'write midx with no packs' ' + git multi-pack-index --object-dir=. write +' + +test_done From 5c4d6f07daeb2bc6ed414de8b071cf9ce6b10d80 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 May 2018 08:36:23 -0400 Subject: [PATCH 05/23] midx: write header information to lockfile As we begin writing the multi-pack-index format to disk, start with the basics: the 12-byte header and the 20-byte checksum footer. Start with these basics so we can add the rest of the format in small increments. As we implement the format, we will use a technique to check that our computed offsets within the multi-pack-index file match what we are actually writing. Each method that writes to the hashfile will return the number of bytes written, and we will track that those values match our expectations. Currently, write_midx_header() returns 12, but is not checked. We will check the return value in a later commit. Signed-off-by: Derrick Stolee --- midx.c | 50 +++++++++++++++++++++++++++++++++++++ t/t5319-multi-pack-index.sh | 4 ++- 2 files changed, 53 insertions(+), 1 deletion(-) diff --git a/midx.c b/midx.c index 32468db1a28752..f85f2d334d4762 100644 --- a/midx.c +++ b/midx.c @@ -1,7 +1,57 @@ #include "cache.h" +#include "csum-file.h" +#include "lockfile.h" #include "midx.h" +#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ +#define MIDX_VERSION 1 +#define MIDX_HASH_VERSION 1 +#define MIDX_HEADER_SIZE 12 + +static char *get_midx_filename(const char *object_dir) +{ + return xstrfmt("%s/pack/multi-pack-index", object_dir); +} + +static size_t write_midx_header(struct hashfile *f, + unsigned char num_chunks, + uint32_t num_packs) +{ + unsigned char byte_values[4]; + + hashwrite_be32(f, MIDX_SIGNATURE); + byte_values[0] = MIDX_VERSION; + byte_values[1] = MIDX_HASH_VERSION; + byte_values[2] = num_chunks; + byte_values[3] = 0; /* unused */ + hashwrite(f, byte_values, sizeof(byte_values)); + hashwrite_be32(f, num_packs); + + return MIDX_HEADER_SIZE; +} + int write_midx_file(const char *object_dir) { + unsigned char num_chunks = 0; + char *midx_name; + struct hashfile *f = NULL; + struct lock_file lk; + + midx_name = get_midx_filename(object_dir); + if (safe_create_leading_directories(midx_name)) { + UNLEAK(midx_name); + die_errno(_("unable to create leading directories of %s"), + midx_name); + } + + hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); + f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); + FREE_AND_NULL(midx_name); + + write_midx_header(f, num_chunks, 0); + + finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); + commit_lock_file(&lk); + return 0; } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index ec3ddbe79cc13b..50e80f8f2cde8c 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -4,7 +4,9 @@ test_description='multi-pack-indexes' . ./test-lib.sh test_expect_success 'write midx with no packs' ' - git multi-pack-index --object-dir=. write + test_when_finished rm -f pack/multi-pack-index && + git multi-pack-index --object-dir=. write && + test_path_is_file pack/multi-pack-index ' test_done From 700022db6ba353847e56ead3d29714bf164b0280 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 21 Jun 2018 10:45:21 -0400 Subject: [PATCH 06/23] multi-pack-index: load into memory Create a new multi_pack_index struct for loading multi-pack-indexes into memory. Create a test-tool builtin for reading basic information about that multi-pack-index to verify the correct data is written. Signed-off-by: Derrick Stolee --- Makefile | 1 + midx.c | 79 +++++++++++++++++++++++++++++++++++++ midx.h | 18 +++++++++ object-store.h | 2 + t/helper/test-read-midx.c | 31 +++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5319-multi-pack-index.sh | 11 +++++- 8 files changed, 143 insertions(+), 1 deletion(-) create mode 100644 t/helper/test-read-midx.c diff --git a/Makefile b/Makefile index f5636c711dbb96..0b801d1b16b96f 100644 --- a/Makefile +++ b/Makefile @@ -717,6 +717,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o TEST_BUILTINS_OBJS += test-path-utils.o TEST_BUILTINS_OBJS += test-prio-queue.o TEST_BUILTINS_OBJS += test-read-cache.o +TEST_BUILTINS_OBJS += test-read-midx.o TEST_BUILTINS_OBJS += test-ref-store.o TEST_BUILTINS_OBJS += test-regex.o TEST_BUILTINS_OBJS += test-revision-walking.o diff --git a/midx.c b/midx.c index f85f2d334d4762..c1ff5acf853911 100644 --- a/midx.c +++ b/midx.c @@ -1,18 +1,97 @@ #include "cache.h" #include "csum-file.h" #include "lockfile.h" +#include "object-store.h" #include "midx.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 +#define MIDX_BYTE_FILE_VERSION 4 +#define MIDX_BYTE_HASH_VERSION 5 +#define MIDX_BYTE_NUM_CHUNKS 6 +#define MIDX_BYTE_NUM_PACKS 8 #define MIDX_HASH_VERSION 1 #define MIDX_HEADER_SIZE 12 +#define MIDX_HASH_LEN 20 +#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN) static char *get_midx_filename(const char *object_dir) { return xstrfmt("%s/pack/multi-pack-index", object_dir); } +struct multi_pack_index *load_multi_pack_index(const char *object_dir) +{ + struct multi_pack_index *m = NULL; + int fd; + struct stat st; + size_t midx_size; + void *midx_map = NULL; + uint32_t hash_version; + char *midx_name = get_midx_filename(object_dir); + + fd = git_open(midx_name); + + if (fd < 0) + goto cleanup_fail; + if (fstat(fd, &st)) { + error_errno(_("failed to read %s"), midx_name); + goto cleanup_fail; + } + + midx_size = xsize_t(st.st_size); + + if (midx_size < MIDX_MIN_SIZE) { + error(_("multi-pack-index file %s is too small"), midx_name); + goto cleanup_fail; + } + + FREE_AND_NULL(midx_name); + + midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0); + + FLEX_ALLOC_MEM(m, object_dir, object_dir, strlen(object_dir)); + m->fd = fd; + m->data = midx_map; + m->data_len = midx_size; + + m->signature = get_be32(m->data); + if (m->signature != MIDX_SIGNATURE) { + error(_("multi-pack-index signature 0x%08x does not match signature 0x%08x"), + m->signature, MIDX_SIGNATURE); + goto cleanup_fail; + } + + m->version = m->data[MIDX_BYTE_FILE_VERSION]; + if (m->version != MIDX_VERSION) { + error(_("multi-pack-index version %d not recognized"), + m->version); + goto cleanup_fail; + } + + hash_version = m->data[MIDX_BYTE_HASH_VERSION]; + if (hash_version != MIDX_HASH_VERSION) { + error(_("hash version %u does not match"), hash_version); + goto cleanup_fail; + } + m->hash_len = MIDX_HASH_LEN; + + m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS]; + + m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); + + return m; + +cleanup_fail: + free(m); + free(midx_name); + if (midx_map) + munmap(midx_map, midx_size); + if (0 <= fd) + close(fd); + return NULL; +} + static size_t write_midx_header(struct hashfile *f, unsigned char num_chunks, uint32_t num_packs) diff --git a/midx.h b/midx.h index dbdbe9f8732239..0e05051bca8f26 100644 --- a/midx.h +++ b/midx.h @@ -1,6 +1,24 @@ #ifndef __MIDX_H__ #define __MIDX_H__ +struct multi_pack_index { + int fd; + + const unsigned char *data; + size_t data_len; + + uint32_t signature; + unsigned char version; + unsigned char hash_len; + unsigned char num_chunks; + uint32_t num_packs; + uint32_t num_objects; + + char object_dir[FLEX_ARRAY]; +}; + +struct multi_pack_index *load_multi_pack_index(const char *object_dir); + int write_midx_file(const char *object_dir); #endif diff --git a/object-store.h b/object-store.h index d683112fd7bbab..13a766aea8ece9 100644 --- a/object-store.h +++ b/object-store.h @@ -84,6 +84,8 @@ struct packed_git { char pack_name[FLEX_ARRAY]; /* more */ }; +struct multi_pack_index; + struct raw_object_store { /* * Path to the repository's object store. diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c new file mode 100644 index 00000000000000..988a48716986c6 --- /dev/null +++ b/t/helper/test-read-midx.c @@ -0,0 +1,31 @@ +#include "test-tool.h" +#include "cache.h" +#include "midx.h" +#include "repository.h" +#include "object-store.h" + +static int read_midx_file(const char *object_dir) +{ + struct multi_pack_index *m = load_multi_pack_index(object_dir); + + if (!m) + return 1; + + printf("header: %08x %d %d %d\n", + m->signature, + m->version, + m->num_chunks, + m->num_packs); + + printf("object-dir: %s\n", m->object_dir); + + return 0; +} + +int cmd__read_midx(int argc, const char **argv) +{ + if (argc != 2) + usage("read-midx "); + + return read_midx_file(argv[1]); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 805a45de9c877d..1c3ab36e6c3ef0 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -27,6 +27,7 @@ static struct test_cmd cmds[] = { { "path-utils", cmd__path_utils }, { "prio-queue", cmd__prio_queue }, { "read-cache", cmd__read_cache }, + { "read-midx", cmd__read_midx }, { "ref-store", cmd__ref_store }, { "regex", cmd__regex }, { "revision-walking", cmd__revision_walking }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 7116ddfb94398d..6af8c08a6655ad 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -21,6 +21,7 @@ int cmd__online_cpus(int argc, const char **argv); int cmd__path_utils(int argc, const char **argv); int cmd__prio_queue(int argc, const char **argv); int cmd__read_cache(int argc, const char **argv); +int cmd__read_midx(int argc, const char **argv); int cmd__ref_store(int argc, const char **argv); int cmd__regex(int argc, const char **argv); int cmd__revision_walking(int argc, const char **argv); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 50e80f8f2cde8c..506bd8abb8383d 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -3,10 +3,19 @@ test_description='multi-pack-indexes' . ./test-lib.sh +midx_read_expect () { + cat >expect <<-EOF + header: 4d494458 1 0 0 + object-dir: . + EOF + test-tool read-midx . >actual && + test_cmp expect actual +} + test_expect_success 'write midx with no packs' ' test_when_finished rm -f pack/multi-pack-index && git multi-pack-index --object-dir=. write && - test_path_is_file pack/multi-pack-index + midx_read_expect ' test_done From 3315b480e663364754ebce86990208eff55021ef Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 May 2018 15:05:00 -0400 Subject: [PATCH 07/23] t5319: expand test data As we build the multi-pack-index file format, we want to test the format on real repositories. Add tests that create repository data including multiple packfiles with both version 1 and version 2 formats. The current 'git multi-pack-index write' command will always write the same file with no "real" data. This will be expanded in future commits, along with the test expectations. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 84 +++++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 506bd8abb8383d..1240127ec1a0c8 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -18,4 +18,88 @@ test_expect_success 'write midx with no packs' ' midx_read_expect ' +generate_objects () { + i=$1 + iii=$(printf '%03i' $i) + { + test-tool genrandom "bar" 200 && + test-tool genrandom "baz $iii" 50 + } >wide_delta_$iii && + { + test-tool genrandom "foo"$i 100 && + test-tool genrandom "foo"$(( $i + 1 )) 100 && + test-tool genrandom "foo"$(( $i + 2 )) 100 + } >deep_delta_$iii && + { + echo $iii && + test-tool genrandom "$iii" 8192 + } >file_$iii && + git update-index --add file_$iii deep_delta_$iii wide_delta_$iii +} + +commit_and_list_objects () { + { + echo 101 && + test-tool genrandom 100 8192; + } >file_101 && + git update-index --add file_101 && + tree=$(git write-tree) && + commit=$(git commit-tree $tree -p HEADobj-list && + git reset --hard $commit +} + +test_expect_success 'create objects' ' + test_commit initial && + for i in $(test_seq 1 5) + do + generate_objects $i + done && + commit_and_list_objects +' + +test_expect_success 'write midx with one v1 pack' ' + pack=$(git pack-objects --index-version=1 pack/test Date: Wed, 20 Jun 2018 11:41:33 -0400 Subject: [PATCH 08/23] packfile: generalize pack directory list In anticipation of sharing the pack directory listing with the multi-pack-index, generalize prepare_packed_git_one() into for_each_file_in_pack_dir(). Signed-off-by: Derrick Stolee --- packfile.c | 101 +++++++++++++++++++++++++++++++++-------------------- packfile.h | 6 ++++ 2 files changed, 69 insertions(+), 38 deletions(-) diff --git a/packfile.c b/packfile.c index 7cd45aa4b2a0e0..ee1ab9b804d54c 100644 --- a/packfile.c +++ b/packfile.c @@ -738,13 +738,14 @@ static void report_pack_garbage(struct string_list *list) report_helper(list, seen_bits, first, list->nr); } -static void prepare_packed_git_one(struct repository *r, char *objdir, int local) +void for_each_file_in_pack_dir(const char *objdir, + each_file_in_pack_dir_fn fn, + void *data) { struct strbuf path = STRBUF_INIT; size_t dirnamelen; DIR *dir; struct dirent *de; - struct string_list garbage = STRING_LIST_INIT_DUP; strbuf_addstr(&path, objdir); strbuf_addstr(&path, "/pack"); @@ -759,53 +760,77 @@ static void prepare_packed_git_one(struct repository *r, char *objdir, int local strbuf_addch(&path, '/'); dirnamelen = path.len; while ((de = readdir(dir)) != NULL) { - struct packed_git *p; - size_t base_len; - if (is_dot_or_dotdot(de->d_name)) continue; strbuf_setlen(&path, dirnamelen); strbuf_addstr(&path, de->d_name); - base_len = path.len; - if (strip_suffix_mem(path.buf, &base_len, ".idx")) { - /* Don't reopen a pack we already have. */ - for (p = r->objects->packed_git; p; - p = p->next) { - size_t len; - if (strip_suffix(p->pack_name, ".pack", &len) && - len == base_len && - !memcmp(p->pack_name, path.buf, len)) - break; - } - if (p == NULL && - /* - * See if it really is a valid .idx file with - * corresponding .pack file that we can map. - */ - (p = add_packed_git(path.buf, path.len, local)) != NULL) - install_packed_git(r, p); - } - - if (!report_garbage) - continue; - - if (ends_with(de->d_name, ".idx") || - ends_with(de->d_name, ".pack") || - ends_with(de->d_name, ".bitmap") || - ends_with(de->d_name, ".keep") || - ends_with(de->d_name, ".promisor")) - string_list_append(&garbage, path.buf); - else - report_garbage(PACKDIR_FILE_GARBAGE, path.buf); + fn(path.buf, path.len, de->d_name, data); } + closedir(dir); - report_pack_garbage(&garbage); - string_list_clear(&garbage, 0); strbuf_release(&path); } +struct prepare_pack_data { + struct repository *r; + struct string_list *garbage; + int local; +}; + +static void prepare_pack(const char *full_name, size_t full_name_len, + const char *file_name, void *_data) +{ + struct prepare_pack_data *data = (struct prepare_pack_data *)_data; + struct packed_git *p; + size_t base_len = full_name_len; + + if (strip_suffix_mem(full_name, &base_len, ".idx")) { + /* Don't reopen a pack we already have. */ + for (p = data->r->objects->packed_git; p; p = p->next) { + size_t len; + if (strip_suffix(p->pack_name, ".pack", &len) && + len == base_len && + !memcmp(p->pack_name, full_name, len)) + break; + } + + if (!p) { + p = add_packed_git(full_name, full_name_len, data->local); + if (p) + install_packed_git(data->r, p); + } + } + + if (!report_garbage) + return; + + if (ends_with(file_name, ".idx") || + ends_with(file_name, ".pack") || + ends_with(file_name, ".bitmap") || + ends_with(file_name, ".keep") || + ends_with(file_name, ".promisor")) + string_list_append(data->garbage, full_name); + else + report_garbage(PACKDIR_FILE_GARBAGE, full_name); +} + +static void prepare_packed_git_one(struct repository *r, char *objdir, int local) +{ + struct prepare_pack_data data; + struct string_list garbage = STRING_LIST_INIT_DUP; + + data.r = r; + data.garbage = &garbage; + data.local = local; + + for_each_file_in_pack_dir(objdir, prepare_pack, &data); + + report_pack_garbage(data.garbage); + string_list_clear(data.garbage, 0); +} + static void prepare_packed_git(struct repository *r); /* * Give a fast, rough count of the number of objects in the repository. This diff --git a/packfile.h b/packfile.h index e0a38aba9321de..d2ad30300a2740 100644 --- a/packfile.h +++ b/packfile.h @@ -28,6 +28,12 @@ extern char *sha1_pack_index_name(const unsigned char *sha1); extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path); +typedef void each_file_in_pack_dir_fn(const char *full_path, size_t full_path_len, + const char *file_pach, void *data); +void for_each_file_in_pack_dir(const char *objdir, + each_file_in_pack_dir_fn fn, + void *data); + /* A hook to report invalid files in pack directory */ #define PACKDIR_FILE_PACK 1 #define PACKDIR_FILE_IDX 2 From e5875f466e3a82730ac7ad9c319b9397cfa78821 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 May 2018 15:17:34 -0400 Subject: [PATCH 09/23] multi-pack-index: read packfile list When constructing a multi-pack-index file for a given object directory, read the files within the enclosed pack directory and find matches that end with ".idx" and find the correct paired packfile using add_packed_git(). Signed-off-by: Derrick Stolee --- midx.c | 48 ++++++++++++++++++++++++++++++++++++- t/t5319-multi-pack-index.sh | 15 ++++++------ 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/midx.c b/midx.c index c1ff5acf853911..f742d7ccd70783 100644 --- a/midx.c +++ b/midx.c @@ -1,6 +1,8 @@ #include "cache.h" #include "csum-file.h" +#include "dir.h" #include "lockfile.h" +#include "packfile.h" #include "object-store.h" #include "midx.h" @@ -109,12 +111,41 @@ static size_t write_midx_header(struct hashfile *f, return MIDX_HEADER_SIZE; } +struct pack_list { + struct packed_git **list; + uint32_t nr; + uint32_t alloc_list; +}; + +static void add_pack_to_midx(const char *full_path, size_t full_path_len, + const char *file_name, void *data) +{ + struct pack_list *packs = (struct pack_list *)data; + + if (ends_with(file_name, ".idx")) { + ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list); + + packs->list[packs->nr] = add_packed_git(full_path, + full_path_len, + 0); + if (!packs->list[packs->nr]) { + warning(_("failed to add packfile '%s'"), + full_path); + return; + } + + packs->nr++; + } +} + int write_midx_file(const char *object_dir) { unsigned char num_chunks = 0; char *midx_name; + uint32_t i; struct hashfile *f = NULL; struct lock_file lk; + struct pack_list packs; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -123,14 +154,29 @@ int write_midx_file(const char *object_dir) midx_name); } + packs.nr = 0; + packs.alloc_list = 16; + packs.list = NULL; + ALLOC_ARRAY(packs.list, packs.alloc_list); + + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); + hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); - write_midx_header(f, num_chunks, 0); + write_midx_header(f, num_chunks, packs.nr); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); commit_lock_file(&lk); + for (i = 0; i < packs.nr; i++) { + if (packs.list[i]) { + close_pack(packs.list[i]); + free(packs.list[i]); + } + } + + free(packs.list); return 0; } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 1240127ec1a0c8..54117a7f49563f 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -4,8 +4,9 @@ test_description='multi-pack-indexes' . ./test-lib.sh midx_read_expect () { + NUM_PACKS=$1 cat >expect <<-EOF - header: 4d494458 1 0 0 + header: 4d494458 1 0 $NUM_PACKS object-dir: . EOF test-tool read-midx . >actual && @@ -15,7 +16,7 @@ midx_read_expect () { test_expect_success 'write midx with no packs' ' test_when_finished rm -f pack/multi-pack-index && git multi-pack-index --object-dir=. write && - midx_read_expect + midx_read_expect 0 ' generate_objects () { @@ -65,13 +66,13 @@ test_expect_success 'write midx with one v1 pack' ' pack=$(git pack-objects --index-version=1 pack/test Date: Thu, 31 May 2018 16:22:03 -0400 Subject: [PATCH 10/23] multi-pack-index: write pack names in chunk The multi-pack-index needs to track which packfiles it indexes. Store these in our first required chunk. Since filenames are not well structured, add padding to keep good alignment in later chunks. Modify the 'git multi-pack-index read' subcommand to output the existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh to reflect this new output and the new expected number of chunks. Defense in depth: A pattern we are using in the multi-pack-index feature is to verify the data as we write it. We want to ensure we never write invalid data to the multi-pack-index. There are many checks that verify that the values we are writing fit the format definitions. This mainly helps developers while working on the feature, but it can also identify issues that only appear when dealing with very large data sets. These large sets are hard to encode into test cases. Signed-off-by: Derrick Stolee --- Documentation/technical/pack-format.txt | 6 + midx.c | 174 +++++++++++++++++++++++- midx.h | 2 + t/helper/test-read-midx.c | 7 + t/t5319-multi-pack-index.sh | 3 +- 5 files changed, 189 insertions(+), 3 deletions(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index e060e693f41a76..6c5a77475f4a65 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -296,6 +296,12 @@ CHUNK LOOKUP: CHUNK DATA: + Packfile Names (ID: {'P', 'N', 'A', 'M'}) + Stores the packfile names as concatenated, null-terminated strings. + Packfiles must be listed in lexicographic order for fast lookups by + name. This is the only chunk not guaranteed to be a multiple of four + bytes in length, so should be the last chunk for alignment reasons. + (This section intentionally left incomplete.) TRAILER: diff --git a/midx.c b/midx.c index f742d7ccd70783..ca7a32bf956596 100644 --- a/midx.c +++ b/midx.c @@ -17,6 +17,11 @@ #define MIDX_HASH_LEN 20 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN) +#define MIDX_MAX_CHUNKS 1 +#define MIDX_CHUNK_ALIGNMENT 4 +#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */ +#define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t)) + static char *get_midx_filename(const char *object_dir) { return xstrfmt("%s/pack/multi-pack-index", object_dir); @@ -31,6 +36,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) void *midx_map = NULL; uint32_t hash_version; char *midx_name = get_midx_filename(object_dir); + uint32_t i; fd = git_open(midx_name); @@ -82,6 +88,33 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); + for (i = 0; i < m->num_chunks; i++) { + uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE + + MIDX_CHUNKLOOKUP_WIDTH * i); + uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 + + MIDX_CHUNKLOOKUP_WIDTH * i); + + switch (chunk_id) { + case MIDX_CHUNKID_PACKNAMES: + m->chunk_pack_names = m->data + chunk_offset; + break; + + case 0: + die(_("terminating multi-pack-index chunk id appears earlier than expected")); + break; + + default: + /* + * Do nothing on unrecognized chunks, allowing future + * extensions to add optional chunks. + */ + break; + } + } + + if (!m->chunk_pack_names) + die(_("multi-pack-index missing required pack-name chunk")); + return m; cleanup_fail: @@ -113,8 +146,11 @@ static size_t write_midx_header(struct hashfile *f, struct pack_list { struct packed_git **list; + char **names; uint32_t nr; uint32_t alloc_list; + uint32_t alloc_names; + size_t pack_name_concat_len; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -124,6 +160,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, if (ends_with(file_name, ".idx")) { ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list); + ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names); packs->list[packs->nr] = add_packed_git(full_path, full_path_len, @@ -134,18 +171,89 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, return; } + packs->names[packs->nr] = xstrdup(file_name); + packs->pack_name_concat_len += strlen(file_name) + 1; packs->nr++; } } +struct pack_pair { + uint32_t pack_int_id; + char *pack_name; +}; + +static int pack_pair_compare(const void *_a, const void *_b) +{ + struct pack_pair *a = (struct pack_pair *)_a; + struct pack_pair *b = (struct pack_pair *)_b; + return strcmp(a->pack_name, b->pack_name); +} + +static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t *perm) +{ + uint32_t i; + struct pack_pair *pairs; + + ALLOC_ARRAY(pairs, nr_packs); + + for (i = 0; i < nr_packs; i++) { + pairs[i].pack_int_id = i; + pairs[i].pack_name = pack_names[i]; + } + + QSORT(pairs, nr_packs, pack_pair_compare); + + for (i = 0; i < nr_packs; i++) { + pack_names[i] = pairs[i].pack_name; + perm[pairs[i].pack_int_id] = i; + } + + free(pairs); +} + +static size_t write_midx_pack_names(struct hashfile *f, + char **pack_names, + uint32_t num_packs) +{ + uint32_t i; + unsigned char padding[MIDX_CHUNK_ALIGNMENT]; + size_t written = 0; + + for (i = 0; i < num_packs; i++) { + size_t writelen = strlen(pack_names[i]) + 1; + + if (i && strcmp(pack_names[i], pack_names[i - 1]) <= 0) + BUG("incorrect pack-file order: %s before %s", + pack_names[i - 1], + pack_names[i]); + + hashwrite(f, pack_names[i], writelen); + written += writelen; + } + + /* add padding to be aligned */ + i = MIDX_CHUNK_ALIGNMENT - (written % MIDX_CHUNK_ALIGNMENT); + if (i < MIDX_CHUNK_ALIGNMENT) { + memset(padding, 0, sizeof(padding)); + hashwrite(f, padding, i); + written += i; + } + + return written; +} + int write_midx_file(const char *object_dir) { - unsigned char num_chunks = 0; + unsigned char cur_chunk, num_chunks = 0; char *midx_name; uint32_t i; struct hashfile *f = NULL; struct lock_file lk; struct pack_list packs; + uint32_t *pack_perm = NULL; + uint64_t written = 0; + uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; + uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -156,16 +264,76 @@ int write_midx_file(const char *object_dir) packs.nr = 0; packs.alloc_list = 16; + packs.alloc_names = 16; packs.list = NULL; + packs.pack_name_concat_len = 0; ALLOC_ARRAY(packs.list, packs.alloc_list); + ALLOC_ARRAY(packs.names, packs.alloc_names); for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); + if (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) + packs.pack_name_concat_len += MIDX_CHUNK_ALIGNMENT - + (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT); + + ALLOC_ARRAY(pack_perm, packs.nr); + sort_packs_by_name(packs.names, packs.nr, pack_perm); + hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); - write_midx_header(f, num_chunks, packs.nr); + cur_chunk = 0; + num_chunks = 1; + + written = write_midx_header(f, num_chunks, packs.nr); + + chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES; + chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; + + cur_chunk++; + chunk_ids[cur_chunk] = 0; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + packs.pack_name_concat_len; + + for (i = 0; i <= num_chunks; i++) { + if (i && chunk_offsets[i] < chunk_offsets[i - 1]) + BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, + chunk_offsets[i - 1], + chunk_offsets[i]); + + if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT) + BUG("chunk offset %"PRIu64" is not properly aligned", + chunk_offsets[i]); + + hashwrite_be32(f, chunk_ids[i]); + hashwrite_be32(f, chunk_offsets[i] >> 32); + hashwrite_be32(f, chunk_offsets[i]); + + written += MIDX_CHUNKLOOKUP_WIDTH; + } + + for (i = 0; i < num_chunks; i++) { + if (written != chunk_offsets[i]) + BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32, + chunk_offsets[i], + written, + chunk_ids[i]); + + switch (chunk_ids[i]) { + case MIDX_CHUNKID_PACKNAMES: + written += write_midx_pack_names(f, packs.names, packs.nr); + break; + + default: + BUG("trying to write unknown chunk id %"PRIx32, + chunk_ids[i]); + } + } + + if (written != chunk_offsets[num_chunks]) + BUG("incorrect final offset %"PRIu64" != %"PRIu64, + written, + chunk_offsets[num_chunks]); finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); commit_lock_file(&lk); @@ -175,8 +343,10 @@ int write_midx_file(const char *object_dir) close_pack(packs.list[i]); free(packs.list[i]); } + free(packs.names[i]); } free(packs.list); + free(packs.names); return 0; } diff --git a/midx.h b/midx.h index 0e05051bca8f26..38af01fa3beac9 100644 --- a/midx.h +++ b/midx.h @@ -14,6 +14,8 @@ struct multi_pack_index { uint32_t num_packs; uint32_t num_objects; + const unsigned char *chunk_pack_names; + char object_dir[FLEX_ARRAY]; }; diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index 988a48716986c6..3f2d2cfa7807cd 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -17,6 +17,13 @@ static int read_midx_file(const char *object_dir) m->num_chunks, m->num_packs); + printf("chunks:"); + + if (m->chunk_pack_names) + printf(" pack-names"); + + printf("\n"); + printf("object-dir: %s\n", m->object_dir); return 0; diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 54117a7f49563f..7512d55c927ed9 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -6,7 +6,8 @@ test_description='multi-pack-indexes' midx_read_expect () { NUM_PACKS=$1 cat >expect <<-EOF - header: 4d494458 1 0 $NUM_PACKS + header: 4d494458 1 1 $NUM_PACKS + chunks: pack-names object-dir: . EOF test-tool read-midx . >actual && From 8860947a09691debf70bc703e1d85ad4081b6afd Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 1 Jun 2018 09:32:28 -0400 Subject: [PATCH 11/23] midx: read pack names into array Signed-off-by: Derrick Stolee --- midx.c | 17 +++++++++++++++++ midx.h | 1 + t/helper/test-read-midx.c | 5 +++++ t/t5319-multi-pack-index.sh | 17 ++++++++++++----- 4 files changed, 35 insertions(+), 5 deletions(-) diff --git a/midx.c b/midx.c index ca7a32bf956596..fcdf6553cea312 100644 --- a/midx.c +++ b/midx.c @@ -37,6 +37,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) uint32_t hash_version; char *midx_name = get_midx_filename(object_dir); uint32_t i; + const char *cur_pack_name; fd = git_open(midx_name); @@ -115,6 +116,22 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) if (!m->chunk_pack_names) die(_("multi-pack-index missing required pack-name chunk")); + m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); + + cur_pack_name = (const char *)m->chunk_pack_names; + for (i = 0; i < m->num_packs; i++) { + m->pack_names[i] = cur_pack_name; + + cur_pack_name += strlen(cur_pack_name) + 1; + + if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) { + error(_("multi-pack-index pack names out of order: '%s' before '%s'"), + m->pack_names[i - 1], + m->pack_names[i]); + goto cleanup_fail; + } + } + return m; cleanup_fail: diff --git a/midx.h b/midx.h index 38af01fa3beac9..17b56172e3cc4c 100644 --- a/midx.h +++ b/midx.h @@ -16,6 +16,7 @@ struct multi_pack_index { const unsigned char *chunk_pack_names; + const char **pack_names; char object_dir[FLEX_ARRAY]; }; diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index 3f2d2cfa7807cd..76a60d78828307 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -6,6 +6,7 @@ static int read_midx_file(const char *object_dir) { + uint32_t i; struct multi_pack_index *m = load_multi_pack_index(object_dir); if (!m) @@ -24,6 +25,10 @@ static int read_midx_file(const char *object_dir) printf("\n"); + printf("packs:\n"); + for (i = 0; i < m->num_packs; i++) + printf("%s\n", m->pack_names[i]); + printf("object-dir: %s\n", m->object_dir); return 0; diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 7512d55c927ed9..e8da082c64fa49 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -5,11 +5,18 @@ test_description='multi-pack-indexes' midx_read_expect () { NUM_PACKS=$1 - cat >expect <<-EOF - header: 4d494458 1 1 $NUM_PACKS - chunks: pack-names - object-dir: . - EOF + { + cat <<-EOF && + header: 4d494458 1 1 $NUM_PACKS + chunks: pack-names + packs: + EOF + if test $NUM_PACKS -ge 1 + then + ls pack/ | grep idx | sort + fi && + printf "object-dir: .\n" + } >expect && test-tool read-midx . >actual && test_cmp expect actual } From 824e913a00eae8ba5e45fb8f9a61d45a9bdeb27d Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 1 Jun 2018 11:16:37 -0400 Subject: [PATCH 12/23] midx: sort and deduplicate objects from packfiles Before writing a list of objects and their offsets to a multi-pack-index, we need to collect the list of objects contained in the packfiles. There may be multiple copies of some objects, so this list must be deduplicated. It is possible to artificially get into a state where there are many duplicate copies of objects. That can create high memory pressure if we are to create a list of all objects before de-duplication. To reduce this memory pressure without a significant performance drop, automatically group objects by the first byte of their object id. Use the IDX fanout tables to group the data, copy to a local array, then sort. Copy only the de-duplicated entries. Select the duplicate based on the most-recent modified time of a packfile containing the object. Signed-off-by: Derrick Stolee --- midx.c | 128 +++++++++++++++++++++++++++++++++++++++++++++++++++++ packfile.c | 17 +++++++ packfile.h | 2 + 3 files changed, 147 insertions(+) diff --git a/midx.c b/midx.c index fcdf6553cea312..29f8de5ee6461e 100644 --- a/midx.c +++ b/midx.c @@ -4,6 +4,7 @@ #include "lockfile.h" #include "packfile.h" #include "object-store.h" +#include "packfile.h" #include "midx.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ @@ -182,12 +183,21 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, packs->list[packs->nr] = add_packed_git(full_path, full_path_len, 0); + if (!packs->list[packs->nr]) { warning(_("failed to add packfile '%s'"), full_path); return; } + if (open_pack_index(packs->list[packs->nr])) { + warning(_("failed to open pack-index '%s'"), + full_path); + close_pack(packs->list[packs->nr]); + FREE_AND_NULL(packs->list[packs->nr]); + return; + } + packs->names[packs->nr] = xstrdup(file_name); packs->pack_name_concat_len += strlen(file_name) + 1; packs->nr++; @@ -228,6 +238,119 @@ static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t *p free(pairs); } +struct pack_midx_entry { + struct object_id oid; + uint32_t pack_int_id; + time_t pack_mtime; + uint64_t offset; +}; + +static int midx_oid_compare(const void *_a, const void *_b) +{ + const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a; + const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b; + int cmp = oidcmp(&a->oid, &b->oid); + + if (cmp) + return cmp; + + if (a->pack_mtime > b->pack_mtime) + return -1; + else if (a->pack_mtime < b->pack_mtime) + return 1; + + return a->pack_int_id - b->pack_int_id; +} + +static void fill_pack_entry(uint32_t pack_int_id, + struct packed_git *p, + uint32_t cur_object, + struct pack_midx_entry *entry) +{ + if (!nth_packed_object_oid(&entry->oid, p, cur_object)) + die(_("failed to locate object %d in packfile"), cur_object); + + entry->pack_int_id = pack_int_id; + entry->pack_mtime = p->mtime; + + entry->offset = nth_packed_object_offset(p, cur_object); +} + +/* + * It is possible to artificially get into a state where there are many + * duplicate copies of objects. That can create high memory pressure if + * we are to create a list of all objects before de-duplication. To reduce + * this memory pressure without a significant performance drop, automatically + * group objects by the first byte of their object id. Use the IDX fanout + * tables to group the data, copy to a local array, then sort. + * + * Copy only the de-duplicated entries (selected by most-recent modified time + * of a packfile containing the object). + */ +static struct pack_midx_entry *get_sorted_entries(struct packed_git **p, + uint32_t *perm, + uint32_t nr_packs, + uint32_t *nr_objects) +{ + uint32_t cur_fanout, cur_pack, cur_object; + uint32_t alloc_fanout, alloc_objects, total_objects = 0; + struct pack_midx_entry *entries_by_fanout = NULL; + struct pack_midx_entry *deduplicated_entries = NULL; + + for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) + total_objects += p[cur_pack]->num_objects; + + /* + * As we de-duplicate by fanout value, we expect the fanout + * slices to be evenly distributed, with some noise. Hence, + * allocate slightly more than one 256th. + */ + alloc_objects = alloc_fanout = total_objects > 3200 ? total_objects / 200 : 16; + + ALLOC_ARRAY(entries_by_fanout, alloc_fanout); + ALLOC_ARRAY(deduplicated_entries, alloc_objects); + *nr_objects = 0; + + for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { + uint32_t nr_fanout = 0; + + for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) { + uint32_t start = 0, end; + + if (cur_fanout) + start = get_pack_fanout(p[cur_pack], cur_fanout - 1); + end = get_pack_fanout(p[cur_pack], cur_fanout); + + for (cur_object = start; cur_object < end; cur_object++) { + ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + fill_pack_entry(perm[cur_pack], p[cur_pack], cur_object, &entries_by_fanout[nr_fanout]); + nr_fanout++; + } + } + + QSORT(entries_by_fanout, nr_fanout, midx_oid_compare); + + /* + * The batch is now sorted by OID and then mtime (descending). + * Take only the first duplicate. + */ + for (cur_object = 0; cur_object < nr_fanout; cur_object++) { + if (cur_object && !oidcmp(&entries_by_fanout[cur_object - 1].oid, + &entries_by_fanout[cur_object].oid)) + continue; + + ALLOC_GROW(deduplicated_entries, *nr_objects + 1, alloc_objects); + memcpy(&deduplicated_entries[*nr_objects], + &entries_by_fanout[cur_object], + sizeof(struct pack_midx_entry)); + (*nr_objects)++; + } + } + + free(entries_by_fanout); + return deduplicated_entries; +} + static size_t write_midx_pack_names(struct hashfile *f, char **pack_names, uint32_t num_packs) @@ -271,6 +394,8 @@ int write_midx_file(const char *object_dir) uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; + uint32_t nr_entries; + struct pack_midx_entry *entries = NULL; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -296,6 +421,8 @@ int write_midx_file(const char *object_dir) ALLOC_ARRAY(pack_perm, packs.nr); sort_packs_by_name(packs.names, packs.nr, pack_perm); + entries = get_sorted_entries(packs.list, pack_perm, packs.nr, &nr_entries); + hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); @@ -365,5 +492,6 @@ int write_midx_file(const char *object_dir) free(packs.list); free(packs.names); + free(entries); return 0; } diff --git a/packfile.c b/packfile.c index ee1ab9b804d54c..3d652212c60633 100644 --- a/packfile.c +++ b/packfile.c @@ -196,6 +196,23 @@ int open_pack_index(struct packed_git *p) return ret; } +uint32_t get_pack_fanout(struct packed_git *p, uint32_t value) +{ + const uint32_t *level1_ofs = p->index_data; + + if (!level1_ofs) { + if (open_pack_index(p)) + return 0; + level1_ofs = p->index_data; + } + + if (p->index_version > 1) { + level1_ofs += 2; + } + + return ntohl(level1_ofs[value]); +} + static struct packed_git *alloc_packed_git(int extra) { struct packed_git *p = xmalloc(st_add(sizeof(*p), extra)); diff --git a/packfile.h b/packfile.h index d2ad30300a2740..b0eed44c0b1e35 100644 --- a/packfile.h +++ b/packfile.h @@ -69,6 +69,8 @@ extern int open_pack_index(struct packed_git *); */ extern void close_pack_index(struct packed_git *); +extern uint32_t get_pack_fanout(struct packed_git *p, uint32_t value); + extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *); extern void close_pack_windows(struct packed_git *); extern void close_pack(struct packed_git *); From 9b40260e5d12e48ce63c66edfd85ae0814321372 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 1 Jun 2018 11:39:29 -0400 Subject: [PATCH 13/23] midx: write object ids in a chunk Signed-off-by: Derrick Stolee --- Documentation/technical/pack-format.txt | 4 +++ midx.c | 47 +++++++++++++++++++++++-- midx.h | 1 + t/helper/test-read-midx.c | 2 ++ t/t5319-multi-pack-index.sh | 4 +-- 5 files changed, 53 insertions(+), 5 deletions(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 6c5a77475f4a65..78ee0489c60cf3 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -302,6 +302,10 @@ CHUNK DATA: name. This is the only chunk not guaranteed to be a multiple of four bytes in length, so should be the last chunk for alignment reasons. + OID Lookup (ID: {'O', 'I', 'D', 'L'}) + The OIDs for all objects in the MIDX are stored in lexicographic + order in this chunk. + (This section intentionally left incomplete.) TRAILER: diff --git a/midx.c b/midx.c index 29f8de5ee6461e..3f113e1bebbf74 100644 --- a/midx.c +++ b/midx.c @@ -18,9 +18,10 @@ #define MIDX_HASH_LEN 20 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN) -#define MIDX_MAX_CHUNKS 1 +#define MIDX_MAX_CHUNKS 2 #define MIDX_CHUNK_ALIGNMENT 4 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */ +#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t)) static char *get_midx_filename(const char *object_dir) @@ -101,6 +102,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) m->chunk_pack_names = m->data + chunk_offset; break; + case MIDX_CHUNKID_OIDLOOKUP: + m->chunk_oid_lookup = m->data + chunk_offset; + break; + case 0: die(_("terminating multi-pack-index chunk id appears earlier than expected")); break; @@ -116,6 +121,8 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) if (!m->chunk_pack_names) die(_("multi-pack-index missing required pack-name chunk")); + if (!m->chunk_oid_lookup) + die(_("multi-pack-index missing required OID lookup chunk")); m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); @@ -382,6 +389,32 @@ static size_t write_midx_pack_names(struct hashfile *f, return written; } +static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, + struct pack_midx_entry *objects, + uint32_t nr_objects) +{ + struct pack_midx_entry *list = objects; + uint32_t i; + size_t written = 0; + + for (i = 0; i < nr_objects; i++) { + struct pack_midx_entry *obj = list++; + + if (i < nr_objects - 1) { + struct pack_midx_entry *next = list; + if (oidcmp(&obj->oid, &next->oid) >= 0) + BUG("OIDs not in order: %s >= %s", + oid_to_hex(&obj->oid), + oid_to_hex(&next->oid)); + } + + hashwrite(f, obj->oid.hash, (int)hash_len); + written += hash_len; + } + + return written; +} + int write_midx_file(const char *object_dir) { unsigned char cur_chunk, num_chunks = 0; @@ -428,7 +461,7 @@ int write_midx_file(const char *object_dir) FREE_AND_NULL(midx_name); cur_chunk = 0; - num_chunks = 1; + num_chunks = 2; written = write_midx_header(f, num_chunks, packs.nr); @@ -436,9 +469,13 @@ int write_midx_file(const char *object_dir) chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; cur_chunk++; - chunk_ids[cur_chunk] = 0; + chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + packs.pack_name_concat_len; + cur_chunk++; + chunk_ids[cur_chunk] = 0; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_HASH_LEN; + for (i = 0; i <= num_chunks; i++) { if (i && chunk_offsets[i] < chunk_offsets[i - 1]) BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, @@ -468,6 +505,10 @@ int write_midx_file(const char *object_dir) written += write_midx_pack_names(f, packs.names, packs.nr); break; + case MIDX_CHUNKID_OIDLOOKUP: + written += write_midx_oid_lookup(f, MIDX_HASH_LEN, entries, nr_entries); + break; + default: BUG("trying to write unknown chunk id %"PRIx32, chunk_ids[i]); diff --git a/midx.h b/midx.h index 17b56172e3cc4c..4d3bceafc58058 100644 --- a/midx.h +++ b/midx.h @@ -15,6 +15,7 @@ struct multi_pack_index { uint32_t num_objects; const unsigned char *chunk_pack_names; + const unsigned char *chunk_oid_lookup; const char **pack_names; char object_dir[FLEX_ARRAY]; diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index 76a60d78828307..de6d452a7ced7d 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -22,6 +22,8 @@ static int read_midx_file(const char *object_dir) if (m->chunk_pack_names) printf(" pack-names"); + if (m->chunk_oid_lookup) + printf(" oid-lookup"); printf("\n"); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index e8da082c64fa49..4813610115783f 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -7,8 +7,8 @@ midx_read_expect () { NUM_PACKS=$1 { cat <<-EOF && - header: 4d494458 1 1 $NUM_PACKS - chunks: pack-names + header: 4d494458 1 2 $NUM_PACKS + chunks: pack-names oid-lookup packs: EOF if test $NUM_PACKS -ge 1 From 7c22d94a194e730a7be07ab5b6f4cbe195fadf9b Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 1 Jun 2018 12:34:12 -0400 Subject: [PATCH 14/23] midx: write object id fanout chunk Signed-off-by: Derrick Stolee --- Documentation/technical/pack-format.txt | 5 +++ midx.c | 53 +++++++++++++++++++++++-- midx.h | 1 + t/helper/test-read-midx.c | 4 +- t/t5319-multi-pack-index.sh | 16 ++++---- 5 files changed, 68 insertions(+), 11 deletions(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 78ee0489c60cf3..3215f7bfcdbda9 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -302,6 +302,11 @@ CHUNK DATA: name. This is the only chunk not guaranteed to be a multiple of four bytes in length, so should be the last chunk for alignment reasons. + OID Fanout (ID: {'O', 'I', 'D', 'F'}) + The ith entry, F[i], stores the number of OIDs with first + byte at most i. Thus F[255] stores the total + number of objects. + OID Lookup (ID: {'O', 'I', 'D', 'L'}) The OIDs for all objects in the MIDX are stored in lexicographic order in this chunk. diff --git a/midx.c b/midx.c index 3f113e1bebbf74..7a954eb0cd317a 100644 --- a/midx.c +++ b/midx.c @@ -18,11 +18,13 @@ #define MIDX_HASH_LEN 20 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN) -#define MIDX_MAX_CHUNKS 2 +#define MIDX_MAX_CHUNKS 3 #define MIDX_CHUNK_ALIGNMENT 4 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */ +#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t)) +#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256) static char *get_midx_filename(const char *object_dir) { @@ -102,6 +104,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) m->chunk_pack_names = m->data + chunk_offset; break; + case MIDX_CHUNKID_OIDFANOUT: + m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset); + break; + case MIDX_CHUNKID_OIDLOOKUP: m->chunk_oid_lookup = m->data + chunk_offset; break; @@ -121,9 +127,13 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) if (!m->chunk_pack_names) die(_("multi-pack-index missing required pack-name chunk")); + if (!m->chunk_oid_fanout) + die(_("multi-pack-index missing required OID fanout chunk")); if (!m->chunk_oid_lookup) die(_("multi-pack-index missing required OID lookup chunk")); + m->num_objects = ntohl(m->chunk_oid_fanout[255]); + m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); cur_pack_name = (const char *)m->chunk_pack_names; @@ -389,6 +399,35 @@ static size_t write_midx_pack_names(struct hashfile *f, return written; } +static size_t write_midx_oid_fanout(struct hashfile *f, + struct pack_midx_entry *objects, + uint32_t nr_objects) +{ + struct pack_midx_entry *list = objects; + struct pack_midx_entry *last = objects + nr_objects; + uint32_t count = 0; + uint32_t i; + + /* + * Write the first-level table (the list is sorted, + * but we use a 256-entry lookup to be able to avoid + * having to do eight extra binary search iterations). + */ + for (i = 0; i < 256; i++) { + struct pack_midx_entry *next = list; + + while (next < last && next->oid.hash[0] == i) { + count++; + next++; + } + + hashwrite_be32(f, count); + list = next; + } + + return MIDX_CHUNK_FANOUT_SIZE; +} + static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, struct pack_midx_entry *objects, uint32_t nr_objects) @@ -461,7 +500,7 @@ int write_midx_file(const char *object_dir) FREE_AND_NULL(midx_name); cur_chunk = 0; - num_chunks = 2; + num_chunks = 3; written = write_midx_header(f, num_chunks, packs.nr); @@ -469,9 +508,13 @@ int write_midx_file(const char *object_dir) chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH; cur_chunk++; - chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; + chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + packs.pack_name_concat_len; + cur_chunk++; + chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE; + cur_chunk++; chunk_ids[cur_chunk] = 0; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_HASH_LEN; @@ -505,6 +548,10 @@ int write_midx_file(const char *object_dir) written += write_midx_pack_names(f, packs.names, packs.nr); break; + case MIDX_CHUNKID_OIDFANOUT: + written += write_midx_oid_fanout(f, entries, nr_entries); + break; + case MIDX_CHUNKID_OIDLOOKUP: written += write_midx_oid_lookup(f, MIDX_HASH_LEN, entries, nr_entries); break; diff --git a/midx.h b/midx.h index 4d3bceafc58058..8572cf0f4b3540 100644 --- a/midx.h +++ b/midx.h @@ -15,6 +15,7 @@ struct multi_pack_index { uint32_t num_objects; const unsigned char *chunk_pack_names; + const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; const char **pack_names; diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index de6d452a7ced7d..f7c17b0940d342 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -22,10 +22,12 @@ static int read_midx_file(const char *object_dir) if (m->chunk_pack_names) printf(" pack-names"); + if (m->chunk_oid_fanout) + printf(" oid-fanout"); if (m->chunk_oid_lookup) printf(" oid-lookup"); - printf("\n"); + printf("\nnum_objects: %d\n", m->num_objects); printf("packs:\n"); for (i = 0; i < m->num_packs; i++) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 4813610115783f..1cf609958f221f 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -5,10 +5,12 @@ test_description='multi-pack-indexes' midx_read_expect () { NUM_PACKS=$1 + NUM_OBJECTS=$2 { cat <<-EOF && - header: 4d494458 1 2 $NUM_PACKS - chunks: pack-names oid-lookup + header: 4d494458 1 3 $NUM_PACKS + chunks: pack-names oid-fanout oid-lookup + num_objects: $NUM_OBJECTS packs: EOF if test $NUM_PACKS -ge 1 @@ -24,7 +26,7 @@ midx_read_expect () { test_expect_success 'write midx with no packs' ' test_when_finished rm -f pack/multi-pack-index && git multi-pack-index --object-dir=. write && - midx_read_expect 0 + midx_read_expect 0 0 ' generate_objects () { @@ -74,13 +76,13 @@ test_expect_success 'write midx with one v1 pack' ' pack=$(git pack-objects --index-version=1 pack/test Date: Fri, 1 Jun 2018 13:09:11 -0400 Subject: [PATCH 15/23] midx: write object offsets The final pair of chunks for the multi-pack-index file stores the object offsets. We default to using 32-bit offsets as in the pack-index version 1 format, but if there exists an offset larger than 32-bits, we use a trick similar to the pack-index version 2 format by storing all offsets at least 2^31 in a 64-bit table; we use the 32-bit table to point into that 64-bit table as necessary. We only store these 64-bit offsets if necessary, so create a test that manipulates a version 2 pack-index to fake a large offset. This allows us to test that the large offset table is created, but the data does not match the actual packfile offsets. The multi-pack-index offset does match the (corrupted) pack-index offset, so a future feature will compare these offsets during a 'verify' step. Signed-off-by: Derrick Stolee --- Documentation/technical/pack-format.txt | 15 +++- midx.c | 100 +++++++++++++++++++++++- midx.h | 2 + t/helper/test-read-midx.c | 4 + t/t5319-multi-pack-index.sh | 49 +++++++++--- 5 files changed, 155 insertions(+), 15 deletions(-) diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 3215f7bfcdbda9..cab5bdd2ff0f88 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -311,7 +311,20 @@ CHUNK DATA: The OIDs for all objects in the MIDX are stored in lexicographic order in this chunk. - (This section intentionally left incomplete.) + Object Offsets (ID: {'O', 'O', 'F', 'F'}) + Stores two 4-byte values for every object. + 1: The pack-int-id for the pack storing this object. + 2: The offset within the pack. + If all offsets are less than 2^31, then the large offset chunk + will not exist and offsets are stored as in IDX v1. + If there is at least one offset value larger than 2^32-1, then + the large offset chunk must exist. If the large offset chunk + exists and the 31st bit is on, then removing that bit reveals + the row in the large offsets containing the 8-byte offset of + this object. + + [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'}) + 8-byte offsets into large packfiles. TRAILER: diff --git a/midx.c b/midx.c index 7a954eb0cd317a..e83110ae92b476 100644 --- a/midx.c +++ b/midx.c @@ -18,13 +18,18 @@ #define MIDX_HASH_LEN 20 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN) -#define MIDX_MAX_CHUNKS 3 +#define MIDX_MAX_CHUNKS 5 #define MIDX_CHUNK_ALIGNMENT 4 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */ #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ +#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */ +#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */ #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t)) #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256) +#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t)) +#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t)) +#define MIDX_LARGE_OFFSET_NEEDED 0x80000000 static char *get_midx_filename(const char *object_dir) { @@ -112,6 +117,14 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) m->chunk_oid_lookup = m->data + chunk_offset; break; + case MIDX_CHUNKID_OBJECTOFFSETS: + m->chunk_object_offsets = m->data + chunk_offset; + break; + + case MIDX_CHUNKID_LARGEOFFSETS: + m->chunk_large_offsets = m->data + chunk_offset; + break; + case 0: die(_("terminating multi-pack-index chunk id appears earlier than expected")); break; @@ -131,6 +144,8 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) die(_("multi-pack-index missing required OID fanout chunk")); if (!m->chunk_oid_lookup) die(_("multi-pack-index missing required OID lookup chunk")); + if (!m->chunk_object_offsets) + die(_("multi-pack-index missing required object offsets chunk")); m->num_objects = ntohl(m->chunk_oid_fanout[255]); @@ -454,6 +469,56 @@ static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len, return written; } +static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed, + struct pack_midx_entry *objects, uint32_t nr_objects) +{ + struct pack_midx_entry *list = objects; + uint32_t i, nr_large_offset = 0; + size_t written = 0; + + for (i = 0; i < nr_objects; i++) { + struct pack_midx_entry *obj = list++; + + hashwrite_be32(f, obj->pack_int_id); + + if (large_offset_needed && obj->offset >> 31) + hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++); + else if (!large_offset_needed && obj->offset >> 32) + BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!", + oid_to_hex(&obj->oid), + obj->offset); + else + hashwrite_be32(f, (uint32_t)obj->offset); + + written += MIDX_CHUNK_OFFSET_WIDTH; + } + + return written; +} + +static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset, + struct pack_midx_entry *objects, uint32_t nr_objects) +{ + struct pack_midx_entry *list = objects; + size_t written = 0; + + while (nr_large_offset) { + struct pack_midx_entry *obj = list++; + uint64_t offset = obj->offset; + + if (!(offset >> 31)) + continue; + + hashwrite_be32(f, offset >> 32); + hashwrite_be32(f, offset & 0xffffffffUL); + written += 2 * sizeof(uint32_t); + + nr_large_offset--; + } + + return written; +} + int write_midx_file(const char *object_dir) { unsigned char cur_chunk, num_chunks = 0; @@ -466,8 +531,9 @@ int write_midx_file(const char *object_dir) uint64_t written = 0; uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1]; uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1]; - uint32_t nr_entries; + uint32_t nr_entries, num_large_offsets = 0; struct pack_midx_entry *entries = NULL; + int large_offsets_needed = 0; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -494,13 +560,19 @@ int write_midx_file(const char *object_dir) sort_packs_by_name(packs.names, packs.nr, pack_perm); entries = get_sorted_entries(packs.list, pack_perm, packs.nr, &nr_entries); + for (i = 0; i < nr_entries; i++) { + if (entries[i].offset > 0x7fffffff) + num_large_offsets++; + if (entries[i].offset > 0xffffffff) + large_offsets_needed = 1; + } hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); cur_chunk = 0; - num_chunks = 3; + num_chunks = large_offsets_needed ? 5 : 4; written = write_midx_header(f, num_chunks, packs.nr); @@ -516,9 +588,21 @@ int write_midx_file(const char *object_dir) chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE; cur_chunk++; - chunk_ids[cur_chunk] = 0; + chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS; chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_HASH_LEN; + cur_chunk++; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH; + if (large_offsets_needed) { + chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS; + + cur_chunk++; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + + num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH; + } + + chunk_ids[cur_chunk] = 0; + for (i = 0; i <= num_chunks; i++) { if (i && chunk_offsets[i] < chunk_offsets[i - 1]) BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64, @@ -556,6 +640,14 @@ int write_midx_file(const char *object_dir) written += write_midx_oid_lookup(f, MIDX_HASH_LEN, entries, nr_entries); break; + case MIDX_CHUNKID_OBJECTOFFSETS: + written += write_midx_object_offsets(f, large_offsets_needed, entries, nr_entries); + break; + + case MIDX_CHUNKID_LARGEOFFSETS: + written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries); + break; + default: BUG("trying to write unknown chunk id %"PRIx32, chunk_ids[i]); diff --git a/midx.h b/midx.h index 8572cf0f4b3540..e15966272ff4a5 100644 --- a/midx.h +++ b/midx.h @@ -17,6 +17,8 @@ struct multi_pack_index { const unsigned char *chunk_pack_names; const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; + const unsigned char *chunk_object_offsets; + const unsigned char *chunk_large_offsets; const char **pack_names; char object_dir[FLEX_ARRAY]; diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index f7c17b0940d342..8e19972e8939d1 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -26,6 +26,10 @@ static int read_midx_file(const char *object_dir) printf(" oid-fanout"); if (m->chunk_oid_lookup) printf(" oid-lookup"); + if (m->chunk_object_offsets) + printf(" object-offsets"); + if (m->chunk_large_offsets) + printf(" large-offsets"); printf("\nnum_objects: %d\n", m->num_objects); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 1cf609958f221f..4a4fa26f7a5225 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -6,27 +6,30 @@ test_description='multi-pack-indexes' midx_read_expect () { NUM_PACKS=$1 NUM_OBJECTS=$2 + NUM_CHUNKS=$3 + OBJECT_DIR=$4 + EXTRA_CHUNKS="$5" { cat <<-EOF && - header: 4d494458 1 3 $NUM_PACKS - chunks: pack-names oid-fanout oid-lookup + header: 4d494458 1 $NUM_CHUNKS $NUM_PACKS + chunks: pack-names oid-fanout oid-lookup object-offsets$EXTRA_CHUNKS num_objects: $NUM_OBJECTS packs: EOF if test $NUM_PACKS -ge 1 then - ls pack/ | grep idx | sort + ls $OBJECT_DIR/pack/ | grep idx | sort fi && - printf "object-dir: .\n" + printf "object-dir: $OBJECT_DIR\n" } >expect && - test-tool read-midx . >actual && + test-tool read-midx $OBJECT_DIR >actual && test_cmp expect actual } test_expect_success 'write midx with no packs' ' test_when_finished rm -f pack/multi-pack-index && git multi-pack-index --object-dir=. write && - midx_read_expect 0 0 + midx_read_expect 0 0 4 . ' generate_objects () { @@ -76,13 +79,13 @@ test_expect_success 'write midx with one v1 pack' ' pack=$(git pack-objects --index-version=1 pack/test [] +corrupt_data () { + file=$1 + pos=$2 + data="${3:-\0}" + printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc +} + +# Force 64-bit offsets by manipulating the idx file. +# This makes the IDX file _incorrect_ so be careful to clean up after! +test_expect_success 'force some 64-bit offsets with pack-objects' ' + mkdir objects64 && + mkdir objects64/pack && + for i in $(test_seq 1 11) + do + generate_objects 11 + done && + commit_and_list_objects && + pack64=$(git pack-objects --index-version=2,0x40 objects64/pack/test-64 Date: Fri, 1 Jun 2018 15:21:13 -0400 Subject: [PATCH 16/23] config: create core.multiPackIndex setting The core.multiPackIndex config setting controls the multi-pack- index (MIDX) feature. If false, the setting will disable all reads from the multi-pack-index file. Read this config setting in the new prepare_multi_pack_index_one() which is called during prepare_packed_git(). This check is run once per repository. Add comparison commands in t5319-multi-pack-index.sh to check typical Git behavior remains the same as the config setting is turned on and off. This currently includes 'git rev-list' and 'git log' commands to trigger several object database reads. Currently, these would only catch an error in the prepare_multi_pack_index_one(), but with later commits will catch errors in object lookups, abbreviations, and approximate object counts. Signed-off-by: Derrick Stolee midx: prepare midxed_git struct Signed-off-by: Derrick Stolee --- Documentation/config.txt | 5 ++++ midx.c | 25 ++++++++++++++++++ midx.h | 5 ++++ object-store.h | 7 +++++ packfile.c | 6 ++++- t/t5319-multi-pack-index.sh | 51 +++++++++++++++++++++++++++---------- 6 files changed, 85 insertions(+), 14 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a9984b..25f817ca427c45 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -908,6 +908,11 @@ core.commitGraph:: Enable git commit graph feature. Allows reading from the commit-graph file. +core.multiPackIndex:: + Use the multi-pack-index file to track multiple packfiles using a + single index. See link:technical/multi-pack-index.html[the + multi-pack-index design document]. + core.sparseCheckout:: Enable "sparse checkout" feature. See section "Sparse checkout" in linkgit:git-read-tree[1] for more information. diff --git a/midx.c b/midx.c index e83110ae92b476..4090cf4ca42fe5 100644 --- a/midx.c +++ b/midx.c @@ -1,4 +1,5 @@ #include "cache.h" +#include "config.h" #include "csum-file.h" #include "dir.h" #include "lockfile.h" @@ -177,6 +178,30 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) return NULL; } +int prepare_multi_pack_index_one(struct repository *r, const char *object_dir) +{ + struct multi_pack_index *m = r->objects->multi_pack_index; + struct multi_pack_index *m_search; + int config_value; + + if (repo_config_get_bool(r, "core.multipackindex", &config_value) || + !config_value) + return 0; + + for (m_search = m; m_search; m_search = m_search->next) + if (!strcmp(object_dir, m_search->object_dir)) + return 1; + + r->objects->multi_pack_index = load_multi_pack_index(object_dir); + + if (r->objects->multi_pack_index) { + r->objects->multi_pack_index->next = m; + return 1; + } + + return 0; +} + static size_t write_midx_header(struct hashfile *f, unsigned char num_chunks, uint32_t num_packs) diff --git a/midx.h b/midx.h index e15966272ff4a5..9bcfc82d2ec3b2 100644 --- a/midx.h +++ b/midx.h @@ -1,7 +1,11 @@ #ifndef __MIDX_H__ #define __MIDX_H__ +#include "repository.h" + struct multi_pack_index { + struct multi_pack_index *next; + int fd; const unsigned char *data; @@ -25,6 +29,7 @@ struct multi_pack_index { }; struct multi_pack_index *load_multi_pack_index(const char *object_dir); +int prepare_multi_pack_index_one(struct repository *r, const char *object_dir); int write_midx_file(const char *object_dir); diff --git a/object-store.h b/object-store.h index 13a766aea8ece9..c2b162489a161f 100644 --- a/object-store.h +++ b/object-store.h @@ -105,6 +105,13 @@ struct raw_object_store { */ struct oidmap *replace_map; + /* + * private data + * + * should only be accessed directly by packfile.c and midx.c + */ + struct multi_pack_index *multi_pack_index; + /* * private data * diff --git a/packfile.c b/packfile.c index 3d652212c60633..5d4493dbf4f9a4 100644 --- a/packfile.c +++ b/packfile.c @@ -15,6 +15,7 @@ #include "tree-walk.h" #include "tree.h" #include "object-store.h" +#include "midx.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -935,10 +936,13 @@ static void prepare_packed_git(struct repository *r) if (r->objects->packed_git_initialized) return; + prepare_multi_pack_index_one(r, r->objects->objectdir); prepare_packed_git_one(r, r->objects->objectdir, 1); prepare_alt_odb(r); - for (alt = r->objects->alt_odb_list; alt; alt = alt->next) + for (alt = r->objects->alt_odb_list; alt; alt = alt->next) { + prepare_multi_pack_index_one(r, alt->path); prepare_packed_git_one(r, alt->path, 0); + } rearrange_packed_git(r); prepare_packed_git_mru(r); r->objects->packed_git_initialized = 1; diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 4a4fa26f7a5225..601e28a2f06ff8 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -3,6 +3,8 @@ test_description='multi-pack-indexes' . ./test-lib.sh +objdir=.git/objects + midx_read_expect () { NUM_PACKS=$1 NUM_OBJECTS=$2 @@ -76,18 +78,35 @@ test_expect_success 'create objects' ' ' test_expect_success 'write midx with one v1 pack' ' - pack=$(git pack-objects --index-version=1 pack/test expect && + git -c core.multiPackIndex=true $1 >actual && + test_cmp expect actual +} + +compare_results_with_midx() { + MSG=$1 + test_expect_success "check normal git operations: $MSG" ' + midx_git_two_modes "rev-list --objects --all" && + midx_git_two_modes "log --raw" + ' +} + test_expect_success 'write midx with one v2 pack' ' - git pack-objects --index-version=2,0x40 pack/test [] corrupt_data () { file=$1 From d9af1051c7f8bf3c56e731962b4711777619d139 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 1 Jun 2018 15:56:40 -0400 Subject: [PATCH 17/23] midx: read objects from multi-pack-index Signed-off-by: Derrick Stolee --- midx.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++- midx.h | 3 ++ packfile.c | 8 ++++- 3 files changed, 100 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index 4090cf4ca42fe5..182535933c33ed 100644 --- a/midx.c +++ b/midx.c @@ -5,7 +5,7 @@ #include "lockfile.h" #include "packfile.h" #include "object-store.h" -#include "packfile.h" +#include "sha1-lookup.h" #include "midx.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ @@ -151,6 +151,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) m->num_objects = ntohl(m->chunk_oid_fanout[255]); m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names)); + m->packs = xcalloc(m->num_packs, sizeof(*m->packs)); cur_pack_name = (const char *)m->chunk_pack_names; for (i = 0; i < m->num_packs; i++) { @@ -178,6 +179,94 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) return NULL; } +static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id) +{ + struct strbuf pack_name = STRBUF_INIT; + + if (pack_int_id >= m->num_packs) + BUG("bad pack-int-id"); + + if (m->packs[pack_int_id]) + return 0; + + strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir, + m->pack_names[pack_int_id]); + + m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 1); + strbuf_release(&pack_name); + return !m->packs[pack_int_id]; +} + +int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result) +{ + return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup, + MIDX_HASH_LEN, result); +} + +static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos) +{ + const unsigned char *offset_data; + uint32_t offset32; + + offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH; + offset32 = get_be32(offset_data + sizeof(uint32_t)); + + if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) { + if (sizeof(offset32) < sizeof(uint64_t)) + die(_("multi-pack-index stores a 64-bit offset, but off_t is too small")); + + offset32 ^= MIDX_LARGE_OFFSET_NEEDED; + return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * offset32); + } + + return offset32; +} + +static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos) +{ + return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH); +} + +static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry *e, uint32_t pos) +{ + uint32_t pack_int_id; + struct packed_git *p; + + if (pos >= m->num_objects) + return 0; + + pack_int_id = nth_midxed_pack_int_id(m, pos); + + if (prepare_midx_pack(m, pack_int_id)) + die(_("error preparing packfile from multi-pack-index")); + p = m->packs[pack_int_id]; + + /* + * We are about to tell the caller where they can locate the + * requested object. We better make sure the packfile is + * still here and can be accessed before supplying that + * answer, as it may have been deleted since the MIDX was + * loaded! + */ + if (!is_pack_valid(p)) + return 0; + + e->offset = nth_midxed_offset(m, pos); + e->p = p; + + return 1; +} + +int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m) +{ + uint32_t pos; + + if (!bsearch_midx(oid, m, &pos)) + return 0; + + return nth_midxed_pack_entry(m, e, pos); +} + int prepare_multi_pack_index_one(struct repository *r, const char *object_dir) { struct multi_pack_index *m = r->objects->multi_pack_index; diff --git a/midx.h b/midx.h index 9bcfc82d2ec3b2..377838c9ca2724 100644 --- a/midx.h +++ b/midx.h @@ -25,10 +25,13 @@ struct multi_pack_index { const unsigned char *chunk_large_offsets; const char **pack_names; + struct packed_git **packs; char object_dir[FLEX_ARRAY]; }; struct multi_pack_index *load_multi_pack_index(const char *object_dir); +int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); +int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir); int write_midx_file(const char *object_dir); diff --git a/packfile.c b/packfile.c index 5d4493dbf4f9a4..bc763d91b9b64c 100644 --- a/packfile.c +++ b/packfile.c @@ -1902,11 +1902,17 @@ static int fill_pack_entry(const struct object_id *oid, int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e) { struct list_head *pos; + struct multi_pack_index *m; prepare_packed_git(r); - if (!r->objects->packed_git) + if (!r->objects->packed_git && !r->objects->multi_pack_index) return 0; + for (m = r->objects->multi_pack_index; m; m = m->next) { + if (fill_midx_entry(oid, e, m)) + return 1; + } + list_for_each(pos, &r->objects->packed_git_mru) { struct packed_git *p = list_entry(pos, struct packed_git, mru); if (fill_pack_entry(oid, e, p)) { From 4404aedb6180322bab37f3d097c4be7d7a27e833 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 1 Jun 2018 16:24:19 -0400 Subject: [PATCH 18/23] midx: use midx in abbreviation calculations Signed-off-by: Derrick Stolee --- midx.c | 11 +++++++++ midx.h | 3 +++ packfile.c | 6 +++++ packfile.h | 1 + sha1-name.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 91 insertions(+) diff --git a/midx.c b/midx.c index 182535933c33ed..4e014ff6e3b189 100644 --- a/midx.c +++ b/midx.c @@ -203,6 +203,17 @@ int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32 MIDX_HASH_LEN, result); } +struct object_id *nth_midxed_object_oid(struct object_id *oid, + struct multi_pack_index *m, + uint32_t n) +{ + if (n >= m->num_objects) + return NULL; + + hashcpy(oid->hash, m->chunk_oid_lookup + m->hash_len * n); + return oid; +} + static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos) { const unsigned char *offset_data; diff --git a/midx.h b/midx.h index 377838c9ca2724..1b976df8735da6 100644 --- a/midx.h +++ b/midx.h @@ -31,6 +31,9 @@ struct multi_pack_index { struct multi_pack_index *load_multi_pack_index(const char *object_dir); int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); +struct object_id *nth_midxed_object_oid(struct object_id *oid, + struct multi_pack_index *m, + uint32_t n); int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir); diff --git a/packfile.c b/packfile.c index bc763d91b9b64c..c0eb5ac885f5ff 100644 --- a/packfile.c +++ b/packfile.c @@ -961,6 +961,12 @@ struct packed_git *get_packed_git(struct repository *r) return r->objects->packed_git; } +struct multi_pack_index *get_multi_pack_index(struct repository *r) +{ + prepare_packed_git(r); + return r->objects->multi_pack_index; +} + struct list_head *get_packed_git_mru(struct repository *r) { prepare_packed_git(r); diff --git a/packfile.h b/packfile.h index b0eed44c0b1e35..046280caf372ff 100644 --- a/packfile.h +++ b/packfile.h @@ -45,6 +45,7 @@ extern void install_packed_git(struct repository *r, struct packed_git *pack); struct packed_git *get_packed_git(struct repository *r); struct list_head *get_packed_git_mru(struct repository *r); +struct multi_pack_index *get_multi_pack_index(struct repository *r); /* * Give a rough count of objects in the repository. This sacrifices accuracy diff --git a/sha1-name.c b/sha1-name.c index 60d9ef3c7e7108..7dc71201e68be9 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -12,6 +12,7 @@ #include "packfile.h" #include "object-store.h" #include "repository.h" +#include "midx.h" static int get_oid_oneline(const char *, struct object_id *, struct commit_list *); @@ -149,6 +150,32 @@ static int match_sha(unsigned len, const unsigned char *a, const unsigned char * return 1; } +static void unique_in_midx(struct multi_pack_index *m, + struct disambiguate_state *ds) +{ + uint32_t num, i, first = 0; + const struct object_id *current = NULL; + num = m->num_objects; + + if (!num) + return; + + bsearch_midx(&ds->bin_pfx, m, &first); + + /* + * At this point, "first" is the location of the lowest object + * with an object name that could match "bin_pfx". See if we have + * 0, 1 or more objects that actually match(es). + */ + for (i = first; i < num && !ds->ambiguous; i++) { + struct object_id oid; + current = nth_midxed_object_oid(&oid, m, i); + if (!match_sha(ds->len, ds->bin_pfx.hash, current->hash)) + break; + update_candidates(ds, current); + } +} + static void unique_in_pack(struct packed_git *p, struct disambiguate_state *ds) { @@ -177,8 +204,12 @@ static void unique_in_pack(struct packed_git *p, static void find_short_packed_object(struct disambiguate_state *ds) { + struct multi_pack_index *m; struct packed_git *p; + for (m = get_multi_pack_index(the_repository); m && !ds->ambiguous; + m = m->next) + unique_in_midx(m, ds); for (p = get_packed_git(the_repository); p && !ds->ambiguous; p = p->next) unique_in_pack(p, ds); @@ -527,6 +558,42 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data) return 0; } +static void find_abbrev_len_for_midx(struct multi_pack_index *m, + struct min_abbrev_data *mad) +{ + int match = 0; + uint32_t num, first = 0; + struct object_id oid; + const struct object_id *mad_oid; + + if (!m->num_objects) + return; + + num = m->num_objects; + mad_oid = mad->oid; + match = bsearch_midx(mad_oid, m, &first); + + /* + * first is now the position in the packfile where we would insert + * mad->hash if it does not exist (or the position of mad->hash if + * it does exist). Hence, we consider a maximum of two objects + * nearby for the abbreviation length. + */ + mad->init_len = 0; + if (!match) { + if (nth_midxed_object_oid(&oid, m, first)) + extend_abbrev_len(&oid, mad); + } else if (first < num - 1) { + if (nth_midxed_object_oid(&oid, m, first + 1)) + extend_abbrev_len(&oid, mad); + } + if (first > 0) { + if (nth_midxed_object_oid(&oid, m, first - 1)) + extend_abbrev_len(&oid, mad); + } + mad->init_len = mad->cur_len; +} + static void find_abbrev_len_for_pack(struct packed_git *p, struct min_abbrev_data *mad) { @@ -565,8 +632,11 @@ static void find_abbrev_len_for_pack(struct packed_git *p, static void find_abbrev_len_packed(struct min_abbrev_data *mad) { + struct multi_pack_index *m; struct packed_git *p; + for (m = get_multi_pack_index(the_repository); m; m = m->next) + find_abbrev_len_for_midx(m, mad); for (p = get_packed_git(the_repository); p; p = p->next) find_abbrev_len_for_pack(p, mad); } From 3d6d1158be79dd2979c36a4f8f7dc4fc9ec30645 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 5 Jun 2018 10:49:31 -0400 Subject: [PATCH 19/23] midx: use existing midx when writing new one Due to how Windows handles replacing a lockfile when there is an open handle, create the close_midx() method to close the existing midx before writing the new one. Signed-off-by: Derrick Stolee --- midx.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- midx.h | 1 + 2 files changed, 111 insertions(+), 6 deletions(-) diff --git a/midx.c b/midx.c index 4e014ff6e3b189..bf2334acc6f2a2 100644 --- a/midx.c +++ b/midx.c @@ -179,6 +179,23 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir) return NULL; } +static void close_midx(struct multi_pack_index *m) +{ + uint32_t i; + munmap((unsigned char *)m->data, m->data_len); + close(m->fd); + m->fd = -1; + + for (i = 0; i < m->num_packs; i++) { + if (m->packs[i]) { + close_pack(m->packs[i]); + free(m->packs); + } + } + FREE_AND_NULL(m->packs); + FREE_AND_NULL(m->pack_names); +} + static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id) { struct strbuf pack_name = STRBUF_INIT; @@ -278,6 +295,29 @@ int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct mu return nth_midxed_pack_entry(m, e, pos); } +int midx_contains_pack(struct multi_pack_index *m, const char *idx_name) +{ + uint32_t first = 0, last = m->num_packs; + + while (first < last) { + uint32_t mid = first + (last - first) / 2; + const char *current; + int cmp; + + current = m->pack_names[mid]; + cmp = strcmp(idx_name, current); + if (!cmp) + return 1; + if (cmp > 0) { + first = mid + 1; + continue; + } + last = mid; + } + + return 0; +} + int prepare_multi_pack_index_one(struct repository *r, const char *object_dir) { struct multi_pack_index *m = r->objects->multi_pack_index; @@ -326,6 +366,7 @@ struct pack_list { uint32_t alloc_list; uint32_t alloc_names; size_t pack_name_concat_len; + struct multi_pack_index *m; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -334,6 +375,9 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, struct pack_list *packs = (struct pack_list *)data; if (ends_with(file_name, ".idx")) { + if (packs->m && midx_contains_pack(packs->m, file_name)) + return; + ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list); ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names); @@ -419,6 +463,23 @@ static int midx_oid_compare(const void *_a, const void *_b) return a->pack_int_id - b->pack_int_id; } +static int nth_midxed_pack_midx_entry(struct multi_pack_index *m, + uint32_t *pack_perm, + struct pack_midx_entry *e, + uint32_t pos) +{ + if (pos >= m->num_objects) + return 1; + + nth_midxed_object_oid(&e->oid, m, pos); + e->pack_int_id = pack_perm[nth_midxed_pack_int_id(m, pos)]; + e->offset = nth_midxed_offset(m, pos); + + /* consider objects in midx to be from "old" packs */ + e->pack_mtime = 0; + return 0; +} + static void fill_pack_entry(uint32_t pack_int_id, struct packed_git *p, uint32_t cur_object, @@ -444,7 +505,8 @@ static void fill_pack_entry(uint32_t pack_int_id, * Copy only the de-duplicated entries (selected by most-recent modified time * of a packfile containing the object). */ -static struct pack_midx_entry *get_sorted_entries(struct packed_git **p, +static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, + struct packed_git **p, uint32_t *perm, uint32_t nr_packs, uint32_t *nr_objects) @@ -453,8 +515,9 @@ static struct pack_midx_entry *get_sorted_entries(struct packed_git **p, uint32_t alloc_fanout, alloc_objects, total_objects = 0; struct pack_midx_entry *entries_by_fanout = NULL; struct pack_midx_entry *deduplicated_entries = NULL; + uint32_t start_pack = m ? m->num_packs : 0; - for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) + for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) total_objects += p[cur_pack]->num_objects; /* @@ -471,7 +534,23 @@ static struct pack_midx_entry *get_sorted_entries(struct packed_git **p, for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { uint32_t nr_fanout = 0; - for (cur_pack = 0; cur_pack < nr_packs; cur_pack++) { + if (m) { + uint32_t start = 0, end; + + if (cur_fanout) + start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); + end = ntohl(m->chunk_oid_fanout[cur_fanout]); + + for (cur_object = start; cur_object < end; cur_object++) { + ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + nth_midxed_pack_midx_entry(m, perm, + &entries_by_fanout[nr_fanout], + cur_object); + nr_fanout++; + } + } + + for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { uint32_t start = 0, end; if (cur_fanout) @@ -667,16 +746,34 @@ int write_midx_file(const char *object_dir) midx_name); } + packs.m = load_multi_pack_index(object_dir); + packs.nr = 0; - packs.alloc_list = 16; - packs.alloc_names = 16; + packs.alloc_list = packs.m ? packs.m->num_packs : 16; + packs.alloc_names = packs.alloc_list; packs.list = NULL; + packs.names = NULL; packs.pack_name_concat_len = 0; ALLOC_ARRAY(packs.list, packs.alloc_list); ALLOC_ARRAY(packs.names, packs.alloc_names); + if (packs.m) { + for (i = 0; i < packs.m->num_packs; i++) { + ALLOC_GROW(packs.list, packs.nr + 1, packs.alloc_list); + ALLOC_GROW(packs.names, packs.nr + 1, packs.alloc_names); + + packs.list[packs.nr] = NULL; + packs.names[packs.nr] = xstrdup(packs.m->pack_names[i]); + packs.pack_name_concat_len += strlen(packs.names[packs.nr]) + 1; + packs.nr++; + } + } + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs); + if (packs.m && packs.nr == packs.m->num_packs) + goto cleanup; + if (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) packs.pack_name_concat_len += MIDX_CHUNK_ALIGNMENT - (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT); @@ -684,7 +781,8 @@ int write_midx_file(const char *object_dir) ALLOC_ARRAY(pack_perm, packs.nr); sort_packs_by_name(packs.names, packs.nr, pack_perm); - entries = get_sorted_entries(packs.list, pack_perm, packs.nr, &nr_entries); + entries = get_sorted_entries(packs.m, packs.list, pack_perm, packs.nr, &nr_entries); + for (i = 0; i < nr_entries; i++) { if (entries[i].offset > 0x7fffffff) num_large_offsets++; @@ -696,6 +794,9 @@ int write_midx_file(const char *object_dir) f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); + if (packs.m) + close_midx(packs.m); + cur_chunk = 0; num_chunks = large_offsets_needed ? 5 : 4; @@ -787,6 +888,7 @@ int write_midx_file(const char *object_dir) finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM); commit_lock_file(&lk); +cleanup: for (i = 0; i < packs.nr; i++) { if (packs.list[i]) { close_pack(packs.list[i]); @@ -798,5 +900,7 @@ int write_midx_file(const char *object_dir) free(packs.list); free(packs.names); free(entries); + free(pack_perm); + free(midx_name); return 0; } diff --git a/midx.h b/midx.h index 1b976df8735da6..d4cde99473aa1d 100644 --- a/midx.h +++ b/midx.h @@ -35,6 +35,7 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid, struct multi_pack_index *m, uint32_t n); int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m); +int midx_contains_pack(struct multi_pack_index *m, const char *idx_name); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir); int write_midx_file(const char *object_dir); From 5169b1c53439ba067f13e1d24168c28aa7220206 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 7 Jun 2018 08:12:54 -0400 Subject: [PATCH 20/23] midx: use midx in approximate_object_count Signed-off-by: Derrick Stolee --- packfile.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/packfile.c b/packfile.c index c0eb5ac885f5ff..97e7812b6be638 100644 --- a/packfile.c +++ b/packfile.c @@ -861,10 +861,13 @@ unsigned long approximate_object_count(void) { if (!the_repository->objects->approximate_object_count_valid) { unsigned long count; + struct multi_pack_index *m; struct packed_git *p; prepare_packed_git(the_repository); count = 0; + for (m = get_multi_pack_index(the_repository); m; m = m->next) + count += m->num_objects; for (p = the_repository->objects->packed_git; p; p = p->next) { if (open_pack_index(p)) continue; From 7407381014d1ae2dbfcb87d29fa21b7045030018 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 1 Jun 2018 16:31:58 -0400 Subject: [PATCH 21/23] midx: prevent duplicate packfile loads The multi-pack-index, when present, tracks the existence of objects and their offsets within a list of packfiles. This allows us to use the multi-pack-index for object lookups, abbreviations, and object counts. When the multi-pack-index tracks a packfile, then we do not need to add that packfile to the packed_git linked list or the MRU list. We still need to load the packfiles that are not tracked by the multi-pack-index. Signed-off-by: Derrick Stolee --- packfile.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/packfile.c b/packfile.c index 97e7812b6be638..2c819a0ad8deb3 100644 --- a/packfile.c +++ b/packfile.c @@ -795,6 +795,7 @@ struct prepare_pack_data { struct repository *r; struct string_list *garbage; int local; + struct multi_pack_index *m; }; static void prepare_pack(const char *full_name, size_t full_name_len, @@ -805,6 +806,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len, size_t base_len = full_name_len; if (strip_suffix_mem(full_name, &base_len, ".idx")) { + if (data->m && midx_contains_pack(data->m, file_name)) + return; /* Don't reopen a pack we already have. */ for (p = data->r->objects->packed_git; p; p = p->next) { size_t len; @@ -839,6 +842,12 @@ static void prepare_packed_git_one(struct repository *r, char *objdir, int local struct prepare_pack_data data; struct string_list garbage = STRING_LIST_INIT_DUP; + data.m = r->objects->multi_pack_index; + + /* look for the multi-pack-index for this object directory */ + while (data.m && strcmp(data.m->object_dir, objdir)) + data.m = data.m->next; + data.r = r; data.garbage = &garbage; data.local = local; From 79c1ce26368f98173dc5d0cd0ce24114d31cf94e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 19 Jun 2018 13:08:18 -0400 Subject: [PATCH 22/23] packfile: skip loading index if in multi-pack-index Signed-off-by: Derrick Stolee --- packfile.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/packfile.c b/packfile.c index 2c819a0ad8deb3..e6ecf12ab581bf 100644 --- a/packfile.c +++ b/packfile.c @@ -469,8 +469,19 @@ static int open_packed_git_1(struct packed_git *p) ssize_t read_result; const unsigned hashsz = the_hash_algo->rawsz; - if (!p->index_data && open_pack_index(p)) - return error("packfile %s index unavailable", p->pack_name); + if (!p->index_data) { + struct multi_pack_index *m; + const char *pack_name = strrchr(p->pack_name, '/'); + + for (m = the_repository->objects->multi_pack_index; + m; m = m->next) { + if (midx_contains_pack(m, pack_name)) + break; + } + + if (!m && open_pack_index(p)) + return error("packfile %s index unavailable", p->pack_name); + } if (!pack_max_fds) { unsigned int max_fds = get_max_fd_limit(); @@ -521,6 +532,10 @@ static int open_packed_git_1(struct packed_git *p) " supported (try upgrading GIT to a newer version)", p->pack_name, ntohl(hdr.hdr_version)); + /* Skip index checking if in multi-pack-index */ + if (!p->index_data) + return 0; + /* Verify the pack matches its index. */ if (p->num_objects != ntohl(hdr.hdr_entries)) return error("packfile %s claims to have %"PRIu32" objects" From f3f936360e5717e9df135008702007e1e2929557 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jun 2018 14:17:15 -0400 Subject: [PATCH 23/23] midx: clear midx on repack If a 'git repack' command replaces existing packfiles, then we must clear the existing multi-pack-index before moving the packfiles it references. Signed-off-by: Derrick Stolee --- builtin/repack.c | 9 +++++++++ midx.c | 12 ++++++++++++ midx.h | 1 + t/t5319-multi-pack-index.sh | 9 +++++++++ 4 files changed, 31 insertions(+) diff --git a/builtin/repack.c b/builtin/repack.c index 6c636e159eaf2d..7f7cdc8b17dcd2 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -8,6 +8,7 @@ #include "strbuf.h" #include "string-list.h" #include "argv-array.h" +#include "midx.h" static int delta_base_offset = 1; static int pack_kept_objects = -1; @@ -174,6 +175,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) int no_update_server_info = 0; int quiet = 0; int local = 0; + int midx_cleared = 0; struct option builtin_repack_options[] = { OPT_BIT('a', NULL, &pack_everything, @@ -333,6 +335,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix) for_each_string_list_item(item, &names) { for (ext = 0; ext < ARRAY_SIZE(exts); ext++) { char *fname, *fname_old; + + if (!midx_cleared) { + /* if we move a packfile, it will invalidated the midx */ + clear_midx_file(get_object_directory()); + midx_cleared = 1; + } + fname = mkpathdup("%s/pack-%s%s", packdir, item->string, exts[ext].name); if (!file_exists(fname)) { diff --git a/midx.c b/midx.c index bf2334acc6f2a2..19b7df338ee76b 100644 --- a/midx.c +++ b/midx.c @@ -904,3 +904,15 @@ int write_midx_file(const char *object_dir) free(midx_name); return 0; } + +void clear_midx_file(const char *object_dir) +{ + char *midx = get_midx_filename(object_dir); + + if (remove_path(midx)) { + UNLEAK(midx); + die(_("failed to clear multi-pack-index at %s"), midx); + } + + free(midx); +} diff --git a/midx.h b/midx.h index d4cde99473aa1d..e3b07f15862a75 100644 --- a/midx.h +++ b/midx.h @@ -39,5 +39,6 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_name); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir); int write_midx_file(const char *object_dir); +void clear_midx_file(const char *object_dir); #endif diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 601e28a2f06ff8..5ad6614465c127 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -141,6 +141,15 @@ test_expect_success 'write midx with twelve packs' ' compare_results_with_midx "twelve packs" +test_expect_success 'repack removes multi-pack-index' ' + test_path_is_file $objdir/pack/multi-pack-index && + git repack -adf && + test_path_is_missing $objdir/pack/multi-pack-index +' + +compare_results_with_midx "after repack" + + # usage: corrupt_data [] corrupt_data () { file=$1