这是indexloc提供的服务,不要输入任何密码
Skip to content

[BUG]: GitLab RepoLoader Infinite While Loops #4624

@jonathanortega2023

Description

@jonathanortega2023

How are you running AnythingLLM?

Docker (local)

What happened?

When using the GitLab RepoLoader, there is potential for at least 1 infinite loop:
...
Unexpected response format for /api/v4/projects/developer%2Fdata-source/repository/branches: {message: '404 Project Not Found' }
Unexpected response format for /api/v4/projects/developer%2Fdata-source/repository/branches: {message: '404 Project Not Found' }
Unexpected response format for /api/v4/projects/developer%2Fdata-source/repository/branches: {message: '404 Project Not Found' }
...

I noticed the issue in getRepoBranches(), but it could possibly apply to fetchFilesRecursive() and fetchIssues() as well under certain conditions.

  /**
   * Retrieves all branches for the repository.
   * @returns {Promise<string[]>} An array of branch names.
   */
  async getRepoBranches() {
    if (!this.#validGitlabUrl() || !this.projectId) return [];
    await this.#validateAccessToken();
    this.branches = [];

    const branchesRequestData = {
      endpoint: `/api/v4/projects/${this.projectId}/repository/branches`,
    };

    let branchesPage = [];
    while ((branchesPage = await this.fetchNextPage(branchesRequestData))) {
      this.branches.push(...branchesPage.map((branch) => branch.name));
    }
    return this.#branchPrefSort(this.branches);
  }

This while loop runs on fetchNextPage()

    while ((branchesPage = await this.fetchNextPage(branchesRequestData))) {
      this.branches.push(...branchesPage.map((branch) => branch.name));
    }
  /**
   * Fetches the next page of data from the API.
   * @param {Object} requestData - The request data.
   * @returns {Promise<Array<Object>|null>} The next page of data, or null if no more pages.
   */
  async fetchNextPage(requestData) {
    try {
      if (requestData.page === -1) return null;
      if (!requestData.page) requestData.page = 1;

      const { endpoint, perPage = 100, queryParams = {} } = requestData;
      const params = new URLSearchParams({
        ...queryParams,
        per_page: perPage,
        page: requestData.page,
      });
      const url = `${this.apiBase}${endpoint}?${params.toString()}`;

      const response = await fetch(url, {
        method: "GET",
        headers: this.accessToken ? { "PRIVATE-TOKEN": this.accessToken } : {},
      });

      // Rate limits get hit very often if no PAT is provided
      if (response.status === 401) {
        console.warn(`Rate limit hit for ${endpoint}. Skipping.`);
        return null;
      }

      const totalPages = Number(response.headers.get("x-total-pages"));
      const data = await response.json();
      if (!Array.isArray(data)) {
        console.warn(`Unexpected response format for ${endpoint}:`, data);
        return [];
      }

      console.log(
        `Gitlab RepoLoader: fetched ${endpoint} page ${requestData.page}/${totalPages} with ${data.length} records.`
      );

      if (totalPages === requestData.page) {
        requestData.page = -1;
      } else {
        requestData.page = Number(response.headers.get("x-next-page"));
      }

      return data;
    } catch (e) {
      console.error(`RepoLoader.fetchNextPage`, e);
      return null;
    }
  }

When there are any issues reading the page, the function returns an empty array [ ], which is TRUTHY!

      if (!Array.isArray(data)) {
        console.warn(`Unexpected response format for ${endpoint}:`, data);
        return [];
      }

Note: This tends to happen instantly because the modal tries to fetch as soon as link is entered into the text box, so if you have an access token to add as well, it's already failed. You can subvert this behavior by putting the access token in first.

Are there known steps to reproduce?

Have a non-public GitLab repo
Paste link to that repo into the GitLab connector

Metadata

Metadata

Assignees

No one assigned

    Labels

    possible bugBug was reported but is not confirmed or is unable to be replicated.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions