ConfluenceLoader page_ids and label parameters not working #28179

avfranco-br · 2024-11-18T12:02:13Z

avfranco-br
Nov 18, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_community.document_loaders import ConfluenceLoader

loader = ConfluenceLoader(
    url="https://learnitall.atlassian.net/wiki", username="me", api_key=c,
    space_key="space", include_attachments=False, limit=50,
    label="dbr"
)

OR

loader = ConfluenceLoader(
    url="https://learnitall.atlassian.net/wiki", username="me", api_key=c,
    space_key="space", include_attachments=False, limit=50,
    page_ids=["page"]
)

documents = loader.load()
documents

Description

I am trying to load only pages with a specific label e.g. "dbr", but not only all other docs have been loaded as the pages with the label are being duplicated. I've tried using pages_id but same behaviour has happened.

System Info

System Information

OS: Darwin
OS Version: Darwin Kernel Version 24.1.0: Thu Oct 10 21:05:14 PDT 2024; root:xnu-11215.41.3~2/RELEASE_ARM64_T8103
Python Version: 3.11.3 (main, Jun 5 2024, 16:36:41) [Clang 15.0.0 (clang-1500.3.9.4)]

Package Information

langchain_core: 0.3.19
langchain: 0.3.7
langchain_community: 0.3.7
langsmith: 0.1.129
langchain_experimental: 0.3.2
langchain_huggingface: 0.1.0
langchain_openai: 0.2.1
langchain_text_splitters: 0.3.0
langchainhub: 0.1.20
langgraph: 0.2.32

Answered by avfranco-br

Nov 18, 2024

Hi @feijoes, removing space_key from the initialisation has fixed the issue. Now, only the entries that matches pages_id or label are returned. Thanks again for your reply.

View full answer

feijoes · 2024-11-18T14:35:02Z

feijoes
Nov 18, 2024

@avfranco-br maybe it's a version issue? I didn't find any issues with the code you provided, but perhaps the pages don't have the correct label? You could try implementing a custom CustomConfluenceLoader by overriding the _lazy_load method with your own filter:
ConflueceLoader class

class CustomConfluenceLoader(ConfluenceLoader):
    def _lazy_load(self, **kwargs: Any) -> Iterator[Document]:
        # your logit here

2 replies

avfranco-br Nov 18, 2024
Author

@avfranco-br maybe it's a version issue? I didn't find any issues with the code you provided, but perhaps the pages don't have the correct label? You could try implementing a custom CustomConfluenceLoader by overriding the _lazy_load method with your own filter: ConflueceLoader class
class CustomConfluenceLoader(ConfluenceLoader):
    def _lazy_load(self, **kwargs: Any) -> Iterator[Document]:
        # your logit here    

Thanks @feijoes for your reply! I've installed the latest version of langchain_community and atlassian packages. Any other package should I have a look at. Label is correct I'm afraid. Have you used this integration before? Reason I'm asking is what I've noticed is that the loader is returning all pages on the space plus a duplicated entry of those ones that matches the pages_id or label. Do you know if this is the normal behaviour?

avfranco-br Nov 18, 2024
Author

Hi @feijoes, removing space_key from the initialisation has fixed the issue. Now, only the entries that matches pages_id or label are returned. Thanks again for your reply.

Answer selected by avfranco-br

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConfluenceLoader page_ids and label parameters not working #28179

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

ConfluenceLoader page_ids and label parameters not working #28179

avfranco-br Nov 18, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Replies: 1 comment · 2 replies

feijoes Nov 18, 2024

avfranco-br Nov 18, 2024 Author

avfranco-br Nov 18, 2024 Author

avfranco-br
Nov 18, 2024

Replies: 1 comment 2 replies

feijoes
Nov 18, 2024

avfranco-br Nov 18, 2024
Author

avfranco-br Nov 18, 2024
Author