Wikipedia_Extractor

This Repository will help you get the content and links of all the article related to a particular keyword.

Usage:

This wiki_extractor.py file will take 3 parameter inputs:

keyword:- It should be the word for which we want the related pages links to be extracted.
num_urls:- Number of related pages ,related to a keyword, that we want.
output:- The name of the json file in which the results would be stored.

C:\Users\username\Wikipedia_Extractor>python wiki_extractor.py --keyword="Indian Historical Events" --num_urls=10 --output="output.json"

Once the above command is executed successfully the output file will be generated ,as follows, in the current directory.

[
    {
        "url":"URL for related page no: 1",
        "content":"Content of the related page no: 1"
    },
    {
        "url":"URL for related page no: 2",
        "content":"Content of the related page no: 2"
    }
    ......
    ,
    {
        "url":"URL for related page no: 10",
        "content":"Content of the related page no: 10"
    }
]

Future Scope:

Currently only upto 10,000 related pages can be requested using the wikipedia api, if anyone from the community knows how we can try to extract more related pages please create a PR for that, it would be really helpful.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
utils.py		utils.py
wiki_extractor.py		wiki_extractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia_Extractor

This Repository will help you get the content and links of all the article related to a particular keyword.

Usage:

Future Scope:

About

Releases

Packages

Languages

License

shyampatadia/Wikipedia_Extractor

Folders and files

Latest commit

History

Repository files navigation

Wikipedia_Extractor

This Repository will help you get the content and links of all the article related to a particular keyword.

Usage:

Future Scope:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages