ArXASE

ArXiv.org Abstract Similarity Engine

What is arXiv?

The arXiv (pronounced archive) is a repository of electronic preprints, known as e-prints, of scientific papers in the field of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance, which can be accessed online

What we will do with it

Part 1

In this project, we will extract prepints of the API of arXiv, collect the abstract of each article, clean the text (tokenize, lowercase, extract stopwords, lemmatize) and then extract the most important words with TF-IDF, and remove the rest. Then we will start building a network graph(directed), using the papers and the words of each paper as nodes. In order for the nodes to be distinguised between each other(nodes and words), we will assume that paper-nodes will always have non-zero out-degrees and word-nodes the opposite.

Part 2

This graph will then be the basis for a new undirected graph, where all word-nodes will turn into edges. Since we can't have multiple edges between the two same nodes, we will instead set the weight of the edge. The more common words, the higher the weight.

Part 3

Furthermore we will also do community detection on the undirected graph, useful for finding papers in different, and potentially unexpected fields of study.

Part 4

We will create a wordcloud per community and analyse the frequency of each field of study per community

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
fonts		fonts
screens		screens
ArXiv Analysis.ipynb		ArXiv Analysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArXASE

ArXiv.org Abstract Similarity Engine

What is arXiv?

What we will do with it

Part 1

Part 2

Part 3

Part 4

About

Releases

Packages

Contributors 2

Languages

rluch/ArXASE

Folders and files

Latest commit

History

Repository files navigation

ArXASE

ArXiv.org Abstract Similarity Engine

What is arXiv?

What we will do with it

Part 1

Part 2

Part 3

Part 4

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages