Skip to content

Commit

Permalink
Add financials etc
Browse files Browse the repository at this point in the history
  • Loading branch information
daoudclarke committed Dec 23, 2023
1 parent a2e7935 commit 4b581cf
Showing 1 changed file with 32 additions and 5 deletions.
37 changes: 32 additions & 5 deletions content/articles/indexing-a-billion.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ It's two years since we launched [Mwmbl](https://mwmbl.org), the open
source, non-profit search engine, on Boxing Day 2021. A good time to
take stock of where we are and where we're going.

# We've indexed over 100 million pages
## We've indexed over 100 million pages

Thanks to our volunteers, who crawl the web using the [Firefox
extension](https://addons.mozilla.org/en-GB/firefox/addon/mwmbl-web-crawler/)
Expand All @@ -21,7 +21,7 @@ Given that Mwmbl is still relatively unknown, it seems plausible that
we can reach our target of crawling three billion pages a day, to
refresh the entire index in one month.

## Indexing a billion pages in 2024
### Indexing a billion pages in 2024

Our index is currently still a titchy 40 gigabytes, so scaling this up
by a factor of 10 should be enough to reach this target, and is
Expand All @@ -31,12 +31,12 @@ building our own as [Marginalia Search](https://search.marginalia.nu/)
does. We think we will eventually need to index 100 billion pages in
order to meet our [search quality goals](https://book.mwmbl.org/page/roadmap/).

## Other crawling improvements planned for 2024
##' Other crawling improvements planned for 2024

- Fix [Search queue does not have enough URLs from top domains](https://github.com/mwmbl/mwmbl/issues/140)
- Fix [SEO spam showing up in crawler](https://github.com/mwmbl/mwmbl/issues/141)

# Users can curate search results pages
## Users can curate search results pages

This is a big part of our goal to be a _community driven_ search
engine. Instead of ranking being determined by a secret
Expand All @@ -47,12 +47,39 @@ curations.
We currently have 54 registered users who have curated 241 search
results pages.

## Curation plans for 2024
### Curation plans for 2024

The curation user experience is still in an experimental phase. We
will continue experimenting throughout 2024 to find the best way for
users to contribute to improving search results. All suggestions
gratefully received.

## Financials

Our finances are managed by [Open Collective](https://opencollective.com/mwmbl)
where we currently have funds of $733.36. Our estimated annual budget
is $752.36 and we have spent $174.49. The biggest expense was
purchasing a PyCharm professional license at $116.58, on top of that
we have a GitHib Copilot license (somehow we don't qualify for the
free version, despite being open source), and Mozilla's VPN so that we
can crawl the web without getting blocked.

Most of our finances come from one generous donor, we hope to have
more donors in the coming year.

## Incorporating

We are still currently unincorporated as an organisation. The amount
of funding we are getting is not enough to cover the costs of
accounting and other overheads involved in forming an official
organisation. The most likely way for us to incorporate is as a
[Company Limited by Guarantee](https://en.wikipedia.org/wiki/Company_limited_by_guarantee)
which is a standard vehicle for non-profits in the UK - see e.g.
[The Openstreetmap Foundation](https://osmfoundation.org/wiki/Incorporation).

## Team

Mainly still just me (Daoud) with help from the community. Please join
our [Matrix chat](https://matrix.to/#/#mwmbl:matrix.org) if you are
interested in helping out!

0 comments on commit 4b581cf

Please sign in to comment.