GitHub

Intro

A repository containing important raw texts (some non-proofread). Processed (eg: transliterated) texts may be placed in other repos/ sites.

"raw" in raw_etexts means "mostly unencoded" text files (and not binary files like doc or pdf or rtf), whose contents may be searched by (say) grep. This means that we prefer files whose formatting (if any) does not interfere much with search.

Motivation

Websites with critical digitized texts have disappeared in the past, to the frustration of users.
It is easier to search multiple files encoded in plain text.
- part of an email sent by shrI vIranArAyaNa pANDurangi - "...problems in searching in wikisourse. it does not search to our wish. Suppose I wish to त्रिपादूर्ध्व wherever it occurs. If I search त्रिपा only it should search त्रिपादूर्ध्व. but it does not search it. it only searches त्रिपा. Hence I need to get all the veda and purana files in my computer. so I can put it in text files. text files search these things very fine."

Usage

Clone this repo: git clone https://github.com/sanskrit/raw_etexts.git --recursive --shallow-submodules
Initialize submodules if needed: git submodule update --init --recursive --depth 1
Update submodules if needed:
git submodule foreach -q 'git checkout $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master) || git checkout -b $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master)' (also without the -b option, if all submodules are not covered. Also consider adding the --recursive option.)
Check status: git submodule foreach -q 'git status'

Generating a catalog

On linux: ./make_catalog.sh

Contribution

Fork this repo and send pull requests.
Raise issues.
Add submodule: For example - git submodule add --depth 1 -- https://github.com/indic-dict/something.git some/path

If you're maintaining lot of submodules, it may make sense to group them together (eg. mixed/vishvAsa) so that they can be easily excluded from your local multi-file searches so as to avoid duplication.

Conventions

Allowed formats in the order of decreasing preference: md (possibly with frontmatter), txt, itx, tex, html.

Preferred naming conventions

OCR files will be named: xyz_ocr.md or xyz_ocr.txt.
When they are being proofread, they will be renamed: xyz_proofreading_.md or txt.
When fully proofread, they will be: xyz.md or xyz.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1,411 Commits
.github		.github
AgamAH		AgamAH
AyurvedaH		AyurvedaH
catalogs		catalogs
jyotiSham		jyotiSham
kAvyam		kAvyam
kalAH		kalAH
kalpaH		kalpaH
koshaH		koshaH
mImAMsA		mImAMsA
mixed		mixed
nyAya-shAstram		nyAya-shAstram
purANam		purANam
shixA		shixA
vedAntam		vedAntam
vedaH		vedaH
vyAkaraNam		vyAkaraNam
yogaH		yogaH
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
make_catalog.sh		make_catalog.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

Motivation

Usage

Generating a catalog

Contribution

Conventions

Preferred naming conventions

About

Releases

Packages

Languages

chakrabortydeepro/raw_etexts

Folders and files

Latest commit

History

Repository files navigation

Intro

Motivation

Usage

Generating a catalog

Contribution

Conventions

Preferred naming conventions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages