Skip to content

Commit

Permalink
Merge pull request #67 from barseghyanartur/dev
Browse files Browse the repository at this point in the history
dev -> main
  • Loading branch information
barseghyanartur authored Nov 19, 2023
2 parents 743ff1c + 872d656 commit 32d23a2
Show file tree
Hide file tree
Showing 10 changed files with 92 additions and 29 deletions.
4 changes: 2 additions & 2 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@
"filename": "README.rst",
"hashed_secret": "077d5a0e0f8bb517307a6e92a73b0a9aa959233c",
"is_verified": true,
"line_number": 578
"line_number": 582
}
],
"docs/_static/examples/recipes/sftp_storage_1.py": [
Expand Down Expand Up @@ -219,5 +219,5 @@
}
]
},
"generated_at": "2023-11-18T23:53:12Z"
"generated_at": "2023-11-19T23:07:25Z"
}
6 changes: 6 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ are used for versioning (schema follows below):
0.3.4 to 0.4).
- All backwards incompatible changes are mentioned in this document.

0.17.11
-------
2023-11-20

- Minor documentation fixes.

0.17.10
-------
2023-11-19
Expand Down
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -132,10 +132,14 @@ make_migrations:
echo 'Applying migrations...'
./manage.py migrate

make_release:
release:
python setup.py sdist bdist_wheel
twine upload dist/* --verbose

test_release:
python setup.py sdist bdist_wheel
twine upload --repository testpypi dist/* --verbose

migrate:
cd examples/django_example/ && ./manage.py migrate "$$@"

Expand Down
18 changes: 11 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ faker-file
.. _gTTS: https://gtts.readthedocs.io/
.. _google-cloud-storage: https://pypi.org/project/google-cloud-storage/
.. _imgkit: https://pypi.org/project/imgkit/
.. _nltk: https://www.nltk.org/
.. _nlpaug: https://nlpaug.readthedocs.io/
.. _numpy: https://numpy.org/
.. _odfpy: https://pypi.org/project/odfpy/
Expand All @@ -85,6 +86,7 @@ faker-file
.. _python-pptx: https://python-pptx.readthedocs.io/
.. _reportlab: https://pypi.org/project/reportlab/
.. _tablib: https://tablib.readthedocs.io/
.. _textaugment: https://pypi.org/project/textaugment/
.. _tika: https://pypi.org/project/tika/
.. _transformers: https://pypi.org/project/transformers/
.. _wkhtmltopdf: https://wkhtmltopdf.org/
Expand Down Expand Up @@ -117,8 +119,8 @@ All licenses are mentioned below between the brackets.
requires either just `Pillow`_ (`HPND`), or a combination of
`imgkit`_ (`MIT`) and `wkhtmltopdf`_ (`LGPLv3`).
- ``MP3`` file support requires `gTTS`_ (`MIT`) or `edge-tts`_ (`GPLv3`).
- ``PDF`` file support requires either combination of `pdfkit`_ (`MIT`)
and `wkhtmltopdf`_ (`LGPLv3`), or `reportlab`_ (`BSD`).
- ``PDF`` file support requires either `Pillow`_ (`HPND`), or a combination of
`pdfkit`_ (`MIT`) and `wkhtmltopdf`_ (`LGPLv3`), or `reportlab`_ (`BSD`).
- ``PPTX`` file support requires `python-pptx`_ (`MIT`).
- ``ODP`` and ``ODT`` file support requires `odfpy`_ (`Apache 2`).
- ``ODS`` file support requires `tablib`_ (`MIT`) and `odfpy`_ (`Apache 2`).
Expand All @@ -131,9 +133,11 @@ All licenses are mentioned below between the brackets.
- ``GoogleCloudStorage`` storage support requires `pathy`_ (`Apache 2`)
and `google-cloud-storage`_ (`Apache 2`).
- ``SFTPStorage`` storage support requires `paramiko`_ (`LGLPv2.1`).
- ``AugmentFileFromDirProvider`` provider requires `nlpaug`_ (`MIT`),
`PyTorch`_ (`BSD`), `transformers`_ (`Apache 2`), `numpy`_ (`BSD`),
`pandas`_ (`BSD`), `tika`_ (`Apache 2`) and `Apache Tika`_ (`Apache 2`).
- ``AugmentFileFromDirProvider`` provider requires either a combination of
`textaugment`_ (`MIT`) and `nltk`_ (`Apache 2`) or a combination of
`nlpaug`_ (`MIT`), `PyTorch`_ (`BSD`), `transformers`_ (`Apache 2`),
`numpy`_ (`BSD`), `pandas`_ (`BSD`), `tika`_ (`Apache 2`) and
`Apache Tika`_ (`Apache 2`).

Documentation
=============
Expand Down Expand Up @@ -360,8 +364,8 @@ functions):
.. container:: jsphinx-toggle-emphasis

.. code-block:: python
:emphasize-lines: 7
:name: test_usage_examples_with_faker_raw_recommended_way
:emphasize-lines: 7
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
Expand Down Expand Up @@ -396,8 +400,8 @@ If you just need ``bytes`` back:
.. container:: jsphinx-toggle-emphasis

.. code-block:: python
:emphasize-lines: 6
:name: test_rst_readme_usage_examples_with_faker_raw_but_this_works_too
:emphasize-lines: 6
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
Expand Down
5 changes: 0 additions & 5 deletions docs/_static/examples/prismjs/sample.js

This file was deleted.

7 changes: 0 additions & 7 deletions docs/_static/examples/prismjs/sample.py

This file was deleted.

36 changes: 36 additions & 0 deletions docs/_static/examples/recipes/augment_file_from_dir_4.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
from faker import Faker
from faker_file.providers.augment_file_from_dir import (
AugmentFileFromDirProvider,
)
from faker_file.providers.augment_file_from_dir.augmenters import (
textaugment_augmenter,
)
from faker_file.providers.docx_file import DocxFileProvider
from faker_file.providers.eml_file import EmlFileProvider
from faker_file.providers.odt_file import OdtFileProvider
from faker_file.providers.txt_file import TxtFileProvider

FAKER = Faker()
FAKER.add_provider(DocxFileProvider)
FAKER.add_provider(TxtFileProvider)
FAKER.add_provider(EmlFileProvider)
FAKER.add_provider(OdtFileProvider)
FAKER.add_provider(AugmentFileFromDirProvider)

# Create files to test `augment_file_from_dir` with
FAKER.docx_file()
FAKER.eml_file()
FAKER.odt_file()
FAKER.txt_file()

# We assume that directory "/tmp/tmp/" exists and contains
# files of `DOCX`, `EML`, `EPUB`, `ODT`, `PDF`, `RTF` or `TXT`
# formats. Valid values for `action` are: "random_deletion",
# "random_insertion", "random_swap" and "synonym_replacement" (default).
augmented_file = FAKER.augment_file_from_dir(
source_dir_path="/tmp/tmp/",
text_augmenter_cls=textaugment_augmenter.EDATextaugmentAugmenter,
text_augmenter_kwargs={
"action": "synonym_replacement",
},
)
27 changes: 26 additions & 1 deletion docs/recipes.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
Recipes
=======
.. External references
.. _nlpaug: https://nlpaug.readthedocs.io/
.. _nltk: https://www.nltk.org/
.. _textaugment: https://pypi.org/project/textaugment/

When using with ``Faker``
-------------------------
When using with ``Faker``, there are two ways of using the providers.
Expand Down Expand Up @@ -613,7 +619,14 @@ however narrow that list by providing ``extensions`` argument:
:download:`here <_static/examples/recipes/augment_file_from_dir_2.py>`

----

Actual augmentation of texts is delegated to an abstraction layer of text
augmenters. Currently, two augmenters are implemented. Default one is based on
`textaugment`_ (which is in its' turn based on `nltk`_) is very lightweight
and speedy, but produces less accurate results. Another one is based on
`nlpaug`_, which is way more sophisticated, but at the cost of speed.

nlpaug augmenter
~~~~~~~~~~~~~~~~
By default ``bert-base-multilingual-cased`` model is used, which is
pretrained on the top 104 languages with the largest Wikipedia using a
masked language modeling (MLM) objective. If you want to use a different
Expand All @@ -640,6 +653,18 @@ Refer to ``nlpaug``
`docs <https://nlpaug.readthedocs.io/en/latest/example/example.html>`__
and check `Textual augmenters` examples.

textaugment augmenter
~~~~~~~~~~~~~~~~~~~~~
.. container:: jsphinx-download

.. literalinclude:: _static/examples/recipes/augment_file_from_dir_4.py
:language: python
:lines: 5-7, 25-

*See the full example*
:download:`here <_static/examples/recipes/augment_file_from_dir_4.py>`


Using `raw=True` features in tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you pass ``raw=True`` argument to any provider or inner function,
Expand Down
10 changes: 5 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@
from setuptools import find_packages, setup


def clean_readme(text):
def clean_readme(text: str) -> str:
# Pattern to match ":emphasize-lines:" followed by digits
emphasize_lines_pattern = r":emphasize-lines: \d+"
text = re.sub(emphasize_lines_pattern, "", text)

# Pattern to match ":name:" followed by any characters to the line end
name_lines_pattern = r":name: .*$"
text = re.sub(name_lines_pattern, "", text, flags=re.MULTILINE)
# # Pattern to match ":name:" followed by any characters to the line end
# name_lines_pattern = r":name: .*$"
# text = re.sub(name_lines_pattern, "", text, flags=re.MULTILINE)

return text


version = "0.17.10"
version = "0.17.11"

try:
readme = open(os.path.join(os.path.dirname(__file__), "README.rst")).read()
Expand Down
2 changes: 1 addition & 1 deletion src/faker_file/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
__title__ = "faker_file"
__version__ = "0.17.10"
__version__ = "0.17.11"
__author__ = "Artur Barseghyan <[email protected]>"
__copyright__ = "2022-2023 Artur Barseghyan"
__license__ = "MIT"

0 comments on commit 32d23a2

Please sign in to comment.