Skip to content

Commit

Permalink
Merge pull request #43 from jina-ai/docs-reader-lm-v2
Browse files Browse the repository at this point in the history
docs: reader lm v2
  • Loading branch information
zac-li authored Jan 15, 2025
2 parents 4ad0f90 + aa9de94 commit 42cc069
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 26 deletions.
5 changes: 5 additions & 0 deletions examples/sample-reader-lm-v2-inference-input.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"model": "ReaderLM-v2",
"prompt": "<html><head><title>Minimal Bullet Points</title></head><body><ul><li>hello</li><li>jina.ai</li></ul></body></html>",
"stream": false
}
1 change: 1 addition & 0 deletions examples/sample-reader-lm-v2-inference-output.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"id":"cmpl-45e7ff341f3d4084a33c0fe3ece009c6","object":"text_completion","created":1736918982,"model":"ReaderLM-v2","choices":[{"index":0,"text":"The provided HTML content is a simple list with two bullet points. Here's the minimal representation in Markdown:\n\n```markdown\n- hello\n- jina.ai\n```\n\nThis version only includes the essential elements of the original HTML, excluding any decorative or supplementary information. If you need to include additional text or formatting, please provide that as well.","logprobs":null,"finish_reason":"stop","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":61,"total_tokens":131,"completion_tokens":70}}
45 changes: 19 additions & 26 deletions notebooks/Reader-LM.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,18 @@
"This notebook shows you how to deploy the [Jina Reader-LM](https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown) models using Amazon SageMaker and perform inference with it:\n",
" - [reader-lm-500m](https://aws.amazon.com/marketplace/pp/prodview-nli7b6dueo424?sr=0-1&ref_=beagle&applicationId=AWSMPContessa)\n",
" - [reader-lm-1500m](https://aws.amazon.com/marketplace/pp/prodview-ms27ixcwq3wjk?sr=0-2&ref_=beagle&applicationId=AWSMPContessa)\n",
" - [reader-lm-v2](TBD)\n",
"\n",
"## Pre-requisites:\n",
"1. Ensure that IAM role used has **AmazonSageMakerFullAccess**\n",
"1. To deploy this ML model successfully, ensure that:\n",
" 1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: \n",
" 1. **aws-marketplace:ViewSubscriptions**\n",
" 1. **aws-marketplace:Unsubscribe**\n",
" 1. **aws-marketplace:Subscribe** \n",
" 2. or your AWS account has a subscription to this model."
"## Prerequisites:\n",
"\n",
"1. This notebook should be rendered correctly in the Jupyter interface and can be executed either within an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.\n",
"2. Ensure that the IAM role being used has **AmazonSageMakerFullAccess**.\n",
"3. To successfully deploy this ML model, ensure that:\n",
" 1. Either your IAM role has the following three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account:\n",
" - **aws-marketplace:ViewSubscriptions**\n",
" - **aws-marketplace:Unsubscribe**\n",
" - **aws-marketplace:Subscribe**\n",
" 2. Or, your AWS account already has a subscription to this model."
]
},
{
Expand Down Expand Up @@ -50,19 +53,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Install `jina-sagemaker` package \n",
"\n",
"\n",
"```bash\n",
"pip install --upgrade jina-sagemaker\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And then get the model package ARN(s)."
"And then let's install `jina-sagemaker` package and get the model package ARN(s) using code below."
]
},
{
Expand All @@ -71,6 +62,8 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install --upgrade jina-sagemaker\n",
"\n",
"import boto3\n",
"\n",
"region = boto3.Session().region_name\n",
Expand All @@ -83,10 +76,11 @@
"model_name_map = {\n",
" \"reader-lm-500m\": \"reader-lm-500m-186f9d30c561356c92a721dbf9540212\",\n",
" \"reader-lm-1500m\": \"reader-lm-1500m-cc71e40a204537f2a2fdbbd0a03d88e8\",\n",
" \"reader-lm-v2\": \"reader-lm-v2-tbd\",\n",
"}\n",
"\n",
"# Specify the model name, reader-lm-500m is picked here for example\n",
"model_package_name = model_name_map[\"reader-lm-500m\"]\n",
"# Specify the model name, `reader-lm-v2` picked here for example\n",
"model_package_name = model_name_map[\"reader-lm-v2\"]\n",
"\n",
"# Mapping for Model Packages\n",
"def get_arn_for_model(region_name, model_name):\n",
Expand Down Expand Up @@ -239,8 +233,7 @@
"\n",
"Create a input file `input.json` with the following content.\n",
"\n",
"Note that if `reader-lm-500m` is the used, then `\"reader-lm-0.5b\"` should be used for `model`, \n",
"if `reader-lm-1500m` is the used, then `\"reader-lm-1.5b\"` should be used for `model`. "
"Note that If using `reader-lm-v2`, set `model` to `ReaderLM-v2`; for `reader-lm-500m`, set `model` to `reader-lm-0.5b`; and for `reader-lm-1500m`, set `model` to `reader-lm-1.5b`."
]
},
{
Expand Down Expand Up @@ -334,7 +327,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.10.16"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 42cc069

Please sign in to comment.