This project builds a multiplatform iOS/macOS application that can:
- Read a linked webpage
- Upload a document - txt, pdf, epub
- Read the uploaded documents using text to speech. The supported TTS synthesizers are: a) Local Apple TTS b) Amazon Polly Cloud TTS c) Google Cloud TTS d) Microsoft Azure TTS
For the Cloud TTS, user needs to use their own AWS/Google Cloud/Azure accounts. The setup below will guide you through what resources to deploy. All of these cloud services are a pay-as-you-go model, so you'll only pay for what you actually read instead of a subscription that you pay to some app developer. See below for what the pricing looks like for each cloud service. You might also qualify for free tier options, new account credits and free monthly quotas.
- Amazon Polly - https://aws.amazon.com/polly/pricing/
- Google Cloud - https://cloud.google.com/text-to-speech/pricing
- Microsoft Azure - https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/
Just build the project in Xcode, sign it for local development and install it in your devices.
- Create an AWS account or use an existing one. https://aws.amazon.com/free
- Open the AWS Console -> select any region you like (in the region selector on top-right) (Some voices are not available in certain regions. I recommend us-east-1 for all voices) Console Link
- Search for CloudFormation in the Console search -> Create Stack -> Upload a template -> Use the template in the AetherVoice/Dist/AmazonPollyCFN.yaml
- Wait for stack to complete creation and then note down the value of
identityPoolId
in the Outputs tab of the stack - Provide the identityPoolId in AWS configuration in the Settings of the app. (Note: the identityPoolId acts like a password to access your AWS account's Polly resources. Don't share it with anyone. The value will be securely stored in your keychain upon entering it)
- Sign up for GCP or use an existing account: https://cloud.google.com
- Sign in to your Google Cloud Console.
- Create a new project "AetherVoice" on the top-bar or use an existing project.
- Access the API Library for the Text-to-Speech API.
- Select your project and click the "Enable" button.
- Visit the Credentials page.
- Click on “Create Credentials” and choose "API key". Your new API key will appear; click "Close" to save it.
- Click on the name of the new API key to open its settings page.
- Under "Application restrictions", select "iOS apps" and add the bundle identifier 'com.ract.AetherVoice' (You can change the bundle id in Xcode if you want)
- Under "API restrictions", select "Restrict key" and choose "Google Cloud Text-to-Speech API" from the dropdown list.
- Click "Save" to apply the restrictions.
Source in GCP Docs - Create API Keys Source in GCP Docs - Restrict API key usage in iOS apps
Provide the generated API key in AWS configuration in the Settings -> GCP Configuration of the AetherVoice app.
(Note: the API key acts like a password to access your GCP account's Text-to-Speech API. Don't share it with anyone. The value will be securely stored in your keychain upon entering it)
- Sign up for Azure or use an existing account: https://azure.microsoft.com/en-us/free/open-source
- Sign in to your Azure Portal and go to deploy custom template.
- Click "Build your own template in the editor" and then load the AetherVoice/Dist/AetherVoiceFree_Azure_template.json file. Feel free to change the region if you want. The available voices change based on the region selected. Also, the template is using the free-tier subscription but feel free to change to a standard or pay-as-you-go subscription if you want.
- Save -> Create new or use existing resource group. -> Review + Create -> Create
- Wait for deployment to finish and then "Go to resource"
- Under "Keys and endpoint" section of the resource, note down "Key 1" or "Key 2" and the region.
Source in Azure Docs - Deploy templates Source in Azure Docs - TTS Pricing
Provide the generated Resource key and Azure Region in the Settings -> Azure Configuration of the AetherVoice app.
(Note: the API key acts like a password to access your Azure account's Text-to-Speech API. Don't share it with anyone. The value will be securely stored in your keychain upon entering it)