[Share] After creating annotations, automatically remove spaces and garbled characters from the content #269
Replies: 5 comments 5 replies
-
介绍 Introduction本脚本可用于去除注释中的多余空格、换行符,替换全角字母、数字,并规范标点符号。 用法 Usage本脚本可用于自动触发(事件:新建注释)和手动触发(菜单项:注释菜单中)。 请将以下代码完整拷贝至“数据”中: 第一版:基于规则的处理(无需联网)Version 1: Rule-based Processing (No networking required)/**
* Format Chinese Annotations
* @author wakewon
* @usage Create Annotation & In Annotation Menu
* @link https://github.com/windingwind/zotero-actions-tags/discussions/269
* @see https://github.com/windingwind/zotero-actions-tags/discussions/269
*/
if (!item) return;
const topItem = Zotero.Items.getTopLevel([item])[0];
const formatLang = ["", "zh", "zh-CN", "zh_CN"];
const lang = topItem.getField("language");
if (!formatLang.includes(lang)) return "[Action: Format Chinese Annotations] Skip due to language";
return await editAnnotation(item);
async function editAnnotation(annotationItem) {
if (!annotationItem.isAnnotation()) return "[Action: Format Chinese Annotations] Not an annotation item";
if (!annotationItem.annotationText) return "[Action: Format Chinese Annotations] No text found in this annotation";
annotationItem.annotationText = await formatText(annotationItem.annotationText);
return;
}
async function formatText(text) {
const punctuationMap = { '.': '。', ',': ',', '!': '!', '?': '?', ':': ':', ';': ';' };
const fullWidthToHalfWidth = s => String.fromCharCode(s.charCodeAt(0) - 0xFEE0);
return text
.replace(/[\r\n]/g, '') // Remove all line breaks
.replace(/[\uE5D2\uE5CF\uE5CE\uE5E5]/g, '') // Remove special characters
.replace(/[A-Za-z0-9!"'()[]{}<>,.:;-]/g, fullWidthToHalfWidth) // Full-width to half-width
.replace(/\s+/g, ' ') // Replace consecutive spaces with a single space
.replace(/(?<=\d)\s+|\s+(?=\d)/g, '') // Remove spaces around digits
.replace(/\s*(?=[.,:;!?"()\[\]。?!,、;:“”‘’()《》【】])|(?<=[.,:;!?"()\[\]。?!,、;:“”‘’()《》【】])\s*/g, '') // Remove spaces around punctuation
.replace(/(\S)\s+(?=[\u4e00-\u9fa5])|(?<=[\u4e00-\u9fa5])\s+(\S)/g, '\$1\$2') // Remove spaces between Chinese characters
.replace(/([\u4e00-\u9fa5]+)([,.!?:;]+)/g, (m, c, p) => c + p.split('').map(p => punctuationMap[p]).join('')) // Replace English punctuation marks with Chinese ones
.replace(/([,.!?:;]+)([\u4e00-\u9fa5]+)/g, (m, p, c) => p.split('').map(p => punctuationMap[p]).join('') + c) // Replace English punctuation marks with Chinese ones
.replace(/\(([^()]*[\u4e00-\u9fa5][^()]*)\)|\[([^\[\]]*[\u4e00-\u9fa5][^\[\]]*)\]/g, (m, c1, c2) => c1 ? `(${c1})` : `【${c2}】`) // Replace full-width parentheses
.replace(/([0-9a-zA-Z])(/g, "\$1" + String.fromCharCode(0xFF08)) // Full-width parentheses around digits and letters
.replace(/)([0-9a-zA-Z])/g, String.fromCharCode(0xFF09) + "\$1") // Full-width parentheses around digits and letters
.replace(/([a-zA-Z]+)([,.!?:;]+)([a-zA-Z]+)/g, (m, w1, p, w2) => w1 + p + ' ' + w2) // Add space for English punctuations
.replace(/(\S)\(/g, '\$1 (') // Add space before parenthesis
.replace(/\)([\u4e00-\u9fa5])/g, ') \$1') // Add space after parenthesis
.replace(/([,.!?:;)])(?!\s|(?<=\.)\d)/g, '\$1 ') // Add a space after punctuation if not followed by a space or a digit after '.'
.replace(/🔤(.*)/g, (match, p1) => p1.trim() ? `\n🔤${p1}` : '🔤'); // Add a newline before 🔤 if there is content after it
} 第二版:使用AI处理(需要联网,需要有效的OpenAI API)Version 2: Processing with AI (Requires networking and a valid OpenAI API)
/**
* AI Normalize Punctuation
* This script standardizes punctuation in the selected text, handling both Chinese and English punctuation.
* It uses the OpenAI API for text processing.
*
* @usage In Annotation Menu
* @link https://github.com/windingwind/zotero-actions-tags/discussions/269
* @see https://github.com/windingwind/zotero-actions-tags/discussions/269
*/
/** { 👍 "openai" } service provider */
const SERVICE = "openai";
// OpenAI API configuration
const OPENAI = {
API_KEY: "InputYourKeyHere", // 替换为你的OpenAI API密钥。 // Replace with your OpenAI API key.
MODEL: "gpt-3.5-turbo", // 默认模型名称,可以根据需要进行更改。 // Default model name, which can be changed as needed.
API_URL: "https://api.openai.com/v1/chat/completions", // 请求地址,可以根据需要进行更改。 // Request address, which can be changed as needed.
};
if (!item) return;
const topItem = Zotero.Items.getTopLevel([item])[0];
const formatLang = ["", "zh", "zh-CN", "zh_CN"];
if (!formatLang.includes(lang)) return "[Action: Format Chinese Annotations] Skip due to language";
if (!formatLang.includes(lang)) return;
return await normalizePunctuation(item);
async function normalizePunctuation(annotationItem) {
if (!annotationItem.isAnnotation()) return "[Action: AI Normalize Punctuation] Not an annotation item";
if (!annotationItem.annotationText) return "[Action: AI Normalize Punctuation] No text found in this annotation";
const selectedText = annotationItem.annotationText;
let result;
let success;
switch (SERVICE) {
case "openai":
({ result, success } = await callOpenAI(selectedText));
break;
default:
result = "Service Not Found";
success = false;
}
if (success) {
annotationItem.annotationText = `${result}`;
return `Formatted Text: ${result}`;
} else {
return `Error: ${result}`;
}
}
async function callOpenAI(text) {
const prompt = `
Please standardize the punctuation in the following text, using Chinese punctuation for Chinese content. Return only the corrected text:
${text}
`;
const data = {
model: OPENAI.MODEL,
messages: [
{ role: "system", content: "You are a helpful language assistant." },
{ role: "user", content: prompt }
],
max_tokens: 1000,
temperature: 0.2,
};
try {
const xhr = await Zotero.HTTP.request(
"POST",
OPENAI.API_URL,
{
headers: {
'Authorization': `Bearer ${OPENAI.API_KEY}`,
'Content-Type': 'application/json; charset=utf-8',
},
body: JSON.stringify(data),
responseType: "json",
}
);
if (xhr && xhr.status && xhr.status === 200 && xhr.response.choices && xhr.response.choices.length > 0) {
return {
success: true,
result: xhr.response.choices[0].message.content.trim(),
};
} else {
return {
result: xhr.response.error ? xhr.response.error.message : 'Unknown error',
success: false,
};
}
} catch (error) {
console.error('Error calling OpenAI API:', error);
return {
result: error.message,
success: false,
};
}
} 定制化用法 Customized Usage跳过特定语言的文献 Skip the documentation of a specific language本脚本只处理语言字段为zh、zh-CN、zh_CN以及没有语言信息条目下的PDF文档。
This script only handles PDF documents with language fields zh, zh-CN, zh_CN and no language information entries.
关闭提醒弹窗 Turn off alert pop-ups如果希望关闭某一个弹窗提醒,你可以将代码中 If you wish to turn off a particular pop-up alert, you can remove the double quotes and the content inside the double quotes after the 致谢 Acknowledgements本脚本主要参考了 #107 和 #220 ,并借助gpt-4o完成了主要的代码编写工作。再次感谢原脚本作者的帮助以及GPT的强力支持! |
Beta Was this translation helpful? Give feedback.
-
感谢大佬!太牛了! |
Beta Was this translation helpful? Give feedback.
-
谢谢,太好了。终于能便捷地解决这个空格的问题了 |
Beta Was this translation helpful? Give feedback.
-
您好 ,想问一下这个 编辑动作 窗口该怎么打开呀 |
Beta Was this translation helpful? Give feedback.
-
Is there an existing issue for this?
Environment
Describe the feature request
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
感谢开发action&tag!自动化省了很多功夫。请问可否利用action&tag实现添加注释以后,自动去除注释内容中的空格以及乱码呢?
Why do you need this feature?
A clear and concise description of why you need this feature.
pdf中的中文文本有时空格很多,即使一行内没有空格,换行也会造成空格。目前可以利用快捷指令、quicker等工具选中文本以后去除空格,但是是否可以利用action&tag的功能实现全自动去除空格呢?感谢开发者~
Describe the solution you'd like
The solution you'd like
A clear and concise description of what you want to happen.
Alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Anything else?
Beta Was this translation helpful? Give feedback.
All reactions