-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query language? #38
Comments
@isoboroff when running the web services like so:
It reads the configuration file saved in the run directory and uses the topic file section to grab the language of the queries (and uses the retrieve config for those parameters). I think you're asking for the ability to override parts of the config on the command line. Is that right? |
My main question is how are the queries parsed. The answer seems to be the same way they are in batch mode. I think that's just word tokens with no operators or anything, right? I'm adapting my collection search tool, which currently uses ElasticSearch, to use the Patapsco web service, on the hypothesis that it is better at tokenizing the languages I'm working with (Russian, Farsi, Chinese). Elastic has a lot of web service functionality like highlights and faceting and pagination which are nice when building an interactive search tool, and also it's not hard to use Lucene query syntax which supports some common operators. |
Just adding the minimum configuration:
There is an error:
These fields of course don't make sense for interactive queries. Does it mean that the query endpoint is expecting a JSON object like a batch query? (edited: removed bad stand-in config. I needed a basic "queries" section which was missing.) |
This is my javascript code
// Searching using Patapsco
var lang = targetLanguage;
lang = 'zho' // FIX ME remove for release
var url = PATAPSCO_URL + '/' + lang ;
const myRequest = new Request(url+'/query/'+inputQuery);
fetch(myRequest)
.then(response => {
console.log('Response:', response.status);
if (!response.ok) {
throw new Error('Network response was not OK');
}
return response.json();
})
.then(data => {
console.log("Patapsco response");
console.log(data);
var results = data['results'];
if (data.query && data.query.text) {
document.getElementById('target-query').dataset.recent =
data.query.text;
}
console.log(results);
for (let i in results) {
let id = results[i]['doc_id']
var doc_num = parseInt(i) + 1;
let doc_info = [doc_num.toString(), id];
document_list.push(doc_info);
}
console.log(document_list);
possible_queries[inputQuery] = [inputQuery, document_list];
console.log(possible_queries);
buildDocumentList(document_list);
})
.catch(error => {
document.getElementById('inner-hit-list').classList.remove('no-display');
const findContainer =
document.getElementById('find-document-container');
findContainer.innerHTML = '<b> There was a error issuing the
query...try again</b>';
console.error('There has been a problem with your fetch
operation:', error);
});
}
Are you getting the error when setting up the web service?
…On Mon, Mar 28, 2022 at 10:12 AM Ian Soboroff ***@***.***> wrote:
Just adding the minimum configuration:
topics:
input:
lang: fas
retrieve:
name: bm25
number: 10
There is an error:
patapsco.error.ConfigError: 3 validation errors in configuration
topics.input.format - missing field
topics.input.source - missing field
topics.input.path - missing field
These fields of course don't make sense for interactive queries. Does it
mean that the query endpoint is expecting a JSON object like a batch query?
—
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJNDOQNOIPZY3V73WSFFWDVCG45PANCNFSM5RUGTUXQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
_________________________________________________
Dawn J. Lawrie Ph.D.
Senior Research Scientist
Human Language Technology Center of Excellence
Johns Hopkins University
810 Wyman Park Drive
Baltimore, MD 21211
***@***.***
https://hltcoe.jhu.edu/faculty/dawn-lawrie/
|
Frankly, I'm trying run the web service and send some queries from the command line so I can understand the request and response formats. Your JS doesn't clarify the format of the query, and you appear to have a custom URL maybe meaning you have a proxy layer in there per language, or your own web service app. |
I see in |
@isoboroff Yes, processing of queries/topics in the web services is controlled by the configuration file used to create the index. Most people use term-based queries or PSQ. I added support for Lucene syntax but it has to be configured for that and is not interoperable with PSQ. The only documentation that I have on this is here: https://github.com/hltcoe/patapsco/blob/master/docs/config.md#lucene-classic-query-parsing |
Hi @dlawrie your js code looks so subtle and concise, could you share your js code project for beginners as me? Thanks! |
I tested in a web browser by just typing the URL with the query at the end.
…On Mon, Mar 28, 2022 at 10:27 AM Ian Soboroff ***@***.***> wrote:
Frankly, I'm trying run the web service and send some queries from the
command line so I can understand the request and response formats.
—
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJNDOQURK6RVASM2VL2NODVCG6WTANCNFSM5RUGTUXQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
_________________________________________________
Dawn J. Lawrie Ph.D.
Senior Research Scientist
Human Language Technology Center of Excellence
Johns Hopkins University
810 Wyman Park Drive
Baltimore, MD 21211
***@***.***
https://hltcoe.jhu.edu/faculty/dawn-lawrie/
|
The plain text query is parsed in the same way the documents were parsed
(ie. normalized, stemmed or not, etc). Does that answer the question?
…On Fri, Mar 25, 2022 at 9:48 AM Ian Soboroff ***@***.***> wrote:
How does Patapsco parse queries? In particular, when you send a query to
the web service, is it parsed as a Lucene query, or something else?
—
Reply to this email directly, view it on GitHub
<#38>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJNDOVN63BLBFDACIDYLBTVBW7Y7ANCNFSM5RUGTUXQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
_________________________________________________
Dawn J. Lawrie Ph.D.
Senior Research Scientist
Human Language Technology Center of Excellence
Johns Hopkins University
810 Wyman Park Drive
Baltimore, MD 21211
***@***.***
https://hltcoe.jhu.edu/faculty/dawn-lawrie/
|
How does Patapsco parse queries? In particular, when you send a query to the web service, is it parsed as a Lucene query, or something else?
The context is that I'm thinking about ways to handle queries on a combined traditional and simplified Chinese corpus.
Are parameters of the retrieval in the web service controlled by the "queries" and "retrieve" clauses of the config file?
The text was updated successfully, but these errors were encountered: