Skip to content

Commit

Permalink
Merge pull request #310 from kbss-cvut/development
Browse files Browse the repository at this point in the history
[3.3.0] Release
  • Loading branch information
ledsoft authored Nov 8, 2024
2 parents 22ffaed + 5352045 commit c2f32e0
Show file tree
Hide file tree
Showing 36 changed files with 394 additions and 180 deletions.
5 changes: 5 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
* text = auto
*.java text diff=java
*.png binary
*.jpg binary
*.xlsx binary
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,29 +31,29 @@ See the [docs folder](doc/index.md) for additional information on implementation
This section briefly lists the main technologies and principles used (or planned to be used) in the application.

- Spring Boot 3, Spring Framework 6, Spring Security, Spring Data (paging, filtering)
- Jackson 2.13
- Jackson Databind
- [JB4JSON-LD](https://github.com/kbss-cvut/jb4jsonld-jackson) - Java - JSON-LD (de)serialization library
- [JOPA](https://github.com/kbss-cvut/jopa) - persistence library for the Semantic Web
- JUnit 5 (RT used 4), Mockito 4 (RT used 1), Hamcrest 2 (RT used 1)
- Servlet API 4 (RT used 3.0.1)
- JSON Web Tokens (CSRF protection not necessary for JWT)
- JUnit 5, Mockito 4, Hamcrest 2
- Jakarta Servlet API 4
- JSON Web Tokens
- SLF4J + Logback
- CORS (for separate frontend)
- Java bean validation (JSR 380)


## Ontology
## Ontologies

The ontology on which TermIt is based can be found in the `ontology` folder. For proper inference
functionality, `termit-model.ttl`, the
_popis-dat_ ontology model (http://onto.fel.cvut.cz/ontologies/slovnik/agendovy/popis-dat/model) and the SKOS vocabulary
model
(http://www.w3.org/TR/skos-reference/skos.rdf) need to be loaded into the repository used by TermIt (see `doc/setup.md`)
for details.
The ontology on which TermIt is based can be found in the `ontology` folder. It extends the
_popis-dat_ ontology (http://onto.fel.cvut.cz/ontologies/slovnik/agendovy/popis-dat). TermIt vocabularies and terms
use the SKOS vocabulary (http://www.w3.org/TR/skos-reference/skos.rdf).

Relevant ontologies need to be loaded into the repository for proper inference functionality. See [setup.md](doc/setup.md)
for more details.

## Monitoring

We use [JavaMelody](https://github.com/javamelody/javamelody) for monitoring the application and its usage. The data are
[JavaMelody](https://github.com/javamelody/javamelody) can be used for monitoring the application and its usage. The data are
available on the `/monitoring` endpoint and are secured using _basic_ authentication. Credentials are configured using
the `javamelody.init-parameters.authorized-users`
parameter in `application.yml` (see
Expand Down
14 changes: 5 additions & 9 deletions doc/implementation.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,19 @@ follows:
Fulltext search currently supports multiple types of implementation:

* Simple substring matching on term and vocabulary label _(default)_
* RDF4J with Lucene SAIL
* GraphDB with Lucene connector

Each implementation has its own search query which is loaded and used by `SearchDao`. In order for the more advanced
implementations for Lucene to work, a corresponding Maven profile (**graphdb**, **rdf4j**) has to be selected. This
implementation for Lucene to work, a corresponding Maven profile (**graphdb**) has to be selected. This
inserts the correct query into the resulting artifact during build. If none of the profiles is selected, the default
search is used.

Note that in case of GraphDB, corresponding Lucene connectors (`label_index` for labels and `defcom_index` for
definitions and comments)
have to be created as well.
definitions and comments) have to be created as well.

### RDFS Inference in Tests

The test in-memory repository is configured to be a SPIN SAIL with RDFS inferencing engine. Thus, basically all the
inference features available in production are available in tests as well. However, the repository is by default left
empty (without the model or SPIN rules) to facilitate test performance (inference in RDF4J is really slow). To load the
The test in-memory repository is configured to be a RDF4J SAIL with RDFS inferencing engine. The repository is by default left
empty (without the model) to facilitate test performance (inference in RDF4J is really slow). To load the
TermIt model into the repository and thus enable RDFS inference, call the `enableRdfsInference`
method available on both `BaseDaoTestRunner` and `BaseServiceTestRunner`. SPIN rules are currently not loaded as they
don't seem to be used by any tests.
method available on both `BaseDaoTestRunner` and `BaseServiceTestRunner`.
60 changes: 17 additions & 43 deletions doc/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This guide provides information on how to build and deploy TermIt.

### System Requirements

* JDK 11 or newer (tested up to JDK 11 LTS)
* JDK 17 or newer
* Apache Maven 3.5.x or newer


Expand All @@ -16,13 +16,11 @@ This guide provides information on how to build and deploy TermIt.

To build TermIt for **non**-development deployment, use Maven and select the `production` profile.

In addition, full text search in TermIt supports three modes:
In addition, full text search in TermIt supports two modes:
1. Default label-based substring matching
2. RDF4J repository with Lucene index
3. GraphDB repository with Lucene index
2. GraphDB repository with Lucene indexes

Options 2. and 3. have their respective Maven profiles - `rdf4j` and `graphdb`. Select one of them
or let the system use the default one.
Option 2. has its respective Maven profile - `graphdb`.

Moreover, TermIt can be packaged either as an executable JAR (using Spring Boot) or as a WAR that can be deployed in any Servlet API 4-compatible application server.
Maven profiles `standalone` (active by default) and `war` can be used to activate them respectively.
Expand All @@ -40,9 +38,9 @@ There is one parameter not used by the application itself, but by Spring - `spri
by the application:
* `lucene` - decides whether Lucene text indexing is enabled and should be used in full text search queries.
* `admin-registration-only` - decides whether new users can be registered only by application admin, or whether anyone can register.
* `no-cache` - disables EhCache which is used to cache lists of resources and vocabularies for faster retrieval.
* `no-cache` - disables Ehcache, which is used to cache lists of resources and vocabularies for faster retrieval, and persistence cache.

The `lucene` Spring profile is activated automatically by the `rdf4j` and `graphdb` Maven profiles. `admin-registration-only` and `no-cache` have to be added
The `lucene` Spring profile is activated automatically by the `graphdb` Maven. `admin-registration-only` and `no-cache` have to be added
either in `application.yml` directly, or one can pass the parameter to Maven build, e.g.:

* `mvn clean package -P graphdb "-Dspring.profiles.active=lucene,admin-registration-only"`
Expand All @@ -51,7 +49,7 @@ either in `application.yml` directly, or one can pass the parameter to Maven bui
#### Example

* `mvn clean package -B -P production,graphdb "-Ddeployment=DEV"`
* `clean package -B -P production,rdf4j,war "-Ddeployment=STAGE"`
* `clean package -B -P production,graphdb,war "-Ddeployment=STAGE"`

The `deployment` parameter is used to parameterize log messages and JMX beans and is important in case multiple deployments
of TermIt are running in the same Tomcat.
Expand All @@ -74,20 +72,17 @@ or configure it permanently by setting the `MAVEN_OPTS` variable in System Setti

### System Requirements

* JDK 11 or later (tested with JDK 11)
* (WAR) Apache Tomcat 8.5 or 9.x (recommended) or any Servlet API 4-compatible application server
* JDK 17 or later
* (WAR) Apache Tomcat 10 or any Jakarta Servlet API 4-compatible application server
* _For deployment of a WAR build artifact._
* Do not use Apache Tomcat 10.x, it is based on the new Jakarta EE and TermIt would not work on it due to package namespace issues (`javax` -> `jakarta`)
* Do not use Apache Tomcat 9.x or older, it is based on the old Java EE and TermIt would not work on it due to package namespace issues (`javax` -> `jakarta`)

### Setup

Application deployment is simple - just deploy the WAR file (in case of the `war` Maven build profile) to an
application server or run the JAR file (in case of the `standalone` Maven build profile).

What is important is the correct setup of the repository. We will describe two options:

1. GraphDB
2. RDF4J
What is important is the correct setup of the repository.

#### GraphDB

Expand All @@ -99,16 +94,16 @@ In order to support inference used by the application, a custom ruleset has to b
4. Create the following Lucene connectors in GraphDB:
* *Label index*
* name: **label_index**
* Field name: **label**, **title**
* Property chain: **http://www.w3.org/2000/01/rdf-schema#label**, **http://purl.org/dc/terms/title**
* Field names: **prefLabel**, **altLabel**, **hiddenLabel**, **title**
* Property chains: **http://www.w3.org/2004/02/skos/core#prefLabel**, http://www.w3.org/2004/02/skos/core#altLabel**, **http://www.w3.org/2004/02/skos/core#hiddenLabel**, **http://purl.org/dc/terms/title**
* Languages: _Leave empty (for indexing all languages) or specify the language tag - see below_
* Types: **http://www.w3.org/2004/02/skos/core#Concept**, **http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/pojem/slovník**
* Analyzer: Analyzer appropriate for the system language, e.g. **org.apache.lucene.analysis.cz.CzechAnalyzer**
* *Definition and comment index*
* name: **defcom_index**
* Field name: **definition**, **comment**, **description**
* Field name: **definition**, **scopeNote**, **description**
* Languages: _Leave empty (for indexing all languages) or specify the language tag - see below_
* Property chain: **http://www.w3.org/2004/02/skos/core#definition**, **http://www.w3.org/2000/01/rdf-schema#comment**, **http://purl.org/dc/terms/description**
* Property chain: **http://www.w3.org/2004/02/skos/core#definition**, **http://www.w3.org/2004/02/skos/core#scopeNote**, **http://purl.org/dc/terms/description**
* Types and Analyzer as above

Language can be set for each connector. This is useful in case the data contain labels, definitions, and comments in multiple languages. In this case,
Expand All @@ -117,34 +112,13 @@ there is a term with label `území`@cs and `area`@en. Now, if no language is sp
look as follows: `<em>území</em> area`, which may not be desired. If the connector language is set to `cs`, the result snippet will contain
only `<em>území</em>`. See the [documentation](http://graphdb.ontotext.com/documentation/free/lucene-graphdb-connector.html) for more details.

#### RDF4J

In order to support the inference used by the application, new rules need to be added to RDF4J because its own RDFS rule engine does not
support OWL stuff like inverse properties (which are used in the model).

For RDF4J 2.x:
1. Start by creating an RDF4J repository of type **RDFS+SPIN with Lucene support**
2. Upload SPIN rules from `rulesets/rules-termit-spin.ttl` into the repository
3. There is no need to configure Lucene connectors, it by default indexes all properties in RDF4J (alternatively, it is possible
to upload a repository configuration directly into the system repository - see examples at [[1]](https://github.com/eclipse/rdf4j/tree/master/core/repository/api/src/main/resources/org/eclipse/rdf4j/repository/config)
4. -----

For RDF4J 3.x:
1. Start by creating an RDF4J repository with RDFS and SPIN inference and Lucene support
* Copy repository configuration into the appropriate directory, as described at [[2]](https://rdf4j.eclipse.org/documentation/server-workbench-console/#repository-configuration)
* Native store with RDFS+SPIN and Lucene sample configuration is at [[3]](https://github.com/eclipse/rdf4j/blob/master/core/repository/api/src/main/resources/org/eclipse/rdf4j/repository/config/native-spin-rdfs-lucene.ttl)
2. Upload SPIN rules from `rulesets/rules-termit-spin.ttl` into the repository
3. There is no need to configure Lucene connectors, it by default indexes all properties in RDF4J
4. -----

#### Common

TermIt needs the repository to provide some inference. Beside loading the appropriate rulesets (see above), it is also
necessary to load the ontological models into the repository.

5. Upload the following RDF files into the newly created repository:
* `ontology/termit-glosář.ttl`
* `ontology/termit-model.ttl`
* `ontology/sioc-ns.rdf`
* `http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/model`
* `http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/glosář`
* `https://www.w3.org/TR/skos-reference/skos.rdf`
Expand Down Expand Up @@ -203,4 +177,4 @@ TERMIT_SECURITY_PROVIDER=oidc
TermIt will automatically configure its security accordingly
(it is using Spring's [`ConditionalOnProperty`](https://www.baeldung.com/spring-conditionalonproperty)).

**Note that termit-ui needs to be configured for mathcing authentication mode.**
**Note that termit-ui needs to be configured for matching authentication mode.**
9 changes: 9 additions & 0 deletions ontology/termit-glosář.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -640,3 +640,12 @@ termit-pojem:požadavek-na-změnu-hesla
termit:glosář ;
<http://www.w3.org/2004/02/skos/core#prefLabel>
"Password reset request"@en , "Požadavek na změnu hesla"@cs .

termit-pojem:má-adresu-modelovacího-nástroje
a <http://www.w3.org/2004/02/skos/core#Concept> ;
<http://www.w3.org/2004/02/skos/core#broader>
<https://slovník.gov.cz/základní/pojem/vlastnost> , <https://slovník.gov.cz/základní/pojem/typ-vlastnosti> ;
<http://www.w3.org/2004/02/skos/core#inScheme>
termit:glosář ;
<http://www.w3.org/2004/02/skos/core#prefLabel>
"Has modeling tool address"@en , "Má adresu modelovacího nástroje"@cs .
4 changes: 4 additions & 0 deletions ontology/termit-model.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -337,4 +337,8 @@ termit-pojem:koncový-stav-pojmu
termit-pojem:požadavek-na-změnu-hesla
a <https://slovník.gov.cz/základní/pojem/typ-objektu>, owl:Class .

termit-pojem:má-adresu-modelovacího-nástroje
a owl:AnnotationProperty , <https://slovník.gov.cz/základní/pojem/typ-vlastnosti> ;
rdfs:subPropertyOf <https://slovník.gov.cz/základní/pojem/vlastnost> .


22 changes: 11 additions & 11 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.3</version>
<version>3.3.4</version>
</parent>

<artifactId>termit</artifactId>
<version>3.2.0</version>
<version>3.3.0</version>
<name>TermIt</name>
<description>Terminology manager based on Semantic Web technologies.</description>
<packaging>${packaging}</packaging>
Expand All @@ -28,10 +28,10 @@

<properties>
<java.version>17</java.version>
<org.apache.tika.tika-core.version>2.7.0</org.apache.tika.tika-core.version>
<org.mapstruct.version>1.6.0</org.mapstruct.version>
<org.apache.tika.tika-core.version>3.0.0</org.apache.tika.tika-core.version>
<org.mapstruct.version>1.6.2</org.mapstruct.version>
<org.springdoc.version>2.6.0</org.springdoc.version>
<cz.cvut.kbss.jopa.version>2.0.5</cz.cvut.kbss.jopa.version>
<cz.cvut.kbss.jopa.version>2.1.0</cz.cvut.kbss.jopa.version>
<cz.cvut.kbss.jsonld.version>0.15.0</cz.cvut.kbss.jsonld.version>

<!-- Default value for deployment type property which should otherwise specified on command line -->
Expand Down Expand Up @@ -119,7 +119,7 @@
<dependency>
<groupId>com.github.ledsoft</groupId>
<artifactId>jopa-spring-transaction</artifactId>
<version>0.3.0</version>
<version>0.3.1</version>
</dependency>

<!-- Spring -->
Expand Down Expand Up @@ -249,14 +249,14 @@
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.15.4</version>
<version>1.18.1</version>
</dependency>

<!-- Flexmark for handling Markdown -->
<dependency>
<groupId>com.vladsch.flexmark</groupId>
<artifactId>flexmark-all</artifactId>
<version>0.64.6</version>
<version>0.64.8</version>
</dependency>

<!-- Logging -->
Expand All @@ -273,7 +273,7 @@
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.2</version>
<version>5.3.0</version>
</dependency>

<!-- I18n - language tags and languages -->
Expand All @@ -287,7 +287,7 @@
<dependency>
<groupId>org.apache.velocity</groupId>
<artifactId>velocity-engine-core</artifactId>
<version>2.3</version>
<version>2.4</version>
</dependency>

<!-- Java Melody Monitoring -->
Expand Down Expand Up @@ -394,7 +394,7 @@
</build>
</profile>

<!-- Profiles for storages. Important for correct full text search functionality -->
<!-- Profile for GraphDB storage with Lucene connectors. Important for correct full text search functionality -->
<profile>
<id>graphdb</id>
<properties>
Expand Down
22 changes: 17 additions & 5 deletions profile/graphdb/query/fulltextsearch.rq
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ PREFIX inst: <http://www.ontotext.com/connectors/lucene/instance#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?entity ?label ?vocabularyUri ?state ?type ?snippetField ?snippetText ?score {
SELECT DISTINCT ?entity ?label ?description ?vocabularyUri ?state ?type ?snippetField ?snippetText ?score {
{
?search a inst:label_index .
}
Expand All @@ -17,12 +18,21 @@ SELECT DISTINCT ?entity ?label ?vocabularyUri ?state ?type ?snippetField ?snippe
?search a inst:defcom_index .
}
{
?entity rdfs:label ?label .
?entity skos:prefLabel ?label .
OPTIONAL {
?entity skos:definition ?definition .
}
OPTIONAL {
?entity skos:scopeNote ?scopeNote .
}
} UNION {
?entity dc:title ?label .
OPTIONAL {
?entity dc:description ?dcDescription .
}
}
?search :query ?wildCardSearchString ;
:snippetSize 2000 ;
:snippetSize 250 ;
:entities ?entity .
?entity a ?type ;
:score ?initScore ;
Expand All @@ -38,7 +48,9 @@ SELECT DISTINCT ?entity ?label ?vocabularyUri ?state ?type ?snippetField ?snippe
FILTER (?type = ?term || ?type = ?vocabulary)
FILTER NOT EXISTS { ?entity a ?snapshot . }
FILTER (lang(?label) = ?langTag)
BIND(IF(lcase(str(?snippetText)) = lcase(str(?splitExactMatch)), ?initScore * 2, IF(CONTAINS(lcase(str(?snippetText)), ?searchString), IF(?snippetField = "label", ?initScore * 1.5, ?initScore), ?initScore)) as ?exactMatchScore)
BIND(IF(?snippetField = "label", ?exactMatchScore * 2, IF(?snippetField = "definition", ?exactMatchScore * 1.2, ?exactMatchScore)) as ?score)
BIND(COALESCE(?definition, COALESCE(?scopeNote, ?dcDescription)) AS ?description)
FILTER (!BOUND(?description) || lang(?description) = ?langTag)
BIND(IF(lcase(str(?snippetText)) = lcase(str(?splitExactMatch)), ?initScore * 2, IF(CONTAINS(lcase(str(?snippetText)), ?searchString), IF(?snippetField = "prefLabel", ?initScore * 1.5, ?initScore), ?initScore)) as ?exactMatchScore)
BIND(IF(?snippetField = "prefLabel", ?exactMatchScore * 2, IF(?snippetField = "definition", ?exactMatchScore * 1.2, ?exactMatchScore)) as ?score)
}
ORDER BY desc(?score)
11 changes: 11 additions & 0 deletions src/main/java/cz/cvut/kbss/termit/dto/ConfigurationDto.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ public class ConfigurationDto implements Serializable {
@OWLDataProperty(iri = Vocabulary.s_p_ma_oddelovac_verze)
private String versionSeparator;

@OWLAnnotationProperty(iri = Vocabulary.s_p_ma_adresu_modelovaciho_nastroje)
private String modelingToolUrl;

public String getLanguage() {
return language;
}
Expand Down Expand Up @@ -92,4 +95,12 @@ public String getVersionSeparator() {
public void setVersionSeparator(String versionSeparator) {
this.versionSeparator = versionSeparator;
}

public String getModelingToolUrl() {
return modelingToolUrl;
}

public void setModelingToolUrl(String modelingToolUrl) {
this.modelingToolUrl = modelingToolUrl;
}
}
Loading

0 comments on commit c2f32e0

Please sign in to comment.