Merge pull request #310 from kbss-cvut/development

[3.3.0] Release
kbss-cvut · Nov 8, 2024 · c2f32e0 · c2f32e0
2 parents 22ffaed + 5352045
commit c2f32e0
Show file tree

Hide file tree

Showing 36 changed files with 394 additions and 180 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,5 @@
+* text = auto
+*.java text diff=java
+*.png binary
+*.jpg binary
+*.xlsx binary
diff --git a/README.md b/README.md
@@ -31,29 +31,29 @@ See the [docs folder](doc/index.md) for additional information on implementation
 This section briefly lists the main technologies and principles used (or planned to be used) in the application.
 
 - Spring Boot 3, Spring Framework 6, Spring Security, Spring Data (paging, filtering)
-- Jackson 2.13
+- Jackson Databind
 - [JB4JSON-LD](https://github.com/kbss-cvut/jb4jsonld-jackson) - Java - JSON-LD (de)serialization library
 - [JOPA](https://github.com/kbss-cvut/jopa) - persistence library for the Semantic Web
-- JUnit 5 (RT used 4), Mockito 4 (RT used 1), Hamcrest 2 (RT used 1)
-- Servlet API 4 (RT used 3.0.1)
-- JSON Web Tokens (CSRF protection not necessary for JWT)
+- JUnit 5, Mockito 4, Hamcrest 2
+- Jakarta Servlet API 4
+- JSON Web Tokens
 - SLF4J + Logback
 - CORS (for separate frontend)
 - Java bean validation (JSR 380)
 
 
-## Ontology
+## Ontologies
 
-The ontology on which TermIt is based can be found in the `ontology` folder. For proper inference
-functionality, `termit-model.ttl`, the
-_popis-dat_ ontology model (http://onto.fel.cvut.cz/ontologies/slovnik/agendovy/popis-dat/model) and the SKOS vocabulary
-model
-(http://www.w3.org/TR/skos-reference/skos.rdf) need to be loaded into the repository used by TermIt (see `doc/setup.md`)
-for details.
+The ontology on which TermIt is based can be found in the `ontology` folder. It extends the
+_popis-dat_ ontology (http://onto.fel.cvut.cz/ontologies/slovnik/agendovy/popis-dat). TermIt vocabularies and terms
+use the SKOS vocabulary (http://www.w3.org/TR/skos-reference/skos.rdf).
+
+Relevant ontologies need to be loaded into the repository for proper inference functionality. See [setup.md](doc/setup.md)
+for more details.
 
 ## Monitoring
 
-We use [JavaMelody](https://github.com/javamelody/javamelody) for monitoring the application and its usage. The data are
+[JavaMelody](https://github.com/javamelody/javamelody) can be used for monitoring the application and its usage. The data are
 available on the `/monitoring` endpoint and are secured using _basic_ authentication. Credentials are configured using
 the `javamelody.init-parameters.authorized-users`
 parameter in `application.yml` (see

diff --git a/doc/implementation.md b/doc/implementation.md
@@ -43,23 +43,19 @@ follows:
 Fulltext search currently supports multiple types of implementation:
 
 * Simple substring matching on term and vocabulary label _(default)_
-* RDF4J with Lucene SAIL
 * GraphDB with Lucene connector
 
 Each implementation has its own search query which is loaded and used by `SearchDao`. In order for the more advanced
-implementations for Lucene to work, a corresponding Maven profile (**graphdb**, **rdf4j**) has to be selected. This
+implementation for Lucene to work, a corresponding Maven profile (**graphdb**) has to be selected. This
 inserts the correct query into the resulting artifact during build. If none of the profiles is selected, the default
 search is used.
 
 Note that in case of GraphDB, corresponding Lucene connectors (`label_index` for labels and `defcom_index` for
-definitions and comments)
-have to be created as well.
+definitions and comments) have to be created as well.
 
 ### RDFS Inference in Tests
 
-The test in-memory repository is configured to be a SPIN SAIL with RDFS inferencing engine. Thus, basically all the
-inference features available in production are available in tests as well. However, the repository is by default left
-empty (without the model or SPIN rules) to facilitate test performance (inference in RDF4J is really slow). To load the
+The test in-memory repository is configured to be a RDF4J SAIL with RDFS inferencing engine. The repository is by default left
+empty (without the model) to facilitate test performance (inference in RDF4J is really slow). To load the
 TermIt model into the repository and thus enable RDFS inference, call the `enableRdfsInference`
-method available on both `BaseDaoTestRunner` and `BaseServiceTestRunner`. SPIN rules are currently not loaded as they
-don't seem to be used by any tests.
+method available on both `BaseDaoTestRunner` and `BaseServiceTestRunner`.
diff --git a/doc/setup.md b/doc/setup.md
@@ -6,7 +6,7 @@ This guide provides information on how to build and deploy TermIt.
 
 ### System Requirements
 
-* JDK 11 or newer (tested up to JDK 11 LTS)
+* JDK 17 or newer
 * Apache Maven 3.5.x or newer
 
 
@@ -16,13 +16,11 @@ This guide provides information on how to build and deploy TermIt.
 
 To build TermIt for **non**-development deployment, use Maven and select the `production` profile.
 
-In addition, full text search in TermIt supports three modes:
+In addition, full text search in TermIt supports two modes:
 1. Default label-based substring matching
-2. RDF4J repository with Lucene index
-3. GraphDB repository with Lucene index
+2. GraphDB repository with Lucene indexes
 
-Options 2. and 3. have their respective Maven profiles - `rdf4j` and `graphdb`. Select one of them
-or let the system use the default one.
+Option 2. has its respective Maven profile - `graphdb`.
 
 Moreover, TermIt can be packaged either as an executable JAR (using Spring Boot) or as a WAR that can be deployed in any Servlet API 4-compatible application server.
 Maven profiles `standalone` (active by default) and `war` can be used to activate them respectively.
@@ -40,9 +38,9 @@ There is one parameter not used by the application itself, but by Spring - `spri
 by the application:
 * `lucene` - decides whether Lucene text indexing is enabled and should be used in full text search queries.
 * `admin-registration-only` - decides whether new users can be registered only by application admin, or whether anyone can register.
-* `no-cache` - disables EhCache which is used to cache lists of resources and vocabularies for faster retrieval.
+* `no-cache` - disables Ehcache, which is used to cache lists of resources and vocabularies for faster retrieval, and persistence cache.
 
-The `lucene` Spring profile is activated automatically by the `rdf4j` and `graphdb` Maven profiles. `admin-registration-only` and `no-cache` have to be added
+The `lucene` Spring profile is activated automatically by the `graphdb` Maven. `admin-registration-only` and `no-cache` have to be added
 either in `application.yml` directly, or one can pass the parameter to Maven build, e.g.:
 
 * `mvn clean package -P graphdb "-Dspring.profiles.active=lucene,admin-registration-only"`
@@ -51,7 +49,7 @@ either in `application.yml` directly, or one can pass the parameter to Maven bui
 #### Example
 
 * `mvn clean package -B -P production,graphdb "-Ddeployment=DEV"`
-* `clean package -B -P production,rdf4j,war "-Ddeployment=STAGE"`
+* `clean package -B -P production,graphdb,war "-Ddeployment=STAGE"`
 
 The `deployment` parameter is used to parameterize log messages and JMX beans and is important in case multiple deployments
 of TermIt are running in the same Tomcat.
@@ -74,20 +72,17 @@ or configure it permanently by setting the `MAVEN_OPTS` variable in System Setti
 
 ### System Requirements
 
-* JDK 11 or later (tested with JDK 11)
-* (WAR) Apache Tomcat 8.5 or 9.x (recommended) or any Servlet API 4-compatible application server
+* JDK 17 or later
+* (WAR) Apache Tomcat 10 or any Jakarta Servlet API 4-compatible application server
   * _For deployment of a WAR build artifact._
-  * Do not use Apache Tomcat 10.x, it is based on the new Jakarta EE and TermIt would not work on it due to package namespace issues (`javax` -> `jakarta`)
+  * Do not use Apache Tomcat 9.x or older, it is based on the old Java EE and TermIt would not work on it due to package namespace issues (`javax` -> `jakarta`)
 
 ### Setup
 
 Application deployment is simple - just deploy the WAR file (in case of the `war` Maven build profile) to an 
 application server or run the JAR file (in case of the `standalone` Maven build profile).
 
-What is important is the correct setup of the repository. We will describe two options:
-
-1. GraphDB
-2. RDF4J
+What is important is the correct setup of the repository.
 
 #### GraphDB
 
@@ -99,16 +94,16 @@ In order to support inference used by the application, a custom ruleset has to b
 4. Create the following Lucene connectors in GraphDB:
     * *Label index*
         * name: **label_index**
-        * Field name: **label**, **title** 
-        * Property chain: **http://www.w3.org/2000/01/rdf-schema#label**, **http://purl.org/dc/terms/title**
+        * Field names: **prefLabel**, **altLabel**, **hiddenLabel**, **title** 
+        * Property chains: **http://www.w3.org/2004/02/skos/core#prefLabel**, http://www.w3.org/2004/02/skos/core#altLabel**, **http://www.w3.org/2004/02/skos/core#hiddenLabel**, **http://purl.org/dc/terms/title**
         * Languages: _Leave empty (for indexing all languages) or specify the language tag - see below_
         * Types: **http://www.w3.org/2004/02/skos/core#Concept**, **http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/pojem/slovník**
         * Analyzer: Analyzer appropriate for the system language, e.g. **org.apache.lucene.analysis.cz.CzechAnalyzer**
     * *Definition and comment index*
         * name: **defcom_index**
-        * Field name: **definition**, **comment**, **description**
+        * Field name: **definition**, **scopeNote**, **description**
         * Languages: _Leave empty (for indexing all languages) or specify the language tag - see below_
-        * Property chain: **http://www.w3.org/2004/02/skos/core#definition**, **http://www.w3.org/2000/01/rdf-schema#comment**, **http://purl.org/dc/terms/description**
+        * Property chain: **http://www.w3.org/2004/02/skos/core#definition**, **http://www.w3.org/2004/02/skos/core#scopeNote**, **http://purl.org/dc/terms/description**
         * Types and Analyzer as above
 
 Language can be set for each connector. This is useful in case the data contain labels, definitions, and comments in multiple languages. In this case,
@@ -117,34 +112,13 @@ there is a term with label `území`@cs and `area`@en. Now, if no language is sp
 look as follows: `<em>území</em> area`, which may not be desired. If the connector language is set to `cs`, the result snippet will contain
 only `<em>území</em>`. See the [documentation](http://graphdb.ontotext.com/documentation/free/lucene-graphdb-connector.html) for more details.
 
-#### RDF4J
-
-In order to support the inference used by the application, new rules need to be added to RDF4J because its own RDFS rule engine does not
-support OWL stuff like inverse properties (which are used in the model).
-
-For RDF4J 2.x: 
-1. Start by creating an RDF4J repository of type **RDFS+SPIN with Lucene support**
-2. Upload SPIN rules from `rulesets/rules-termit-spin.ttl` into the repository
-3. There is no need to configure Lucene connectors, it by default indexes all properties in RDF4J (alternatively, it is possible
-to upload a repository configuration directly into the system repository - see examples at [[1]](https://github.com/eclipse/rdf4j/tree/master/core/repository/api/src/main/resources/org/eclipse/rdf4j/repository/config)
-4. -----
-
-For RDF4J 3.x: 
-1. Start by creating an RDF4J repository with RDFS and SPIN inference and Lucene support
-    * Copy repository configuration into the appropriate directory, as described at [[2]](https://rdf4j.eclipse.org/documentation/server-workbench-console/#repository-configuration)
-    * Native store with RDFS+SPIN and Lucene sample configuration is at [[3]](https://github.com/eclipse/rdf4j/blob/master/core/repository/api/src/main/resources/org/eclipse/rdf4j/repository/config/native-spin-rdfs-lucene.ttl)
-2. Upload SPIN rules from `rulesets/rules-termit-spin.ttl` into the repository
-3. There is no need to configure Lucene connectors, it by default indexes all properties in RDF4J
-4. -----
-
-#### Common
-
 TermIt needs the repository to provide some inference. Beside loading the appropriate rulesets (see above), it is also
 necessary to load the ontological models into the repository.
 
 5. Upload the following RDF files into the newly created repository:
     * `ontology/termit-glosář.ttl`
     * `ontology/termit-model.ttl`
+    * `ontology/sioc-ns.rdf`
     * `http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/model`
     * `http://onto.fel.cvut.cz/ontologies/slovník/agendový/popis-dat/glosář`
     * `https://www.w3.org/TR/skos-reference/skos.rdf`
@@ -203,4 +177,4 @@ TERMIT_SECURITY_PROVIDER=oidc
 TermIt will automatically configure its security accordingly
 (it is using Spring's [`ConditionalOnProperty`](https://www.baeldung.com/spring-conditionalonproperty)).
 
-**Note that termit-ui needs to be configured for mathcing authentication mode.**
+**Note that termit-ui needs to be configured for matching authentication mode.**
diff --git a/ontology/termit-glosář.ttl b/ontology/termit-glosář.ttl
@@ -640,3 +640,12 @@ termit-pojem:požadavek-na-změnu-hesla
             termit:glosář ;
     <http://www.w3.org/2004/02/skos/core#prefLabel>
             "Password reset request"@en , "Požadavek na změnu hesla"@cs .
+
+termit-pojem:má-adresu-modelovacího-nástroje
+    a       <http://www.w3.org/2004/02/skos/core#Concept> ;
+    <http://www.w3.org/2004/02/skos/core#broader>
+            <https://slovník.gov.cz/základní/pojem/vlastnost> , <https://slovník.gov.cz/základní/pojem/typ-vlastnosti> ;
+        <http://www.w3.org/2004/02/skos/core#inScheme>
+                termit:glosář ;
+        <http://www.w3.org/2004/02/skos/core#prefLabel>
+                "Has modeling tool address"@en , "Má adresu modelovacího nástroje"@cs .
diff --git a/ontology/termit-model.ttl b/ontology/termit-model.ttl
@@ -337,4 +337,8 @@ termit-pojem:koncový-stav-pojmu
 termit-pojem:požadavek-na-změnu-hesla
     a                <https://slovník.gov.cz/základní/pojem/typ-objektu>, owl:Class .
 
+termit-pojem:má-adresu-modelovacího-nástroje
+        a                   owl:AnnotationProperty , <https://slovník.gov.cz/základní/pojem/typ-vlastnosti> ;
+        rdfs:subPropertyOf  <https://slovník.gov.cz/základní/pojem/vlastnost> .
+
 
diff --git a/pom.xml b/pom.xml
@@ -7,11 +7,11 @@
     <parent>
         <groupId>org.springframework.boot</groupId>
         <artifactId>spring-boot-starter-parent</artifactId>
-        <version>3.3.3</version>
+        <version>3.3.4</version>
     </parent>
 
     <artifactId>termit</artifactId>
-    <version>3.2.0</version>
+    <version>3.3.0</version>
     <name>TermIt</name>
     <description>Terminology manager based on Semantic Web technologies.</description>
     <packaging>${packaging}</packaging>
@@ -28,10 +28,10 @@
 
     <properties>
         <java.version>17</java.version>
-        <org.apache.tika.tika-core.version>2.7.0</org.apache.tika.tika-core.version>
-        <org.mapstruct.version>1.6.0</org.mapstruct.version>
+        <org.apache.tika.tika-core.version>3.0.0</org.apache.tika.tika-core.version>
+        <org.mapstruct.version>1.6.2</org.mapstruct.version>
         <org.springdoc.version>2.6.0</org.springdoc.version>
-        <cz.cvut.kbss.jopa.version>2.0.5</cz.cvut.kbss.jopa.version>
+        <cz.cvut.kbss.jopa.version>2.1.0</cz.cvut.kbss.jopa.version>
         <cz.cvut.kbss.jsonld.version>0.15.0</cz.cvut.kbss.jsonld.version>
 
         <!-- Default value for deployment type property which should otherwise specified on command line -->
@@ -119,7 +119,7 @@
         <dependency>
             <groupId>com.github.ledsoft</groupId>
             <artifactId>jopa-spring-transaction</artifactId>
-            <version>0.3.0</version>
+            <version>0.3.1</version>
         </dependency>
 
         <!-- Spring -->
@@ -249,14 +249,14 @@
         <dependency>
             <groupId>org.jsoup</groupId>
             <artifactId>jsoup</artifactId>
-            <version>1.15.4</version>
+            <version>1.18.1</version>
         </dependency>
 
         <!-- Flexmark for handling Markdown -->
         <dependency>
             <groupId>com.vladsch.flexmark</groupId>
             <artifactId>flexmark-all</artifactId>
-            <version>0.64.6</version>
+            <version>0.64.8</version>
         </dependency>
 
         <!-- Logging -->
@@ -273,7 +273,7 @@
         <dependency>
             <groupId>org.apache.poi</groupId>
             <artifactId>poi-ooxml</artifactId>
-            <version>5.2.2</version>
+            <version>5.3.0</version>
         </dependency>
 
         <!-- I18n - language tags and languages -->
@@ -287,7 +287,7 @@
         <dependency>
             <groupId>org.apache.velocity</groupId>
             <artifactId>velocity-engine-core</artifactId>
-            <version>2.3</version>
+            <version>2.4</version>
         </dependency>
 
         <!-- Java Melody Monitoring -->
@@ -394,7 +394,7 @@
             </build>
         </profile>
 
-        <!-- Profiles for storages. Important for correct full text search functionality -->
+        <!-- Profile for GraphDB storage with Lucene connectors. Important for correct full text search functionality -->
         <profile>
             <id>graphdb</id>
             <properties>

diff --git a/profile/graphdb/query/fulltextsearch.rq b/profile/graphdb/query/fulltextsearch.rq
@@ -7,8 +7,9 @@ PREFIX inst: <http://www.ontotext.com/connectors/lucene/instance#>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 PREFIX dc: <http://purl.org/dc/terms/>
+PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
 
-SELECT DISTINCT ?entity ?label ?vocabularyUri ?state ?type ?snippetField ?snippetText ?score {
+SELECT DISTINCT ?entity ?label ?description ?vocabularyUri ?state ?type ?snippetField ?snippetText ?score {
   {
     ?search a inst:label_index .
   }
@@ -17,12 +18,21 @@ SELECT DISTINCT ?entity ?label ?vocabularyUri ?state ?type ?snippetField ?snippe
     ?search a inst:defcom_index .
   }
   {
-    ?entity rdfs:label ?label .
+    ?entity skos:prefLabel ?label .
+    OPTIONAL {
+        ?entity skos:definition ?definition .
+    }
+    OPTIONAL {
+        ?entity skos:scopeNote ?scopeNote .
+    }
   } UNION {
     ?entity dc:title ?label .
+    OPTIONAL {
+        ?entity dc:description ?dcDescription .
+    }
   }
     ?search :query ?wildCardSearchString ;
-          :snippetSize 2000 ;
+          :snippetSize 250 ;
           :entities ?entity .
   ?entity a ?type ;
           :score ?initScore ;
@@ -38,7 +48,9 @@ SELECT DISTINCT ?entity ?label ?vocabularyUri ?state ?type ?snippetField ?snippe
     FILTER (?type = ?term || ?type = ?vocabulary)
     FILTER NOT EXISTS { ?entity a ?snapshot . }
     FILTER (lang(?label) = ?langTag)
-    BIND(IF(lcase(str(?snippetText)) = lcase(str(?splitExactMatch)), ?initScore * 2, IF(CONTAINS(lcase(str(?snippetText)), ?searchString), IF(?snippetField = "label", ?initScore * 1.5, ?initScore), ?initScore)) as ?exactMatchScore)
-    BIND(IF(?snippetField = "label", ?exactMatchScore * 2, IF(?snippetField = "definition", ?exactMatchScore * 1.2, ?exactMatchScore)) as ?score)
+    BIND(COALESCE(?definition, COALESCE(?scopeNote, ?dcDescription)) AS ?description)
+    FILTER (!BOUND(?description) || lang(?description) = ?langTag)
+    BIND(IF(lcase(str(?snippetText)) = lcase(str(?splitExactMatch)), ?initScore * 2, IF(CONTAINS(lcase(str(?snippetText)), ?searchString), IF(?snippetField = "prefLabel", ?initScore * 1.5, ?initScore), ?initScore)) as ?exactMatchScore)
+    BIND(IF(?snippetField = "prefLabel", ?exactMatchScore * 2, IF(?snippetField = "definition", ?exactMatchScore * 1.2, ?exactMatchScore)) as ?score)
 }
 ORDER BY desc(?score)
diff --git a/src/main/java/cz/cvut/kbss/termit/dto/ConfigurationDto.java b/src/main/java/cz/cvut/kbss/termit/dto/ConfigurationDto.java
@@ -53,6 +53,9 @@ public class ConfigurationDto implements Serializable {
     @OWLDataProperty(iri = Vocabulary.s_p_ma_oddelovac_verze)
     private String versionSeparator;
 
+    @OWLAnnotationProperty(iri = Vocabulary.s_p_ma_adresu_modelovaciho_nastroje)
+    private String modelingToolUrl;
+
     public String getLanguage() {
         return language;
     }
@@ -92,4 +95,12 @@ public String getVersionSeparator() {
     public void setVersionSeparator(String versionSeparator) {
         this.versionSeparator = versionSeparator;
     }
+
+    public String getModelingToolUrl() {
+        return modelingToolUrl;
+    }
+
+    public void setModelingToolUrl(String modelingToolUrl) {
+        this.modelingToolUrl = modelingToolUrl;
+    }
 }