Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using _Formatted value when searching is not user friendly #315

Open
juchom opened this issue Sep 1, 2022 · 16 comments · May be fixed by #406
Open

Using _Formatted value when searching is not user friendly #315

juchom opened this issue Sep 1, 2022 · 16 comments · May be fixed by #406
Labels
enhancement New feature or request question Further information is requested

Comments

@juchom
Copy link
Contributor

juchom commented Sep 1, 2022

While I was working on the code I saw something that is not user friendly for C# and Java developers.

If you look at the documentation for Highlighting : https://docs.meilisearch.com/reference/api/search.html#attributes-to-highlight

Here is the json response of the server :

{
  "id": 50393,
  "title": "Kung Fu Panda Holiday",
  "poster": "https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
  "overview": "The Winter Feast is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal Winter Feast at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
  "release_date": 1290729600,
  "_formatted": {
    "id": 50393,
    "title": "Kung Fu Panda Holiday",
    "poster": "https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
    "overview": "The <em>Winter Feast</em> is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal <em>Winter Feast</em> at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
    "release_date": 1290729600
  }
}

The server returns the document and add an extra field _formatted with a modified copy of the document.

Let see the problem now with C#:

We're going to work with this unit test :

public async Task CustomSearchWithAttributesToHighlight()
{
var newFilters = new Settings
{
FilterableAttributes = new string[] { "name" },
};
var task = await _basicIndex.UpdateSettingsAsync(newFilters);
task.TaskUid.Should().BeGreaterOrEqualTo(0);
await _basicIndex.WaitForTaskAsync(task.TaskUid);
var movies = await _basicIndex.SearchAsync<FormattedMovie>(
"man",
new SearchQuery { AttributesToHighlight = new string[] { "name" } });
movies.Hits.Should().NotBeEmpty();
movies.Hits.First().Id.Should().NotBeEmpty();
movies.Hits.First().Name.Should().NotBeEmpty();
movies.Hits.First().Genre.Should().NotBeEmpty();
movies.Hits.First()._Formatted.Name.Should().NotBeEmpty();
}

We are working on the _basicIndex and here is the definition of it :

public async Task<Index> SetUpBasicIndex(string indexUid)
{
var index = DefaultClient.Index(indexUid);
var movies = new[]
{
new Movie { Id = "10", Name = "Gladiator" },
new Movie { Id = "11", Name = "Interstellar" },
new Movie { Id = "12", Name = "Star Wars", Genre = "SF" },
new Movie { Id = "13", Name = "Harry Potter", Genre = "SF" },
new Movie { Id = "14", Name = "Iron Man", Genre = "Action" },
new Movie { Id = "15", Name = "Spider-Man", Genre = "Action" },
new Movie { Id = "16", Name = "Amélie Poulain", Genre = "French movie" },
};
var task = await index.AddDocumentsAsync(movies);
// Check the documents have been added
var finishedTask = await index.WaitForTaskAsync(task.TaskUid);
if (finishedTask.Status != TaskInfoStatus.Succeeded)
{
throw new Exception("The documents were not added during SetUpBasicIndex. Impossible to run the tests.");
}
return index;
}

We are adding a collection of Movie and here is the definition :

public class Movie
{
public string Id { get; set; }
public string Name { get; set; }
public string Genre { get; set; }
}

Now, on line 86 of the test we are doing a Search query against Meilisearch with this code :

     var movies = await _basicIndex.SearchAsync<FormattedMovie>( 
         "man", 
         new SearchQuery { AttributesToHighlight = new string[] { "name" } }); 

We are querying an index that contains Movie documents, but we ask to map to another class FormattedMovie with this definition :

public class FormattedMovie
{
public string Id { get; set; }
public string Name { get; set; }
public string Genre { get; set; }
#pragma warning disable SA1300
public Movie _Formatted { get; set; }
}

We have to duplicate the Movie class with a FormattedMovie class in order to have the _Formatted property available.

This is not very user friendly, because users will have to create two objects in order to fully use Meilisearch.

There is no way in C# to create a specific object that will enhance a base object with a _Formatted property like this :

public class FormattedDocument<T> : T
{
    public T _Formatted { get; set; }
}

There is actually also a small risk of collision if a document contains a _formatted property.

I told Guillaume it would great to have a response with an original and formatted properties that contains both a version of the document.

{
   "original":{
      "id":50393,
      "title":"Kung Fu Panda Holiday",
      "poster":"https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
      "overview":"The Winter Feast is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal Winter Feast at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
      "release_date":1290729600
   },
   "formatted":{
      "id":50393,
      "title":"Kung Fu Panda Holiday",
      "poster":"https://image.tmdb.org/t/p/w1280/gp18R42TbSUlw9VnXFqyecm52lq.jpg",
      "overview":"The <em>Winter Feast</em> is Po's favorite holiday. Every year he and his father hang decorations, cook together, and serve noodle soup to the villagers. But this year Shifu informs Po that as Dragon Warrior, it is his duty to host the formal <em>Winter Feast</em> at the Jade Palace. Po is caught between his obligations as the Dragon Warrior and his family traditions: between Shifu and Mr. Ping.",
      "release_date":1290729600
   }
}

One more interesting thing in having a response like the one above is that if you want to add extra information like the global score of this result, or a score per field, you could enhance it easily without any risk of fields collisions.

But it's too late for breaking changes. So we need to find a way for developers (C# / Java) to enjoy this feature without too much pain.

Ping @gmourier @brunoocasali

@juchom
Copy link
Contributor Author

juchom commented Sep 2, 2022

I just checked the rust client for this issue, I'm no expert but according to the doc :

https://github.com/meilisearch/meilisearch-rust/blob/261fa529e795a831035c7fb30f27313ea9496af7/src/search.rs#L12-L25

This works thanks to serde, who has the ability to flatten structs.

The unit tests in Java are passing because the Movie has a _formatted property, which is not supposed to be defined in the definition of a movie.

https://github.com/meilisearch/meilisearch-java/blob/e157c2dbb014aa581a887585890dbf2ee1947c4d/src/test/java/com/meilisearch/sdk/utils/Movie.java#L5-L15

In C# we saw, that we create an index with a Movie class and query with a FormattedMovie class.

The way the response is sent actually does not play well with strongly type languages.

In C# and Java, the unit tests are not really well shaped, in rust this works because of the flattening feature of serde.

I think, everything is here about this issue.

@gmourier
Copy link
Member

gmourier commented Sep 2, 2022

Thank you, @juchom, for your research and this very well-detailed issue.

But it's too late for breaking changes.

Indeed, we want 0 breaking changes to not involve any significant and tedious changes until 1.0, especially on the search endpoint.

Nevertheless, this is something we may consider after having reached a v1.0. We know that we will have to make some breaking changes afterward, but it's unclear how we will organize all this to be the most convenient for the users and us in terms of management. We are working on it to elaborate strategies to handle this type of change in the future.

@brunoocasali
Copy link
Member

@juchom, thanks for the issue. Indeed very well explained, and I could feel the pain here 😞

Do you think it could be possible to work around this problem internally in the meilisearch-dotnet by adding a post json processing before the actual object creation?

return await responseMessage.Content
                .ReadFromJsonAsync<SearchResult<T>>(cancellationToken: cancellationToken).ConfigureAwait(false);

@juchom
Copy link
Contributor Author

juchom commented Sep 2, 2022

Good question ? Very good question...

Digging further, I found this issue opened in System.Text.Json in order to have a JsonPropertyFlatten attribute like in serde : dotnet/runtime#55120

Not sure this is going to happen soon and there is nothing that could help with it on Github.

We only have bad solutions right now,

The worst one, trying to create our own JsonPropertyFlatten and create an object like in rust client, not sur how long would it take to do it, test it, avoid too bad performance but it would look like something like this in the end.

public class SearchResult<T>
{
    [JsonPropertyFlatten]
    public T Result { get; set; }

    [JsonPropertyName("_formatted")]
    public T FormattedResult { get; set; }
}

Writing this, just made me realised, that the rust client is already patching the json response to map to the proposed json response thanks to serde's flatten attribute.

Second option, bad developer experience but it would work without side effect if we use custom json options write a clear documentation on how to deal with it.

In my case I'm going to wrap everything where needed and create the second class by hand.

// My indexed object
public class Movie 
{ 
    public string Id { get; set; } 
  
    public string Name { get; set; } 
  
    public string Genre { get; set; } 
} 

// The search result class, a bit like the unit tests, but we keep refactoring tools working
public class MovieSearchResult : Movie
{
    [JsonPropertyName("_formatted")]
    public Movie Formatted { get; set; } 
} 

Maybe a source generator could help doing this second class automatically for the developer, but I have never used them before and I'm not sure it would be good idea to offer this option if it works. (Andrew Lock has a very good blog about them : https://andrewlock.net/creating-a-source-generator-part-1-creating-an-incremental-source-generator/)

In my case I don't have a bunch of documents so doing it by hand is ok, but for people with bigger projects source generators is probably the best bet.

Concerning Java, I think code generators exist but I have no idea of what we can do. There are some project on github to flatten json objects, but it seems that it's not following the same pattern like serde.

In my case I want to be extra safe, I would go with writing a second inherited class and let System.Text.Json do the job.

Concerning Meilisearch server, here is the response object :

https://github.com/meilisearch/meilisearch/blob/c4453340707f0b8ed6168118dff2e4d2ae634f8d/meilisearch-lib/src/index/search.rs#L86-L94

/// A single result.
/// Contains the complete object, optionally the formatted object, and optionally an object that contains information about the matches.
#[derive(Deserialize, Debug)]
pub struct SearchResult<T> {
    /// The full result.
    #[serde(flatten)]
    pub result: T,
    /// The formatted result.
    #[serde(rename = "_formatted")]
    pub formatted_result: Option<Map<String, Value>>,
    /// The object that contains information about the matches.
    #[serde(rename = "_matchesPosition")]
    pub matches_position: Option<HashMap<String, Vec<MatchRange>>>,
}

Without #[serde(flatten)] this object looks like the proposed json response.

I would encourage you to think twice before using #[serde(flatten)], this is making complex transformations when serializing / deserializing object to json. This suppose that this transformation must be available in every langages with all the same rules.

Just to be clear, there is no arrogance in my last sentence, I have been caught in this trap once it was a GUID serialization / deserialization issue between Python and C#.

And again your are doing a really good job and Meilisearch is a really great product 👍

@alallema alallema added enhancement New feature or request question Further information is requested labels Sep 21, 2022
@FelixLorenz
Copy link

Thanks @juchom ! Came across this caveat and could immediately workaround it with your approach.

@Christoph-Wagner
Copy link

As the SDK does not support highlights, maybe the readme should mention it? It even gives an example using AttributesToHighlight without mentioning that this won’t actually work.

@FelixLorenz
Copy link

For me it works fine, I got highlighting in all "attrA", "attrB", "attrC".

var myIndexSettings = new Settings
{
        DisplayedAttributes = new string[] { "*" },
        SearchableAttributes = new string[] { "attrA", "attrB", "attrC" },
};

var result = await Index.SearchAsync<MyType>(
                  queryString,
                  new SearchQuery()
                  {
                      AttributesToHighlight = new string[] { "attrA", "attrB", "attrC" }, // will be returned formatted
                      AttributesToCrop = new string[] { "attrC" }, // will be returned formatted
                      HighlightPreTag = "<mark>",
                      HighlightPostTag = "</mark>",
                      CropLength = 60,
                  },
                  cancellationToken: cancellationToken
                  );

@Christoph-Wagner
Copy link

That is weird. Is that on the master build, or the currently released NuGet build?

If I call it with AttributesToHighlight = new string[] { "text" }, I get no formatting, but if I inspect the HttpResponseMessage, I see the "_formatted" object in JSON that has the expected highlighting.

@FelixLorenz
Copy link

FelixLorenz commented Feb 9, 2023

Did you implement juchom's solution with the search result class that inherits from the actual type you have an index for?
If not, the formatted json prop will be omitted during deserialization since there is no matching prop in your type.

@Christoph-Wagner
Copy link

No, that’s my whole point, that it’s not supported out of the box.

@juchom
Copy link
Contributor Author

juchom commented Feb 13, 2023

@Christoph-Wagner that's right, it's not supported directly because there is no clean strategy to support this json transformation made server side.

This issue has been discovered after the api freeze this is why it works like this for now and you have to chose what best fit your needs.

Now that v1 is out, maybe @gmourier or @brunoocasali may have some insight on this issue.

@ahmednfwela
Copy link
Collaborator

@juchom Recently while working on the meilisearch flutter ui kit, I tried an approach, where we wrapped all the returned results from meilisearch with a helper object that would understand all the extra fields added by meilisearch and parse them correctly.
Relevant code:
https://github.com/meilisearch/meilisearch-flutter/blob/58ccfc5a4cb4bea36fee0fa52817c70e5d3c4944/lib/src/models/result_container.dart#L3-L16

class MeilisearchResultContainer<T> {
  final T parsed;
  final Map<String, dynamic> src;
  final Map<String, dynamic>? formatted;
  final Searcheable<Map<String, Object?>> fromResult;
  final SearchQuery fromQuery;

  MeilisearchResultContainer({
    required this.src,
    required this.parsed,
    required this.fromQuery,
    required this.fromResult,
  }) : formatted = src['_formatted'];
}

it would also reference the source query and the parent search result, but these can be omitted of course.

this made handling meilisearch specific fields way easier

@juchom
Copy link
Contributor Author

juchom commented Jun 29, 2023

If the product team decide to keep the original object without transformation and make this change soon, then I think it will be better to wait.

Otherwise, it will be probably wise to use your work and provide a solution that makes dotnet devs life easier.

@brunoocasali
Copy link
Member

I can safely state that a v2 is not in our current plans. And also, the same behavior is being used in other features like the new vector store feature (_semanticSimilarity, _vectors), and ranking details (_rankingScore, _rankingScoreDetails).

So I think if the implementation made by @ahmednfwela is good enough, we should definitely merge it :)

Then in the future, when Meilisearch starts to plan a possible v2, we can decide to go for a meta: {} key that wraps all those keys at once or something similar if needed.

Are you ok with that @juchom and @ahmednfwela?

meili-bors bot added a commit that referenced this issue Mar 11, 2024
528: Implements show ranking scores in search query r=curquiza a=massijay

# Pull Request

## Related issue
Fixes #458 

## What does this PR do?
- Implements show ranking scores in search query
- Does not implement a user friendly way to retrieve the score in the response (impossible to do without making breaking changes, see also #315, #426) users have to add the field in their classes themselves

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Massimiliano Cristarella <[email protected]>
Co-authored-by: Massimiliano Cristarella <[email protected]>
Co-authored-by: Clémentine U. - curqui <[email protected]>
@MoienTajik
Copy link

I have the same issue. Any updates on this? There are workarounds to fix it, but as other people mentioned, the SearchAsync method offers highlighting. You would be surprised that it doesn't work out of the box. Fixing it using either a custom JSON converter or code generator is better than nothing, IMO.

@curquiza
Copy link
Member

curquiza commented Jul 10, 2024

Hello @MoienTajik
Sorry, it's huge work. @ahmednfwela started something 🙏 but since it's huge, we did not move forward recently, sorry.
Trying to see if it's worth continuing the work

#406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants