Skip to content

Detection process

Riley edited this page Mar 21, 2024 · 7 revisions

This page describes how an item is identified in the screenshot as well as what affects accuracy. To see the accuracy data, please read Test data results.

Finding a unique item in a screenshot

First, we find the guides:

guides

There are two version: one line (unidentified item) and two line (identified item). Both are tried and the best result wins. This also gives us information about whether the item is identified or not. Identified can be either corrupted or the player identified it and dropped back on the ground and wants to record it.

Then the guide start helps us crop the item image. Since we don't know what item it is yet, we cannot crop out just the item, we have to crop the maximum size, which is 104x208px, which accounts also for some room around the item, just in case.

crop1image

The next step is to crop out the item title. For unidentified items, it is just the base name and for identified items, it includes the item name. This image part is then processed using Tesseract OCR. Some preprocessing is required to improve the accuracy of OCR, such as adding a 1px white border around the image.

crop2

The output from Tesseract is not ideal, so some clean up is required. Only letters are allowed (which is why item's file attribute doesn't contain apostrophes) and words shorter than 3 letters are discarded. Then based on whether the item is identified or not, we either parse just the item base or the name, too. The parsed base is then compared against the list of known item bases and the optional item name is also compared against the list of items in items.csv. If the base isn't detected, the whole detection fails. If the item name isn't detected, the program will continue as if the item was unidentified and use other methods.

Finally, some "manual" corrections are required, like translating "Ruy Ring" to "Ruby Ring" and similar. Tesseract is a really great software but it isn't always 100% accurate.

At the end of this step, we have the following information:

  • Cropped item image
  • Base name
  • (Optional) item name
  • Whether the item is identified or not

Detection by item name

If the previous step got us the correct item name, the detection is done and we have 100% accurate result.

Detection by item base

If the unique item is the only one for a given base, the detection is done and again we have 100% accurate result.

Note

This is where the attributes enabled (drop-disabled items) and global (specific drops) come in handy to limit the amount of possible uniques per base.

Template matching

If neither item name or "only base" methods yield results, we attempt to match the item against its artwork. Thanks to knowing the base name we can only match the unknown image against a smaller set of items. These items have known dimensions and the base is the same, which allows us to crop the unique item even further, from 104x208px to let's say 52x52px for ring, amulets and such. This greatly improves accuracy and if the detection fails, it might be that items.csv has an incorrect value in width or height.

Generating socket variants

For better accuracy, we generate sockets onto the item image. This is what item's sockets and cols attributes are for. If there can be any sockets, we generate [1, sockets] sockets, in 4 variants: red, green, blue and white. This can give us up to 24 variants for an item with 6 sockets.

All variants are then tried using OpenCV's template matching. The result is a value called min_val which is in range [0, 1], 0 being a perfect match, 1 meaning not matched at all. We take the lowest min_val out of all tried variants and return this result.

Now we have exactly 1 result for each unique item for a given base. Again, the one with the lowest min_val wins.

Histogram comparison

For some items the template matching doesn't provide accurate enough results. For such cases, we use OpenCV's histogram comparison. This gives us a value we call hist_val and the process of selecting the best result is the same as with min_val - the lowest wins.

The best example of items that need histogram comparison is Berek's Pass, Berek's Grip and Berek's Respite. Since the template matching only uses grayscale images, the subtle difference between the colored stones cannot be detected. However, a histogram provides the information about colors in the image, which is why Two-Stone Rings are always matched using histogram comparison.

Solaris Circlet

For items Flamesight, Galesight and Thundersight, all previous methods will fail. Template matching will get us all three circlets with similar min_val, but histogram comparison is impossible, since all three can have up to four sockets with different colors. The only thing that's constant in the images are the gems located on the left and right sides of the circlet, with only the left one being unobstructed enough. Therefore, in this case we can crop out the gem and compute average colors for each channel (R, G, B) and pick the correct one based on that:

image

Picking aliased items

If the process finds that the identified item is an alias, the non-aliased version of an item is returned. For example, Agnerod West will always be detected as Agnerod South.

Debugging the detection process

There is a lot of information in the log (data/logs or terminal output). However, there is also a tool specifically written to help with the detection. Run it like this: python3 match.py <path to screenshot> --html. This will generate a web page with all known information about the matched item and possible results, with item images, their progression through the code and links to poewiki.net.

Clone this wiki locally