-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace HoG with other features #64
Comments
I wrote the features = esvm_features(I,sbin) To encode the dimensionality of the features, dim = esvm_features() If you are working with a dense SIFT descriptor, think about what the output dimensionality will be in your case. If you associate a 128-D SIFT vector with every pixel in the image, then you are working with sbin=1. Here is some pseudocode of what you need to do to integrate SIFT into this framework:
|
Hi, Tomasz, One more question about "M". |
You'll have to look closely at the code: by default I was employing a method which is more powerful than Platt's calibration method. This method involves a "boosting" matrix which couples the activations of the different exemplars. It is a very simple equation, but I pulled it out of thin air. I never got learning to improve the result over my own simple heuristic. To learn more, I refer you to my doctoral dissertation, Section 3.3 Exemplar Co-occurrence Matrices. I'm not sure simply setting M.C=[] is enough, there is a mechanism for disabling the matrix part, but it will involve you poking around the code. |
Thanks! I will read you doctoral dissertation. Another problem about replacing the features.
Since HMP features only return 2 dimensional matrix, different from blocked based sift and HoG, I found errors during |
HOG is not typically used for image classification tasks, but instead object detection tasks. When applying dense SIFT or GIST (or engineering your own feature) for image classification, you have to only think about the function In object detection, you want to use something like pyramid of cell-based features. HOG is the prime example of such a cell-based feature. Let a cell be of size If I is an 80x80x3 image, You can then slide templates of arbitrary sizes such as 4x4xF, 2x6xF, etc using HOG computation is perform at the image pyramid level, while older vision systems would typically enumerate regions first, then compute f(subI) for each sub image It's not easy to just throw any feature you want at the problem. There is a quite a bit of engineering in any reasonably sophisticated vision system. I would suggest just talking to a few vision PhD engineers, they should be able to help you! Good luck! |
Hi, we came up with an idea to use HMP features with ESVM for object classification. We don't need sliding window detection in object recognition.
One question about Thanks a lot! |
Hi Angela, The In the exemplarSVM case I tried making templates which have roughly 100 bins (
In your single-feature-vector-per-image scenario, you should consider if your images all have the same size or not. The notion of initializing a template is a bit different than in the object detection case (where objects can have dramatically different aspect ratios), so you'll have to think a bit about your problem formulation and modify the code accordingly. It seems like Cheers! |
Hi Tomasz, Since HMP feature extractor is time consuming, I am curious about how many times I trace the codes, and what I know is that: Consequently, the number of calling |
Hi Angela,
However, most people like to use heavy-weight computer vision features which might take 1-10 seconds (or 1-10 minutes) per image. In that case, it makes sense to store them on disk. I think you are venturing into territory which is outside the domain of object detection, and you might need to perform some serious surgery on the ExemplarSVM codebase to get the effect you want. Here are some details on hard-negative mining: The idea is to maintain in memory the examples which incur a non-zero loss for the SVM objective function. This are positive examples which score below +1 and negative examples which score above -1. In the ExemplarSVM case, there is only one positive example which is fixed after initialization and the negative examples are automatically mined. Let's say we start with 1 positive and 0 negatives, but we have a negative cache which can hold %some pseudocode for hard negative mining
esvm = init_esvm
for image in images
%compute detections in this image
dets = esvm_detect(esvm, image)
% add negatives to the cache
esvm = add_negatives(esvm, dets)
% update the SVM using liblinear, libsvm, or your very own SVM library
esvm = update_svm(esvm)
%Keep at most `MAX_NEG` negatives.
esvm = prune_negatives(MAX_NEG)
end |
Hi Tomaz, I read the codes, and have a few questions:
I incorporated HMP features (1x1x112000) in esvm, and removed pyramid by making MAXLEVEL equal to 1 (so that no sliding window). However, the accuracy is lower than expectation, and other features (HoG & SIFT). I want to figure out whether I misuse codes or just HMP featuares are not appropriate for this algorithm. |
I don't know what HMP is, nor what it captures. But please take a detailed look at your code and make sure the image being fed into the features is the full-sized image and not a down-sampled version. In other words, please make sure that whatever you did to disable the sliding window operation, you are keeping the highest resolution image and not the lowest resolution image. In my experience, the performance of a recognition/detection algorithm depends on 2 things: the interplay of features / learning algorithm, and how many years you spend working on the problem. There's something really nice about HOG+LinearSVM, probably because Navneet Dalal optimized HOG for use in this scenario. Don't be surprised that changing one of these components drastically reduced performance. If you really feel you need to use HMP, consider designing your own learning algorithm. Cheers and good luck! |
Hi, Tomasz, There are two details I would like to know more:
Since each iteration of hard negative mining keeps support vectors whose scores are greater than -1, will the initialized w and b be influential to choose support vectors?
My understanding is that: Your explanation above helps me understanding this paper a lot! |
Hi Angela, 1.) The initialization is just a simple mechanism for creating a mean-zero vector which can be used to detect the exemplar in its originating image. I made this initialization after I observed that all learned Exemplar-SVMs resembled the raw HOG features of the positive, but the learned hyperplane was mean-zero. It's just one of those tricks (among others) which I must have pulled out of thin air after endless nights of hacking at CMU. Yes, the initialization does affect the support vectors. In theory, if you make multiple passes over the data, the initialization will not matter. If you remember from ML101, a linear SVM is a convex problem, and if you remember from Felszenwalb et al's DPM work, under a reasonable mining strategy, negative hard mining will give you the same answer as loading all data into memory. 2.) The boosting matrix does give a reasonable boost over raw ExemplarSVMs, even over per-exemplar calibrated ones. I do have to admit that I spent months trying to "learn" this boosting matrix, but I could never get the overfitting under control. This was yet another heuristic I pulled out of thin air (and after ~30 nights of trying it the ML way). This heuristic really does have a simple form, and if I remember correctly, I added In the case of false-positives, there are scenarios where a single detection window will have a large score because some image gradients accidentally lined up to look like the object of interest. In these scenarios, there is a single large "max score" but nearby windows score below -1. This intuition is not handled by non-maximum suppression, at least not in the Felzenszwalb et al, non-max suppression. That is why I favored ExemplarSVMs which have many high scoring windows. You'll have to look at the code in detail to see exactly what I did. I hope this helps. Good luck! --Tomasz |
Hi,
I was trying to replace HoG features with dense SIFT features.
However, I found esvm_detect.m will change feature dimension especially for HoG.
Simply replacing model.x with SIFT feature will not work.
Is there any suggestion helping me develop the codes (replace HoG with other features)?
Thanks.
The text was updated successfully, but these errors were encountered: