Reading Time: 7 MIn
Some of you know I have been recently experimenting a bit more with WebXR than a WebVR and when we talk about mobile Mixed Reality, ARkit and ARCore is something which plays a pivotal role to map and understand the environment inside our applications.
Some of you know I have been recently experimenting a bit more with WebXR than a WebVR and when we talk about mobile Mixed Reality, ARkit and ARCore is something which plays a pivotal role to map and understand the environment inside our applications.
I am planning to write a series of blog posts on how you can start developing WebXR applications now and play with them starting with the basics and then going on to using different features of it. But before that, I planned to pen down this series of how actually the "world mapping" works in arcore and arkit. So that we have a better understanding of the Mixed Reality capabilities of the devices we will be working with.
Mapping: feature detection and anchors
Creating apps that work seamlessly with arcore/kit requires a little bit of knowledge about the algorithms that work in the back and that involves knowing about Anchors.
What are anchors:
Anchors are your virtual markers in the real world. As a developer, you anchor any virtual object to the real world and that 3d model will stay glued into the physical location of the real world. Anchors also get updated over time depending on the new information that the system learns. For example, if you anchor a pikcahu 2 ft away from you and then you actually walk towards your pikachu, it realises the actual distance is 2.1ft, then it will compensate that. In real life we have a static coordinate system where every object has it's own x,y,z coordinates. Anchors in devices like HoloLens override the rotation and position of the transform component.
How Anchors stick?
If we follow the documentation in Google ARCore then we see it is attached to something called "trackables", which are feature points and planes in an image. Planes are essentially clustered feature points. You can have a more in-depth look at what Google says ARCore anchor does by reading their really nice Fundamentals. But for our purposes, we first need to understand what exactly are these Feature Points.
Feature Points: through the eyes of computer vision
Feature points are distinctive markers on an image that an algorithm can use to track specific things in that image. Normally any distinctive pattern, such as T-junctions, corners are a good candidate. They lone are not too useful to distinguish between each other and reliable place marker on an image so the neighbouring pixels of that image are also analyzed and saved as a descriptor.
Now a good anchor should have reliable feature points attached to it. The algorithm must be able to find the same physical space under different viewpoints. It should be able to accommodate the change in
- Camera Perspective
- Rotation
- Scale
- Lightning
- Motion blur and noise
Reliable Feature Points
This is an open research problem with multiple solutions on board. Each with their own set of problems though. One of the most popular algorithms stem from this paper by David G. Lowe at IJCV is called Scale Invariant Feature Transform (SIFT). Another follow up work which claims to have even better speed was published in ECCV in 2006 called Speeded Up Robust Features (SURF) by Bay et al. Thought both of them are patented at this point.
Microsoft Hololens doesn't really need to do all this heavy lifting since it can rely on an extensive amount of sensor data. Especially it gets aid from the depth sensor data from its Infrared Sensors. However, ARCore and ARkit doesn't enjoy those privileges and has to work with 2d images. Though we cannot say for sure which of the algorithm is actually used for ARKit or ARCore we can try to replicate the procecss with a patent-free algorithm to understand how the process actually works.
D.I.Y Feature Detection and keypoint descriptors
To understand the process we will use an algorithm by Leutenegger et al called BRISK. To detect feature we must follow a multi-step process. A typical algorithm would adhere to the following two steps
- Keypoint Detection: Detecting keypoint can be as simple as just detecting corners. Which essentially evaluates the contrast between neighbouring pixels. A common way to do that in a large-scale image is to blur the image to smooth out pixel contrast variation and then do edge detection on it. The rationale for this is that you would normally want to detect a tree and a house as a whole to achieve reliable tracking instead of every single twig or window. SIFT and SURF adhere to this approach. However, for real-time scenario blurring adds a compute penalty which we don't want. In their paper "Machine Learning for High-Speed Corner Detection" Rosten and Drummond proposed a method called FAST which analyzes the circular surrounding if each pixel p. If the neighbouring pixels brightness is lower/higher than p and a certain number of connected pixels fall into this category then the algorithm found a corner.
Image credits: Rosten E., Drummond T. (2006) Machine Learning for High-Speed Corner Detection.. Computer Vision – ECCV 2006. - Keypoint Descriptor: The primary property of the detected keypoints should be that they are unique. The algorithm should be able to find the feature in a different image with a different viewpoint, lightning. BRISK concatenates the brightness comparison results between different pixels surrounding the centre keypoint and concatenates them to a 512 bit string.
Image credits: S. Leutenegger, M. Chli and R. Y. Siegwart, “BRISK: Binary Robust invariant scalable keypoints”, 2011 International Conference on Computer Vision
Test out the code
To test out the algorithms we will use the reference implementations available in OpenCV. We can use a pip install to install it with python bindings
As test image, I used two test images I had previously captured in my talks at All Things Open and TechSpeaker meetup. One is an evening shot of Louvre, which should have enough corners as well as overlapping edges, and another a picture of my friend in a beer garden in portrait mode. To see how it fares against the already existing blur on the image.
Original Image Link |
Original Image Link |
Visualizing the Feature Points
We use the below small code snippet to use BRISK on the two images above to visualize the feature points
What the code does:
- Loads the jpeg into the variable and converts into grayscale
- Initialize BRISK and run it. We use the paper suggested 4 octaves and an increased threshold of 70. This ensures we get a low number but highly reliable key points. As we will see below we still got a lot of key points
- We used the detectAndCompute() to get two arrays from the algo holding both the keypoints and their descriptors
- We draw the keypoints at their detected positions indicating keypoint size and orientation through circle diameter and angle, The "DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS" does that.
As you can see most of the key points are visible at the castle edges and all visible edges for Louvre and almost none on the floor. With the portrait mode pic of my friend it's more diverse but also shows some false positives like the reflection on the glass. Considering how BRISK works, this is normal.
Conclusion
In this first part of understanding what power ARCore/Kit we understood how basic feature detection works behind the scenes and tried our hands on to replicate that. These spatial anchors are vital to the virtual objects "glueing" to the real world. This whole demo and hands-on now should help you understand why designing your apps so that users place objects in areas where the device has a chance to create anchors is a good idea.
Placing a lot of objects in a single non-textured smooth plane may produce inconsistent experience since its just too hard to detect keypoints in a plane surface (now you know why). As a result, the objects may sometimes drift away if tracking is not good enough.
If your app design encourages placing objects near corners of floor or table then the app has a much better chance of working reliably. Also, Google's guide on anchor placement is an excellent read.
Their short recommendation is:
- Release spatial anchors if you don’t need them. Each anchor costs CPU cycles which you can save
- Keep the object close to the anchor.
On our next post, we will see how we can use this knowledge to do basic SLAM.
Update: The next post lives here: https://blog.rabimba.com/2018/10/arcore-and-arkit-SLAM.html
Update: The next post lives here: https://blog.rabimba.com/2018/10/arcore-and-arkit-SLAM.html
Comments
Post a Comment