<aside> 💡 Create a new page and select Product Spec ****from the list of template options to automatically generate the format below.

</aside>

Questions

How do they generate Figure 3? So, in their Figure 3, Query is the joint vertex and key is the regular vertex. We are kind of like just measuring the soft-max score?
The Transformer encoder??? What is the inner structure? And, what is norm in their case? Seems like normalization??

Brief Introduction

Proposed a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. (Task description)
uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. (Method)
Input: a single image and mesh template. Output: a list of coordinate for each vertex.
This is not a GAN like model. It does have labels.
Human3.6M , 3DPW, FreiHAND (Dataset)

Intuition

NLP

I feel ...

In natural language, the words in each sentence must have some correlations. Like the above sentence, ... is following feel , it must be some adjective to describe emotion. How about mesh?

Inverse-Kinetics

Kinematics is the study of motion without considering the cause of the motion, such as forces and torques. Inverse kinematics is the use of kinematic equations to determine the motion of a robot to reach a desired position.