<aside>
💡 Create a new page and select Product Spec
****from the list of template options to automatically generate the format below.
</aside>
Questions
- How do they generate Figure 3? So, in their Figure 3, Query is the joint vertex and key is the regular vertex. We are kind of like just measuring the soft-max score?
- The Transformer encoder??? What is the inner structure? And, what is
norm
in their case? Seems like normalization??
Brief Introduction
- Proposed a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. (Task description)
- uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. (Method)
- Input: a single image and mesh template. Output: a list of coordinate for each vertex.
- This is not a GAN like model. It does have labels.
Human3.6M
, 3DPW
, FreiHAND
(Dataset)

Intuition
NLP
I feel ...
In natural language, the words in each sentence must have some correlations. Like the above sentence, ...
is following feel
, it must be some adjective to describe emotion. How about mesh?
Inverse-Kinetics
Kinematics is the study of motion without considering the cause of the motion, such as forces and torques. Inverse kinematics is the use of kinematic equations to determine the motion of a robot to reach a desired position.