Calculating 6D Estimation (3 Translations and 3 Rotations) from Two Orthogonal Detectors Output ((X, Y, Thea, Roll)

Hello everyone,
I am currently simulating a 2D-to-3D image registration pipeline to estimate 6D transformations (3D translations and 3 rotations) using two orthogonal X-ray detectors. From each detector, I can calculate the in-plane transformations (X, Y, Theta) and the out-of-plane rotations (Roll).
My question is whether it is possible to compute the full 6D transformation (3 translations and 3 rotations) using the information provided (X, Y, Theta, Roll).
I would greatly appreciate any insights or suggestions regarding this topic.
Thank you!

2 comentarios

What does "out-of-plane roll" mean? How can a 2D x-ray image "roll out" of the plane that it lives in? And what is the axis of rotation we are talking about? Is the roll a rotation about the pixel rows? The columns?
I have attached an image that I believe will help to understand the 3D coordinates and their relationship with the two 2D coordinates from the two detectors.
How can a 2D X-ray image "roll out" of the plane it resides in? When we have a 3D object and rotate it, both detectors can capture images (lte's called sets of DRRs references) for out-of-plane rotations. It's important to note that the direction of axis xA in the coordinates of projection A is opposite to that of axis x in the 3D coordinates. Meanwhile, the direction of axis xB in the coordinates of projection B aligns with the direction of axis x in the 3D coordinates.
I believe (though I’m not entirely certain) that this problem can be addressed by defining another 3D coordinate system to match the outputs of the two orthogonal detectors. This system would relate the parameters (X, Y, Theta, Roll) to the real 6D estimation, which includes 3 translations and 3 rotations.
To achieve this, the 2D in-plane transformation (X, Y, Theta) can be estimated through 2D–2D image comparison, while the out-of-plane rotations (Roll) can be determined by best matching the X-ray image to the set of DRR references.

Iniciar sesión para comentar.

 Respuesta aceptada

From each detector, I can calculate the in-plane transformations (X, Y, Theta) and the out-of-plane rotations (Roll)....My question is whether it is possible to compute the full 6D transformation (3 translations and 3 rotations) using the information provided (X, Y, Theta, Roll).
I don't know if it's possible mathematically, but if it is, you might be able to to do a quick iterative solve with fsolve or lsqnonlin, rather than seek some closed form algorithm.
The way I imagine that is sort of like what @Matt J said. Construct a fake 3D object with point markers. Then code a model function which takes a vector of unknown 6D pose parameters [p1,p2,..., p6] as input and maps it to the 4-tuplet (X,Y,Theta,Roll) expected from each of the detectors -
function F=forwardModel(p)
movedPoints3D = applyPose(Points3D,p) %move the points in 3D
movedPoints2D = projectToDetector(movedPoins3D); %project the points
[X1,Y1,Theta1,Roll1] = getInplaneMotion(movedPoints2D,___); %use your algorithm for detector 1
[X2,Y2,Theta2,Roll2] = getInplaneMotion(movedPoints2D,__); %use your algorithm for detector 2
F=[X1,Y1,Theta1,Roll1, X2,Y2,Theta2,Roll2];
end
This forwadModel() should be pretty fast since it just works with points, rather than full images. Now you solve for p with something like,
p = lsqnonlin(@(p) forwardModel(p) - Fmeas, p0) %p0 is the required initial guess
where Fmeas contains your prior calculations of [X1,Y1,Theta1,Roll1, X2,Y2,Theta2,Roll2] from the actual measured x-ray images.

20 comentarios

Thank you for your response.
Let me clarify that I found the relationship between the 3D rotations (roll(X), pitch(Y), yaw(Z)) and the 3D translations (LR, AP, SI), according to the output of two orthogonal detectors. (Please see "reconstructPose" function).
To test the model accuracy, I consider 17 tests (lines 325-391), in which 14 tests passed successfully; however, I encountered issues in tests 8 and 9, where I was unable to estimate the true direction of the AP.
To further investigate, I performed an additional 16 tests to examine the impact of both roll(X) and pitch(Y) on the translations [LR, AP, SI] (see pages 69-84 of Text3c.docx). The findings indicate that, at times, the algorithm converges to local minima, notably in cases 2, 4, 14, and 16.
Now, I am encountering a local minima problem (estimating the 3D traslation), and I would greatly appreciate any insights or suggestions regarding this code.
I would like to mention two important points. First, I will post my code along with a detailed report (Text3c.docx). I believe this will help convey my thoughts effectively. Second, I conducted another 100 random sample tests and calculated the error between the true 6D values and the estimated 6D values. I found that the 3D rotation error is consistently under 1 mm. However, the 3D translation sometimes gets stuck in a local minimum, resulting in errors that exceed 1 mm.
Thank you in advance!
Sorry, I cannot understand the document. There is too much material. What exactly is the indication that a local minimum is reached?
If the ground truth rotation/translation is not found, it doesn't necessarily mean the minimum is non-global. It can mean simply that no exact solution to the inverse problem exists. fsolve will give you a least squares solution in that case.
Thank you for your response. Let me provide some details about the code. In lines 190-225 and lines 244-292, I define parameters related to the simulation and detector features. Following that, in line 231, I read the CT volume (CT_13.nii.gz).
My main goal is to perform 2D-3D image registration, so I need to generate X-ray projections from the CT. In lines 295-311, I create two X-ray projections in the AP and LAT directions. The core idea of my approach is in line 315, where I call the function "generateAccurateDRRSets." In this function, I rotate the CT volume (from -5 to 5 degrees) around the Z and Y axes and create two sets of DRRs (DRR_set_A and DRR_set_B). This information will assist me in estimating in-plane rotation, which I will discuss next (Phase 2).
In lines 325-389, I aim to move the CT in 6D, for example:
- `ground_truth.translation = [-2.3, 20, 10];` % Translation in mm [LR, AP, SI]
- `ground_truth.rotation = [1, -1.5, 2];` % Rotation in degrees [roll(X), pitch(Y), yaw(Z)]
After this, I generate a new set of X-ray shifts (line 394). Now, I can use my model "registerOneProjection" to estimate these movements based on the outputs from each detector (result_A and result_B). This function considers three phases:
1. Phase 1 (in-plane translation): Use SSD with multiresolution matching to estimate (X, Y, and Theta).
2. Phase 2 (in-plane rotation): Use optimized pattern intensity to estimate the Roll.
3. Phase 3 (refinement): Use optimized pattern intensity for both in-plane translation and roll.
Finally, I combine the projections (result_A and result_B) and estimate the 6D motion using "reconstructPose."
As I mentioned in my previous post, to test the model's accuracy, I conducted 17 tests (lines 325-391), of which 14 tests passed successfully. However, I encountered issues in tests 8 and 9, where I couldn't estimate the true direction of the AP. (You can test other cases by uncommenting them.) Additionally, I performed an extra 16 tests to examine the impact of both roll (X) and pitch (Y) on the translations [LR, AP, SI] (see pages 69-84 of Text3c.docx). The findings indicate that, at times, the algorithm converges to local minima, particularly in cases 2, 4, 14, and 16.
I tested another 300 different samples, and the results show that the accuracy of the 3D rotation estimation is within 1 mm. However, in some cases, I am facing challenges with estimating the 3D translation. So, I am encountering a local minima problem—possibly due to the "coordinateSearchRefinement" function, though I'm not entirely sure.
I would greatly appreciate any insights or suggestions regarding this code.
That is a very long summary of things I believe you already told us previously. I cannot see how it is responsive to my last comment.
You say that "I am facing challenges with estimating the 3D translation. So, I am encountering a local minima problem". But how do you know this is because of local minima? That is what I asked you before. Again, just because you don't get an estimate that is as accurate as you want, that doesn't prove that you are getting stuck in a local minimum. Global minima can be inaccurate as well.
I apologize for my misunderstanding.
In my recent tests with an additional 300 different samples (see attached file), the results indicate that the accuracy of the 3D rotation error is within 1 mm. However, the 3D translation error exceeds 1 mm. I suspect that the code may have a local minimum issue. What are your thoughts on this?
To verify if it is a local minimum, initialize the optimization with the ground truth values and with points close to the ground truth values and see if the iterations converge to ground truth.
I conducted this test:
%Test 8:
ground_truth.translation = [0.0, 0.0, 0.0]; % y, x, z % Translation in mm [LR, AP, SI]
ground_truth.rotation = [-2.0, 3.0, 0.0]; % Rotation in degrees [roll(X), pitch(Y), yaw(Z)]
For the initial calculations using the function `multiresolutionMatchingMultiCandidate`, I obtained the following results:
1. When I set X (zero), Y (zero), and theta (estimated from the function), the output was:
- translation_3d = [-2.5665, 2.0605, 10.7125]
- rotation_3d = [-0.0230, 0.0220, 3.0769]
2. When I set X (zero), Y (zero), and theta (zero), the output was:
- translation_3d = [4.3688, -9.4341, 8.3020]
- rotation_3d = [-0.1060, 0.0750, 0.0413]
Additionally, I noticed that after moving my CT volume with a roll of -2 degrees and a pitch of +3 degrees, there is a shift in the X-ray projections (attached figure). Despite no movement in 6D, why do I observe some movement in the Y direction for each projection? Is it normal to see shifts in both the X and Y directions of the detectors when roll and pitch are applied?
@payam samadi I'm afraid I still don't understand the connection between your last comment and the previous ones by me. What is the purpose of setting X,Y, and theta? Aren't they constant input data to the estimation?
If you are implementing my Answer above, then you are iterating over a 6-vector [p1,...,p6] using fsolve. As @Matt J said, if you are wondering about whether these iterations are getting stuck in a local minimum, then the way to test that is to initialize fsolve with values very near to the simulated ground-truth values of [p1,...,p6]
If you are doing something else entirely, not using fsolve or the approach I have described, then I have no advice to give.
First, thanks for your time and response. Second, I have developed a method that does not involve iterating over a 6-vector [p1, ..., p6] using fsolve. The reason for my approach is that it estimates 6D from the outputs of two detectors, eliminating the need for further iteration or fsolve. However, the problem is that in some simulations, my model gets stuck in a local minimum.
Dear Matt,
I have one more question about my previous post.
I mentioned that I noticed that after moving my CT volume with a roll of -2 degrees and a pitch of +3 degrees, there is a shift in the X-ray projections (attached figure). Despite no movement in 3D translation, why do I observe some movement in the Y direction for each projection and small movement in the X direction? Is it normal to see shifts in both the X and Y directions of the detectors when only roll and pitch are applied?
Second, I have developed a method that does not involve iterating over a 6-vector [p1, ..., p6] using fsolve.
If you are not usig iterative minimization, it is not clear what you mean you say you are getting stuck in a local minimum. You need to be minimizing something in order for that to happen.
Is it normal to see shifts in both the X and Y directions of the detectors when only roll and pitch are applied?
Nothing in your posted images looks abnormal.
Thank you for your reply. In my phase 3, I utilized iterative refinement, specifically employing a method called "cordinateSearchRefinement" function. I developed my own objective function to minimize the "optimizedPatternIntensity". Additionally, I switched ""cordinateSearchRefinement" function to gradient-based methods such as fminunc, fminsearch, fmincon, and lsqnonlin. However, I still encountered cases where the process got stuck in local minima.
I still encountered cases where the process got stuck in local minima.
Well, you think so, but you still haven't shown us proof of that, as far as I can tell. To prove that, you need to demonstrate that your minimization of optimizedPatternIntensity reaches a better solution when a different initial point is used.
To prove that, you need to demonstrate that your minimization of optimizedPatternIntensity reaches a better solution when a different initial point is used.
To prove this, you can test 17 different samples (Previous attached New.zip file, lines 325-391), and you will see that all tests passed except for samples 8 and 9. In my opinion, my code may be getting stuck at a local minimum.
I also tested this by changing the "coordinateSearchRefinement" function to use gradient-based methods, such as fmincon. I was able to estimate my Test 8 (lines 354-355). The final estimation is accurate; however, when I tested the other samples, other samples failed. So, I didn't see significant differences between "coordinateSearchRefinement" and the gradient-based methods. Therefore, I believe the issue may be related to local minima, or may be you have other opinions.
To prove this, you can test 17 different samples (Previous attached New.zip file, lines 325-391), and you will see that all tests passed except for samples 8 and 9. In my opinion, my code may be getting stuck at a local minimum.
I may be missing something, but I don't see what that proves. How was that test initialized? The only thing it seems to demonstrate is that you don't get an accurate result
I may be missing something, but I don't see what that proves. How was that test initialized? The only thing it seems to demonstrate is that you don't get an accurate result.
The initial values for (x, y, theta) in the translation estimation are derived using the "multiresolutionMatchingMultiCandidate" function. This function employs the "evaluateSearchGridSSD" function to estimate these initial values, which in turn calls the "sumSquaredDifference" function.
For the initial value of (roll) in the out-plan estimation, I rely on the "estimateRoll" function. This function is called the "optimizedPatternIntensity" function to obtain the initial value. Note that the initial value from phase 1 (from "multiresolutionMatchingMultiCandidate") is used to estimate the initial value of roll.
You can find this information in phases 1 and 2, specifically on lines 833 and 840.
But the estimation of (x,y, theta, roll) is not what we are talking about. It is the 6DOF pose of the CT subject that you are trying to estimate, given (x,y, theta, roll) which are assumed known. That's what your original post says you are trying to do.
The issues we are discussing involve two different matters. I am trying to explain that my code is getting stuck in a local minimum during some tests, while you are suggesting that the initial values for these iterations might be incorrect.
As I explained before, my code does not require any initial values; it is calculated through a series of phases.
Additionally, I don't need to perform more iterations to map the (x, y, theta, roll) to the 6D position. Instead, I have written a function called "reconstructPose," which relates to the geometry of the mapping output from each detector to the 6D position.
Quoting your original post: "My question is whether it is possible to compute the full 6D transformation (3 translations and 3 rotations) using the information provided (X, Y, Theta, Roll)."
But now you say "I don't need to perform more iterations to map the (x, y, theta, roll) to the 6D position.I have written a function called "reconstructPose," which relates to the geometry of the mapping output from each detector to the 6D position."
If you already have a reconstructPose function which does the pose calculation to your satisfaction, is your original question still alive?
If you already have a reconstructPose function which does the pose calculation to your satisfaction, is your original question still alive? No.
First, I apologize for any confusion caused by my previous question. At that time (posting this question), I was trying to understand the relationship between the outputs of the orthogonal detectors (X, Y, Theta, Roll) and a full 6D transformation (three translations and three rotations).
Second, after further analysis, I developed a function called "reconstructPose", which models the geometric mapping between the outputs of each detector and the corresponding 6D pose.
Third, the attached code (new.zip) is functional and produces correct results in most cases; however, in some scenarios (for example, test 8), the accuracy is not satisfactory (I mentioned my problem, which is getting stuck in a local minimum, of course, in my opinion).

Iniciar sesión para comentar.

Más respuestas (1)

Matt J
Matt J el 9 de En. de 2026
Editada: Matt J el 9 de En. de 2026
Spreaking for myself, the provided Fig1.png does not help me understand how a 6DOF 3D pose maps to a 4DOF projected pose (X, Y, Theta,Roll). In theory, it is possible to get the 6DOF pose from a single projection view (traditional 3D-2D registration algorithms do this all the time), so I don't know why the algorithm can't do better than the 4 parameters (X, Y, Theta,Roll) per projection.
However, here is an idea regardless:
Once you have the (X, Y, Theta,Roll) quadruplet for each projection, you can simulate the projection of a 3D arrangement of fiducials. In other words, start with a hypothetical fiducial phantom whose known 3D fiducial locations are Pstart. Then apply a 3D pose change to Pstart that matches the specific 4 parameters (X, Y, Theta, Roll) seen by one of the detectors. Example, if Theta=45 deg in-plane, then rotate Pstart in 3D by 45 deg about the detector plane axis. Then, project the rotated 3D fiducials onto the detector plane to obtain projected coordinates, pa.
Do this again for the second detector to obtain pb
Once you have pa and pb, you can use the triangulate command in the Computer Vision Toolbox to triangulate the 3D world coordinates of the fiducials that represents what both projection views see simultaneously as a result of the motion. Call this Pmoved. Once you have Pmoved, you can register it to Pstart with 3D-3D point registration techniques to determine the final 6DOF pose transformation.
NOTE: One way to implement the final 3D-to-3D point registration step is with absor, downloadable from,

Preguntada:

el 8 de En. de 2026

Editada:

el 24 de En. de 2026

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by