Attempting to find patterns within my data

Hello everyone,
I have an idea I'd like to impliment, but I don't quite know how to.
I have created a script that will designate a 154xm matrix (the more data points I add, the more columns that are created). However, as it stands, I just have a long list of numbers, but it would be impossible to interpert this data once I add more data points (getting 154x100 matrix), so I want a write a program that can analyze the data for me.
It might just be easier for me to demonstrate what I want to do:
A =
8 1 2
2 4 4
5 2 1
6 1 3
1 1 1
Assume I have a 5x3 column. What I want to do is find a diagonal pattern that goes through my matrix at values below 2. So in this example if we scoure each column and the elements in this column, we can determine easily find the diagonal line that goes through each columns values that have a value below 2 (I have zeroed all the values out to demonstrate what I mean)
A =
0 0 0
0 0 0
0 0 1
0 1 0
1 0 0
Now I don't actually want to zero out my actual data (since multiple diaganol lines may exist), but I hope it's clear what I'm trying to do. I have found the pattern I was looking for in my data. Now in a 5x3 matrix, you can easily visualize this by looking at it, but using a 154x100 matrix, this becomes impossible to visualize.
If it helps, this is the script I am currently using to obtain my data:
predictions=load('Predictions2.txt');
experimental=load('Experimental.txt');
x=predictions(:,1);
error=predictions(:,2);
y=experimental(:,1);
z = zeros(1,6);
sizeval = 3; % in this example I am using 3 data points, so I will have 3 columns in my final matrix
b = zeros(sizeval,154);
d=(1:154); %this is simply for plotting purposes and is not used in any calculations
e=zeros(sizeval,6);
e=zeros(1,6);
for n=1:154 % there are 154 predictions, so I am determining the RMSD of 1 data point (using 6 different parameters) against each prediction
for j=1:sizeval % each data point has 6 parameters, here I am creating the loop to calculate RMSDs for multiple data points
for i=1:6 % I am taking the RMSD between the prediction and experimental values
xindex = i+(6*(n-1));
yindex=i+(6*(j-1));
z(i)=((x(xindex)-y(yindex)))^2;
e(1,i)=(z(i)/(error(xindex)^2));
if e(1,i)>1000
e(1,i)=0;
end
b(j,n)=sqrt((1/5)*sum(e,2)); %this is the output of my data, creating a 154xm (m being data points) matrix
end
end
b'
end
With an output like this:
ans =
3.9481 5.3775 5.1606
4.4432 3.6738 3.7466
2.7247 6.6981 6.7029
5.4045 4.2693 3.9113
1.3158 10.7013 10.4940
7.9002 6.2291 5.8123
2.2395 10.3191 10.1340
2.6847 9.3292 9.2099
7.5437 7.5024 7.2936
5.8558 8.5550 8.3015
1.6878 11.2286 11.0484
6.7887 8.6833 8.4203
12.6863 1.7771 0.9488
13.4256 4.2317 3.4892
2.3376 8.3851 8.2385
5.0820 5.3472 5.0439
10.3929 1.7875 1.3311
4.1463 3.4607 2.2643
6.0488 5.8100 5.6339
...

5 comentarios

Image Analyst
Image Analyst el 2 de Ag. de 2019
Why is a 154x100 matrix impossible to visualize? Simply use imshow(). It's a pretty tiny matrix so you might want to use the 'InitialMagnification' option with imshow().
Unfortunately you forgot to attach 'Predictions2.txt' and 'Experimental.txt'. Waiting for you to attach them...
Guillaume
Guillaume el 2 de Ag. de 2019
Editada: Guillaume el 2 de Ag. de 2019
In your example, you're looking at antidiagonal not diagonal. Are you just looking for antidiagonal, diagonal, or both?
Do the (anti)diagnoal you want have to span the width of the whole matrix?
Sam Mahdi
Sam Mahdi el 2 de Ag. de 2019
Editada: Sam Mahdi el 2 de Ag. de 2019
To ImageAnalyst:
I didn't mean it's not possible to plot/graph. I meant its impossible to decipher anything from (I can't find the diagonal patters I'm looking for just by glancing at it. The predictions and experimental files are just big txt files of a 924x2 matrix and 600x1 matrix, respectively. It's just a list of numbers, so thought it wouldn't be useful to post.
To Guillaume:
I'm actually looking for diagonals, but I assumed the process for determining either would be the same. And yes, they need to span the entire matrix (i.e. if there are 3 columns, they must go from one end of the matrix to the other, like in the example). However, since the # of columns does not equal the #rows, there can be multiple antidiagonal (or diagonal) lines that span through the columns
A =
0 0 1
0 1 0
1 0 1
0 1 0
1 0 0
Or if this is an easier method, I could set a threshold so my output just gives me the values below that value (i.e. if output>3 output=0), and then simply search for any diagonal (or antidiagonal) that exists (containing non-zero elements)
It's easy to get the 1's in A by doing:
[rows, columns] = find(A);
If you each separate, contiguous grouping of 1's in A, then you can use bwlabel() and/or regionprops() depending on exactly what you want. Post your larger matrix in the text files, if you want an example.
You can certainly threshold
A = b < someValue; % Produces a logical matrix. Or use > someValue.
Then you can skeletonize the lines/regions down to single pixel wide lines with bwmorph()
A = bwmorph(A, 'skel', inf);
imshow(A);

Iniciar sesión para comentar.

 Respuesta aceptada

the cyclist
the cyclist el 2 de Ag. de 2019
% The original data
A = [
8 1 2
2 4 4
5 2 1
6 1 3
1 1 1];
% Get the dimensions of A
[m,n] = size(A);
% Initialize the pattern matrix as all false. Will fill in valid
% antidiagonals as true.
pattern = false(m,n);
% Find the vector of linear indices that span the first possible
% antidiagonal
dvec = n : m-1 : n + (n-1)*(m-1);
% Work down all antidiagonals, and fill in "true" if the pattern is
% matched, updating the linear indices as we go.
for ni = n : m
pattern(dvec) = all(A(dvec)<2);
dvec = dvec + 1;
end

19 comentarios

Sam Mahdi
Sam Mahdi el 2 de Ag. de 2019
Editada: Sam Mahdi el 2 de Ag. de 2019
There are some portions of this I don't quite understand.
Pattern I assume is simply creating a matrix of all zeros in the shape of matrix A.
I don't understand what dvec is doing here. If we just plug it in, we should get 3 4 11 (although I don't understand why matlab gives 3 7 11), giving an output of 3 and 10. Which from what I understand, simply creates a 1x2 matrix that is [3 10]. So I don't understand what's going on here.
Finally, the loop appears to be going through the last column, and finding a diagonal that fits the threshold (pattern creates the matrix of zeroes in dimension A, I don't know what dvec does, and the right hand side of the equation seems to search the entire matrix of A looking for a diagonal of values less than 2). Due to the limitations of the matrix, you can only do 3 loops, so I don't understand why n:m (which would be 3:5), and not just 1:3.
Also, this changes all the values of your matrix as ones (which I assume is due to false=0 and true=1 so all this does is mark the values in your matrix that are below your threshold as either true or false). Is there any way to get the same thing, but set true=the actual value instead of 1?
the cyclist
the cyclist el 2 de Ag. de 2019
I'll try to answer your questions.
First, you stated "What I want to do is find a diagonal pattern that goes through my matrix at values below 2." To be clear, that is exactly what my sample code does. Did you run it?
The variable pattern is initialized with all zeros (actually logical false), but ends up with a single anti-diagonal of ones (actually logical true), exactly as you asked for.
The variable dvec is a vector of linear indices. This is a method for indexing into an N-dimensional array with one index instead of N subscripts. (See this documentation for details.)
dvec is initialized with the values of the "top-most" anti-diagonal:
  • the 3rd element down in col 1
  • the 2nd element down in col 2
  • the 1st element in col 3
You'll maybe have to trust me here, but the linear indices of those elements are [3,7,11].
The code checks whether each of those elements meets your stated criterion -- "are these elements all less than 2?" -- and if so, then it fills in pattern accordingly.
Then the loop "moves down", and updates dvec to correspond to the next anti-diagonal:
  • the 4rd element down in col 1
  • the 3rd element down in col 2
  • the 2nd element down in col 3
Trust me again, the linear index of those elements is [4,8,12]. And so on.
The algorithm's final iteration is at
  • the m'th (last) element down in col 1
  • the (m-1)th element down in col 2
  • the (m-2)th element down in col 3
In your example, that is the only anti-diagonal that meets the criterion.
Sam Mahdi
Sam Mahdi el 2 de Ag. de 2019
Editada: Sam Mahdi el 2 de Ag. de 2019
Thank you for the explanation. The code worked perfectly actually. The problem arises that this determines the antidiagonals, I tried to change it around to determine the diagonals now, but because I didn't quite understand what dvec was doing (or where those values came from), I was unable to modify it.
I.E. So now I want dvec to focus on the 1st element in column 1, 2nd element in column 2, etc.
the cyclist
the cyclist el 2 de Ag. de 2019
Editada: the cyclist el 2 de Ag. de 2019
I realize from your prior comment that you did not understand the specification of dvec, because you don't understand some basic MATLAB syntax.
This code
x = 1 : 2 : 9
will give the vector
x = [1 3 5 7 9]
It is saying, "start at 1, and go by steps of 2 until you get to 9".
See this documentation for details.
the cyclist
the cyclist el 2 de Ag. de 2019
Editada: the cyclist el 2 de Ag. de 2019
This modification will find the diagonals:
% Find the vector of linear indices that span the first possible
% diagonal
dvec = 1 : m+1 : 1 + (n-1)*(m+1);
% Work down all diagonals, and fill in "true" if the pattern is
% matched, updating the linear indices as we go.
for ni = 1 : m-n+1
pattern(dvec) = all(A(dvec)<2);
dvec = dvec + 1;
end
Sam Mahdi
Sam Mahdi el 2 de Ag. de 2019
I understand how is creates a matrix, I just don't understand what dvecs matrix is doing. I.E. dvec= 3 7 11
means dvec= 3 10
I don't understand what a 1x2 matrix of [3 10] does here
Guillaume
Guillaume el 2 de Ag. de 2019
Note that it can be done without loops (not that there is anything wrong with loops). See other answer.
the cyclist
the cyclist el 2 de Ag. de 2019
You are correct that dvec = [3 7 11], at least for the first iteration. I don't understand why you think this means that dvec is also [3 10]. That is not correct.
You really need to understand how linear indexing works, to be able to understand how my algorithm works. I posted a link to that documentation. I don't think I can explain it any better than that.
I understand that setting something to 1:1:5, makes an array [1 2 3 4 5] (i.e. an array of values from 1 to 5 in increments of 1). Therefore, if you set something to 3:7:11, then you get an array of [3 10].
I.E.
B=3:7:11
B =
3 10
Finally, I also don't understand why it's 3:7:11 and not 3:4:11.
d
>> dvec = n : m-1 : n + (n-1)*(m-1)
dvec =
3 7 11
>> n
n =
3
>> m-1
ans =
4
>> n + (n-1)*(m-1)
ans =
11
On these facts we seem to agree:
  • The value of n is 3
  • The value of m-1 is 4
  • The value of n + (n-1)*(m-1) is 11
So far, so good.
So, the specification of the variable dvec is
dvec = 3 : 4 : 11
which means "start with 3, increment by steps of 4, until you get to 11". So, that means you get a vector with elements 3,7,11. MATLAB notation for the vector with elements 3,7,11 is
dvec = [3 7 11]
Note that this is DIFFERENT from writing dvec = 3:7:11.
To be very clear on the MATLAB syntax:
[3 7 11] and 3:7:11 are NOT the same!
These are different syntax, and mean different things in MATLAB. Does that help?
Sam Mahdi
Sam Mahdi el 6 de Ag. de 2019
Oooh, yes I understand now.
So far from what I understand:
We have created a matrix (pattern) of all zeros (false) in the size of matrix A. We then create an array (dvec), this then is used in the matrix A and finds every values less than 2, and is set equal to another matrix (pattern) filling in the values as true if the values in matrix A<2 in a diagonal.
The only thing I don't understand is, how does an array of [3 7 11] mean "search the first column find if value less than 2, then search the 2nd column but one row down and find if value less than 2, etc.". I understand dvec+1 does the same diagonal search one row down.
The array [3 7 11] isn't really "searching". It specifies the location of the top (anti)diagonal.
Take a look at this 5x3 array.
[1 6 11
2 7 12
3 8 13
4 9 14
5 10 15]
Those numbers, in that array, are the linear indices of the array. The top-left location is "1", then we work our way down the first column, then start again at the top of the next column, and so on.
Now, see where [3 7 11] are? They are the top (anti)diagonal! Those indices are specifying the locations to check your criterion. Are the elements of A at those locations all less than 2?
In the next iteration of the loop, we check [4 8 12]. That's the next (anti)diagonal down. And next we get [5 9 13], which is the bottom, so we are done.
As I have mentioned, you need to understand how this linear indexing works, to understand the algorithm. Here is the link (again) for details.
Sam Mahdi
Sam Mahdi el 6 de Ag. de 2019
Editada: Sam Mahdi el 6 de Ag. de 2019
Oooh okay, I believe I understand now. So if we take the example of a diagonal.
dvec = 1 : m+1 : 1 + (n-1)*(m+1)
and assume my matrix A is a 154x3, what I get is 1:155: 311
Which then gives [1 156 311], which would correlate to A(1,1), A (2,2) and A(3,3), and adding more columns adds another 155 in our array (i.e. 311+155 gives 466 which would then correlate to A(4,4)), and then by adding one each iteration, you then look at the next row down A(2,1) A(3,2), etc.
One final question then,
pattern(dvec) = all(A(dvec)<2)
I don't quite understand this final piece entirely. I'll start with the righthand side first.
A(dvec)<2 is looking at the diagonal values, and filtering if they are less than 2.
I don't exactly understand what "all" is doing here (since based on the way dvec is set up and loop, you will look at every single value)
pattern(dvec) is simply looking at the diagonal, using the matrix pattern (which is a matrix of size A of all zeros).
I don't understand where here we are saying "if dvec values, in matrix A, are less than 2, than true".
You are slipping into your other error again, regarding the syntactical difference between
v = xi : dx : xf
and
v = [xi dx xf]
In your example, m = 154 and n = 3. Therefore
dvec = 1 : 155 : 311
Remember that this does NOT mean dvec = [1 155 311]. It means start at 1, increment by 155, until you get to 311. So, actually,
dvec = [1 156 311]
Regarding your "final question":
When I wrote
pattern(dvec) = all(A(dvec)<2);
that was a shortcut. It might have been clearer if I had instead written
if all(A(dvec)<2)
pattern(dvec) = true;
end
The if statement is checking your criterion: "Are all the values of A on this diagonal less than 2?" If so, then fill in that same diagonal of pattern with "true".
(The loop itself does not look at each value of the diagonal separately. That loop just identifies the entire diagonal vector.)
But note that my shortcut gives the same result, in slightly slicker (and probably slightly quicker) fashion.
Sam Mahdi
Sam Mahdi el 7 de Ag. de 2019
Thank you, now I understand what's going on.
On a side note, I'm just curious as to how one comes up with these things. It would've personally taken me forever to come up with this code (hell, took me a few days to even understand it). I'm looking to get into programming/coding, but I find that while I have all these ideas and the logic behind them, I don't quite know how to impliment them (I.e. I can come up with the idea of creating a false matrix, then determine which in a diagonal are true, but it would've taken me forever to come up with the dvec equation you had written).
Guillaume
Guillaume el 7 de Ag. de 2019
It all comes down to practice and experience. It's clear that solving your problem boils down to shifting columns against each other. You could have done that with explicit loops over each columns and rows, or work out the index shift of the columns all at once as the cyclist or even compute the whole shift for everything at once, as I've done in my answer. Have you even looked at it?
the cyclist
the cyclist el 7 de Ag. de 2019
What Guillaume said is all true.
"How do I learn a programming language (or programming in general) really well?" is a huge topic. In the case of MATLAB there are very good beginner-level materials out there, e.g. the MATLAB Onramp.
Things that I think help a person come up to speed more quickly:
  • Having real-world problems that one is trying solve. In my experience, nothing motivates one to learn more than the need for a solution.
  • Trying to understand the core concepts of the language. For example, understanding the power of vectorization is key to using MATLAB well.
  • Not just blindly copying & pasting code (from here, Stack Overflow, etc), but instead trying to really understand what the algorithms are doing. [You seem to be trying that!] Remembering those techniques, for next time, helps you build up that "bag of tricks" for similar problems.
  • Really really trying hard to solve problems yourself before asking for help. In my experience, I remember better when I figured it out for myself. (There is of course a balance here, between the value of figuring it out, and the frustration of pounding your head against a wall.)
In the end, it really is the experience of doing, over and over again, that builds that expertise.
Sam Mahdi
Sam Mahdi el 7 de Ag. de 2019
To Guillaume:
No, sorry I was trying to understand the cyclists code first before I moved on to yours.
But thank you guys for your help and feedback. I'm currently in a Machine learning class that uses Matlab, so sorta learning linear algebra and all the things you can do with matrices and vectors/arrays as I go, as well as trying to apply it to what I'm doing (like my job above).

Iniciar sesión para comentar.

Más respuestas (1)

Guillaume
Guillaume el 2 de Ag. de 2019
Editada: Guillaume el 2 de Ag. de 2019
%demo data:
A = logical([
0 1 0
0 0 1
1 0 1
0 1 0
1 0 1
0 1 0
0 1 1])
Finding the start index (in the first column) of dagonals:
indices = hankel(1:size(A, 1)+1-size(A, 2), size(A, 1)+1-size(A, 2):size(A, 1)) + (0:size(A, 2)-1) * size(A, 1);
isdiago = all(A(indices), 2);
diag_idx = indices(isdiago)
Finding the start index in the first column of antidiagonals:
indices = toeplitz(size(A, 2):size(A, 1), size(A, 2):-1:1) + (0:size(A, 2)-1) * size(A, 1)
isantidiag = all(A(indices), 2);
antidiag_idx = indices(isantidiag)
If you want the indices in all the columns, just repmat the isdiago, isantidiag across all columns of the respective indices.

Categorías

Más información sobre Creating and Concatenating Matrices en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 2 de Ag. de 2019

Comentada:

el 7 de Ag. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by