Attempting to find patterns within my data

Question

0 votos

Hello everyone,

I have an idea I'd like to impliment, but I don't quite know how to.

I have created a script that will designate a 154xm matrix (the more data points I add, the more columns that are created). However, as it stands, I just have a long list of numbers, but it would be impossible to interpert this data once I add more data points (getting 154x100 matrix), so I want a write a program that can analyze the data for me.

It might just be easier for me to demonstrate what I want to do:

Assume I have a 5x3 column. What I want to do is find a diagonal pattern that goes through my matrix at values below 2. So in this example if we scoure each column and the elements in this column, we can determine easily find the diagonal line that goes through each columns values that have a value below 2 (I have zeroed all the values out to demonstrate what I mean)

Now I don't actually want to zero out my actual data (since multiple diaganol lines may exist), but I hope it's clear what I'm trying to do. I have found the pattern I was looking for in my data. Now in a 5x3 matrix, you can easily visualize this by looking at it, but using a 154x100 matrix, this becomes impossible to visualize.

If it helps, this is the script I am currently using to obtain my data:

predictions=load('Predictions2.txt');
experimental=load('Experimental.txt');
x=predictions(:,1);
error=predictions(:,2);
y=experimental(:,1);
z = zeros(1,6);
sizeval = 3; % in this example I am using 3 data points, so I will have 3 columns in my final matrix
b = zeros(sizeval,154);
d=(1:154); %this is simply for plotting purposes and is not used in any calculations 
e=zeros(sizeval,6);
e=zeros(1,6);
for n=1:154 % there are 154 predictions, so I am determining the RMSD of 1 data point (using 6 different parameters) against each prediction
    for j=1:sizeval   % each data point has 6 parameters, here I am creating the loop to calculate RMSDs for multiple data points
        for i=1:6 % I am taking the RMSD between the prediction and experimental values 
            xindex = i+(6*(n-1));
            yindex=i+(6*(j-1));
            z(i)=((x(xindex)-y(yindex)))^2;
            e(1,i)=(z(i)/(error(xindex)^2)); 
            if e(1,i)>1000
                e(1,i)=0;
            end
            
            b(j,n)=sqrt((1/5)*sum(e,2)); %this is the output of my data, creating a 154xm (m being data points) matrix
        end     
    end
    
   
   b'
    
end

With an output like this:

ans =
9481    5.3775    5.1606
4432    3.6738    3.7466
7247    6.6981    6.7029
4045    4.2693    3.9113
3158   10.7013   10.4940
9002    6.2291    5.8123
2395   10.3191   10.1340
6847    9.3292    9.2099
5437    7.5024    7.2936
8558    8.5550    8.3015
6878   11.2286   11.0484
7887    8.6833    8.4203
6863    1.7771    0.9488
4256    4.2317    3.4892
3376    8.3851    8.2385
0820    5.3472    5.0439
3929    1.7875    1.3311
1463    3.4607    2.2643
0488    5.8100    5.6339
    ...

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Sam Mahdi el 2 de Ag. de 2019

Editada: Sam Mahdi el 2 de Ag. de 2019

Abrir en MATLAB Online

To ImageAnalyst:

I didn't mean it's not possible to plot/graph. I meant its impossible to decipher anything from (I can't find the diagonal patters I'm looking for just by glancing at it. The predictions and experimental files are just big txt files of a 924x2 matrix and 600x1 matrix, respectively. It's just a list of numbers, so thought it wouldn't be useful to post.

To Guillaume:

I'm actually looking for diagonals, but I assumed the process for determining either would be the same. And yes, they need to span the entire matrix (i.e. if there are 3 columns, they must go from one end of the matrix to the other, like in the example). However, since the # of columns does not equal the #rows, there can be multiple antidiagonal (or diagonal) lines that span through the columns

Or if this is an easier method, I could set a threshold so my output just gives me the values below that value (i.e. if output>3 output=0), and then simply search for any diagonal (or antidiagonal) that exists (containing non-zero elements)

Image Analyst el 2 de Ag. de 2019

Abrir en MATLAB Online

It's easy to get the 1's in A by doing:

[rows, columns] = find(A);

If you each separate, contiguous grouping of 1's in A, then you can use bwlabel() and/or regionprops() depending on exactly what you want. Post your larger matrix in the text files, if you want an example.

Image Analyst el 3 de Ag. de 2019

Abrir en MATLAB Online

You can certainly threshold

A = b < someValue; % Produces a logical matrix. Or use > someValue.

Then you can skeletonize the lines/regions down to single pixel wide lines with bwmorph()

A = bwmorph(A, 'skel', inf);
imshow(A);

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

the cyclist el 2 de Ag. de 2019

Abrir en MATLAB Online

1 voto

% The original data
A =  [
     8     1     2
     2     4     4
     5     2     1
     6     1     3
     1     1     1];
% Get the dimensions of A
[m,n] = size(A);
% Initialize the pattern matrix as all false. Will fill in valid
% antidiagonals as true.
pattern = false(m,n);
% Find the vector of linear indices that span the first possible
% antidiagonal
dvec = n : m-1 : n + (n-1)*(m-1);
% Work down all antidiagonals, and fill in "true" if the pattern is
% matched, updating the linear indices as we go.
for ni = n : m
    pattern(dvec) = all(A(dvec)<2);
    dvec = dvec + 1;
end

19 comentarios
Mostrar 17 comentarios más antiguos Ocultar 17 comentarios más antiguos

Sam Mahdi el 2 de Ag. de 2019

Editada: Sam Mahdi el 2 de Ag. de 2019

There are some portions of this I don't quite understand.

Pattern I assume is simply creating a matrix of all zeros in the shape of matrix A.

I don't understand what dvec is doing here. If we just plug it in, we should get 3 4 11 (although I don't understand why matlab gives 3 7 11), giving an output of 3 and 10. Which from what I understand, simply creates a 1x2 matrix that is [3 10]. So I don't understand what's going on here.

Finally, the loop appears to be going through the last column, and finding a diagonal that fits the threshold (pattern creates the matrix of zeroes in dimension A, I don't know what dvec does, and the right hand side of the equation seems to search the entire matrix of A looking for a diagonal of values less than 2). Due to the limitations of the matrix, you can only do 3 loops, so I don't understand why n:m (which would be 3:5), and not just 1:3.

Also, this changes all the values of your matrix as ones (which I assume is due to false=0 and true=1 so all this does is mark the values in your matrix that are below your threshold as either true or false). Is there any way to get the same thing, but set true=the actual value instead of 1?

the cyclist el 2 de Ag. de 2019

I'll try to answer your questions.

First, you stated "What I want to do is find a diagonal pattern that goes through my matrix at values below 2." To be clear, that is exactly what my sample code does. Did you run it?

The variable pattern is initialized with all zeros (actually logical false), but ends up with a single anti-diagonal of ones (actually logical true), exactly as you asked for.

The variable dvec is a vector of linear indices. This is a method for indexing into an N-dimensional array with one index instead of N subscripts. (See this documentation for details.)

dvec is initialized with the values of the "top-most" anti-diagonal:

the 3rd element down in col 1
the 2nd element down in col 2
the 1st element in col 3

You'll maybe have to trust me here, but the linear indices of those elements are [3,7,11].

The code checks whether each of those elements meets your stated criterion -- "are these elements all less than 2?" -- and if so, then it fills in pattern accordingly.

Then the loop "moves down", and updates dvec to correspond to the next anti-diagonal:

the 4rd element down in col 1
the 3rd element down in col 2
the 2nd element down in col 3

Trust me again, the linear index of those elements is [4,8,12]. And so on.

The algorithm's final iteration is at

the m'th (last) element down in col 1
the (m-1)th element down in col 2
the (m-2)th element down in col 3

In your example, that is the only anti-diagonal that meets the criterion.

Sam Mahdi el 6 de Ag. de 2019

Editada: Sam Mahdi el 6 de Ag. de 2019

Abrir en MATLAB Online

Oooh okay, I believe I understand now. So if we take the example of a diagonal.

dvec = 1 : m+1 : 1 + (n-1)*(m+1)

and assume my matrix A is a 154x3, what I get is 1:155: 311

Which then gives [1 156 311], which would correlate to A(1,1), A (2,2) and A(3,3), and adding more columns adds another 155 in our array (i.e. 311+155 gives 466 which would then correlate to A(4,4)), and then by adding one each iteration, you then look at the next row down A(2,1) A(3,2), etc.

One final question then,

pattern(dvec) = all(A(dvec)<2)

I don't quite understand this final piece entirely. I'll start with the righthand side first.

A(dvec)<2 is looking at the diagonal values, and filtering if they are less than 2.

I don't exactly understand what "all" is doing here (since based on the way dvec is set up and loop, you will look at every single value)

pattern(dvec) is simply looking at the diagonal, using the matrix pattern (which is a matrix of size A of all zeros).

I don't understand where here we are saying "if dvec values, in matrix A, are less than 2, than true".

the cyclist el 7 de Ag. de 2019

What Guillaume said is all true.

"How do I learn a programming language (or programming in general) really well?" is a huge topic. In the case of MATLAB there are very good beginner-level materials out there, e.g. the MATLAB Onramp.

Things that I think help a person come up to speed more quickly:

Having real-world problems that one is trying solve. In my experience, nothing motivates one to learn more than the need for a solution.
Trying to understand the core concepts of the language. For example, understanding the power of vectorization is key to using MATLAB well.
Not just blindly copying & pasting code (from here, Stack Overflow, etc), but instead trying to really understand what the algorithms are doing. [You seem to be trying that!] Remembering those techniques, for next time, helps you build up that "bag of tricks" for similar problems.
Really really trying hard to solve problems yourself before asking for help. In my experience, I remember better when I figured it out for myself. (There is of course a balance here, between the value of figuring it out, and the frustration of pounding your head against a wall.)

In the end, it really is the experience of doing, over and over again, that builds that expertise.

Sam Mahdi el 7 de Ag. de 2019

To Guillaume:

No, sorry I was trying to understand the cyclists code first before I moved on to yours.

But thank you guys for your help and feedback. I'm currently in a Machine learning class that uses Matlab, so sorta learning linear algebra and all the things you can do with matrices and vectors/arrays as I go, as well as trying to apply it to what I'm doing (like my job above).

Iniciar sesión para comentar.

Answer 2

Guillaume el 2 de Ag. de 2019

Editada: Guillaume el 2 de Ag. de 2019

Abrir en MATLAB Online

0 votos

%demo data:
A = logical([
1 0
0 1
0 1
1 0
0 1
1 0
1 1])

Finding the start index (in the first column) of dagonals:

indices = hankel(1:size(A, 1)+1-size(A, 2), size(A, 1)+1-size(A, 2):size(A, 1)) + (0:size(A, 2)-1) * size(A, 1);
isdiago = all(A(indices), 2);
diag_idx = indices(isdiago)

Finding the start index in the first column of antidiagonals:

indices = toeplitz(size(A, 2):size(A, 1), size(A, 2):-1:1) + (0:size(A, 2)-1) * size(A, 1)
isantidiag = all(A(indices), 2);
antidiag_idx = indices(isantidiag)

If you want the indices in all the columns, just repmat the isdiago, isantidiag across all columns of the respective indices.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Attempting to find patterns within my data

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

19 comentarios
Mostrar 17 comentarios más antiguos Ocultar 17 comentarios más antiguos

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

Attempting to find patterns within my data

5 comentarios Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

Respuesta aceptada

19 comentarios Mostrar 17 comentarios más antiguos Ocultar 17 comentarios más antiguos

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguos Ocultar 3 comentarios más antiguos

19 comentarios
Mostrar 17 comentarios más antiguos Ocultar 17 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos