File Exchange

image thumbnail

clusterData

version 1.1.0.1 (3.43 KB) by Brett Shoelson
Clusters an MxN array of data into an unspecified number (P) of bins.

13 Downloads

Updated 01 Sep 2016

View License

No a priori knowledge of the number of bins, or the distance between bins, is required. This approach relies on the relative difference between (sorted) elements of the data, and works well when the difference between clusters is bigger than the difference between elements within a cluster.

SYNTAX:
CLUSTERS = clusterData(DATA);
Operates column-by-column. An optional input allows you to specify the sensitivity of each columnwise clustering. Additional outputs also specify the indices of the cluster each row of data, and the bounds used to separate them.

Each column may have a different interpretation. For instance, an Mx4 array of data may represent x-data in the first column, y- in the second, z- in the third, and t- in the fourth. Returns a Px1 cell array, CLUSTERS, specifying the data points in each of the P clusters detected.

The final clustering utilizes all columns.

NOTE: This submission incorporates, expands, and replaces my earlier submission ezCluster.

Cite As

Brett Shoelson (2020). clusterData (https://www.mathworks.com/matlabcentral/fileexchange/35014-clusterdata), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (22)

Royi Avital

I think you can generalize it by creating a variation which instead of the data it uses Pair Wise Distance matrix.
Then you method will be similar to greedy solution for the K-Medoids problem.

@sayanti:
clusterData works on vectors, not on matrices. Well, sort of. If you input a non-vector matrix, it clusters each column, and then clusters based on the columnwise clustering. What do your data represent?
Brett

how I can cluster a dataset of 1700X400 matrix.?
Shall I directly run this code upon my dataset?

Fritz

@nadjoua:
numel(clusters)?

nadjoua

please can you indicate me how can i obtain the number of clusters?
thanks

tsan toso

Ah thanks for the catch Brett, just got around to run the code.

Hi Tsan,

I didn’t spend a lot of time trying to understand your data, but I did manage to cluster them in less than 1 second, using clusterData. I noticed that your column 2 isn’t fully filled out. I think that’s why you’re seeing the long delay when you include column 2. If you were to exclude the pairs with missing values, it would process a lot faster. (In fact, I’m not sure how I treated missing variables. Maybe as NaNs.)

Let me know if the clustering you get with

[clusters,clusterInds,clusterBounds] = clusterData(Binningbydensity(1:3216,:));

works for you. (Those are the rows without missing column-two values.)

Cheers,
Brett

@tsan: Hi Tsan,
It's difficult to comment without seeing your data, but it sounds like you could just create and analyze a vector of densities. ClusterData will spit out the indices for the groupings. (You may need to tweak the sensitivity.)
Cheers,
Brett

tsan toso

Hi Brett,

Great code, I got a question on how I could use the code for my purposes:

How would you recommend I could use the code if I am looking to bin a sample of data together based on its density (Number of Occurrence/ Length of edge). And the length of the edges are determined by if the adjacent data groups have similar density. (Similar density are grouped together, but if the neighboring bin is 40% more or less in density, it would require another bin).

It seems like what your code is doing is grouping data based on how close they are to each other.

Thanks.

Hoi Wong

Haha. My data set is supposed to give me an array of numbers, but sometimes I got a singleton. That's how I found out. By the way, excellent submission!

Hoi Wong

Xiong

thank you for your submission!

@Hoi,
Hmmm. Well, that's clearly a "bug" in the sense that I could have dealt with that case more gracefully, but then--well, let's just say that I never anticipated that anyone would try to cluster a single scalar. :)

Hoi Wong

It seems like the program get stuck (running forever) when I try to cluster a singleton, say clusterData(3).

Han, did you find some problem with the submission that led you to rate this so poorly? Do you have any comments to share that might help me understand why it merits a two-star rating?
Thanks,
Brett

Han

Deanna

Joel

Excellent submission

Venkat R

Very cool submission. I was searching different options to kind 'k' automatically in the k-means. This submission does it nicely.

PLEASE NOTE that this code uses tildes for argument placeholders. As such, it will not work without modification on releases prior to R2009b. Feel free to edit the code, or upgrade to a newer MATLAB!!!

Updates

1.1.0.1

Updated license

1.1.0.0

Modified the help to correct a doc bug. Higher sensitivity results in fewer clusters, not more. (No code change.)

MATLAB Release Compatibility
Created with R2011b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Acknowledgements

Inspired: Data clustering using Bat Algorithm