International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 1

ISSN 2229-5518

Implementation of Possibilistic Fuzzy C-Means Clustering Algorithm in Matlab

Neelam Kumari, Bhawna Sharma, Dr. Deepti Gaur
Dept.of Computer Science &IT,ITMUniversity,Gurgaon, INDIA. singh.neelam693@gmail.com bhawnash.6@gmail.com deepti_k_Gaur@yahoo.com

Abstract—Clustering means classifying the given observation data sets into subgroups or clusters. It is a process of grouping data objects into disjointed clusters so that the data in the same cluster are similar, yet data belonging to different clusters are different. Different fuzzy data clustering algorithms exist such as Fuzzy C- Means( FCM), Possibilistic C-Means(PCM), Fuzzy Possibilistic C-Means(FPCM) and Possibilistic Fuzzy C- Means(PFCM). In this paper we present the implementation of PFCM algorithm in Matlab and we test the algorithm on two different data sets.

Index Terms— Data clustering , Clustering algorithms, K-Means, FCM, PCM, FPCM, PFCM.

——————————  ——————————

I.INTRODUCTION

There are various algorithms of data clustering, every algorithm has its own advantages and disadvantages. Such as K-Means algorithm does not allow overlapping of clusters which is a major disadvantage of this algorithm. The FCM has problems dealing with noise and outliers. The
PCM has the problem of coincident clusters and the FPCM

{ ( )

∑ ∑( ) ‖ ‖

∑ ∑( ) } ( )

has difficulties when the data set is big because the typicality values will be very small. All the apparent problem of FPCM is that it imposes a constraint on the typicality values (sum of the typicalities over all data points to a particular cluster is 1).We relax the constraint on the typicality values but retain the column constraint on the membership values. The PFCM is a good clustering algorithm to perform classification tests because it possesses capabilities to give more importance to typicalities or membership values. PFCM is a hybridization of PCM and FCM that often avoids various problems of PCM,FCM and FPCM.
II. PFCM CLUSTERING ALGORITHM PFCM lead to optimize the following objective function:
subject to the constraints∑
Here a , b , m , and
is the objective function. U is the partition matrix. T is the typicality matrix. V is a vector of cluster centers, X is a set of all data points, x represents a data point, n is the number of data points and c is the number of cluster centers which are
described by s coordinates.‖ ‖ is any norm
used to calculate the distance between cluster
center and data set(also represented by ). Here we are using Euclidean Distance(calculated by using Equation 4).
The constants a and b define the relative importance of fuzzy membership and typicality values in the objective function. Note that, in Eq. 1 has the same meaning of membership as that in FCM. Similarly, has the same interpretation of typicality as in PCM. If we increase the importance (weight) of membership then that necessarily forces us to reduce the importance of typicality by the same amount. Also, we will see later that the optimal typicality values depend on the magnitude of b. So by constraining a b 1, we lose modeling

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 2

ISSN 2229-5518

flexibility.PFCM uses the objective functions of
PCM and FCM given in Eq.2 and Eq.3 respectively.
{ ( )
∑ ∑( ) ‖
‖ } ( )

{ ( ) ∑ ∑( ) ‖ ‖

∑ ∑( ) } ( )

[∑( ) ] ( ) If b , and for all i , then Eq.1 reduces to the FCM optimization problem in Eq.2; while converts it to the usual PCM model in Eq.3. Later, we will see that when b even if we do not set

for all i ,Equation 1 implicitly becomes equivalent to the FCM model. Like FPCM, under the usual conditions placed on c-means optimization problems, we get the first-order necessary conditions for extrema of .

( )

.Equation 7 shows that if we use a high value of b compared to a , then the centroids will be more influenced by the typicality values than the membership values. On the other hand, if we use a higher value of a then the centroids will be more influenced by the membership values. Thus, to reduce the effect of outliers, we should use a higher value for b than a. Similar effects can also be
obtained by controlling the choice of  . For
example, if we use a large value of m and a smaller value for  , then the effect of outliers on the
centroids will be reduced. However, a very large value of m will reduce the effect of memberships on the prototypes and the model will behave more like the PCM model.

III. EXPERIMENTAL ANALYSIS

We run the algorithm for the two data sets. The Matlab code for following six files for Possiblistic Fuzzy C-Means clustering algorithm are as follows:

pfcm.m

function[center,U,T,obj_fcn]=

pfcm(data,cluster_n,options)

(∑ ( )

( )

) ( )

( ) ( )

%pfcmDatasetclusteringusingpossiblistic fuzzy c-
%meansclustering.
[CENTER,U,OBJ_FCN]=FCM(DATA,N_CLUST ER) %findsN_CLUSTER
%number of clusters in the data set DATA.
%DATA is size M-by-N, where M is the number

∑ ( )

(7)

%of data points and N is the number of coordinates
%for each datapoint. The coordinates for each
%cluster center are returned in the rows of the
%matrix CENTER. The membership function
%matrix U contains the grade of membership of
PFCM behaves like FCM as the exponents grow without bound. That is, irrespective of the values of the constants a and b, all c centroids approach the
%each data point in each cluster. The values 0 and
%1 indicate no membership and full membership
%respectively. Grades between 0 and 1 indicate
%that the data point has partial membership in a
overall (grand) mean as

m   and  

%cluster. At each iteration, a possiblistic degree

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 3

ISSN 2229-5518

%same as in pcm T objective function is
%minimized to find the best location for the
%clusters and its values are returned in OBJ_FCN.

%[CENTER,..]=PFCM(DATA,N_CLUSTER,OPTIO

%NS)specifies a vector of options fort he

%clustering process:

% OPTIONS(1): exponent for the matrixUdefault:

%2.0)

% OPTIONS(2): maximum number of % iterations

%(default: 100)

% OPTIONS(3): minimum amount of

% improvement (default: 1e-5)

% OPTIONS(4): info display during

% iteration (default: 1)

% OPTIONS(5): user defind constant a

%(default: 1)

% OPTIONS(6): user defined constant b %should be greater than a (default:4)

% OPTIONS(7): user defined constant%nc

(default:nc=2.0)

% The clustering process stops when the maximum

% number of iterations is reached, or when

%objective function improvement between two

% consecutiveiterations is less than the minimum

% amount specified. Use NaN to select the default

% value.See also pinitf, tinitf, pdistfcm,pstepfcm ifnargin ~= 2 &nargin ~= 3,

error('Too many or too few input arguments!');end data_n = size(data, 1);
in_n = size(data, 2);

% Change the following to set default options default_options = [2;100;1e-;1;1;4;2];

ifnargin == 2,options = default_options;

else

% If "options" is not fully specified, pad it with

% default values.

if length(options) < 7, tmp = default_options;

tmp(1:length(options)) = options;options = tmp;

end

% If some entries of "options" are nan's, replace them

% with defaults.

nan_index = find(isnan(options)==1); options(nan_index) = default_options(nan_index); if options(1) <= 1,
error('The exponent should be greater than 1!');

end end

expo = options(1); max_iter = options(2); min_impro = options(3); display = options(4); a=options(5); b=options(6);
nc =options(7); ni=input('enter value of ni'); obj_fcn = zeros(max_iter, 1);
% Array for objective function center = input('initialize the centre');

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 4

ISSN 2229-5518

U = pinitf(cluster_n, data_n); % Initial fuzzy partition
T = tinitf(cluster_n,data_n); % Initial typicality matrix

for i = 1:max_iter,

[U,T,center, obj_fcn(i)] =
pstepfcm(data,center,U,T,cluster_n, expo,a,b,nc,ni);

if display,

fprintf('Iteration count = %d, obj. fcn = %f\n', i, obj_fcn(i));

end

% check termination condition if i > 1,

if abs(obj_fcn(i) - obj_fcn(i-1)) <min_impro,break;

end,endenditer_n = i; % Actual number of terations obj_fcn(iter_n+1:max_iter) = [];

tinitf.m function T = pinitf(cluster_n, data_n) T = rand(cluster_n, data_n);

col_sum = sum(T); u=col_sum(ones(cluster_n, 1), :); T = T./u;

pinitf.m function U = pinitf(cluster_n, data_n) U = rand(cluster_n, data_n);

col_sum = sum(U); u=col_sum(ones(cluster_n, 1), :); U = U./u;

pstepfcm.m

function [U_new,T_new,center_new,obj_fcn]=

pstepfcm(data,center,U,T,cluster_n, expo,a,b,nc,ni)

% PSTEPFCM One step in possiblisticfuzzy c-mean

% clusteringperforms one iteration of pfcm

% clustering, where DATA: matrix of data to be

% clustered. (Each row is a data point.)U:partition

% matrix. (U(i,j) is the MF value of data j in cluster

% j.) CLUSTER_N: number of clusters.EXPO:

% exponent (> 1) for the partition matrix.U_NEW:

%new partition matrix of fcmT_new: new partition

%matrix of pcmCENTER: center of clusters. (Each

% row is a center.) obj_fcn: objective function for

% partition U and T. Note that the situation of

% "singularity" (one of the data points isexactly the

% same as one of the cluster centers)

% isnotchecked.However, it hardly occurs in

% practice.

mf = U.^expo; % MF matrix after exponential

%modification tf=T.^nc;tfo=(1-T).^nc;

center_new = (a.*mf+b.*tf)*data./((ones(size(data,
2), 1)*sum(a.*mf'+b.*tf'))')
dist = distfcm(center, data)

% fill the distance matrix obj_fcn=sum(sum((dist.^2).*(a.*mf+b.*tf)))+sum(ni

*sum(tfo))

% objective function

tmp = dist.^(-2/(expo-1)); U_new=tmp./(ones(cluster_n,1)*sum(tmp)); tmpt=((b/ni)*dist.^2).^(1/(nc-1));
T_new = 1./(1+tmpt);

pdistfcm.m

function out = pdistfcm(center, data)

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 5

ISSN 2229-5518

%PDISTFCM Distance measure in possiblistic fuzzy

%c-mean clustering.OUT = PDISTFCM(CENTER,

%DATA) calculates the Euclidean distancebetween

%each row in CENTER and each row in DATA, and

%returns a distance matrix OUT of size M by N,

%where M and N are row dimensions of CENTER

%and DATA, respectively, and OUT(I, J) isthe

%distance between CENTER(I,:) and DATA(J,:). out = zeros(size(center, 1), size(data, 1));

% fill the output matrix

if size(center, 2) > 1,

index2 = find(U(2,:) == maxU);
line(dataset1(index1,1),dataset1(index1,2),'marker','*'
,'color','g);
line(dataset1(index2,1),dataset1(index2,2),'marker','*'
,'color','r');

%Plot the cluster centers plot([center([1 2],1)],[center([1

2],2)],'*','color','k')

holdoff;

for k = 1:size(center, 1),

out(k, :) = sqrt(sum(((data-ones(size(data,
1),1)*center(k,:)).^2)'));

end else

for k = 1:size(center, 1),

out(k, :) = abs(center(k)-data)';

end

dataset1.dat(data set-1)

end

%see also

centpfcm.m

The file dataset1.dat stores the whole data set on which PFCM clustering algorithm is to be applied.The result obtained after calling pfcm.m
%pfcm,dataset1,pinitf,tinitf,pdistfcm,pstepfcm. [center,U,T,obj_fcn] = pfcm(dataset1,2); plot(dataset1(:,1), dataset1(:,2),'o');
holdon;
maxU = max(U);

%Find data points with highest grade of membership

%in cluster1

index1 = find(U(1,:) == maxU);

% Find the data points with highest grade of

%membership in cluster2

function(implemented in matlab) for the above
data set-1 and data Set-2 is shown below:

OUTPUT:

Centpfcm center value of ni0.08

initialize the centre[0.07 0.36;0.40 0.99]

Iteration count = 1, obj. fcn = 747.953767

Iteration count = 2, obj. fcn = 115.889242

Iteration count = 3, obj. fcn = 114.865973

Iteration count = 4, obj. fcn = 112.480753

Iteration count = 5, obj. fcn = 106.975976

Iteration count = 6, obj. fcn = 97.427050

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 6

ISSN 2229-5518

Iteration count = 7, obj. fcn = 87.573956

Iteration count = 8, obj. fcn = 82.246424

Iteration count = 9, obj. fcn = 80.609839

Iteration count = 10, obj. fcn = 80.231624

Iteration count = 11, obj. fcn = 80.145442

Iteration count = 12, obj. fcn = 80.123254

Iteration count = 13, obj. fcn = 80.116857

Iteration count = 14, obj. fcn = 80.114881

Iteration count = 15, obj. fcn = 80.114251

Iteration count = 16, obj. fcn = 80.114046

Iteration count = 17, obj. fcn = 80.113980

Iteration count = 18, obj. fcn = 80.113958

Iteration count = 19, obj. fcn = 80.113951

Figure-1

fcmdata.dat(Data Set-2)

	1	2
1	0.21895919	0.71971108
2	0.67929641	0.31389837
3	0.83096535	0.47903993
4	0.053461635	0.77931485
5	0.68677271	0.52219568
6	0.091964891	0.71405761
7	0.65391896	0.16631787
8	0.70119059	0.422224
9	0.91032083	0.31883553
10	0.73608188	0.45760685
11	0.63263857	0.14380738
12	0.7226604	0.3206678
13	0.75335583	0.46373323

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 7

ISSN 2229-5518

loadfcmdata.dat
[center, U,T, obj_fcn] = pfcm(fcmdata, 2);
maxU = max(U);
index1 = find(U(1, :) == maxU); index2 = find(U(2, :) == maxU); line(fcmdata(index1, 1), fcmdata(index1, 2),

'linestyle',...'none','marker', 'o','color','g');

line(fcmdata(index2,1),fcmdata(index2,2),'linestyle',..

.'none','marker', 'x','color','r'); holdon plot(center(1,1),center(1,2),'kx','markersize',15,'Line Width',2) plot(center(2,1),center(2,2),'kx','markersize',15,'Line Width',2)

Centerpfcm

OUTPUT:

enter value of ni0.085
initialize the centre[0.07 0.36;0.40 0.99] Iteration count = 1, obj. fcn = 169.719992
Iteration count = 2, obj. fcn = 30.906949
Iteration count = 3, obj. fcn = 28.911804
Iteration count = 4, obj. fcn = 28.624657
Iteration count = 5, obj. fcn = 30.545798
Iteration count = 6, obj. fcn = 30.565587
Iteration count = 7, obj. fcn = 32.824230
Iteration count = 8, obj. fcn = 32.436485
Iteration count = 9, obj. fcn = 34.899085
Iteration count = 10, obj. fcn = 33.981948
Iteration count = 11, obj. fcn = 36.337822
Iteration count = 12, obj. fcn = 34.968145
Iteration count = 13, obj. fcn = 36.884404
Iteration count = 14, obj. fcn = 35.170103

%See also

Centerpfcm.m

Iteration count = 15, obj. fcn = 36.419926
Iteration count = 16, obj. fcn = 34.483684

%fcmdata,pfcm,pinitf,tinitf,pdistfcm,pstepfcm.

Iteration count=17,obj.fcn=35.006450
Iteration count= 18,obj. fcn=33.015857

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 8

ISSN 2229-5518

Iteration count = 19, obj. fcn = 32.888743
Iteration count = 20, obj. fcn = 31.018486
Iteration count = 21, obj. fcn = 30.451930
Iteration count = 22, obj. fcn = 28.824755
Iteration count = 23, obj. fcn = 28.097759
Iteration count = 24, obj. fcn = 26.787371
Iteration count = 25, obj. fcn = 26.062699
Iteration count = 27, obj. fcn = 24.435272
Iteration count = 28, obj. fcn = 23.771498
Iteration count = 29, obj. fcn = 23.218869
Iteration count = 30, obj. fcn = 22.764694
Iteration count = 31, obj. fcn = 22.360895
Iteration count = 32, obj. fcn = 22.057192
Iteration count = 33, obj. fcn = 21.795550
Iteration count = 34, obj. fcn = 21.606641
Iteration count = 35, obj. fcn = 21.456819
Iteration count = 36, obj. fcn = 21.351975
Iteration count = 37, obj. fcn = 21.275950
Iteration count = 38, obj. fcn = 21.224284
Iteration count = 39, obj. fcn = 21.189212
Iteration count = 40, obj. fcn = 21.165943
Iteration count = 41, obj. fcn = 21.150726
Iteration count = 42, obj. fcn = 21.140799
Iteration count = 43, obj. fcn = 21.134421
Iteration count = 44, obj. fcn = 21.130304
Iteration count = 45, obj. fcn = 21.127677
Iteration count = 46, obj. fcn = 21.125992
Iteration count = 47, obj. fcn = 21.124918
Iteration count = 48, obj. fcn = 21.124232
Iteration count = 49, obj. fcn = 21.123794
Iteration count = 50, obj. fcn = 21.123515
Iteration count = 51, obj. fcn = 21.123337
Iteration count = 52, obj. fcn = 21.123223
Iteration count = 53, obj. fcn = 21.123151
Iteration count = 54, obj. fcn = 21.123104
Iteration count = 55, obj. fcn = 21.123075
Iteration count = 56, obj. fcn = 21.123056
Iteration count = 57, obj. fcn = 21.123044
Iteration count = 58, obj. fcn = 21.123036

Figure-2

IV.RESULT
The result shown in figure-1 and figure-2 is quite good as compare to FCM and PCM. The PFCM algorithm overcome the problem of Overlapping of the coincident clusters.
V.CONCLUSION

International Journal of Scientific & Engineering Research, Volume 3, Issue 11, November-2012 9

ISSN 2229-5518

We have implemented the PFCM in Matlab taking all the considerations in mind that about the membership value and typicality.
REFERENCES

[1] Nikhil R. Pal,Kuhu Pal, James M. Keller, and James C. Bezdek,” A Possibilistic Fuzzy c-Means Clustering Algorithm,”IEEE Trans. on Fuzzy Systems, vol. 13, no. 4, pp. 517-530, Aug. 2005.

[2] Jian Cui, Qiang Li, Jun Wang and Da-Wei Zong,” Research on Selection Method of the Optimal Weighting Exponent and ClusteringNumber in Fuzzy C-Means Algorithm ,” International Conference on Intelligent Computation Technology and Automation, pp. 104-107, 2010.

[3] Jiang-She Zhang and Yiu-Wing Leung,” Improved Possibilistic C-Means ClusteringAlgorithms,” IEEE Trans. on Fuzzy Systems, vol. 12, no. 2, pp. 209-217, April 2004.

[4] S. Nefti and M. Oussalah,” Probabilistic-Fuzzy Clustering Algorithm,” IEEE lntemational Conference on Systems, Man and Cybemetics, pp. 4786-4791, 2004.

[5] Jiabin Deng, JuanLi Hu, Hehua Chi and Juebo Wu,” An Improved Fuzzy Clustering Method for Text Mining,” Second International Conference on Networks Security, Wireless Communications and Trusted Computing, pp. 65-69, 2010.

[6] Maryam hajiee,” A New Distributed Clustering Algorithm Based on K-means Algorithm,” 3rd International Conforence on Advanced Computer Theory and Engineering, pp. 408-411, 2010.

[7] Jiang Haiyan and DU Min,” Research on the Detection of Gold Immunochromatographic Assay by the Image Histogram Feature Vectors and Fuzzy C-means,”EightInternational Conference on Fuzzy Systems and Knowledge Discovery, pp. 467-471, 2011.

[8] JialunLin ,Xiaoling Li and Yuan Jiao,” Text Categorization

Research Based on Cluster Idea,” Second International Workshop on

Education Technology and Computer Science, pp.483-486, 2010.