ECAI 2017 – International Conf erence 9th Edition [615497]
ECAI 2017 – International Conf erence – 9th Edition
Electronics, Computers and Artificial Intelligence
29 June -01 July, 2017, Targoviste, ROMÂNIA
978-1-5090-6458-8/17/$31.00 ©2017 IEEE
Clustering Based Data Mining in Wind Power
Production
Florina Scarlatache, Gheorghe Grigora ș and Bogdan Constantin Neagu
Power System Department
“Gheorghe Asachi” Technical University of Iasi
Iasi, Romania
e-mail: [anonimizat]
Abstract – This paper presents a clustering based data
mining method for determining the typical wind power profiles and also to estimate the share of wind power from the total power required by the electrical power
system in one year. The proposed method was tested
using a real data set with information’s about power produced in one year (2016), in Romania. The results demonstrate the efficiency of the methodology to be
successfully used in patterns discovery of the wind
power profiles.
Keywords- data mining; clustering; renewable energy;
wind power.
I. INTRODUCTION
The Directive regarding the Renewable Energy
institutes a common policy for the production and promotion of energy from renewable sources in Europe. By 2030, EU countries should to fulfil the requires that refers to cover 27% of its total energy needs from renewable sources. The Directive specifies for each country national renewable energy targets, taking into account its starting point and global potential for renewables sources. These targets fluctuate from a minimum value of 10% in Malta to a maximum value of 49% in Sweden. Romania must to produce from RES about 31% from total energy produced until 2030 [1].
The share of the wind power in total installed
power capacity has increased from 6% in 2005 to 16.7% in 2016, remaining the first through renewables in EU. With a total installed capacity of 153.7 GW, wind energy now overtakes coal as the second largest form of power generation capacity in Europe. By 2030, wind could serve a quarter of the EU’s electricity needs and be the backbone of Europe’s energy system, [2].
One of the most important aspect in share of wind
power is the operation system of the wind turbine that
is conditioned by two parameters, wind speed – v and
wind variations. The output power from a wind turbine varies in function of wind speed, according to Fig.1. There are three operating states that can be differentiated for a wind turbine, 1 – cut-in speed , 2 –
rated output speed and 3 – cut-out speed . [3], [4]. Cut-in speed represent the speed at which the wind
turbine first starts to rotate and generate power. Typically, the wind speed is between 3 and 4 m/s.
Rated output speed – As the wind speed increases
above the cut-in speed , the level of the output power
rises rapidly. Typically, between 12 and 17 m/s, the output power attain the optimal limit called rated power output, time when is registered the rated output
wind speed .
Cut-out speed – When the speed growing more
than the rate output speed , the forces on the turbine
structure continue to rise, being possible to damage the turbine. As a solution, a braking system is used to bring the rotor to a standstill. This is called the cut-out
speed and is usually around 25 m/s, [5].
In case of wind speed v is situated between
operating states 1 and 2, then the wind turbine act
with partial load. If the states change from 2 to 3 then
the wind turbine operate with full load. When the wind speed v registered values lower than cut-in
speed or higher than cut-out speed than the turbines
standstill.
These three states represent characteristics P- v of
the wind power plant. Determining the patterns of the wind power plants, as accurately as possible, for different countries or regions, for each season or years can be useful when it analyses the influence of
interconnecting the wind power plants on the electric
network.
Figure 1. Typical wind turbine power output depending of wind
speed, [5].
Florina Scarlatache, Gheorghe Grigora ș and Bogdan Constantin Neagu
The methods that were proposed in the literature
for classification or load profiling, most of them solve
different problems encountered in power systems. Two types of methods occurred in literature, thus the statistical methods [6], [7] and methods based on artificial intelligence techniques, fuzzy logic [8], [9], neural networks [10], data mining [11], clustering techniques [12]-[18].
II. D
ATA MINING TECHNIQUES IN LARGE
DATABASES
Since the computers have allowed storing more
data, it is only natural to resort to the computational techniques to help us to discover meaningful patterns
and massive structures volume data. In these
conditions, it is necessary to automate the information (knowledge) discovery to assist the human operator [19], [20].
Several definitions are currently used for both data
mining (DM) and knowledge discovery in databases (KDD). While in some situations they are used as equivalent terms, the data mining is one of the steps in the knowledge discovery process [20].
The Knowledge discovery, generally, is the
process of nontrivial extraction of information from the database, information that is implicitly present in that data, previously unknown and potentially useful for the user [21]. As regards KDD, this refers to the overall process of discovering useful knowledge from data. The KDD process is interactive and iterative, involving numerous steps with many decisions made by the user, Figure 2 [21], [22].
Data Selection . In this step are chosen the goal and
the tools of the data mining process, identifying the data to be mined, then choosing appropriate input attributes and output information to represent the task.
Preprocessing . In real cases different kind of
problems will affect the quality of the database. The most relevant and frequent are communication problems, outages, failure of equipment etc. The result will be a very large database with problems like noise or missing values. Thus, the target in this step is to eliminate noise from the database and possibly
generates specific data sequences in the set of pre-
processed data. If it is necessary, for missing data are used special techniques for estimation of these.
Figure 2. Steps that compose the KDD process. Data transformation . The following
transformations must be made by a human operator:
organizing data in desired ways, converting one type of data to another, defining new attributes, reducing the dimensionality of data, removing outliers and normalizing. This kinds of transformations of the preprocessed data depend on the task of the human operator.
Data mining . The transformed data is
subsequently mined, using one or more techniques to extract patterns of interest. The human operator can significantly aid the data mining method by correctly performing the preceding steps.
Result interpretation/validation . For understanding
the meaning of the synthesized knowledge and its range of validity, the data mining application tests its robustness, using established estimation methods and unseen data from the data base.
Incorporation of the Discovered Knowledge . In
this last step, the results are presented to the human operator who may check/ resolve potential conflicts with previously extracted knowledge and apply to the new discovered patterns.
As it can see, an important step in the KDD
process is represented by Data mining (DM). DM involves fitting models or determining patterns from observed data [19].
In the literature, there are available many tools for
data mining, but the most important are: classification methods, cluster analysis, search for association rules,
aggregation and approximation methods, time series
analysis, dependency analysis, and prediction analysis.
Another classification, based on the machine
learning field, is given by division of the data mining methods into two groups: supervised and unsupervised learning methods.
In supervised learning methods, a functional model
is built for one of the variables (modelled variable). The model will relate the modelled variable with the rest of variables. Inside this category, the classification and regression methods are usually considered.
In unsupervised learning, some knowledge about
the variables is extracted from the database. This knowledge is represented by relationships between variables. From this category, the most famous methods are clustering techniques [19]-[23]. The advantage of the unsupervised learning methods is that interesting structures can be found directly from the data without any background knowledge.
Traditionally, the clustering techniques are divided
in hierarchical and partitioning methods [24]-[27]. While the hierarchical algorithms build clusters gradually, the partitioning algorithms learn clusters directly. So, the partitioning methods discover the clusters by iteratively relocating points between subsets, or identify the clusters as zone highly populated with data. These methods are categorized into probabilistic clustering and k-means methods.
Clustering Based Data Mining in Wind Power Production
III. K-M EANS CLUSTERING ALGORITHM
The k-means algorithm is the most popular
clustering tool used in scientific and industrial applications. The name comes from representing each of K clusters Cj by the mean (or weighted average) cj of its points, the so-called centroid. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid:
{} ( ) { }
=∈=k
iCxi
id E
1, min min zx
(1)
where zi is the center of cluster Ci, while d(x, zi) is the
Euclidean distance between points x and zi.
The criterion function E attempts to minimize the
distance of each point from the center of the cluster to which the point belongs. More specifically, the algorithm begins by initializing a set of K cluster
centers. Then, it assigns each object of the dataset to the cluster whose center is the nearest, and recomputed the centers. The process continues until the centers of the clusters stop changing.
The steps of the algorithm are the following [24],
[28], [29]:
Step 1. Choose K initial cluster centres z
1(0),z2(0), …,
zk(0), for instance, at random among the points to be
analyzed.
Step 2. At the k-th iterative step, distribute the
samples { x} among the K clusters by using the
relation:
)(k
iC∈x ),(),()( )( k
jk
i d d zx zx < jiK i ≠ = ;,…,2,1 (2)
where Ci(k) denotes the set of samples whose cluster
centre is zi(k).
Step 3. Compute the new cluster centres zi(k+1), j = 1,
2, …, K. The new cluster centre is given by equation
(3), where ni is the number of objects in Ci(k).
K in k
iC ik
i …,,2,1 ,1
)()1(= =
∈+
xx z
(3)
Step 4. Repeat steps 2 and 3 until convergence is
achieved, that is, until a pass through the training sample causes no new assignment. It is obvious in this algorithm that the final clusters will depend on the initial cluster centre choice and on the value of K.
For defining of the optimal number of clusters
N
c,opt the following algorithm can be used [26], [29]:
1. Determination of the maximum number of clusters
Nc,max. The maximum optimal of clusters Nc,max should
be set to satisfy 2 ≤Nc,max≤n, where n is the
clustered objects from data base.
2. For set of objects from data base, the method of
clustering k-means with given Nc (2 ≤Nc≤Nc,max) is
used. 3. According to the obtained clusters structure,
determinate partition quality is evaluated. In the
paper, this is achieved through the silhouette global coefficient.
4. Increase the number of clusters N
c to Nc,max to see if
the k-means method finds a better grouping of the data. (To repeat the steps 2 ÷ 3).
5. Show the number of clusters corresponding to the
optimal value of the silhouette global coefficient.
Using this approach each cluster could be
represented by the so-called silhouette, which is based on the comparison of its tightness and separation. The silhouette validation technique calculates the silhouette width for each sample, average silhouette width for each cluster and overall average silhouette width for a total database.
The average silhouette width will be applied for
evaluation of clustering validity and also will be used to decide the determination of the optimal number of clusters.
==cN
jj
cSNSC
11
(4)
where Sj is the silhouette local coefficient, defined as:
==jr
ii
jj srS
11
(5)
in which si is the silhouette width index for the i-
object:
{}i ii i
iaba bs, max−=
(6)
rj is the number of object for each cluster;
ai – mean distance between object i and objects of the
same class j;
bi – minimum mean distance between object i and the
objects in the class closest to class j.
In relation (6), if the object i is the single object of
a cluster, then si = 0. In [30] it is proposed the
following interpretation of the silhouette coefficient: 0.71 – 1.0 a strong structure has been found; 0.51 – 0.7 a reasonable structure has been found; 0.26 – 0.5 the structure is weak and could be artificial; < 0.25 no substantial structure has been found.
IV. C
ASE STUDY
The data set used for simulation contains
information’s about the power produced in one year (2016), in Romania, data recorded by Romanian
Transmission and System Operator (TSO), [31] .
Each generated power is described by 24 ×6 hourly
points that depict the behavior of the wind power
plants during a day. The daily power generated was
Florina Scarlatache, Gheorghe Grigora ș and Bogdan Constantin Neagu
normalized using like normalization factor the daily
energy.
To determine the wind power profiles
characteristics for year 2016 is necessary to find the
optimal number of patterns. Regarding this aspect, the study proposes in this paper is based on k-means algorithm.
First of all, the maximum number of patterns N
c,max
must be searched. The value of Nc,max was calculated
with relation Nc,max = N, where N represents the
total number of wind power characteristics from data set (N = 366).
Further, the k-means algorithm with values for N
c
between 2 and Nc,max is applied.
In the third step, the grouping quality is evaluated
using the silhouette global coefficient (SG), Fig. 3.
It can observe that the optimum number of
patterns is for Nc = 4. For this value, in Fig. 4 are
represented the silhouettes of patterns.
For each pattern, a typical wind power profile is
attached, Figs 5-8. The typical wind power profiles were obtained by an averaging process of the hourly values.
Also, for each pattern are associated the monthly
repartitions and the number of operation days of the wind power plants, for year 2016, Table I.
Figure 3. Variation of SG coefficient.
Figure 4. The silhouettes of patterns ( K = 4).
Figure 5. The typical wind power profile for pattern G1 .
Figure 6. The typical wind power profile for pattern G2 .
Figure 7. The typical wind power profile for pattern G3 .
Figure 8. The typical wind power profile for pattern G4 .
Analyzing accurately the results it can remarked
that the typical wind power profile that correspond to pattern G4 have the most frequent repartition, 55.19% in comparison with the typical load profile for pattern
G1, G2 and G3 with 16.67%, 10.66% and 17.49%.
From total power produced in Romania, in 2016,
Table II, the wind energy had a share of 10.04%, because the most period of the year (202 days) the wind turbines standstill. When the wind turbines operate with full load (39 days –especially in winter season) the wind energy shared was around of 30%. In case of partial load (125 days) the wind energy output varies between 10 and 30%.
Clustering Based Data Mining in Wind Power Production
TABLE I. THE NUMBER OF OPERATION DAYS SPECIFIC TO
EACH PATTERN AND THE MONTHY RERARTITION FOR YEAR 2016
Month G1 G2 G3 G4 Total
January 7 6 8 10 31
February 10 6 6 7 29
March 8 3 5 15 31
April 4 2 5 19 30
May 5 2 3 21 31
June 3 0 5 22 30
July 3 1 3 24 31
August 3 3 6 19 31
September 2 – 3 25 30
October 4 4 6 17 31
November 4 4 7 15 30
December 8 8 7 8 31
Total 61 39 64 202 366
Total [%] 16.67 10.66 17.49 55.19 100
TABLE II. THE TOTAL POWER PRODUCED IN ROMANIA , IN
2016, FROM DIFFERENT TYPE OF SOURCES , IN [GW] AND [%]
Source type Power produced
[GW] [%]
Coal 16422.16 25.37
Hydrocarbons 10107.72 15.62
Hydro 18464.06 28.53
Nuclear 11503.03 17.77
Wind 6495.80 10.04
Photovoltaic 1278.39 1.98
Biomass 454.78 0.70
Total 64725.94 100.00
CONCLUSIONS
In this paper a k-means clustering algorithm is
used to determine the typical wind power profiles using a data set based on wind power generated in one year, in Romania.
The typical wind power profiles resulted describes
very well the operation states of the wind turbines. Analyzing the typical wind power profiles obtained it can be noticed that in case of typical wind profile for pattern G1 and G3 the wind turbines operates with partial load, for pattern G2 the wind turbines operates with full load and for pattern G4 the turbines standstill because the power produced is of low value.
The k-means clustering method has been
successfully used to establish the typical wind power profiles and in the same time estimate the share of the wind power in the whole produced power in Romania, in 2016.
R
EFERENCES
[1] G. Resch, C. Panzer, A. Ortner, “2030 RES targets for Europe
-a brief pre-assessment of feasibility and impacts, ” Vienna University of Technology, http://www.keepontrack.eu/ contents/publicationsscenarioreport/kot–2030-res-targets-for-
europe.pdf , Vienna, January 2014.
[2] Wind in power. 2016 European statistics,
https://windeurope.org/about-wind/statistics/european/wind-
in-power-2016/ , February 2017.
[3] Ö.S. Mutlu, “Evaluating The Impacts Of Wind Farms On
Power System Operation,” Journal of Naval Science and
Engineering, vol. 6 , no.2, pp. 166-185, 2010.
[4] H. Holttinen, J. Kiviluoma, A. Forcione, M. Milligan, C.J.
Smith, J. Dillon, M. O'Malley, J. Dobschinski, S. van Roon, N. Cutululis, A. Orths, P. Eriksen, E. M. Carlini, A.Estanqueiro, R. Bessa, L. Söder, H. Farahmand, J.R. Torres, B. Jianhua, J. Kondoh, I. Pineda and G. Strbac, “Design and operation of power systems with large amounts of wind power,” Final summary report, IEA WIND Task 25,
http://www.vtt.fi/inf/pdf/technology/2016/T268.pdf , 2016.
[5] http://www.wind-powerprogram.com/turbine_ characteristics.
htm; online acces May 2017.
[6] L. Chuan, A. Ukil, “Modeling and Validation of Electrical
Load Profiling in Residential Buildings in Singapore, ” http://www.ntu.edu.sg/home/aukil/papers/journal/2014_Ukil_
Load-Profiling_preprint.pdf , 2014.
[7] Z. Lubosny, Wind Turbine Operation in Electric Power
Systems. Berlin :Springer-Verlag, 2003.
[8] Gh. Grigoras, Gh. Cartina, “Improved Fuzzy Load Models by
Clustering Techniques in Distribution Network Control, ” International Journal on Electrical Engineering and
Informatics, vol. 3, no. 2, 2011, pp. 207–216.
[9] D. Gerbec, S. Gašperi č, I. Šmon, F. Gubina, “Determining the
load profiles of consumers based on fuzzy logic and probability neural networks,” in IEE Proc. Generation,
Transmission and Distribution, vol. 151, pp. 95–400, 2004.
[10] M. Sarlak, T. Ebrahimi, S.S Karimi Madahi, “Enhancement
the accuracy of daily and hourly short time load forecasting using neural network,” Journal of Basic and Applied
Scientific Research; vol. 2, no. 1, pp. 247-255, 2012.
[11] A.H. Nizar, “Load profiling and data mining techniques in
electricity deregulated market,” Power Engineering Society
General Meeting, IEEE/PES, 2006.
[12] G. Chicco, “Overview and performance assessment of the
clustering methods for electrical load pattern grouping,”
Energy, vol. 42, no. 1, pp. 68-80, 2012.
[13] N. Mahmoudi-Kohan, M.P. Moghaddam, S.M. Bidaki,
“Evaluating performance of WFA K-means and Modified Follow the leader methods for clustering load curves,” Proc.
IEEE/PES Power System Conference and Exposition, 2009.
[14] Gh. Grigoras, M. Istrate, Fl. Scarlatache, “Electrical energy
consumption estimation in water distribution systems using a clustering based method,” 5th International Conference Electronics, Computers and Artificial Intelligence, Pitesti,
Romania, 2013.
[15] Fl. Scarlatache, Gh. Grigora ș, “Optimal Coordination of Wind
and Hydro Power Plants in Power Systems, ” Proceedings of 14th International Conference on Optimization of Electrical
and Electronic Equipment (OPTIM), pp. 689-694, 2014.
[16] Fl. Scarlatache, Gh. Grigora ș, “Influence Of Wind Power
Plants On Power Systems Operation, ” International Conference and Exposition on Electrical and Power
Engineering (EPE), pp. 1010- 1014, 2014.
[17] D. Comanescu, Gh. Grigoras, Gh. Cartina, Fl. Rotaru,
“Determination of Typical Load Profiles in Hydro-Power Plant by Clustering Techniques, ” Proceedings of 12th International Conference on Optimization of Electrical and
Electronic Equipment (OPTIM), pp. 1294 – 1297, 2010.
[18] Gh. Grigoras, M. Istrate, Fl. Scarlatache, “Electrical Energy
Consumption Estimation in Water Distribution Systems Using a Clustering Based Method, ” The 5th International Conference on Electronics, Computers and Artificial Intelligence (ECAI 2013), vol. 5, no. 4, pp. 27-33, ISSN:
1843–2115, 2013.
[19] KH. Anders, “Data Minning for Automated GIS Data
Collection,” http://www.ifp.unistuttgart. de/publications
/phowo01/Anders.pdf, 2001.
Florina Scarlatache, Gheorghe Grigora ș and Bogdan Constantin Neagu
[20] F.Usama, G.Piatetsky-Shapiro, S. Padhraic, “From Data
Mining to Knowledge Discovery in Databases,” AI Magazine
vol. 17, pp. 37-54, 1996.
[21] V.Torra, J. Domingo-Ferrer, A. Torres, “Data Mining
Methods for Linking Data Coming from Several Sources,” http://www.iiia.csic.es/~vtorra/publications/unrestricted/conf
UNECE. 2003.143.150.pdf, 2003.
[22] V.Devedzic, “Knowledge Discovery and Data Mining in
Databases. Handbook on Software Engineering & Knowledge Engineering,”http://repository.binus.ac.id/20092/content/M07
24/ M072 442942.pdf, 2009.
[23] C.Olaru, P. Geurts, L. Wehenkel, “Data mining tools and
applications in power system engineering.” 13th Power Systems Computation Conference, Trondheim, Nowary, June
28 – July 2, 1999.
[24] Gh. Cârțină, Gh. Grigora ș, E.C.Bobric, Clustering Techniques
in Fuzzy Modeling. Applications in Power Systems, Ia și:
Venus, 2005.
[25] Gh. Grigora ș, Gh.Cârțină, The Impact of the Fuzzy Modeling
in Electric Distribution Systems. Deutschland, Saarbr ȕcken:
Lambert Academic Publishing, 2012. [26] I.Yatskiv, L.Gusarova, “The Methods of Cluster Analysis
Results Validation,” Proc. of International Conference
RelStat. 2004.
[27] A.K. Jain, M.N. Murty, P.J. Flynn, “Data Clustering: A
Review,” ACM Computing Surveys, vol. 31, no. 3, pp.264 –
323. 1999.
[28] Fl. Rotaru, G. Chicco, Gh.Grigoras, Gh.Cartina, “Two-stage
distributed generation optimal sizing with clustering-based node selection,” International Journal of Electrical Power &
Energy Systems, vol. 40, no. 1, pp.120 – 129, 2012.
[29] W. El-Khattam, K. Bhattacharya, Y. Hegazy, M.M.A. Salama
“Optimal investment planning for distributed generation in a competitive electricity market,” IEEE Transactions on Power
Systems, vol. 19, no. 3, pp. 1674–1684, 2004.
[30] P.J. Rousseeuw,. “Silhouettes: a Graphical Aid to the
Interpretation and Validation of Cluster Analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53-65,
1987.
[31] http://www.transelectrica.ro/widget/web/tel/sen-grafic/-/SEN
Grafic_WAR_SENGraficportlet, online acces March 2017.
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: ECAI 2017 – International Conf erence 9th Edition [615497] (ID: 615497)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
