A Simulation Based Approach For Energy Efficient And Temperature Aware Workload Scheduling In A Data Center
A SIMULATION BASED APPROACH FOR
ENERGY-EFFICIENT AND TEMPERATURE-AWARE WORKLOAD SCHEDULING IN A DATA CENTER
The Rack By Rack Placement Method
LICENSE THESIS
2015
Graduate: Adelina-Mihaela BURNETE
A SIMULATION BASED APPROACH FOR ENERGY-EFFICIENT AND TEMPERATURE-AWARE WORKLOAD SCHEDULING IN A DATA CENTER
The Rack By Rack Placement Method
Project proposal: Elaborate a virtual machine workload scheduler and analyze temperature-aware cooling systems for decreasing the energy consumption in a data center.
Project contents: Introduction, Project Objectives, Bibliographic Research, Analysis and Theoretical Foundation, Detailed Design and Implementation, Testing and Validation, User Manual, Conclusions, Bibliography, Appendix.
Place of documentation: Technical University of Cluj-Napoca, Computer Science Department
Consultants: Prof. dr. eng. Ioan SALOMIE, Assist.eng. Marcel ANTAL, Assist.eng. Claudia POP, Assist.eng. Dan VALEA
Date of issue of the proposal: November 1, 2014
Date of delivery: June 18th, 2015
Declarație pe proprie răspundere privind
autenticitatea lucrării de licență
Subsemnatul(a)________________________________________________________________________________________________________________________, legitimat(ă) cu _______________ seria _______ nr. ___________________________
CNP _______________________________________________, autorul lucrării ____________________________________________________________________________________________________________________________________________________________________________________________elaborată în vederea susținerii examenului de finalizare a studiilor de licență la Facultatea de Automatică și Calculatoare, Specializarea ________________________________________ din cadrul Universității Tehnice din Cluj-Napoca, sesiunea _________________ a anului universitar __________, declar pe proprie răspundere, că această lucrare este rezultatul propriei activități intelectuale, pe baza cercetărilor mele și pe baza informațiilor obținute din surse care au fost citate, în textul lucrării, și în bibliografie.
Declar, că această lucrare nu conține porțiuni plagiate, iar sursele bibliografice au fost folosite cu respectarea legislației române și a convențiilor internaționale privind drepturile de autor.
Declar, de asemenea, că această lucrare nu a mai fost prezentată în fața unei alte comisii de examen de licență.
In cazul constatării ulterioare a unor declarații false, voi suporta sancțiunile administrative, respectiv, anularea examenului de licență.
De citit înainte (această pagină se va elimina din versiunea finală):
Cele trei pagini anterioare (foaie de capăt, foaie sumar, declarație) se vor lista pe foi separate (nu față-verso), fiind incluse în lucrarea listată. Foaia de sumar (a doua) necesită semnătura absolventului, respectiv a coordonatorului. Pe declarație se trece data când se predă lucrarea la secretarii de comisie.
Pe foaia de capăt, se va trece corect titulatura cadrului didactic îndrumător, în engleză (consultați pagina de unde ați descărcat acest document pentru lista cadrelor didactice cu titulaturile lor).
Documentul curent a fost creat în MS Office 2007. Dacă folosiți alte versiuni e posibil sa fie mici diferențe de formatare, care se corectează (textul conține descrieri privind fonturi, dimensiuni etc.).
Cuprinsul începe pe pagina nouă, impară (dacă se face listare față-verso), prima pagina din capitolul Introducere tot așa, fiind numerotată cu 1. Pentru actualizarea cuprinsului, click dreapta pe cuprins (zona cuprinsului va apare cu gri), Update field->Update entire table.
Vizualizați (recomandabil și în timpul editării) acest document după ce activați vizualizarea simbolurilor ascunse de formatare (apăsați simbolul din Home/Paragraph).
Fiecare capitol începe pe pagină nouă, datorită simbolului ascuns Section Break (Next Page) care este deja introdus la capitolul precedent. Dacă ștergeți din greșeală simbolul, se reintroduce (Page Layout -> Breaks).
Folosiți stilurile predefinite (Headings, Figure, Table, Normal, etc.)
Marginile la pagini nu se modifică (Office 2003 default).
Respectați restul instrucțiunilor din fiecare capitol.
Table of Contents
Introduction
The main focus of this chapter is to provide an overview regarding the motivation of the project and to emphasize the necessity of providing sustainable solutions to the high impact problems that the project addresses.
Project context
The project’s main objective is to contribute to the further development of the current research projects which deal with the complex and challenging issue of energy efficiency in Smart Data Centers. These Energy Efficient Adaptive Data Centers are envisioned as the next generation of enhanced, state-of-the-art Data centers. The project is aiming not only at delivering an up to standard performance level and quality of service, but is also targeted in reducing the energy consumption and moreover, one of the most critical and alarming environmental challenges – the massive levels of carbon emissions.
Because cloud computing is facing many research challenges due to the fact that it is not only focusing on the technological improvements of data centers, but also there in the major shift in the usage and provisioning of IT [5], enterprises need to consider besides the benefits, also the risks and the effects of cloud computing on their organizations and usage-practices in order to make decisions about its adoption and use. Service Level Agreements (SLAs) have to be defined as the terms of engagement between the cloud providers and the services exploiting the cloud resources. These SLAs can be negotiated and optimized and further enforce the Quality of Service (QoS) characteristics.
Motivation
Over the last few years, the Cloud has become increasingly popular due to its wide scale potential in various areas such as online banking, social networking, e-commerce, e-government and information. Therefore, many powerful companies such as Amazon, eBay, Microsoft, IBM and many others started regarding the Cloud as a potential market opportunity and are investing tremendous amounts of capital in building and making use of data centers.
In [1], cloud computing is described as being a highly scalable infrastructure used for running IT applications such as High Performance Computing (HPC), Web and enterprise applications which require ever-increasing computational resources. This continuous rising need is handled through large-scale datacenters, that host numerous servers, storage and network systems.
However, as a result of the high energy costs reached due to the massive amounts of electricity needed to power and cool the servers and the alarming rise in the levels of carbon dioxide emissions (Figure 1), this growing demand of resources led to a limitation in further performance improvement and has become a challenging and complex issue to overcome.
Figure 1. Cloud and Environmental Sustainability [1]
Therefore, due to the vast development of the Cloud and the increased usage of data centers, the energy consumption has become a concern of global proportions. The U.S Environmental Protection Agency stated that in 2006 the total energy consumption of data centers and their corresponding cooling systems represented 1.2% from the total U.S energy consumption. Even more alarming is that these statistics are expected to double every five years. By 2010, the data centers total consumed energy has reached 0.5% from the global energy consumption. Moreover, the Environmental Protection Energy (EPA) [15] reported that the total energy consumed by the U.S. data centers in 2011, exceeded 100 billion kWh. Hence, if this high demand of energy continues, by 2040 the energy consumption is expected to reach 20%, putting the world on track for a long term global temperature increase of 3.6 °C. The key for reaching the climate goal of limiting the global temperature increase to 2 °C is limiting the carbon emission levels by about 25% by 2040. Powerful companies such as Microsoft have publicly declared that if this increase continues, by 2015 the cloud computing costs will exceed the hardware costs. This is why the research in the field of energy management for data centers is of great importance, and why power consumption together with the cooling costs should be viewed as the main target when designing data centers. Nonetheless, because energy consumption is not only determined by hardware efficiency but is also dependent on the resource management deployed on the infrastructure and the applications running on the system, both hardware together with software capabilities have to be taken into account when designing the data centers.
Project Overview
The project is aiming at identifying the main challenges in achieving energy efficiency and proposes different approaches for obtaining it.
One of the approaches in achieving an efficient power and cooling management is through virtualization. The workload can be viewed as a pool of unified resources and allows them to be efficiently separated from the hardware. Consequently, physical machines can accommodate multiple computing environments isolated from each other and encapsulated as virtual machines. In turn, physical machines (servers) are hosted into racks. The workload, defined as a virtual machine, is characterized by memory, storage and CPU.
Another approach is by making use of the existent hardware and placing racks in such a way that the lowest cooling power is achieved. By analyzing the power consumption resulted from the workload allocation and distribution, various cooling systems can be compared, for ultimately achieving the most efficient energy consumption.
The application monitors the received tasks represented by the change in virtual machine’s state and performs an analysis of the system changes, after which server and rack consolidations are performed. Moreover, a planning phase is developed where the virtual machine allocation scheduling takes place. Also, the scheduling can be applied and make use of the real OpenNebula infrastructure.
Project Objectives
The project is envisioned to be designed and developed in such a way that not only an efficient processing and utilization of the computing infrastructure is achieved, but also to approach the high impact environmental problems by delivering various strategies for the minimization of the power and cooling cost in data centers. Thus, contributing to the future growth of Cloud computing.
Main objective
The main purpose of the project is to design and develop an efficient way of reducing the energy consumption in data centers. The system should be able to easily adapt, scale and quickly respond to the various changes that occur in the data center. Moreover, the system should be focused on developing load balancing and consolidation techniques for an efficient virtual machine migration across the infrastructure in such a way that the lowest values for power consumption and its corresponding cooling costs are achieved and delivered. Also, reaching optimal usages of the resources and optimization functions are carefully analyzed and taken into consideration. In order to validate the efficiency of the obtained results, the proposed scheduling algorithm is compared and contrasted with already existing bin-packing heuristics.
Furthermore, in order to reach the goal of obtaining an improved and efficient cooling power and because measures taken in this area have cascading effects in energy savings, various data center layouts will be taken into account, tested and compared. Because racks tend to be the main energy wasters in a data center, they are the main focus when discussing energy savings in cooling power consumption. In order, to reach this goal, some best practice design guides will be taken into account for rearranging the hardware components in the most suited and efficient manner.
Functional requirements
If it is supported by the infrastructure, namely servers still have available resources, and a more efficient energy consumption could be achieved, virtual machine migration should be performed.
Virtual machine management: resource monitoring should be started when a virtual machine changes its current state from one of the available states PENDING, RUNNING, DEPLOY, SHUTDOWN and DONE into another.
Deploy virtual machines accordingly to the most power efficient scheduling.
If servers become underutilized, its hosted virtual machines should be migrated to another server when possible and then turned off.
Servers should not become over-utilized or underutilized, which means that these physical machines should be kept within a recommended utilization range.
Servers should be turned on when there are still virtual machines waiting to be deployed and the other hosts do not have available resources left.
The system chooses the most energy efficient data center layout according to the available hardware resources when a more efficient cooling power can be achieved.
Non-functional requirements
Scalability
One of the biggest advantages of cloud computing and also a key requirement is scalability. Resources are expected to be easily scaled up or down, when required. When these changes occur, an up to standard performance level must be maintained.
Performance
The system is expected to provide a reasonable response time for the workload requirements.
Security
One of the most stringent requirements is regarding the data security. The user is provided with the assurance that the system provides intrusion prevention for the instantiated virtual machines and the used data is not visible to others using the same system.
Availability
Availability represents the ability of the system to respond to user requests.
Portability
The system should be able to perform the same through different platforms having minimum disruptions.
Metrics
In order to identify potential opportunities for further improvements in reducing the energy use in data centers, various efficiency metrics and benchmarks can be used. By using the Power Usage Effectiveness (PUE) and Datacenter Infrastructure Efficiency (DCiE), data center operators are enabled to quickly estimate the energy efficiency and compare the results with other data centers in order to determine if further improvements need to be made. What is important to be mentioned is that these two metrics, do not define the overall efficiency of the entire data center, but only the efficiency of the supporting equipment within a data center.
The power usage effectiveness is defined as follows:
(2.4.1) Table 1. Data center efficiency based on PUE [6]
The Power Usage Effectiveness is defined as the ratio between the total power needed to run the data center facility and the total power drawn by all the IT equipment. In [6] it is stated that an average data center has a PUE of 2.0. However, several recent efficient data centers have been delivering a PUE as low as 1.1. When a high PUE value is obtained, the cooling power consumption of a data center can grow tremendously. Therefore, techniques for achieving a higher-efficiency in cooling power consumption costs are a crucial necessity.
Data Center Infrastructure Efficiency (DCiE) is the reciprocal of PUE, thus being defined as the ratio between the total power drawn by all the IT equipment and the total power needed to run the data center facility.
(2.4.2) Table 2. Data center efficiency based on DCiE [6]
Bibliographic Research
Due to the over-increasing popularity of data centers and the growth in the volume of servers and their required cooling equipment and infrastructure, the overall electricity consumed has reached alarming levels. Therefore, the main focus has been shifted to the Green Cloud computing which supports the future growth and sustainability of Cloud. The Green Cloud is aimed at achieving an efficient processing and utilization of the computing infrastructure, along with the minimization of the overall energy consumption. To identify the open challenges in the area and facilitate future advancements, it is essential to synthetize and classify the research on power and energy efficient design conducted to date.
During the next subchapters the notion of Cloud and cloud computing will be given a more comprehensive understanding.
Cloud computing characteristics
The main Cloud characteristics include broad network access, resource pooling, rapid scalability, market and service orientation. These main characteristics are depicted in Figure 2. The available service models are classified into SaaS (Software-as-a-Service), PaaS (Platform-as-a-Service), and IaaS (Infrastructure-as-a-Service), while the cloud deployment models are categorized into public, private, community, and hybrid Clouds.
Figure 2. Characteristics of cloud computing as depicted in [1]
Virtualization provides an efficient approach of managing resources by allowing viewing them as a pool of unified resources and allowing applications to be efficiently separated from the hardware.
Due to virtualization, Clouds are provided with a major benefit, and namely scalability. Scalability is the ability of Clouds to scale resources up or down in a matter of minutes or seconds, in order to avoid over or under-provisioning of the resources they lease.
Pay-per-use utility model refers to the fact that the pricing model fluctuates according to the expected QoS (Quality of Service), which means that consumers are only required to pay for their used services and providers can capitalize poorly utilized resources.
Clouds exhibit autonomic behavior in order to provide highly reliable services, fault tolerance and performance degradation management.
Service models
As depicted in Figure 3, the cloud computing system delivers several core services, namely infrastructure, platform, and software (applications) services, known in industry as SaaS (Software as a Service), PaaS (Platform as a Service), and IaaS (Infrastructure as a Service), which are made available to consumers as subscription-based services.
Figure 3. Cloud Computing Architecture [1]
Infrastructure-as-a-Service (IaaS)
When using the Infrastructure-as-a-Service model, the consumer is provided with the opportunity of deploying and running its desired software on the fundamental computing resources previously provided. These computing resources are known as processing, storage and network.
The consumer is not in control of the underlying cloud infrastructure, but it is in control of the operating systems, storage or deployed applications. The IaaS model provides accesibility to virtual servers within minutes and consumers are provided with the opportunity of pay-per-use utility. An example of IaaS is the Amazon EC2 which provides the users with a variety of resources such as CPU, memory, OS and storage to cater to their particular needs. Also, and API access to the infrastructure could be offered as an option.
Software-as-a-Service (SaaS)
The SaaS provides the user with the capability of using the provider’s applications available through an interface or a web browser and running them on a cloud infrastructure. As in IaaS, the consumer is not in control and does not manage the underlying cloud infrastructure. On the other hand, the control and management of the OS, storage, network and servers is not granted. Also, some user specific application configuration settings can be limited.
Platform-as-a-Service (PaaS)
The PaaS provides the user with the capability of deploying onto the cloud infrastructure the desired software applications that must be supported by the provider. As in SaaS, the consumer is not provided with the control and management of the underlying cloud infrastructure including network, storage, OS and servers. On the other hand, the consumer is provided with control over the deployed applications.
Deployment models
The Cloud middleware is deployed on physical infrastructures and delivers various services to consumers. In literature, three commonly-used deployment models are defined. These are known as hybrid, public and private cloud as depicted in Figure 4. Also, there exists a fourth deployment model, known as the community cloud, but it is less-commonly used.
Figure 4. Cloud deployment model [1]
Private cloud
The private cloud is designed for an exclusive use, which means that it is best suited for organizations who may desire to maintain their own specialized environment to meet with their demands. The authors in [1] give as an example the health care industry which desires to maintaing their confidential data private. Because of the privacy, a limitation in scalability may appear. However, the consumers are provided with a greater control over the infrastructure which improves security.
Public cloud
The public clouds are designed for providing availability and open use to the general public and are most appealing for their cutting IT costs. Some of the most popular public clouds are the Amazon Web Services, Google AppEngine and Microsoft Azure. Besides being able to host individual services, the public cloud offers the possiblity of using collections of services.
Hybrid cloud
The hybrid cloud resulted from the emergence of both the private and the public cloud advantages. In this particular deployment model, organizations using the hybrid cloud can outsource the information considered non-critical, while keeping their sensitive data, private. Moreover, organizations can use the hybrid cloud as private and they can resort to the public model whenever it is required to auto-scale their resources.
Community cloud
The community cloud is a special cloud environment which is shared and managed across several organizations, and can be managed by either third-party providers or organizational IT resources.
Virtualization
One of the most advantageous ways of reducing energy consumption is by virtualization. Virtualization is a technique through which multiple independent virtual operating systems can be run on a single physical machine. Thus, hardware independence, isolation of guest operating system and encapsulation are provided. By encapsulation, all virtual machines are grouped into a single resource pool which can be altered or allocated dynamically. The simulated environment is called a virtual machine (VM).
By increasing the percentage of physical machine’s utilization, virtualization allows the same amount of processing to occur, but on a reduced number of servers. Consequently, because of the decreased number of necessary servers, the size and the consumption of the necessary cooling equipment will be drastically reduced.
The authors in [9] have divided the virtualization technique into different approaches such as emulation, hypervisor, full, para and hardware assisted virtualization.
Emulation
Emulation is a virtualization approach which provides flexibilty due to the fact that the hardware behaviour can be converted to a software program.
Hypervisor
Also known as virtual machine monitor, the hypervisor is an intermediate layer between the operating system and hardware, used for monitoring server’s resources by taking into consideration consumer’s needs. The hypervisor controls the flow of instructions between the guest OS and the hardware such as CPU, memory and storage.
Full virtualization
Full virtualization is a technique in which various operating systems and the applications they containt are run on top of the virtual hardware. The hypervisor is the one who manages the guest operating system represented by the virtual machines.
Para virtualization
Unlike full virtualization, the guest OSs are aware of one another. Para virtualization has a poor portability and compatibility as it cannot support unmodified operating systems. In a para virtualized environment, various operating systems can run simultaneously, as depicted in Figure 6.
Resource allocation
Migration
Some of the major benefits of virtualization include fault tolerance, performance isolation between applications that share the same host, and offering the means through which a virtual machine can be easily moved from one physical machine to another, namely by using hot or cold migration.
Figure 7. Virtual machine migration
Hot migration
In hot virtual machine migration, the VMs selected for reallocation are kept powered on and continue their execution started on the source physical machine. The execution is continued on the target host, after the migration is completed and without signaling or indentifying any change.
The hot virtual machine migration can be divided into two categories: the first one is suspend/resume migration, which suspends the virtual machine’s state on the source physical machine and then resumes its state after the migration is completed. The second is known as live migration.
3.5.1.1.1. Live virtual machine migration
One of the most important features of virtualization is the live virtual machine migration which enables consolidation for achieving power efficiency and dynamic load balancing. There are various types of live migration which include memory migration and storage migration.
Live migration enables the reallocation of a running virtual machine from one physical host to another, having little downtime, in order to achieve an improved load balancing workload, or to achieve consolidation by emptying physical hosts so that they can be turned off, hence obtaining power savings.
The authors in [22] state that the main purpose of live migration is to minimize service disruption, enabling the VM to continue responding to requests as it migrates, while keeping it all transparent to users.
Cold migration
Another approach for virtual machine migration is the cold migration where the virtual machine selected for reallocation has to be powered off on the source host. Also, a configuration file will be dispatched from the source physical machine to the destination physical machine. After the transfer of the virtual machine is performed, it can be powered on once again on the target host by using the configuration file.
By using cold virtual machine migration, a faster and convenient way of reallocating virtual machines is achieved, but, on the other hand, a major drawback occurs: high downtime.
Resource management
Due to the ever-increasing requests for cloud computing services,[20] offering customised services to a considerable number of consumers and at the same time meeting the user’s quality expectations, has become a great issue. Thus, [20] Service Level Agreement (SLA)-oriented resource allocation must be enhanced and redesigned in order to deliver the expected results to all the consumers.
By using Service Level Agreements (SLAs) and performance metrics, Cloud providers are given a feedback on how consumers expect their services to be delivered and can make further improvements where needed. Also, in order to detect SLA violations, Quality of Service (QoS) parameters have to be monitored and detected.
Quality of Service (QoS) parameters
The authors in [19] define several QoS metrics, namely response time, availability, reliability, cost and reputation. Each stakeholder has to have a clearly defined role as an actor. The consumers are the ones to submit demands to be handled by the Cloud, while the Cloud providers are the ones who provide the required services and bill the users for their consumption levels. Resource management has to be carefully controlled so that a overloading of resources does not occur. A useful manner through which the Service Level Agreements can be detected is by monitoring the Quality of Service parameters. If violations occur, the providers receive penalties.
Service Level Agreement (SLA) negotiation
When a consumer uses the services of multiple Cloud providers, the aid of a third-party, known as a broker, is required. As stated in [19] the broker acts as an intermediary between the users and the providers, so that the best Service Level Agreements (SLAs) are negotiated. [19] At the consumers demand, the broker can handle various tasks, such as searching for available Cloud services, validate if the vendor is trustworthy, act as a negotiator for the best price and various SLAs with different providers. In order to optimize their profit and not to overload resources, the providers monitor the resource utilization. However, if the consumer wants to keep a resource monitoring of itself, can delegate the broker with the task of monitoring the fulfillment of the SLAs.
Consolidation
A major step towards achieving energy efficiency in data centers is by making use of the consolidation technique.
The authors in [10] found that there is a tight connection between energy consumption and resource utilization. Also, after studying the inter-relationships between the two areas, they showed that optimal operating points exist for achieving energy efficiency.
Server underutilization is one of the major causes of energy waste. When servers become underutilized, which means that they are running under 20% utilization, they become idle. This idle power can reach to 70% of the power consumption of the server when used at full capacity. Because the idle power is not amortized effectively when the server is underutilized, the energy consumption per transaction reaches high values which can be avoided. On the other hand, when the server is over-utilized, performance degradation occurs. Hence, the best way of achieving energy savings is by keeping the server utilization in range.
At first, in order to achieve energy efficiency and avoid energy waste, servers running at low utilization levels should be noted. The remaining virtual machines running on these servers should be reallocated to other hosts that run within optimal parameters. After this, the underutilized servers can be turned off and unnecessary idle power consumption is eliminated. Also, in the case where there are servers running at high utilization levels, the workload without which the host reaches the most efficient power consumption should be migrated. It is to be noted that the reallocation to other servers should be done in such a way that the utilization levels remain in the optimal range.
Another manner of choosing the most suited host to place the virtual machines that need to be reallocated is by using the Euclidean distance. The heuristic is developed on the premise that the problem can be seen as a two-dimensional bin packing problem where the servers represent the bins and the dimensions are represented by the CPU and disk usage. The Euclidean distance depends on the distances to the optimal points for both of the dimensions. Moreover, in both cases, if the workload cannot be allocated, a new server is turned on.
Despite the fact that the optimal energy points may vary significantly when considering each resource separately, there has been found an overall optimal combination of CPU and disk utilization where the power consumption is minimum [10]. The optimal points where energy consumption is lowest are at 70% CPU utilization and 50% disk utilization.
Load balancing
As stated before, because energy efficiency is a key aspect of utter importance in data centers, various strategies have been designed and developed. One of these strategies is the load balancing technique through which energy efficiency, better data center performance at realistic costs, fault tolerance, redundancy and scalability can be achieved, by evenly distributing the workload across the servers in the most power efficient way (Figure 7). Also, it helps avoiding over-consolidation.
Figure 7. Load balancing system in cloud computing [17]
In literature [18], various policies are defined and taken into account when talking about load balancing. In [18], the authors have defined transfer, selection, location and information policies. The transfer policy is the selection of a job for transferring from a local node to a remote node. The selection policy is the one who chooses the processor involved in the load exchange. The location policy selects a destination node to be used and the information policy is responsible with information collection.
The load balancing technique is mainly performed by scheduling algorithms.
Workload consolidation algorithms
Scheduling
Scheduling technique is a flexible and convenient mechanism through which tasks are efficiently distributed and executed. The scheduling process can be divided into three different stages. The first stage is related to the resource discovering and filtering, where the broker tries to discover existent resources, while collecting their status information. After the data center broker discovers the existent resources, it decides which resource should be chosen for performing the given task. The third stage is the task submission stage where the task is sent to the selected resource.
Round Robin (RR)
The Round Robin scheduling algorithm’s strategy is to maintain a queue containing the incoming requests. Each request has the same execution time and will be executed in turn. If a request cannot be executed, it will be stored back in the queue and it will keep waiting for the next turn.
One of the major advantages of Round Robin is that a request can be executed without waiting for its predecessor to get completed. In some cases, when the requests have a large load, the execution time may take a long time to complete. The request with the largest load will be the last to be executed.
First Come First Serve (FCFS)
Despite the fact the Round Robin and FCFS algorithms bear some similarities, such as using a queue for maintaining the requests, the differences between them are clear. The requests in the First Come First Serve scheduling algorithm do not have the same execution time, as in Round Robin. The algorithm selects a resource for the incoming task placed at the top of the queue. The major drawback that FCFS has, is the fact that even if there are tasks with a relatively small execution time, if they are placed at the back of the queue, they have to wait for the tasks in front of it to finish. This may lead to a low response time.
Generalized Priority Algorithm
In the Generalized Priority scheduling algorithm, the workload is prioritized in order to meet the user’s needs. Parameters such as storage size, memory, bandwidth and scheduling policy have to be clearly defined. For example, a ranking can be done for the existing resources, namely the virtual machines, which are prioritized by their MIPS (million instructions per second) value.
Bin Packing heuristics
The problem of virtual machine placement and dynamic consolidation in data centers leading to energy savings while meeting the performance requirements has been investigated by various researchers. Their results [23], show that the multidimensional bin packing problem gives similar results or even outperforms the other evaluated algorithms.
Srikantaiah et al. [24] have studied the effects of the low utilization and the over-utilization of resources regarding performance degradation and energy efficiency. Because the idle power is not amortized effectively, when a resource has a low utilization level the energy consumption per transaction reaches high values. However, a high resource utilization level also leads to performance degradation and higher execution times. The approach that the authors used is to use the model of the multidimensional bin packing problem where servers are seen as bins and each resource (represented by CPU, disk, memory and network) is envisioned as a dimension of the bin. The server’s optimal utilization level gives the size of the bin. By using their proposed approach, a minimization in the number of active bins further leading to a decrease in energy consumption is achieved, due to the idle servers that are switched off. However, when there are no available resources left for the incoming virtual machines, a new server is turned on.
Power and Data Aware Best Fit Decreasing (PADBFD)
The authors in [25] have proposed a modification of a bin packing heuristic, namely, the Best Fit Decreasing (BFD) algorithm. Two more constraints, power and data have been added to the BFD algorithm in order to optimize the energy consumption and resource utilization. Their proposed algorithm is sorting decreasingly the pool of virtual machines by taking into account their CPU utilization. Each virtual machine is allocated in turn to a host, accordingly to the most power efficient physical machine.
Sercon Algorithm
The authors in [26] have proposed a server consolidation algorithm that is aimed in only using the minimum number of physical machines in order to reduce the number of migrations. Sercon also inherits some of the properties of the bin-packing heuristics: First-Fit and Best-Fit. The Sercon algorithm in not only targeted in reducing the number of utilized servers, but is also aimed in decreasing the number of migrations.
First of all, the physical machines are sorted acoordingly to their load. The virtual machines placed on the least loaded host are selected for reallocation and are also sorted decreasingly according to their weights. The algorithm tries to reallocate the VMs one by one, on the most loaded host. If the most loaded physical machine does not have enough resources to host the VM, then the reallocation is attempted on the second node and so on.
The Sercon server consolidation algorithm makes use of various constraints, such as choosing an optimal CPU utilization level, and a metric for measuring the migration efficiency (3.1).
(3.1)
Energy models
In literature, various strategies and designs of power management are addressed in order to significantly increase power savings. Due to virtualization, power measurements cannot only rely on hardware. This is why, the authors in [28] have proposed a solution for virtual machine power metering which deduces the power consumption from resource usage at runtime (3.2).
(3.2)
In (3.2), and are model specific coefficients, and represent the normalized values of the number of llc misses and the sum of bytes read or written.
In [29] the authors have proposed a metric for measuring the cooling efficiency:
(3.3)
Table 1. Cooling efficiency results and condition (adopted from [29])
In [6], the authors have described a metric named Energy Reuse Effectiveness (ERE) (3.4), for measuring the benefit of reuse energy from a data center by computing the ratio between the total energy consumed by the data center (minus the reuse energy) and the IT equipment energy.
(3.4)
In [30], a method for estimating the power consumption at any specific processor utilization (n%) has been proposed (3.5).
(3.5)
Cooling techniques
Today, cooling is one of the most discussed topics in data center research circles. Due to the high popularity of data centers, and the high cooling costs, more and more efficient cooling solutions are needed. The energy cost of cooling systems and computing devices, harware cost and infrastructure cost, sum up to be the main cost of operating a datacenter [14]. The thermal distribution in a data center is mainly affected by the energy costs represented by the overrall power consumption and the cooling capacity of the data center (Table 3). Also, the authors in [23] state that the cooling system in a data center requires an additional 0.5-1W for each watt consumed by the computing resources. An improper design of a data center may lead to various issues such as system failures due to the overheated servers or extra utility costs due to overcooled systems.
Table 3. Percent of Power Consumption of Data Center Devices [1]
The airflow distribution, temperature monitoring and management in a data center are performed by a computer room air conditioner, tipically reffered to as CRAC. These CRAC units are the new and improved versions of the air-conditioning units that were used to cool down data centers. The hardware equipment used in most of the state-of-the-art data centers, take their cold air intake from the front side of the racks, while the hot air is exhausted by the rear of the equipment.
In some cases, a part of the exhaused air from the outlets will be sucked into the inlet of other racks. This mixture of air leads to an increase in the energy costs of the cooling system. In order to avoid this extra energy costs, the server inlet temprature has to be kept within a specific range. Hence, a major step in reducing the cooling costs in a data center is by reducing and obtaining a deep understanding of the hot air recirculation.
Furthermore, for decreasing the energy consumption, the design and development of various cooling systems having an effective air management is mandatory.
Various cooling techniques have been developed in order to achieve an increased cooling efficiency and a decreased Power Usage Effectiveness in data centers. Some of these techniques imply the usage of CRAC units, liquid cooling which makes use of water and pipes in order to diminish the heat produced by the IT equipment, or free air cooling. Moreover, when designed properly, air management becomes a key factor in reducing operating costs and even eliminating the mixture between the hot exhaust air and cold air intake. When not dealt with properly, these heat related issues, may lead to an increase in power consumption, processing interruptions or failures.
Conclusions
The main focus of this chapter was to provide a more comprehensive understanding of the Cloud and cloud computing notions, while performing an examination of the recent research works that deal with the problem of energy efficiency, which has become the main goal for modern data centers. In order to obtain energy-efficiency, an intelligent management of the computing resources has to be designed, while still meeting the performance requirements.
In the pursuit for further advancements, building on the strong foundation of prior research works is a crucial necessity. Nevertheless, there are still complex and challenging issues to overcome.
Analysis and Theoretical Foundation
This chapter’s main focus is to provide a description of the project’s overall structure and design, to emphasize the used technologies and to present an overview regarding the motivation of the chosen solutions.
Modeling
In order to create our data center in a realistic manner, we have used racks with servers and virtual machines. Each server hosts various virtual machines according to their available resources. Also, the nodes are encompassed by the racks according to the available resources.
Furthermore, a workload generator has been designed and has the role of creating requests. Each request is represented by a virtual machine instance and the command it needs execute.
Structure
In order to cope with uncertain and continuous changing environments, together with the ever-increasing complexity of software systems, researchers have turned to self- adaptivity. Self-adaptive software systems are capable of dealing with these environment variations, to optimize their performance even during changing operating conditions and facilitate future maintenance. A typical solution for achieving self-adaptive systems is by using a control loop. One of the most efficient ways of organizing a control loop in a self- adaptive system is by means of four components, primarily known as Monitor, Analyze, Plan and Execute. Together, these components form the MAPE control loop.
The Monitoring Phase
The monitoring phase’s main goal is to collect the system’s raw context data. The collection of the system’s internal state, represented by the context data, will further lead to a programmatic representation of the system’s context state. The system’s self-awareness will be achieved after the monitoring phase’s execution.
The context data is gathered by connecting a workload generator to the monitoring module’s queue. The workload generator is the one who creates requests represented by the virtual machine instances, number of instances and the command each of them should execute, such as CREATE, DEPLOY, SHUT_DOWN, DELETE. After the workload generator has created the requests, they are added to a queue which verifies their correctness, pops each request and turns each instance into a specific programmatic representation.
The monitoring phase’s main objective is to obtain system context awareness. After each instance is processed, the analysis phase in started.
The Analysis Phase
The analysis phase can be characterized by one attribute, namely detection. When discussing about the analysis phase, the system policies have to be kept in mind. These system policies can be seen as a rule guideline that the system follows when taking it’s next steps. When reaching the analysis phase, the system policies are evaluated in order to detect which rules are broken and which is the context situation in which the system failed to obey them.
For our system, we have designed and implemented three different policies: Virtual Machine Policy, Server Policy and Rack Policy. When the system detects that at least one policy has been broken, which means that it is not in it’s ideal state, the planning phase is started. If no broken policy is detected, the system can be regarded as being energy-aware and having an ideal workload distribution.
The Planning Phase
The planning phase is started by the analysis phase, when the system is decided not to be in it’s ideal state, which means that at least one policy is broken.
This phase’s main function is to prepare the system in dealing with and enforcing the broken policies. Furthermore, a decision for which are the actions that need to be taken in order to cope with the detected issues, is proposed. The first step that this phase is taking, is to identify if at any point in time, the system encountered a similar context situation in which the same workload was used and the hosts had the same state as the current one. If a smilar context situation is found, the same solution is proposed as the found one. Otherwise, a new action plan using the proposed approach for generating the most power-efficient sequence of actions is developed by the scheduling algorithm, after which the execution phase is started.
The Execution Phase
The execution phase can be characterized by one attribute, namely action. This phase receives the prepared solution for generating the most power-efficient from the previous phase, the planning phase and executes it.
Energy models
When working with data centers, energy efficiency represents one of the most important criterions. This is why an efficient power and cooling model for our data center have to be chosen.
Power Consumption Model
The power consumption of our data center is modelled by taking into account server components such as server utilization, server MIPS (million instructions per second) and maximum power consumed at a full load. The authors in [27] have conducted studies that show that an idle server consumes even up to 70% of the power consumption of a full utilized node. The power model that we have used is defined in what follows:
(4.3.1)
Cooling Model
As determined by the authors in [39], cooling required for a data center (DC), consumes about 30% of the total operational cost of the DC. This is why reducing the energy consumed by the CRAC (Computer Room Air Conditioning) system, is of major concern when reducing electricity bills is needed. There are many approaches used by the data center operators in order to decrease the cooling energy. One of these solutions and the one we have approached is:
(4.3.2.1) [39]
where HeatRemoved is the heat removed by the system and COP is the coefficient of performance, a variable that depends on the CRAC system’s supplied temperature.
The heat removed in (4.3.2.1) can be computed by:
(4.3.2.2)
Where m is the air mass flow rate, is the specific heat, which is a constant value and is the difference between the outlet and the inlet temperature of the CRAC system.
The cooling energy for each time period can be computed as follows:
(4.3.2.3)
Furthermore, as it will be explained in the upcoming sections, another useful way for decreasing the coooling capacity is by reducing the volumetric air flow rate.
Coefficient of performance
The Coeffiecient of Performance, also known as COP, is a metric used to measure a CRAC system’s cooling efficiency, according to the supplied air temperature. COP is a dimensionless value that can be defined as the ratio between the unit’s heat removed (measured in watts) and the corresponding cooling power by the system (measured in watts). In order to avoid overheating the servers, the inlet air temperature supplied to a node, has to be bounded by some thresholds.
The coefficient of performance can be calculated according to the CRAC unit’s supplied temperature:
(4.3)
Cooling Systems
Due to the fact that the cooling energy represents a major concern factor in energy consumption, an efficient rack placement is mandatory.
If the racks are randomly placed, several issues may occur. For example, one of the poorest IT equipment placements [11] which imposes major efficiency problems, is the parallel orientation where all the racks are facing the same direction. In this orientation, only the first row of racks is receiving a cold air inlet from the CRAC (Computer Room Air Conditioner), while the others are receiving the hot air exhaust from the rack in front. [12] The 2011 version of ASHRAE (American Society of Heating, Refrigerating and Air-Conditioner Engineers) Standard TC9.9 recommends a data center temperature in the range 18-27°C. In this case, the optimal temperatures can be well exceeded due to the inlet temperature that increases progresively.
Figure 8. Poor server row orientation which may lead to hot spots [11]
The overcome this arisen problem, data center owners should consider rearranging the IT equipment in such a way that an alternating hot aisle/cold aisle layout is created. In this arrangement, the units are grouped two by two while their fronts are facing each other.
Hot aisle/ Cold aisle
The hot aisle/cold aisle layout depicted in Figure 9, is considered a cooling best practice design in data centers and, if properly arranged, could reduce energy losses and cooling costs. Furthermore, an enhanced server life expectancy occurs [12] due to the improved airflow management.
Figure 9. Hot aisle/cold aisle approach [12]
However, despite the fact that this particular layout is a cooling best practice, a mix between the hot air and the cold air might occur (as depicted in Figure 10), leading to a degradation of the cooling efficiency and unacceptably high temperatures for the servers placed at the top of racks and at the back of the aisles. In order to overcome these variations, an increased airflow in the cold aisles may help, but this would lead to an increase in the fan’s speed and respectively in the fan power consumption. The authors in [13] have stated that aisle containment could improve cooling performance, by saving up to 20% in chiller operating costs.
Figure 10. Air recirculation in an air-cooled data [40]
Hot Aisle Containment (HACS)
While both hot and cold aisle containments have been proved to improve the energy efficiency, the HACS (Hot Aisle Containment) system, when compared to the CACS system, has given a 43% improvement in anual cooling system energy cost, which corresponds to a 15% reduction in anul power usage effectiveness [12].
As depicted in Figure 11, the hot aisle containment system is arranged in the following manner: the units are grouped two by two with their backs facing each other, creating hot and cold aisles. The created hot aisle is contained so that there doesn’t occur a mix between the cold air supplied by the CRAC unit and the air exhausted by the racks.
The cold-aisle containment system may be more feasible in the case when there is no accessible dropped ceiling plenum or a low headroom and it is difficult or extremely expensive to use the hot aisle containment solution.
Due to containment, the inlet air is able to reach the front of the IT equipment without any mixing with the air exhausted by the racks, this means that a uniform IT inlet air temperature is provided. In the case of uniform IT inlet air temperature, the datacenter operators can increase the CRAC supplied air temperature, without having to fear hotspots and still achieve energy efficiency.
Thermodynamics
As described by the literature, the thermodynamics is the branch of physics which deals with the study of energy. More thorougly, it presents the relationships between heat and different forms of energy and how thermal energy affects matter.
Thermal energy is described as the energy a substance or a system has by taking into account its temperature (the energy of moving or vibrating molecules). When described in the context of engineering applications, thermodynamics focuses on two major objetives. The first objective, is focused on describing the properties of matter when existing in an equilibrium state. This so called equilibrium state is a state in which the properties of matter do not exhibit changing tendecies. Oposed to the first objective, the second one is focused on describing the properties of matter which do exhibit changing tendencies and to relate these changes to energy transfers. The principles of thermodynamics apply to almost every known device and its importance in modern technology cannot be ignored.
In simple terms, the energy is the ability of an object to do work.
Heat
Heat is one of the properties on which thermodynamics is focused on dealing with. Heat can be described as the energy transferred between substances or systems accordingly to the temperature difference between them. Being a form of energy, heat cannot be created or destroyed, only transferred or converted to other forms of energy.
For example, the electrical energy of a light bulb can be converted to light, known as electromagnetic radiation. This radiation, when absorbed by a surface, is converted back to heat.
Temperature
According to American Heritage Dictionary, the temperature is „a measure of the average kinetic energy of the particles in a sample of matter, expressed in terms of units or degrees designated on a standard scale”. Speed and the number of atoms or molecules in motions are the factors which influence the amout of heat transferred by a substance.
The atoms in motion are the ones who influence temperature and the quantity of heat transferred. The temperature varies accordingly to their speed, while the quantity of heat they transfer varies accordingly to their number.
Specific heat
According to Wolfram Research, the specific heat capacity is described as the amount of heat required to increase the temperature of a certain mass of a substance.
Thermal Conductivity
As presented by the Oxford Dictionary, the thermal conductivity is „the rate at which heat passes through a specified material, expressed as the amount of heat that flows per unit time through a unit are with a temperature gradient of one degree per unit distance”. The measurement unit for thermal conductivity is watts (W) per meter (m) per Kelvin (K).
Heat Transfer
There are three different ways through which heat can be transferred. These methods are known as conduction, convection and radiation.
Conduction is defined as the transfer of energy which occurs through a solid material. In order for conduction to occur, the bodies have to be in direct contact, so that the molecules can transfer their energy across the interface.
The heat transfer to or from a fluid medium is known as convection. When in contact with a solid body, the gas or liquid molecules transmit or absorb the heat and then keep going. By doing this, other molecules are allowed to move into their place and repeat the process. By increasing the surface area to be heated or cooled, efficiency can be improved.
The third approach through which heat can be transferred is by radiation. Radiation is defined as the emission of infrared photons (electromagnetic energy) that carry heat energy. The amount of heat emitted or absorbed by the matter is the one that determines the heat loss or gain.
Laws of thermodynamics
Initially, there were three laws which described the fundamental principles of thermodynamics. Later on, it was described a fourth one, now known as the Zeroth Law.
Zeroth Law
The Zeroth Law defines temperature as a fundamental and measurable property of the matter. The law says that if two bodies are known to be in thermal equilibrium with a third body, then the bodes are in equilibrium with each other.
First Law of Thermodynamics
The First Law of thermodynamics describes the law of conservation of energy and mass and states that because heat is a form of energy, can be the subject of conservation. Moreover, this law states that the total increase in a system’s energy is equal to the increase in thermal energy on which is added the work done on the system.
The Second Law of Thermodynamics
The Second Law of thermodynamics states that the heat energy can only be transferred from a body at a lower temperature to a body at higher temperature, if the addition of energy occurs. This is why, using air conditioners is expensive.
The Third Law of Thermodynamics
The third law of thermodynamics refers to the ability of creating an „absolute zero”, by stating that the temperature of any pure subtance that is in a state of thermodynamic equilibrium approaches zero when the entropy approaches zero, or in other words, it states that the entropy of a pure crystal at absolute zero is zero.
In order to determine the entropy, the temperature of absolute zero (Kelvin) has to be taken as reference.
Entropy
When talking about the third law of thermodynamics a new term, called entropy, is introduced. The entropy is by definition the waste energy, energy that is unable to do work, or a measure of disorder in the system.
Conservation of energy
As stated by NASA, the conservation of energy is a fundamental concept of physics and is tightly coupled to the conservation of mass and the conservation of momentum. As stated before, the energy in a closed system cannot be created nor destroyed, it can only be converted from one form to another, while the amount of energy in the system remains constant the whole time. Hence, it can be stated that „the change in energy of an object due to a transformation is equal to the work done on the object or by the object for that transformation”.
The first law of thermodynamics is the law of conservation of energy.
Law of conservation of energy
The law of conservation of energy does not have a theoretical proof. It is only based on experimental results and was discovered due to the extensive efforts of reaserchers from various science branches over a long period of time, so it can be stated that this law was rather realized, not discovered. Albert Einstein was the one who determined the equivalence between the mass and energy and deeply modelled the current form of the law of conservation of energy. Another statement proposed by him is that the universe itself can be seen as a closed system.
The conservation of energy equation:
(4.5.1)
Where:
P = electrical power of the fan (expressed in [W])
t = time
= specific heat
T = temperature rise
As aforementioned, if a certain amount of energy disappears, then exactly the same amount of energy in another form must be produced. For example [35], in the case electric fan, the consumed electrical energy is converted into mechanical energy. Hence, the electrical energy consumed can be recovered in terms of heat energy:
(4.5.2)
Fan laws
Before increasing the fan speed, the implications that will occur must be considered, because fans are air movement systems that have structural elements affected by the speed of the fan. When working with real systems, in order not to face catastrophic problems, the fan manufacturer should be contacted to determine the maximum safe fan speed that can be used. As depicted in Table 4, there are proportional variations between fan speed and other fan elements, such as air flow represented by CFM (cubic feet per minute), pressure (P) or power (HP).
Table 4. Basic Fan Laws (adopted form 37)
Air Flow
The pressure differential between two points is the one who determines the flow of air. Flow will pass from the area of high energy to the area of lower energy, so a fan’s airlow can be defined as the amount of air that can be displaced per unit of time.
Figure 11. Air flow
Tipically, the measuring unit for the volumetric air flow is CFM (cubic feet per minute). Also, accordingly to the fan laws, the required air flow can be estimated correspondingly to the amount of heat dissipated.
Air Density
In air movement systems [33], the fan wheel is the one who „does the work”. As the wheel revolves, with each revolution, it discharges the same amount of volume of air. In the case of a fixed system, the same volume of air regardless of air density will be discharged (Figure 8).
Figure 12. A fan wheel is a constant volume device (adopted from 33)
If the number of revolutions per minute (RPM) is increased, the fan will discharge a greater volume of air in exact proportion to the change of speed. This is how the first fan law is defined. Thus, for a fixed air channel, the volumetric air flow rate and the velocity of air are proportional with the speed of the fan [36].
(4.5.3)
(4.5.4)
Air Mass Flow Rate
The law of conservation of mass, states that matter can experience changes from one form to another, but it cannot be created or destroyed. That is, for any enclosed system the mass must remain constant over time. Thus, it follows that the amount of air entering the duct is equal to the amount of air leaving the duct. The air mass flow rate is in tight correlation with the temperature of the air passing through the fan. For example, if the temperature of the air decreases, the density of the air decreases, hence, the air mass flow rate decreases.
Air Density
A well known and important property of any gas is pressure. The air pressure is defined as the force exerted by the weight of air. The air pressure can be divided into dynamic pressure and static pressure.
The dynamic pressure is defined as the property of a moving flow of gas exerted in the same direction as the airflow.
The static pressure is defined as the force exerted by a gas on duct walls.
First Fit Decreasing (FFD)
Over the last years, various heuristics for addressing the problems of virtual machine consolidation and reallocation in a data center, which is a NP-hard problem with a large solution space, have been the subject of various studies and research. Different algorithm approaches such as greedy heuristics, metaheuristic strategies or search methods, have been proposed to engineer these issues.
An NP-hard problem can be defined as a decision problem that cannot be solved by a nondeterministic Turing machine in polynomial time. The Turing machine is a device that is believed it can be adapted to simulate the logic of any computer algorithm.
Among the widely used greedy heuristics, we can count First Fit Decreasing (FFD), First Fit (FF), Best Fit(BF), Best Fit Decreasing (BFD), Next Fit (NF) and many others. The First Fit Decreasing and Best Fit Decreasing algorithms are known to produce very fast results.
First Fit Decreasing, also known as FFD, is a modification of the greedy algorithm and approaches the offline bin-packing problem. In our particular case, the bins can be seen as the hosts, the upcoming items to be packed, are the virtual machines and bin costs can be seen as the server power consumption. In order to solve the bin packing problem, the FFD algorithm can suffer alterations so that it can be efficiently mapped to the current system. This means that in the case of heterogenous nodes, different sized bins can be used, while in the case of homogenous servers, the bins have the same size.
The problem’s solution space is defined as very large, due to the fact that it experiences an exponential increase or decrease with the number of virtual machines and servers used. As an example, if the number of servers used is and the number is virtual machines is equal to , then the solution space is (considering that no constraints are taken into account). Despite the fact that this heuristic is very fast, it can only lead to optimal results if the servers in the data center are homogenous.
The First Fit Decreasing algorithm is searching a solution for finding a local optimum to place the virtual machine on. However, finding the local optimum does not always lead to finding the global optimum. If there exists a forecast of future usage, the IaaS provider can start sorting the task workload in descending order and then allocate the virtual machines to the first host having available resources.
The First Fit Descreasing algorithm works as follows: first, it sorts descendingly the customer requests represented by the virtual machines forecasted for deployment, after which it starts processing the list in order and it looks for the first server having enough resources and is fitted to serve as a host for the virtual machine currently processed. The same steps are reproduced for every virtual machine, until all the workload is processed.
Technologies
One of the technologies we have used in the development of our project is the Hibernate framework. Hibernate has provided us with the means of using its data query and retrieval facilities and has relieved us from the common data persistence related tasks, aiding with time and effort saving facilities.
Hibernate is a Java object-relational mapping (ORM) and persistence framework. ORM or object-relational mapping can be defined as a programming method that allows an efficient mapping of objects to the relational database tables. The entities (classes) are mapped to tables; the entity instances are mapped to the table rows and the attributes of the instances are mapped to the column rows. So, the Hibernate framework provides the means for creating a virtual object database from within the application.
Persisting data is not a trivial task and it usually represents one of the basic application requirements. Persistence represents the process through which data from the programming environment can be stored to the database and retrieved at any point in time. Databases are the most widely used mediums of storing data due to the simplicity in accessing and manipulating data using SQL (Structured Query Language). By making use of the Hibernate framework, we are able to easily persist data form the Java environment to the database.
The Object-Relational Mapping acts as a bridge between the application and the relational database, by allowing communication between them. The application relies on the ORM for providing the query and persistence services
Another tool that we have used is the MySQL Workbench which is a unified visual tool for data modelling, management and representation. The MySQL workbench is known to simplify database design and maintenance, to enable model-driven database design and we were provided with the means of easily visualizing and managing data, together with best practice standards for data modelling. The model-driven database design is a useful approach for designing and creating well-performing databases, while offering the possibility to provide responses to evolving business requirements.
For visualizing power consumption and the corresponding cooling power consumption increase in real-time, we have used LiveGraph framework. The LiveGraph framework is a plotter for real-time data visualization which dynamically updates the graph with every virtual machine that is allocated to hosts. As a virtual machine is allocated to a node, the overall power consumption increases and the graph is updated. Also, a graph for showing the corresponding increase in cooling power consumption for the cooling system according to the systems used and the temperature losses is plotted.
For an easier project management we have used GitHub. GitHub is a web-based repository hosting service which enables version control. By using this repository, we were able to easily collaborate, advance in the development of the project and perform changes without overwriting any part of the project.
Detailed Design and Implementation
The chapter’s main goal is to provide a detailed view about the implementation and the architecture of the application. Each component within the system will be described in a thoroughly manner, so that the developed system could be easily maintained and developed further on.
System architecture
In what follows, for providing a basis for future reasoning about the structure and the behavior of the system, the high level architecture of the application will be presented.
The system’s high level architecture has been developed using a MAPE structure and is represented by four main components: the Monitoring Phase, the Analysis Phase, the Planning Phase and the Execution Phase.
Figure 13. System’s top level architecture represented with a MAPE structure
Application workflow
For a better understanding of the MAPE structure, the application’s high level workflow will be presented. At first, the monitoring phase is waiting for incoming requests to process. After requests have been sent by the workflow generator, the monitoring phase can add the context data to a queue which will process the data. After the completion of the first module, the analysis phase is started. At first, it validates if the current requests have already been processed by an earlier experiment by searching for similar patterns in a history database. If such an experiment exists, the execution phase is started and the same virtual machine allocation as the one found in the history database is performed. Otherwise, the analysis phase enforces virtual machine, server and rack policies in order to achieve an optimal system. If these policies are broken, the scheduler from the planning phase is started in order to obtain a power efficient system. After the completion of this phase, the execution module is started and the evaluation of the system with different data center layouts is performed.
The high level workflow of the application is presented in Figure 14.
Figure 14. Application’s high level workflow
Models
The organization of the models into groups is presented in the following diagram and further on explained.
Figure 15. Package diagram representing the data center model
Further on, the main packages will be described for a more thorough understanding:
Monitoring package
The monitoring package is composed from Context Data, Monitoring and Queue classes. The Monitoring class waits for incoming raw context data provided by the workflow generator and adds it to a queue. The Queue class is the one which processes the data by making use of threads.
Access package
The access package establishes the connection to the database.
Constants
The constants package keeps the list of permitted values explicitly enumerated for the policy types, states of the racks, servers and virtual machines.
Model
The model package encompasses the database entities represented by rack, server, virtual machine, CPU, RAM and HDD.
Energy
In the energy package, the migration efficiency, together with the cooling power consumption and power consumption estimates models are defined.
Analysis
The analysis package encompasses the analysis model, and the searching for similar experiment patterns model. In the analysis class, policies are enforced and if broken, the planning phase is started. The search for similar experiment patterns validates if an experiment has already been performed and starts either the planning phase or the execution phase, accordingly to its result.
Policies
In the policies package, the server, rack and virtual machine policies are modelled.
Scheduling
The scheduling package is comprised from several classes, namely RBRP, which is the Rack by Rack virtual machine placement model, the PABF class, which is the power aware best fit model, together with the rack, server and virtual machine processor models.
Execution
The execution package saves the virtual machine placement in two databases. The first database saves only the current virtual machine allocations into servers, together with the utilization, power consumption estimates and cooling power consumption estimates for the entire system.
Cooling Systems
In the cooling systems package, the hot aisle containment and the parallel placement datacenter layouts are modelled.
GUI
In this package, the user-friendly interface has been modelled.
Use case
The data center operator is the one who handles the workload and simulation management. After the data center operator receives customer requests, it populates the workload manager with the received tasks and starts the simulation.
After the simulation is completed, the power consumption estimates, cooling power consumption estimates, a power efficient virtual machine placement result, and a percentage decrease in fan power consumption after rearranging the hardware equipment in the most efficient manner.
All the variations in power consumption, cooling power consumption, number of virtual machines running and cooling system’s fan power consumption can be monitored carefully by the data center administrator with the help of the graphical user interface. After each virtual machine that is deployed or deleted, the estimates are recalculated in real time and the new results are displayed. Also, the data center operator can easily monitor all the changes by looking at the logger which displays all the operations and changes that occur.
The following use case has as its actor the data center operator, together with all its attributions and expecting results.
Figure 16. Use case diagram
Monitoring phase implementation
The monitoring phase encompasses three main components: the monitoring class, the context data class and the queue class.
At first, the monitoring class waits for incoming requests produced by the workload generator. After requests are generated, the monitoring class collects the system’s raw context data and adds it to a queue. The queue class pops every added request at a time, verifies their correctness and makes use of threads for starting the processing module. After each processed message, the current database state is updated accordingly and the analysis module is started.
The monitoring class waits for incoming requests, collects them from the context data class and adds the data to the queue. In the queue class, the processing of data occurs. First, each request is popped out in the form of instance type, instance hardware template, number of instances and the command that will be executed by the instance. Each request is broken into pieces and each single request is handled one by one. After all the data processing is done, the analysis phase is started.
The system can be monitored continuously after the building of the monitoring platform was completed. Every state change can be captured and dealt with in an efficient manner.
Figure 17. Monitoring phase class diagram.
Rack by Rack Placement Algorithm (RBRP)
The RBRP algorithm is based on an existing method which sorts the virtual machines and racks decreasingly, based on MIPS and, respectively on the utilization. The RBRP algorithm also takes all the virtual machines that need to be deployed and sorts them decreasingly. The racks are also sorted decreasingly according to their utilization. The algorithm starts by iterating through the list of decreasingly sorted virtual machines. At first, all the ON servers are collected and then it is checked if, from the list of ON servers, at least one has enough resources for hosting the virtual machine. If at least one server from the list is found to have available resources, the servers are sorted in decreasing order of their utilization. In this way, it is ensured that the minimum number of nodes is used. After the sorting is complete, the most power-efficient node is chosen to host the VM. Furthermore, when assigning the virtual machines to servers, a minimum and a maximum utilization threshold are taken into account. In order for a server not to be considered underutilized or over-utilized, it has to keep its capacity utilization between 20% and 80% respectively.
If there are no ON servers, or the servers that are ON do not have enough resources to host the virtual machine, it means that the only available servers for placement are the nodes that are in an OFF state. Therefore, the first rack from the decreasingly sorted rack list, having off servers is selected. The first OFF server from the rack is turned ON and the virtual machine is allocated to that host.
Figure 18. Scheduler class diagram
Improvements
Firstly take the ON servers
Instead of iterating every time through each rack, in order to search through the rack’s corresponding server list for a suited node to host the virtual machine, all the ON servers are collected at once. It is validated if any of these servers can host the VM and still remaing between the accepted thresholds.
Sorting decreasingly the servers
Moreover, the servers are sorted decreasingly according to their utilization, in order to find the fastest and the most efficient solution for placing the virtual machine. In this way, it is ensured that always the smallest number of nodes are used and the smallest percent of utilization remains free, without exceeding the server utilization bounds.
Turn ON only one server at a time
Once again, a faster solution for finding the most efficient solution for the virtual machine placement is found. In the case where the server to host the VM is not found through the servers that are already ON, it means that there are no ON servers, or that the ON servers do not have enough resources. This is why, the first rack having available space, from the decreasingly sorted rack list is selected and the first OFF server is turned ON and selected as a host fot the VM.
Setting utilization bounds
By setting utilization bounds, it is ensured that there is no unnecessary power consumption.
Figure 19. Rack by Rack placement algorithm pseudocode
In the case where there are ON servers on which the virtual machine can be hosted on, the server which will have the lowest power consumption after the allocation of the VM will be chosen as its host. In order to decide which is the most power efficient placement for the virtual machine, the PABFD algorithm is used.
PABFD – Power Aware Best Fit Decreasing
The Power Aware Best Fit Decreasing is an algorithm that is adapted after an existing algorithm in the literature for reaching the most power efficient solution.
When the RBRP, Rack by Rack Placement algorithm finds the list of ON servers and sorts them decreasingly accordingly to their utilization, it tries to find the server which after hosting the virtual machine will have lowest power consumption. As input, the PABFD receives the list of sorted ON servers in order of their utilization and the virtual machine to be allocated.
As an output, the PABFD, Power Aware Best Fit Decreasing returns the best suited server in term of power consumption be the host for the virtual machine.
History Database
For our model, we have elaborated two databases. The first database model, which has been presented in the previous sections, saves the current experiment, namely the states of the virtual machines, severs and racks, including the VM allocation, estimated power consumption and cooling power consumption.
The second database is the one which saves all the experiments ever performed. The history database’s components are:
The type of the algorithm that has been used for scheduling (RBRP, NURF, FFD).
The used experiment which includes the virtual machine instances, their commands and their hardware template type (number of cores, number of RAMs and the number of disks)
The virtual machine allocation.
The states of all servers at the end of the experiments, including their hardware template type, utilization, power consumption estimate and cooling power consumption estimate.
The states of all racks at the end of each experiment, also including their hardware template type, utilization, power consumption estimate and the cooling power consumption estimate.
By using a history database, the use of a search for similar experiment patterns is performed.
Similar experiment patterns search
When requests are sent to the analysis phase, a search for similar experiment patterns in the already performed experiments is started by following the next steps:
Validate if in history already exists an experiment containing the same requests
If a similar experiment has already been performed, the current server states are compared to the one existing in history.
If the servers from the history experiment have the exact same utilization as the current server state, it means that the started experiment has already been performed and there is no need to start the scheduler, so the planning phase is skipped and the execution phase where the virtual machine placement is performed accordingly to the experiment from history, is started.
In this way, if a similar experiment pattern has been found for the current experiment that needs to be performed, a decrease in the execution time is obtained.
Figure 20. Search for similar experiment patterns flowchart
Cooling system simulation framework
For the cooling system simulation framework, two different data center layouts (Figure 21 and Figure 22) have been approached in order to evaluate and demonstrate that an efficient cooling system strategy is imperative when a reduction in cooling power consumption is desired.
In order to achieve these reductions in cooling power consumption, the fan’s power consumption used for cooling the IT equipment must be considered.
From [41] a cubic relationship between the fan power consumption and the fan speed is obtained:
(5.10.1)
From the conservation of energy taken in the context of and electric fan, we get:
(5.10.2)
(5.10.3)
Figure 21. Same orientation with air Figure 22. HACS – Hot Aisle
recirculation Containment
Volumetric air flow rate
Volumetric air flow rate can be expressed as the ratio between the air mass flow rate and air density
(5.10.4)
Table. International System of Units
Air velocity
Air velocity can be expressed as the ratios between the volumetric air flow rate and the cross-sectional of the duct through which the air passes.
(5.10.5)
Power Consumption
The air having a mass flow rate, m, flows through the racks at temperature when entering the racks and temperature when exiting the racks due to the heat removed.
The relationship between a server’s power consumption and its inlet/outlet temperature can be presented as follows:
(5.10.6)
(5.10.7)
Table. International System of Units
Supplied Electrical Energy
As stated before, the law of conservation of energy says that if a certain amount of energy disappears, then the exact same amount of energy in another form must be produced. This means that in the case of the electric fan, the consumed electrical energy can be recovered in terms of heat energy.
The energy consumption can be defined in terms of power consumption over a certain period of time.
(5.10.8)
(5.10.9)
Table. International System of Units
Cooling Power Model
From the previous equation it can be deduced the following:
(5.10.10)
Table. International System of Units
For an easy access to the servers, a parallel rack placement should be used. However, not every rack parallel placement (Figure 10) leads to cooling energy savings. The equations from (5.10.1) up to (5.10.9) are used in order to compute the fan power consumption in Figure 11.
In the case of rack placement having the same orientation (Figure 10), a small fraction of the air is recirculated from the rack exhaust air and mixes with the temperature supplied by the CRAC units, . For simulation purposes, it is assumed that the recirculated air mass flow rate, , and the cool inlet air flow rate, , fully mix.
Recirculated air mass flow rate and corresponding temperatures
By integrating the recirculated air mass flow rate into equation (5.10.6), it is obtained:
(5.10.11)
By combining equations (5.10.6) and (5.10.11) the newly obtained inlet and exhaust temperatures are given by:
(5.10.12)
(5.10.13)
When modelling the two cooling systems, the supposed current data center layout (having the parallel placement orientation) and the proposed, more efficient one, HACS, Hot Aisle Containment, the HACS class, Parallel Placement Strategy class, Air Mass class and Volumetric Air Flow Rate class were used. For simulating the decrease in the fan power consumption, the Fan Power Consumption class was modelled. The later class receives as inputs the calculated air mass flow rate and the volumetric air flow rate from the two cooling systems previously described and returns as outputs the percentage decrease between the two data center layouts.
Figure 23. Cooling Systems class diagram
System input
The system input is represented by customer’s requests regarding the workload. It is sent by the workload generator to the monitoring phase in order to be processed and represented in a programmatic manner. An example of the system’s input can be seen in Figure 12.
Figure 24. System input
The meaning of every parameter is presented in Figure 13.
Figure 25. Meaning of parameters
System output
The system output is represented by the virtual machine placement after all the system phases have been performed, the virtual machines that could not be allocated represented by the VMs having the state failed, the virtual machines that have been only created and not deployed, the shutdown virtual machines and the server and rack states including utilization, power consumption estimates and cooling power consumption estimates.
Graphical user interface
After the customer sends the requests to the data center operator for processing, the DC administrator populates the workload generator with the received tasks and starts the simulation. In order to do this, the data center operator clicks the Input Data Button and then pushes Start.
Figure 26. Graphical User Interface
After the simulation is started, after each virtual machine that is being processed, real time graphs show the power consumption estimates, cooling power consumption estimates, number of virtual machines deployed, the power efficient virtual machine placement result, and the volumetric air flow rates for HACS, Hot Aisle Containment and Parallel Placement with 10%, 20%, 30%, 40% and 50% air loss.
By carefully monitoring the plots in Figure , the data center operator can observe every change that is produced in the system in real-time.
Figure 27. Scaled Power Consumption, Scaled Cooling Power and Number of Virtual Machines (left) and the volumetric air flow loss produced by the fan for HACS and parallel placement with 10%, 20%, 30%, 40%, and 50% air loss (right)
For an easier monitoring of all the changes that occur in power consumption, cooling power consumption, number of virtual machines running and cooling system’s fan power consumption the plots and the real time virtual machine placement interface in Figure is used. After each virtual machine that is deployed or deleted, the estimates are recomputed in real time and the new results are displayed. As it can be seen in Figure, a drop occurs in all the presented components when virtual machines are deleted. Also, after every change in system’s state, the results are recalculated and displayed in real-time.
Figure 28. Power consumption plots, cooling fan volumetric air flow rate and server utilization after virtual machines have been deleted.
Furthermore, for an even easier way of monitoring the changes in power consumption and in the cooling power consumption, the first plot, can be set to display only the changes in the number of virtual machines, only the power consumption, only the cooling power consumption, all three of them, or any combination between the three.
Figure 29. Power Consumption and the number of virtual machines deployed in real time
On the next figure, the data center model is represented. The 8 racks, each with 4 servers are pictured and after every virtual machine deployment or delete, the system is updated. The server utilization expressed in intervals is color coded, as depicted in the legend.
Figure 30. Various numbers of virtual machines hosted on servers
Testing and Validation
Case study description
For testing and validation two approaches were used. The first approach is by comparing the developed algorithm RBRP – rack by rack placement with already existent bin packing heuristics, FFD – first fit decreasing using different numbers of virtual machines. The second approach is by comparing different types of data center layouts using the obtained power consumption estimates from the first case study.
Data Center Configuration
Rack hardware configuration
In order to create our data center system in a realistic manner, we have based the IT equipment design on the real-world Dell PowerEdge Rack Server configurations. We have chosen our data center system to have 8 racks, each with 4 servers.
Server hardware configuration
In the developed system, 32 homogenous servers are being used, having the following configurations:
Table. Server configuration
The power consumption has been measured by using the following constants:
Virtual Machine hardware configurations
The virtual machines used, are modelled after the five virtual hardware templates available in OpenStack, which is an open-source cloud environment. These VMs are also called “flavors” and define sizes for various components such as RAM, disk and number of cores. The virtual hardware templates are stored in a repository to allow an easy access and instantiation.
Table 4. Virtual hardware templates used
The virtual machine attributes, name, cores, memory, disk, MIPS are described in what follows:
Table. Virtual machine attributes
In order to keep the system’s model being as close to reality as possible and to easily manage the customer requests represented by the virtual machine tasks, we have implemented a series of commands based on the OpenNebula cloud environment resource management documentation.
Table. Virtual machine commands
Testing scenarios
For testing, different scenarios have been approached:
Using the model of multidimensional bin packing where servers are envisioned as bins and each resource is seen as a bin dimension.
Compare the proposed Rack by Rack Placement algorithm with existing bin packing heuristics (First Fit Decreasing).
Comparing and contrasting the system on different cooling systems (HACS, Parallel Placement Layout).
In the first scenario, the same hardware equipment, together with the same number of servers and the same virtual machine requests were used for both the RBRP – Rack by Rack Placement and FFD – First Fit Decreasing algorithms.
Validation and Results
Power consumption estimates
In what follows, the power consumption estimates and cooling power consumption estimates are compared and contrasted for both RBRP and FFP algorithms.
The sample chosen for testing and validating the elaborated algorithm is presented in the following table. The number of deployed VMs column, represents the number of virtual machines that are initially reuested to be deployed. The number of deleted VMs column represents the number of virtual machines requested to be deleted from the currently running virtual machines.
Table. Experimental sample
At first, the total power consumed by every experiment from Table by using RBRP, Rack by Rack Placement algorithm and FFD, First Fit Decreasing, were compared. The raw First Fit Decreasing algorithm was used in this case, which means that no improvement were added to it. Therefore, after virtual machines have been deleted, only the proposed algorithm, RBRP turns off the underutilized servers and reallocates the virtual machines hosted by these nodes. In this case, an approximatively 30% improvement has been obtained.
Figure 31. Comparing total simulated power consumption for different experiments in the case of raw FFD and RBRP
However, in order to obtain results as close to reality and accurate as possible, the First Fit Decreasing algorithm was improved. The improved First Fit Decreasing algorithm also turns off underutilized servers and reallocates the virtual machines previously hosted by these underutilized nodes, to hosts having enough resources for them.
Despite the fact that the FFD, First Fit Decreasing algorithm was improved, the RBRP, Rack by Rack Placement algorithm still returns better results, as depicted in the next Figure.
Figure 32. Simulated power consumption for different experiments in the case of improved FFD and RBRP
When comparing the number of servers that host the virtual machines for every experiment in turn, once again better results are obtained in the case of the proposed algorithm Rack By Rack Placement algorithm.
Figure 33. Number of used server for hosting the deplyoed virtual machine for different experiments in the case of improved FFD and RBRP
The same experiments were performed for simulating the cooling power consumption between RBRP and FFD. It can be observed that the RBRP returns better results for the same performed experiments.
The obtained results are presented in the following Figure:
Figure 34. Simulated cooling power consumption for different experiments in the case of improved FFD and RBRP
The same experiments were used for comparing the number of relesed nodes. The number of released nodes reffers to the numbers of underutilized servers which were turned off and whose hosted virtual machines were reallocated to servers having enough available space for hosting the reallocated VMs.
Figure 35. Number of released nodes for different experiments in the case of improved FFD and RBRP
Cooling systems comparison
The second testing scenario consists in the comparison of two data center layouts for cooling systems: HACS – Hot Aisle Containment and Parallel Placement rack layout.
It is assumed that a data center’s current layout is a parallel arrangement, where racks have the same orientation. A Hot Aisle Containment is proposed for improving energy savings.
The range for the supplied CRAC temperature for the two systems that were tested is between 18°C and 24°C. For safety reasons, despite the fact that the 2011 version of ASHRAE (American Society of Heating, Refrigerating and Air-Conditioner Engineers) Standard TC9.9 recommends a data center temperature in the range 18-27°C, the maximum allowed exhaust temperature chosen is 25°C.
Because in the Parallel placement layout, there is a fraction of the hot air exhaust that gets mixed with the cold air inlet, the air losses for which the simulation tests were made are 1%, 10%, 20%, 30%, 40% and 50%.
Table. Percentage improvement when choosing HACS over Parrallel Placement Data Center layouts
For each cooling system, the minimum air mass flow rate has been chosen such that the exhaust temperature does not exceed 25°C safety limit. In Figure 36, the difference in the volumetric air flow rate between the two compared data center layouts, when the air loss in the parallel placement is 10%.
Figure 36. Difference in volumetric air flow rate produced by the fan between HACS and Parallel Placement having 10% air loss
Figure 37. Difference in volumetric air flow rate produced by the fan between HACS and Parallel Placement having 1% air loss
In the following figure the difference in volumetric air flow rate between HACS, Hot Aisle Containment and Parallel Placement with 1%, 2% and %5 air loss is represented. As it can be observed, the Hot Aisle Containment is the most efficient cooling system strategy.
Figure 38. Difference in volumetric air flow rate produced by the fan between HACS and Parallel Placement having different air losses
Figure 39. Difference in volumetric air flow rate between HACS and Parallel Placement with different air losses
User’s manual
Software Requirements
JDK 7
Java SE Development Kit 7 has to be downloaded and installed on the computer on which the system will be deployed.
Jar files
When developing the project, several jars have been used: hibernate and ejb-persistence. These jars have to be installed when deploying the project. In order to use the graphical user interface, the Live Graph jars have to be downloaded and added to the build path of the project
Eclipse IDE
Eclipse IDE, preferably Luna or later, or any other Eclipse project supporting IDE has to be installed.
Running the system
The installation and deployment description must follow the next steps:
Download and install JDK 7 on the computer on which the system will be deployed.
Download Eclipse Luna IDE form from the Eclipse official page.
Install Eclipse Luna IDE on the computer on which the system will be deployed.
Open Eclipse IDE and choose a workspace directory on which the project will be saved.
Import project into Eclipse IDE
Go to File -> Import
Figure 40. Import project in Eclipse
Select Existing Projects into Workspace and click Next.
Click Browse and select the project you wish to import. Click Finish.
Add jars
Right click on the project.
Click Build Path
Click Configure build path. The following dialogue will be opened:
Figure 41. Build path in Eclipse
Click Add External JARs and choose the jars previously installed.
Click OK.
Run project.
Click on Input Data to add requests on the form that was presented in System input (5.10) subchapter
Select inlet temperature to be supplied by the CRAC unit.
Click Start.
Conclusions
In conclusion, the main issue that was identified and addressed by the project, is the energy consumption problem in data centers. Due to the massive amounts of electricity needed to cool down the servers, to power the hardware equipment and to support the large amounts of data processing, enormous energy costs are reached. In order to deal with these problems, various approaches have been proposed.
First of all, tedious research was conducted in order to study the cloud computing characteristics and how to apply them on data center capabilities, which are the main causes of power consumption and the power consumed in order to cool down the hardware equipment, existing cooling strategies and how to incorporate them into a data center design, and last but not least, existing solutions for virtual machine migration and efficient placement.
After the research plan has been formulated and the main objectives have been clearly defined, the objectives that came in support of the main addressed problem were determined:
Formulate a research plan for studying the cloud computing characteristics and how to integrate them into a data center’s capabilities
Study which are the main causes of power consumption in a data center and how to approach them
Study existing heuristics for virtual machine migrations and placement
Study the power consumed by the cooling units
Design, develop and implement a data center model by using all the previous research
Elaborate an efficient virtual machine placement algorithm for reducing the power consumption
Elaborate a solution for reducing the execution time of the simulation
Elaborate and extend a plan and a solution for reducing the cooling costs in a data center
Monitor the system continuously so that every change in the system’s state can be captured and dealt with efficiently
Simulate and validate the system’s efficiency by comparing with already existing solutions
Following the research plan that has been formulated and the main project objectives, together with the supporting objectives that have been clearly defined, their design, elaboration and development phase was started.
Contributions
For the design and the elaboration of the project the previously defined objectives represented the main focus and were approached.
First of all, the data center model has been designed and developed by taking into account all the previous research conducted on the topic so that all the DC’s functionalities could not only be exposed, but also used.
Then, a solution having different components that glue together, for reducing the energy consumption in a data center has been elaborated.
First and foremost, a greedy type algorithm for an efficient virtual machine placement on the servers hosted by the racks has been elaborated. The algorithm is extended after an existing algorithm that uses rack sorting, virtual machine sorting and finding the most power efficient server to place the VM on.
The extended algorithm, RBRP – Rack by Rack placement, is inspired after the model of multidimensional bin packing where servers are envisioned as bins and each resource is seen as a bine dimension. Various improvements have been added to the existing algorithm, such as setting utilization bounds for servers, so that the capacity constraints are obeyed and preventive measures are taken in order not to have unnecessary power consumption. Also, if a host becomes underutilized after the migration of a virtual machine, that server is turned off, this means that idle power is eliminated. In order to ensure that the minimum number of nodes is used, and a reduction in power consumption is achieved, the servers are also sorted decreasingly, so that every virtual machine is placed on the highest utilized server, without exceeding the node’s utilization constraints. Computation time is reduced due to that fact that the RBRP does not have to validate each time for the most power efficient host, due to the fact that at first the algorithm checks for ON servers and validates if there is enough space left on any of these serves. If a negative result is returned, it means that the most power efficient server that can host the virtual machine, is the first OFF server from the first rack of the decreasingly sorted list of racks having enough resources. In this way, it is ensured that the racks are filled one by one, and no empty spots remain. After the completion of the algorithm, if there are still underutilized servers, their hosted virtual machines are reallocate to other nodes with still have remaining resources available.
The efficiency of the elaborated algorithm has been simulated, validated and compared with already existing bin-packing heuristics such as First Fit Decreasing, by using different numbers of virtual machines having various weights.
In order to reduce the execution time of the algorithm, a solution for searching for similar experiment patterns has been elaborated. After each execution, the scheduler’s result is saved in a history database. When starting another experiment, the system first checks if the experiment hasn’t been already performed and if so, the same result as the one found in history database is provided.
Moreover, different cooling systems that could improve the data center’s energy consumption have been analyzed. It has been proved that the HACS, Hot Aisle Containment System is the most energy efficient when comparing to other systems. In order to support the results, the laws of conservation of energy, together with the fan laws have been extended on the HACS system and a typical data center layout that experiences air losses. The proposed data layout for rearranging the hardware equipment, brings up to 30% improvements in fan power consumption at 10% air loss.
Furthermore, a platform for continuously monitoring and dealing with every state change in an efficient manner has been elaborated.
Further improvements
It has been shown that air-based cooling systems using CRAC units provide energy saving when the hardware equipment is placed correspondingly. One of the future improvements that could be approached is extending the proposed equations to address not only CRAC-based cooling systems, but also cooling systems that rely on water, ice-based cooling systems, or hybrid cooling systems. In this way, an efficient solution for rearranging the hardware equipment can be delivered for all data center operators.
Another improvement could be developing a cooling framework that uses different time scales in which the cooling system is turned ON, in order to diminish the electricity bills for cooling the IT equipment in data center, without allowing servers to exceed a maximum temperature, thus to overheat. Also, dynamically leveraging thermal and energy storage techniques could lead to improvements.
Moreover, another improvement that could be brought to the system is by also taking into account the network transfer costs and network layouts. Thus, when there are racks that are underutilized, their network devices can be turned OFF and energy saving occur.
Also, other techniques for developing virtual machine placement methods could be used. For example, bio inspired or genetic algorithms could be developed and tested over various samples. After the development of other algorithms, for each set of requests, the most power efficient solution could be used by comparing all virtual machine allocations and the needed energy to power and cool down the hardware equipment using all the available algorithms.
Bibliography
[1] S. Kumar Garg and R. Buyya “Green Cloud Computing and Environmental Sustainability”, 2012.
[2] M. Bertoncini, B. Pernici, I. Salomie and Stefan Wesner “GAMES: Green Active Management of Energy in IT Service centres”, 2012.
[3] A. Uchechukwu, K. Li and Y.Shen ,“Energy Consumption in Cloud Computing Data Centers”, 2014.
[4] J. Yuan, X. Miao, L. Li and X. Jiang ,“An Online Energy Saving Resource Optimization Methodology for Data Center”
[5] A. Khajeh-Hosseini, I. Sommerville and I. Sriram “Research Challenges for Enterprise Cloud Computing”.
[6] U.S. Department of Energy, “Best Practices Guide for Energy-Efficient Data Center Design”, 2011.
[7] C. Belady, A. Rawson, J. Pfleuger and T. Cader, “Green Grid Data Center Power Efficiency Metrics: Pue and Dcie”, 2008.
[8] D. Rani and R. K. Ranjan “A Comparative Study of SaaS, PaaS and IaaS in Cloud Computing”, 2014.
[9] M. Durairaj and P. Kannan, “A Study On Virtualization Techniques And Challenges In Cloud Computing”, 2014.
[10] S. Srikantaiah A. Kansal and Feng Zhao “Energy Aware Consolidation for Cloud Computing”, 2011.
[11] D. Bouley and T. Brey, “Fundamentals of Data Center Power and Cooling Efficiency Zones”, 2009.
[12] J. Niemann, K. Brown, and V. Avelar “Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency”, 2014.
[13] Emerson, “Focused Cooling Using Cold Aisle Containment”
[14] Q. Tang, T. Mukherjee, S. Gupta and P. Cayton “Sensor-Based Fast Thermal Evaluation Model for Energy Efficient High-Performance Datacenters”, 2006.
[15] L. Li, W. Zheng, X. Wang and X. Wang, “Coordinating Liquid and Free Air Cooling with Workload Allocation for Data Center Power Minimization”, 2014.
[16] A. Velte, T. Velte and R. Elsenpeter, “Cloud Computing A Practical Approach”, TATA McGRAW-HILL Edition 2010.
[17] N. Haryani, D. Jagli, “Dynamic Method for Load Balancing in Cloud Computing”, IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 4, Ver. IV (Jul – Aug. 2014).
[18] N. Kansal and I. Chana “Cloud Load Balancing Techniques : A Step Towards Green Computing”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 1, No 1, January 2012.
[19] S. Venticinque, R. Aversa, B. di Martino, M. Rak and D. Petcu “A Cloud Agency for SLA Negotiation and Management”, 2012.
[20] R. Buyy, S. Kumar Garg and R. Calheiros “SLA-Oriented Resource Provisioning for Cloud Computing: Challenges, Architecture, and Solutions”, 2011.
[21] A. Beloglazov , R. Buyya, Y. Choon Lee and A. Zomaya “A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems”
[22] W. Hu, A. Hicks, L. Zhang, E. M. Dow, V. Soni, H. Jiang, R. Bulland J. Matthews “A Quantitative Study of Virtual Machine Live Migration”
[23] A. Beloglazov and R. Buyya , “Optimal Online Deterministic Algorithms and Adaptive Heuristics for Energy and Performance Efficient Dynamic Consolidation of Virtual Machines in Cloud Data Centers” , 2012.
[24] S. Srikantaiah, A. Kansal and F. Zhao „Energy Aware Consolidation for Cloud Computing”, 2008.
[25] P. Kumar, D. Singh and A. Kaushik ,“Power and Data Aware Best Fit Algorithm for Energy Saving in Cloud Computing”, 2014.
[26] A. Murtazaer and S. Oh, “Sercon: Server Consolidation Algorithm using live migration of virtual machines for Green Computing”, 2011.
[27] S. Esfandiarpoor, A. Pahlavan and M. Goudarzi “Virtual Machine Consolidation for Datacenter Energy Improvement”, 2013.
[28] A. Kansal, F. Zhao, Jie Liu, N. Kothari and A. Bhattacharya, “Joulemeter: Virtual Machine Power Measurement and Management”.
[29] Purkay Labs “Measure Server delta-T using AUDIT-BUDDY”.
[30] Intel, “The Problem of Power Consumption in Servers”.
[31] T. Cioara, I. Anghel, I. Salomie, G. Copil, D. Moldovan and B. Pernici “ A Context Aware Self-Adapting Algorithm for Managing the Energy Efficiency of IT Service Centres”, 2011.
[32] P. Vromant, D. Weyns and J. Andersson “On Interacting Control Loops in Self-Adaptive Systems”, 2011.
[33] The New York Blower Company ,“Fan Laws and System Curves”
[34] T. Leland “Basic Principles of Classical and Statistical Thermodynamics”
[35] http://chemistry.tutorvista.com/nuclear-chemistry/energy-conservation.html
[36] D. Shin, J. Kim, N. Chang, J. Choi, S. Chung, E. Chung “Energy-Optimal Dynamic Thermal Management for Green Computing”
[37] ComAir,“Establishing Cooling Requirements: Air Flow Vs Pressure”
[38] http://www.javabeat.net/hibernate-ormobjectrelational-framework-an-introduction/
[39] Y. Zhang, Y. Wang, and X Wang “TEStore: Exploiting Thermal and Energy Storage to Cut the Electricity Bill for Datacenter Cooling”
[40] L. Li, W. Zheng, X. Wang, and X. Wang “Coordinating Liquid and Free Air Cooling with Workload Allocation for Data Center Power Minimization”
[41] J. Kim, M. Sabry, D. Atienza and K. Gross, “Global Fan Speed Control Considering Non-Ideal Temperature “
Appendix 1 (only if needed)
…
Relevant code sections
…
Other relevant info (proofs etc.)
…
Published papers (if any)
etc.
Copyright Notice
© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.
Acest articol: A Simulation Based Approach For Energy Efficient And Temperature Aware Workload Scheduling In A Data Center (ID: 108528)
Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.
