Ukrainian Cluster project

Project Goal:

Development of methodology for creation of desktop computer cluster using Microsoft Compute Cluster Server 2003 (CCS).
There are two options of building a computer cluster. First one is based on a computer class, the second option uses any desktop computers connected by a high-speed local network. The hardware shall support x64 extensions since it is one of the CCS requirements.
To understand the project, one should bear in mind that it is not a grid system but the cluster with fully-functioning MPI environment.

Project Importance:

  1. Personal computers remain shut down for the greater part of their service life. Bearing in mind that business hours are from 9:00 to 18:00 with Saturdays and Sundays as days off, we see that computers are not in use for 73% of the available time.
  2. Computing productivity of personal computers has long equaled computing productivity of servers. In overwhelming majority of cases, server processors are built on the same cores and working on the same frequencies as the desktop processors.
  3. There is a possibility of creating the local network using inexpensive and high-speed Ethernet technologies. Today, Gigabit ethernet already satisfies requirements for creation of MPI-based cluster interconnections. Forecasts for 2008 anticipate further expansion and cheapening of 10 Gb Ethernet and adoption of 10 Gb standard for copper wire which will allow to compete quite successfully with such interconnection standards as Myrinet and InfiniBand [1|http://www.networkworld.com/news/tech/2006/082106-10gbaset.html], [2|http://www.internetnews.com/stats/article.php/3661141]. 10 Gb Ethernet delays are about 5-10 µs, delays of last Infiniband releases are 1-5 µs, but the price of mass Ethernet solution will be lower. It is forecasted that IEEE standard 100 Gb Ethernet taken as the basis will be introduced beginning from 2010 [3|http://www.infoworld.com/article/06/11/22/HN100gbits_1.html]. Currently, 42% of top 500 clusters use Ethernet technology [4|http://www.top500.org/stats].
  4. Distant and strategic perspective of the solution: introduction of TOE and RDMA [5|http://www.networkworld.com/details/653.html], [6|http://www.networkworld.com/details/5221.html] (protocol hardware processing, network card recording received data directly into the memory bypassing processor) that became widespread lately will allow relieve processor from the load during data communication. Surely, delays in network communication currently do not allow to talk about the shared memory environment (OpenMP), but the trend when computer manufacturers prefer not to speed-up the memory but speed up the bus by doubling the quantity of communicated information (DDR 2,3,4 and so on) results in situation when delays between processor and memory remain the same or grow, while delays in the network decrease. Considering the fact that speed with which data is communicated from processor to memory today is 8-10 Gb/sec and the network speed is 10 Gb, one can forecast further drift towards equalization up to the possibility of building the OpenMP systems. As a result, there will be no need in building separate large systems – unification of existing computing capacities will suffice.

Description of solution:

First option.
This option is recommended when the cluster is built on the basis of computer class in academic educational institutions. This part is simple to install and work with, and does not require from administrator special knowledge or investment in special hardware. The downside of this option is potential problems with ensuring security, therefore it is not recommended for performance of critical calculations or storage of important data on client computers. Technically, this option works as follows:
  1. One server is allocated (it shall be continuously turned on) on which main cluster node and cluster control software (.NET С#) will be launched.
  2. At the preset time, control software starts client computers in the network (using Wake on Lan technology) and launches cluster switch software (CSS) (MASM32) using PXE technology.
  3. CSS switches computer into the cluster mode by replacing MBR with cluster’s.
  4. When the time comes to switch computers back into the regular mode, control software remotely transfers computer processes into standby mode (using remote command execution) and the cluster system – into hibernate mode. It is worth noting that computer processes do not notice the moment of shutting down and as the experiments show, it does not affect their performance.
  5. After that, steps 2 and 3 are repeated, with the only difference is that CSS replaces MBR with client’s.
Approach with MBR replacement ensures additional level of reliability, since even if there is a problem with communication with head node, there is still a possibility of local OS loading.
At the discretion of administrator, cluster section of the hard disk may be marked as available space or as a specific file system when client’s OS is in use, i.e. it remains invisible to users and cannot be destroyed without administrator rights.
This cluster building option is implemented in the computer class of Microsoft IT academy, University Computer Center and is currently used for science calculation purposes.
Second option.
This option is universal and can be recommended for use in organizations that present heightened requirements to security and stability, such as, for example, banks. The downside of this option is potential need in procurement of iSCSI SAN hardware (storage area network). This hardware may be required in the event of increase of the number of computers used for clustering, which will result in situation when SAN software realization may not cope with the load. Technically, this option works as follows:
  1. One server is allocated (it shall be continuously turned on) on which main cluster node and cluster control software (.NET С#) will be launched.
  2. At the preset time, control software starts client computers in the network (using Wake on Lan technology) and using hardware (HBA) or software (emBoot [7|http://www.emboot.com/press-release-winboot_July2006.htm]), loads CCS from SAN through iSCSI technology. Remote loading of Server2003 using iSCSI technology is supported by Microsoft, which was reflected in release of iSCSI initiator [8|http://www.internetnews.com/storage/article.php/3596531].
  3. When the time comes to switch computers back into the regular mode, control software remotely transfers computer processes into standby mode (using remote command execution) and the cluster system – into hibernate mode. At the same time, since each cluster node is working with its own SAN section, hibernate files are stored there as well and are unavailable for users.
If the computing task seldom uses hard disk or just a small space on it, it is possible to use software realization iSCSI SAN, for example, free server mySAN [9|http://www.nimbusdata.com/company/pr/20060814.htm]. Commercial emBoot may be used as the loading software. In the nearest future, when creating and testing this option we plan to develop a free analogue of this software with functions limited by cluster loading. Therefore, it will be possible to create the initial configuration without additional costs.

Last edited Jul 5, 2007 at 8:56 AM by AlexL, version 2

Comments

No comments yet.