Wednesday, April 28, 2010

How HP Insight Control suite for Microsoft System Center integrates with Microsoft System Center Suite

Customers who are using Microsoft System center suite which comprises mainly of four products i.e. System Center Configuration manager, System Center Operations manager, System Center Virtual machine manager and System Center Data protection manager can now leverage Hp insight control suite for system center to to deploy, monitor and control their server infrastructure from almost anywhere. With this first release of the HP Insight Control suite for System Center, HP delivers Operating System Deployment (OSD) capability, Performance and Resource Optimization (PRO), enhanced inventory, the HP Server Updates Catalog, an ILO Advanced license, licenses for HP Insight Power Manager and Performance Management Pack, and HP specific management packs for System Center Operations Manager to optimize deployment, virtualization and configuration of HP servers being managed by Microsoft System Center. However HP ICE for System center does not integrates or adds any functionality for system center Data protection manager.

HP ICE for System center provides proliant server deployment functionality which enables system center configuration manager [SCCM] administrator to configure proliant server bios settings, bios boot order and ILO configuration from the SCCM console using a task sequence and then same task sequence can be advertised with OS image and settings resulting into deployment of bare proliant server using a truly zero touch deployment method. That means you purchase the proliant box and place it in the datacenter and rest everything can be managed via system center consoles. HP ICE for System center adds HP specific capabilities for deployment of proliant servers.

Along with operating system image deployment you can also integrate driver packages or proliant support pack along with other 3rd party software packages or Microsoft software like Exchange 2007 or different operating system roles like DHCP/DNS etc. so that once server is deploys it’s up to date with all patches and software requirements. Within SCCM console you can run inventory reports which will capture hardware inventory from proliant servers.

You can directly manage and control your proliant and blade servers using system center operations manager by installing proliant server management pack and blade system management pack respectively. However at this point of time management of HP integrity servers and VMware host servers from SCOM is not supported.

Hope you find this information useful and check here to learn more on Insight control for Microsoft system center.


Monday, April 26, 2010

How cluster shared volumes Direct and Redirected IO works

Cluster shared volume [CSV] is a great feature offered with 2008 R2 failover clustering. Cluster shared volumes allows the different nodes of cluster to have concurrent access to the LUN where highly available virtual machine's VHD is stored. So in case of Live migration/quick migration or manual move/failover operation the LUN (which has been configured as a physical disk resource in cluster) is not dismounted and mounted which was the case till 2008 server. In server 2008 R2 both the nodes have simultaneous access to the LUN and both the nodes can read and write on the LUN where multiple VHD's of the highly available virtual machines are placed. This allows multiple VHD per LUN and removes the traditional one VHD per LUN issue. Earlier clustered virtual machines can only fail over independently if each virtual machine has its own LUN, which makes the management of LUNs and clustered virtual machines more difficult. Now that limitation is gone with CSV. From the last blog we know how persistent reservation works for cluster shared volumes and we know how both nodes are able to read and write at same time on the CSV disk resource. In this blog we will see how it works on file system level. But before we do that lets have a quick recap of cluster shared volumes below and here

To understand Direct IO, we will take a 2 node cluster scenario. Node A is the coordinator node as shown in the fig 1 and holds the CSV volume containing VHD files of VM 1 and VM 2.

                                                             FIG 1

Node A is directing read write IO and meta data IO [file renames, attribute changes, new file creation etc ] directly as shown by red and yellow lines simultaneously. At the same time I collected a perfmon and enabled new counters for cluster shared volumes. We can see in fig 1 that we have Read write IO along with metadata IO however as both the VM’s are running on coordinator node we do not pass any IO request to mini SMB redirector and none of the IO flows on network. Once you move the VM2 on node B the metadata IO goes via network [see fig 2]. CSV redirector passes over request to SMB mini director which passes the request from client side to server side and IO request flows to CSV redirector on coordinator i.e. node A. CSV redirector passes the IO to ntfs.sys and passes the information back to Node B via same channel. And this is the main reason why it is asked to enable SMB protocol on the CSV network.

                                                                  FIG 2

Cluster disk manger [DCM] is responsible for managing the CSV resources and manages the read write policy set on non coordinator nodes for access to CSV disk resources. You can look at cluster.log below and see that [DCM] is responsible for starting csvfilter.sys and creating CSV disk resources.

From cluster.log

01255 00000724.00000388::2009/10/08-18:01:35.592 INFO [DCM] Cluster Shared Volume Root is C:\ClusterStorage
01257 00000724.00000388::2009/10/08-18:01:35.607 INFO [DCM] service/driver CSVFilter started
01258 00000724.00000388::2009/10/08-18:01:35.607 INFO [DCM] short name is C:\CLUSTE~1
01259 00000724.00000388::2009/10/08-18:01:35.607 INFO [DCM] Filter.CfsSetRootFolder (, RootFolder=\ClusterStorage\)
01260 00000724.00000388::2009/10/08-18:01:35.607 INFO [DCM] SetRoot message sent
01261 00000724.00000388::2009/10/08-18:01:35.607 INFO [DCM] Pnp CfsFilter Launching Filter Listener
01269 00000724.00000388::2009/10/08-18:01:35.607 INFO [DCM] db.CreateDcmDisk 'SR' 8e62ea00-9763-4c21-86d1-4a05708be24a-----SR is name of my CSV disk resource
02000 00000724.00000928::2009/10/08-18:01:41.957 INFO [DCM] FsFilterCanUseDirectIO is called for file:///?\Volume{f98a57c7-9ec6-11de-ad77-806e6f6e6963}\
02001 00000724.00000928::2009/10/08-18:01:41.957 INFO [DCM] PostOnline. CanUseDirectIO for Volume1 => true
02008 00000724.00000928::2009/10/08-18:01:41.957 INFO [DCM] ClearVolumeStates: resource 'SR' states
02011 00000724.00000928::2009/10/08-18:01:41.957 INFO [DCM] Reservation.SetMembership(SR,(1 2))
02019 00000724.000008b4::2009/10/08-18:01:41.972 INFO [DCM] volume 'Volume1' is already paused
02020 00000724.000008b4::2009/10/08-18:01:41.972 INFO [DCM] CreateLink C:\ClusterStorage\Volume1 => \\?\Volume{f98a57c7-9ec6-11de-ad77-806e6f6e6963}\

The interesting part is that when a CSV resource is created it is put in its own group which has a unique GUID instead of name. when you run command cluster.exe group command these CSV GUID based group will not be displayed and the reason is Microsoft does not want users to mess with these. Also there are lot of limitations with these groups like you cannot add any resources in these and many more. However running a “cluster.exe res” will show you CSV group’s GUID name.

Now as we know that if you are accessing CSV disk resource in a “Redirected access” mode you are bound to get a performance impact as all the read/write and metadata IO is flowing over the network from all non coordinator nodes instead of direct IO to storage object. You can calculate the performance impact by capturing a baseline perfmon during direct IO and redirected IO.

In Redirected IO, we leverage SMB mini director and Csvfilter.sys and send all the read/write IO along with NTFS metadata over on network instead of storage path. CSV helps in increased fault tolerance as even if storage path of lun fails from a particular node, the Node will keep accessing the Lun via coordinator node and re direct all IO on network as seen above [though with performance constraints] . We will see more details in next part of this series on cluster shared volumes. We will also see what are options and best practices regarding backup of Vhd files on cluster shared volumes. Hope you find this information interesting and liked today's blog. Thanks for your time and good bye till next blog.

Saturday, April 24, 2010

Manage and secure PCs anywhere with Microsoft's cloud based Intune

Microsoft has released another cloud based client machines management tool which can be integrated with MDOP and can be leverage to manage all your client machines from a IE browser virtually from anywhere. Windows Intune delivers cloud-based management and security capabilities that are administered through a simple Web-based console based on silverlight. This tool can be best used in small business segment where you do not have resources to set up an on premise desktop management solution and want all the features of enterprise solution with much less TCO.

This is what Intune service allows you to do from a web console

You need to install the intune client agent on all the machines which you want to manage. Once that is done after 30 minutes or so the computers will come up in unassigned computers group and from there you can place it in self created relevant group.

You can configure the products for which you need updates along with classification of updates and if you want you can also set a auto approval rule. This otherwise needs to be done by either using winows update service in individual clients or leveraging windows update service server [WSUS] but here we can configure our clients by sitting anywhere, anytime.

you can choose what kind of patches you want to be deployed and you can also configure auto approval rules. intune service provides great alert monitoring capabilities. There are around 949 alerts that you can configure using intune service. If a particular client machine generates an alert administrator can receive an email if configured or else can see alerts in browser itself.

Once you register for intune service beta you get only one admin user email from Microsoft however you can configure multiple administrator accounts to monitor your client machines however I have yet not checked what kind of delegation abilities have been provided.

As mentioned above the product allows you to monitor your patch installations and allows you to grab different reports like Update/software, license purchase and license installed can configure reporting for a particular computer group or as required.

The Web browser also allows you to creat policies for intune client agent, when to schedule malware scan, what folders to exclude, frequencey to look for new updates, firewall settings etc. you can also add licenses agreements and can manage licensing information from web broswer itself.

In short the first version windows Intune looks promising for management of client machines and a great example of how clouds can help in reducing operation and maintennance costs and simplified way of accessing information can be leveraged to make life of a client PC administrator better. You can MDOP tool with intune service to  manage your clients in a better way. Windows Intune customers have access to download and use advanced PC management tools delivered through the Microsoft Desktop Optimization Pack (MDOP) [if you meet the criteria mentioned below]. This set of advanced PC management tools complements the capabilities of the Windows Intune cloud service and includes:

 Microsoft Diagnostics and Recovery Toolset (DaRT)

 Microsoft Advanced Group Policy Management (AGPM)

 Microsoft Application Virtualization (App-V)

 Microsoft Enterprise Desktop Virtualization (MED-V)

 Microsoft System Center Desktop Error Monitoring (DEM)

 Microsoft Asset Inventory Service (AIS) (Note: This functionality is included in the Windows Intune cloud service)

MDOP is available for test and evaluation if you meet one of the following criteria:
You are a Windows Software Assurance (SA) customer. MDOP is available to SA customers through the Microsoft Volume License Service Center. OR You are a Microsoft MSDN or TechNet subscriber.

All that Intune can do for you [check video here]

Hope this was informational and gives you an overview of windows Intune. Thanks and see you again shortly as Microsoft has released system center service manager 2010 which is promising and I will try to cover its features too and how it helps in implementing ITIL/MOF process in your IT management.

Reducing Carbon Foot Print in IT

Carbon is fast becoming a currency that global organizations cannot ignore. With carbon cap-and-trade schemes either being planned or implemented by a growing number of national and regional authorities, the carbon impact of an organization’s operations is becoming a measurable cost. And after the global recession, with the focus still very much on the bottom line, carbon impact management is closer than ever to becoming a universal boardroom issue. But who, within an organization, is responsible for mitigating the carbon cost? Certainly, this question does not yet have a straightforward answer. But one thing is clear – however organizations choose to deal with the carbon question, CIOs are bound to be involved. This is because – even if they are not given primary responsibility for managing organizational carbon impact – CIOs will need to ensure carbon reporting systems remain up to date with legislative requirements. On top of that, there is a growing realization that ICT itself has a significant role to play in carbon impact management. The European Commission recently announced1 the information and communication technologies (ICT) sector should lead the transition to an energy efficient economy. It called for Europe’s ICT sector to:

• Agree on common energy consumption measures
• Overtake the EU’s 2020 targets by 2015
• Make innovative use of ICT to make Europe a low-carbon economy

The EC said replacing 20 per cent of European business trips by video conferencing could save more than 22 million tons of CO2 per year. It also said that broadband facilitating increased use of online public services could save two per cent of total worldwide energy use by 2020. It is clear that CIOs need at least to know all the facts, if they are to make an informed decision about the role they will play. So where should they begin?

Carbon Emission and the Bottom line:-
The poster child in the war against carbon emissions has been ‘green’ energy. But while the likes of hydropower, biomass, wind and solar energy may have a knack of exciting the headline writers, none have yet become affordable mainstream technologies. And while the race to find ways to replace our reliance on ‘dirty’ fuels needs to go on, the place to look for short-term emission cuts is in energy efficiency. Unglamorous it may be, but 40 per cent of the carbon reduction to be achieved by 2020 and beyond needs to come from precisely this source2. And central to the story of how business will meet those targets is IT. When asked to provide an example of environmental irresponsibility, the airline industry is never far from people’s lips. Yet global CO2 emissions from IT are roughly on a par with those pumped into the atmosphere by planes. What’s more, the opportunities to use technology to cut emissions are vast. A recent report notes that IT could contribute as much as 15 per cent of global emissions reductions by 20203. Of course, the idea of cutting the energy consumption associated with IT is not new. It’s a target that has featured in the CSR programs of big business for some time. What is new however is the growing realization that an energy efficient approach to IT is far more than a PR tool. Instead it is a bottom-line issue. It doesn’t just look good on the corporate website, it can help save serious sums of money. Consider the 15 per cent figure mentioned above. That equates to €600 billion in cost savings.

After Copenhagen:-
When world leaders met to discuss climate change at the Copenhagen Summit, carbon reduction was one of the topics under discussion. It played a central role in the Copenhagen Accord, which was the key output from the summit (for more details, see Reference Section “Copenhagen: implications for global business”). As part of the Accord, countries were invited to submit their own carbon reduction targets by the end of January 2010. Fifty five did so. They include the US, all EU countries and China, as well as major emerging economies such as Brazil, Indonesia and India. Between them these nations emit 78 per cent of the world’s greenhouse gases. So although there were notable absentees – Brazil was the only South American nation to volunteer a pledge, and just six out of 55 African countries did so – and although the combined country targets are insufficient to cap the temperature rise at the desired two degrees, this was still an important step on the path towards ultimately achieving a legally-binding global agreement. (Though just when such a step will be taken is another matter.) Crucially, the Accord also provides for scrutiny to monitor whether or not countries meet their emissions reductions targets – a key point of difference between developing and developed countries throughout the negotiations. Unsurprisingly this was an issue that the US was particularly keen on in relation to China. As The Economist puts it: “Unless China can be shown to live up to its promises, it will be very difficult to get a climate bill through America’s Senate.”

Local Action – A Snapshot of Carbon reduction activity around the globe:-

Companies will be required to measure all electricity, gas, and oil use (excluding transport and travel). They will then purchase allowances equal to their annual emissions. Within that overall limit, individual organizations can decide on the most cost-effective way to reduce their emissions. Just like most cap-and-trade schemes they will then have the ability to buy extra allowances or invest in ways to cut the number of allowances they need to buy.
A league table is then created, with credits being handed out based on that year’s performance. An organization at the top of the table will receive repayments totaling more than has been paid for the allowances in the first place, while those at the bottom will receive repayments that are less than the amount paid out. In other words, organizations in the bottom half will lose money. Analysis Mason estimates that for the biggest companies, being ranked at the bottom of the table could amount to a financial penalty of over £120,0007.

How IT can cut Enterprise Cost:-
Several technologies are leading the way in helping businesses save hundreds of millions of pounds in the process of cutting their emissions.

Green data centers
One of the most exciting is data centre virtualization, a technology that slashes the number of servers required to run your organization. In the case of BT, data centres used to account for a significant chunk of the company’s carbon emissions. However the average server is utilised at a tiny percentage of its overall capacity. By ‘virtualising’ these servers, or by asking each server to carry out an increased number of tasks at the same time, utilization shoots up, and this means the number of servers needed slumps. In one of BT’s data centers, the number dropped from 1,500 to around 100, saving £600,000. Large numbers of servers create lots of heat which, in turn, means the need for power-hungry air conditioning systems. BT has redesigned its data centers to allow fresh air to cool servers. Air conditioning is only used on the rare occasions when the temperature reaches 28°C or above.

Flexible working and home working
Flexible working and home working are more familiar approaches to cutting power use(and improving productivity) yet many organizations still make little use of them. At a time when there’s an ongoing pressure to find cost savings, there’s a strong argument to take a fresh look at the efficient use and potential rationalisation of building space. BT’s own results make a persuasive case. By enabling over 13,700 employees to become home workers the company saves €750 million a year in property management costs.

Conferencing and collaboration tools
Using conference calls to replace face to face meetings has had similarly dramatic effects within BT, saving the company an estimated £183 million in 2008 and saving over 50,000 tonnes of CO2.
Collaboration technologies have the same effect, allowing people to work together without the need to travel to the same location.

Tips to Decarbonate IT:-
CIOs are at the heart of the emissions reduction story. Information technology is not just a huge energy consumer but can also play a role in reducing energy use and cutting costs in other areas of the organisation. But what are the first steps CIOs can take today to ensure they are prepared to lead the way in cutting their carbon impact, as soon as it becomes a bottom-line issue within their organization?

1. Build environmental responsibility into your distributed organization
As organizations have globalised in recent decades, they have had to face the challenge of maintaining a common corporate culture, irrespective of geography. They now face the same challenge in their attempts to engender a sense of carbon impact responsibility across the business. Concerted and consistent education, support and encouragement are needed to ensure people are aware of what is expected of them, and that they actually exhibit the desired behavior. But this message needs to come from the top down – the CIO needs to lead the way.

2. Ensure the right systems and processes are in place
A group-wide function should be put in place to capture, manage and report all the data related to IT use in the business. Work with facilities management, environmental management and finance, because they will each potentially be affected by legislation as it is brought in around the world. With more countries looking to enshrine environmental commitments in law, there is an increasing need to display accountability at the highest level of the organization. Simply storing information on a spreadsheet will no longer suffice.

3. Redefine the workplace
The vast majority of action on climate change so far has focused on energy use. Surprisingly there has been very little emphasis on travel, whether between business locations, or from home to the traditional workplace. In the UK, for example, approximately 25% of all emissions are travel-related. Promoting technologies that can replace the need for travel (while also saving huge sums of money), by enabling people to meet virtually or work nomadically, is something that CIOs can be doing at board level.

Copenhagen: implications for global businesses
“It’s very disappointing, I would say, but it is not a failure.” So said Sergio Serra, Brazil’s Climate Change Ambassador at the conclusion of the Copenhagen Summit. This was a view echoed by many, from fellow politicians, to NGOs, to the legion of green pundits in the world’s media. After the summit, there was a feeling that at least a global consensus on climate change was reached. The Copenhagen Accord gives international backing for an immediate global move towards action on climate change. In some ways this is a bigger achievement than the binding commitments that resulted from Kyoto 17 years earlier when the deal only affected developed countries. So what exactly did they agree to? There were three key components:
1. Backing for an overall limit on global warming of two degrees;
2. Agreement that all countries need to take action on climate change;
3. The provision of $30 billion of immediate short-term funding from developed countries over the next three years to kick start emission reduction measures and help the poorest countries adapt to the impacts of climate change; as well as a commitment by developed countries to provide long-term
financing of $100 billion a year by 2020.

Illustration: Comparison of emissions for a typical 200-server Windows network

This illustrative comparison shows the major impact virtualization and data centre energy efficiency can make to CO2 emissions.
1. Impact of virtualization
• Non virtualised: 130KW (~610 tons CO2)
• Virtualised: 24-50KW (~112 – 130 tons CO2)
• Virtualization can reduce power consumption by between 66 per cent and 82 per cent.
(Assumption: Average data centre Power Usage Efficiency* of 2.4)
2. Impact of data centre efficiency
• The difference in energy consumption between an inefficient data centre and an efficient one can be as much as 50 per cent. (Assumption: PUE range from 3.2 to 1.6)
3. Combined impact:
Best and worst-case CO2 emission scenarios

• Potential energy saving/CO2 saving ~91 per cent
* Note: PUE is the industry standard energy efficiency measure for data centers.

2. Analysys Mason’s Carbon Reduction Commitment Brochure
4. EU Press Release


Wednesday, April 21, 2010

Cloud based Fix it center for automated troubleshooting

Microsoft has just released Fix it center online beta and it is based on the automated troubleshooter feature which we saw inbuilt in windows 7. However as this is more of cloud based fix it center, it supports various operating system versions given below. Though it is a great way of managing all your PC from one location with an automated repair functionality of known issues, however if right now there is no troubleshooter for your problem, you will be provided various other related articles and community groups too. you also have an option of raising a support request from the same online portal if nothing mentioned above helped you. I think this tool is very good for individual customers and small business customers.

• Windows XP SP3

• Windows XP Pro (64-bit) SP2

• Windows Vista

• Windows 7

• Windows Server 2003 SP2

• Windows Server 2008

• Windows Server 2008 R2

You can use any computer with Internet connection to get started with Fix it Center. Simply download the Fix it Center client and follow the on-screen instructions to complete the setup. You can install Fix it Center client on as many PCs you like. It is recommend to sign up for Fix it Center Online during setup so you can manage all your computers from a single location on the Internet yet can view solutions specific for each PC. With automated troubleshooters, Fix it Center helps solve issues with your PC, even if you're not sure what the exact problem is. Fix It Center scans your device to diagnose and repair problems, then gives you the option to "Find and fix" or to "Find and report. With a single view of all your devices, it’s easy to manage multiple devices from one view. You can even manage them remotely.

I downloaded the fix it agent and ran on my PC and as you see below its running an
inventory for my PC

It asks me to create an online account and i can use my existing live/hotmail id.

I see some automated troubleshooters on the screen and the remaining are on the bottom right corner where it says " having a different problem"
I can manage multiple PC's from same console and also see the diagnostic history on all PC's. I can find more solutions from the Tab and if i need to raise a support request to microsoft, i can do it from here and can also see the old support requests.
Though i see this tool as a great way of automating troubleshooting and managing small environments, I dont think i will ever run this on my servers as of now. Hope you like this new tool as it automates troubleshooting and make life easier for an end individual user or a small business user.

Sunday, April 18, 2010

How Dynamic memory Feature of 2008 R2 Sp1 works with Failover Clustering

The news of win2k8 R2 SP1 has started coming online and the most interesting feature is dynamic memory support for Hyper-V. Constraints on the allocation of physical memory represents one of the greatest challenges organizations face as they adopt new virtualization technology and consolidate their infrastructure. With Dynamic Memory, an enhancement to Hyper-v introduced in Windows Server 2008 R2 SP1, organizations can now make the most efficient use of available physical memory, allowing them to realize the greatest possible potential from their virtualization resources. Dynamic Memory allows for memory on a host machine to be pooled and dynamically distributed to virtual machines as necessary. Memory is dynamically added or removed based on current workloads, and is done so without service interruption. At a high level, Hyper-V Dynamic Memory is a memory management enhancement for Hyper-V designed for production use that enables customers to achieve higher consolidation/VM density ratios.

Dynamic Memory is supported on guest machines running Windows Server 2003 SP2 Datacenter or Enterprise editions, Windows Server 2008 SP1 Datacenter or Enterprise editions, Windows Server 2008 R2 Datacenter or Enterprise editions, Windows Vista SP2 Enterprise or Ultimate editions, and Windows 7 Enterprise or Ultimate editions. In today’s blog I am going to focus on how dynamic memory features along with failover clustering is a win win for everyone. I am going to take example of a 3 node cluster running on 2008 R2 server core with 16 GB ram each where Red VM’s workload require 4GB ram and Grey VM’s workload require 3GB ram.

So in case of static memory assignment we see that if a cluster node goes down I either need a passive node or I need to reduce the desnsity of VM’s so that that I can place VM’s from failing node on my existing cluster nodes. In the present case [shown above] I have reduced the density of VM that I can place on Cluster nodes and by doing that I am actually loosing out usage of 5GB ram on each cluster host. However when one of the node goes down the other 2 nodes will be able to take the load of the exiting node. So lets see what happens if one of my nodes actually go down for a short interval of time.

Though I see that my virtual machine resources do failover to node A and node B, yellow VM can still not come online on node B till the time I change its static ram and reduce it so that atleast it can start up on node B. This has to be a manual operation however even by doing that I cannot guarantee that workload running on that VM will have a sufficient performance. so in ideal scenario I should modify all the VM’s ram running on node B so that my yellow VM gets enough ram to give me satisfactory performance with provided resources in the short time window where one of my node is down temporarily. But the problem here is that I cannot reduce ram on fly from other VM’s without shutting them down which I cannot as they are in production and shutting them down is not feasible. Here, Dynamic memory assignment which is a feature of 2008 R2 Sp1 brings in the magic.
Now we have all 2008 R2 server core Nodes with Sp1 installed and hence dynamic memory assignment feature is available/VM. For every highly available VM running on all the 3 nodes I have given 512 MB as the initial memory [customizable] and 4 GB as the max memory. Dynamic memory allows us to configure a virtual machine so that the amount of memory assigned to the virtual machine is adjusted while the virtual machine is running, in comparison to the amount of memory that is actually being used by the virtual machine. This allows us to run a higher number of virtual machines on a given physical node. It also ensures that memory is always distributed optimally between running virtual machines. Some applications assign fixed amounts of memory based on the amount of memory available when the application first starts at start of operating system. These applications will perform better with higher values for the initial memory instead of 512 mb ram that I have assigned so it depends on the workload that you are running in VM. So in our scenario, if node C now goes down, the highly available virtual machines resources will failover to other nodes and will come online with optimized host ram being shared between Virtual machines and no manual intervention required. The great benefit that comes with dynamic memory is that if a VM do not require ram at one point of time, that ram can be leverage by other VM and when required can be given back. This allows us to increase VM density, better usage of ram resources and much more control and optimization of ram resources in a failover cluster node failure scenario.
There’s lot more coming about dynamic memory here and very soon we will also compare this with VMware overcommit/ballooning feature when we get more light thrown on how it works. Also very recently XEN has released new version of their hypervisor will additional features like Transcendent Memory and Page Sharing in Xen 4.0 to enhance the performance and capabilities of the hypervisor memory operations. Xen 4.0 now supports live transactional synchronization of VM states between physical servers [very similar to VMware fault tolerance feature]. Keep checking our blog for upcoming articles and thanks for your time and hope this blog would have been a reading pleasure for you and an insight into very nice dynamic memory feature coming in 2008 R2 sp1 which is yet to be released.

Friday, April 16, 2010

Are we molding new Technology to suit ourselves or getting molded by new Technology

The phone that you see on the right side is my favorite phone and I have purchased 3 of those in last 6 years and still using same phone very happily. Once I switched to the new age smart phones but just in a month I gifted it to my mother as I figured out that usage of that new age smart phone is actually becoming a disadvantage for me. How can a cool phone will all great nice features and different apps can be a disadvantage?..ok every one is a different personality and I figured out in one month that instead of saving time this phone is making me spending lot of time in clicking waste snaps [which I would not have clicked otherwise from my semi DSLR camera], listening much more music then I usually used to do, wasting money and time in downloading apps, songs, graphics, games. I configured outlook on my mobile to be bogged down by emails even after my office hours, the thick line between my office and home time started shrinking. The apps like facebook, twitter and msn started keeping me engaged as they were accessible to me 24 hours in my hand. Nothing wrong in using social networking applications, clicking snaps, playing games, listening songs and I use all this today too but I have a dedicated nice laptop with nice gaming hardware, a nice camera to pursue my photography hobby, good surround sound speaker/home theatre for entertainment …the difference is that this allows me to have a control of my time and mind and I still can do all what I want to do without compromising on any of my experiences. Personally with that old phone I need not to care about it being stolen [ I have forgotten this phone maytimes and people have returned me or it was found where it was left ] and nor I worry about its wear and tear…in other words it an economical phone which does all that a phone should do…and on top of it …it helps me in time management and more control over my time without any time maanagement software in it. However this does not mean that you should not use those high end phones but you should know whether you really need them and utilizing them or are they just with you for your coolness quotient. It is fine if those phones are serving you and not you serving them.

Now why I wrote all this!!....
Many times I have seen people seeking advice on shall we upgrade to latest version?...shall we purchase latest hardware?....shall we switch to this new technology which is buzzing in market?……Virtualization! …. Are we molding new Technology to suit ourselves or getting molded by new Technology. Do not switch over to new technology just because it worked for others in their enironment…make sure it works for you, do your own prrof of concept, bench marking and base line testings before you make any decision just based on sales presentations. There is no one technology solution that works for any IT Enterprise and that’s where identifying your buiness needs and aligning them with your existing processes and architecture come in picture. Hope you liked this analogy and it made you think for a minute and once again thanks for staying with our blog.

Tuesday, April 13, 2010

Key Home Takeways from my First day of Tech ED India 2010

I have just returned from the Hotel Lalit Ashok, Bangalore, where Tech ED 2010 event is happening this year. I am going to talk about the key home takeaways from the first day of Tech Ed 2010. It was a fabulous start with visual studio 2010 launch event however as I am not a developer I will talk more about architecture track where we discussed about different cloud patterns and practices which was followed by discussion on dynamic data center took kit session. Post lunch we saw a session on system center service manager which let enterprise admin to follow the ITIL practices and automatically generate an incident based on the alerts from SCOM. Also a self service portal can be created where a user request resources/applications and after approval from his manager those resources/applications will automatically get deployed via SCCM. There was another session on provisioning virtual machines using system center virtual machine manager via creating a hardware and software templates. SCVMM lets end user to self provision virtual machines as per his requirements. Then we jumped into another session where we saw the capability to manage linux and unix servers from SCOM. It’s a great feature which comes with SCOM 2007 R2…so you got default management packs for unix and linux flavors which enable you to generate alerts in SCOM and manage heterogeneous environments in your datacenter. One of the last sessions before demo extravaganza was for protecting virtual machines from DPM 2010 where we specifically deep dived into Cluster shared volumes [CSV]. We saw the DPM backing up virtual machine irrespective of live migration of storage migration happening on them. Best practices of using hardware vss provider for protecting CSV luns was discussed and we also saw a demo where we can recover a VM on a hyper-v host other than from which it was backed up. During breaks I also managed to visit Citrix partner demo tent where 2 Citrix employees where kind enough to show me Citrix xen desktop and Citrix xen server solution and the discussion got stretched as we compared hyper-V VDI and Citrix VDI solution from a 100 FT view. The nice thing about xen server is that the free edition contains features like live migration and Xen motion [synonymous to storage migration/V-motion]. You just need to install the xen server on a bare metal machine and then you install a client app on any other machine [synonymous to VMware Virtual center and MS SCVMM] to manage the xen server and this is again free of cost however yes the High availability features needs to be bought but still I think that’s a great technology which Citrix is providing to SMB users for free. Citrix already is delivering HDX [High definition experience] technology which Microsoft is about to roll out in 2008 R2 sp1 known as Remote FX. New HDX media stream and HDX plug and play capabilities ensure that whether users are watching windows multimedia or connecting multiple monitors, scanners, phone or other device they get same experience of local pc on a hosted virtualized desktop.

In the end the demo extravaganza was fun, where some very cool technologies and features were demoed including windows phone 7 and surface computer. We saw some cool features of power point 2010 where you can do a lot of magic by embedding your videos in power point and editing them directly from power point and same for the images. We saw how we can play with websites HTML code using IE 8 developer tools. Another cool feature was chatting/emailing in office communicator/outlook in Indian local languages. We saw some great features of search in Bing and how its beat Google in those features. It was great fun and that’s all for now. Looking forward to day 2. Thanks and good bye till I catch you tomorrow.

Gaurav Anand

Monday, April 5, 2010

All about maintennance mode, chkdsk/defrag in failover clustering including "cluster shared volumes"

Continuing from “storage architecture changes” to “how configuring storage with volume guid works”, today we will talk about different options of running chkdsk/defrag in maintenance mode in failover clustering 2008/R2. We will also see how putting a physical disk resource in maintenance mode is different from putting a cluster shared volume in maintenance mode. Chkdsk has always been a challenge especially in environments where storage disks size go in terabytes. I have seen many real life scenarios where critical disk resources were not available for services and applications in production time as chkdsk started running on them when they were mounted [brought online]…this happens because when we bring a disk resource online we check the dirty bit on file system to see if it’s dirty or not and if found dirty..we start chkdsk on it…what to do?....let’s call Microsoft….and as recommended they say “ it’s not recommended to stop chkdsk while its running”…ok fair enough..same question..what to do? ….the disk resource size is in terabytes and it may take hours if not days to get it finished. We cannot let our production critical apps/services waiting on chkdsk for days.

You have 2 options…loose production and wait for chkdsk to finish..kill chkdsk from task manager on your own risk [assuming you have all data backup]. And here comes the role of designing, architecting your highly available services. Prevention is always better than cure. Isn’t it! There is a pretty decent blog covering how to stop chkdsk from running on 2003 servers [including cluster servers] today we will see what options do we have for failover clustering 2008/R2. In failover clustering we have much more options of controlling the chkdsk behavior.

DiskRunChkDsk API--Determines whether the operating system runs chkdsk on a physical disk before attempting to mount the disk. Setting DiskRunChkDsk to FALSE causes the operating system to mount the disk without running chkdsk. With DiskRunChkDsk set to TRUE (the default), the operating system runs chkdsk first and, if errors are found, takes action based on the ConditionalMount property. The following table summarizes the interaction between DiskRunChkDsk and ConditionalMount.

The settings can be changed using the cluster.exe
cluster res “cluster disk ” /priv DiskRunChkDsk=value [see below for corresponding value, example 0 is default]

Chkdsk options

0 (Default): Run Normal Check. If corrupt, run chkdsk to fix the problem. Normal Check: Open the files in the root of the volume. Check volume dirty bit)

1 Run Verbose Check. If corrupt, run chkdsk to fix the problem. Verbose Check: Recursively open all files in the volume. Check volume dirty bit.

2 Run Normal Check. If corrupt, run chkdsk to fix the problem. If not corrupt, run chkdsk in read-only mode on the volume in parallel (i.e. online will proceed (and might complete) while chkdsk is running in read-only mode). We run chkdsk on a snapshot of the volume) and proceed with online.

3 Don’t do any File System check. Always run chkdsk on the volume.

4 Don’t do any File System check. Never run chkdsk, online disk without any File System check. Please note that this will also disable IsAlive/LooksAlive File System checks.

5 Run Verbose Check. If corrupt, fail online, don’t run chkdsk. User intervention required.

6 Suppresses volume creation/online/mounting during disk resource online. Disk is in offline read write mode, i.e. the disk is readable/writable using raw block level IOs. [ I doubt this is supported]

So let’s say as per standard maintenance task I need to run chkdsk every quarter or I need to take weekly backup in that case I can leverage the cluster maintenance mode to do the maintenance activity. Administrators use tools such as ChkDsk and VSS as part of weekly maintenance operation to ensure that disks are functional and there are no operational issues. These tools require exclusive access to the volume during their run. While these tools are in use, applications cannot read or write to the disk. The administrator expects the disk maintenance to succeed without ChkDsk failure and without a failover of the disk that the ChkDsk is run against. Under normal circumstances, the cluster disk resources will fail over when ChkDsk (fix error mode), VSS restore or any other tool that locks or dismounts the volume is run against a clustered disk. These tools fails part way through since cluster disk resource fails its health check that causes the cluster service to fail over the disk to the other node. This causes the node where these tools are run to lose access to the disk.

The following checks are performed on any disk that is online and that is managed by the cluster:

File system level checks: At the file system level, the Physical Disk resource type performs the following checks:
LooksAlive: By default, a brief check is performed every 5 seconds to verify that a disk is still available. The LooksAlive check determines whether a resource flag is set. This flag indicates that a device has failed. For example, a flag may indicate that periodic reservation has failed. The frequency of this check is user definable.
IsAlive: A complete check is performed every 60 seconds to verify that the disk and the file system, or systems, can be accessed. The IsAlive check effectively performs the same functionality as a dir command that you type at a command prompt. The frequency of this check is user definable.

Device level checks :At the device level, the Clusdisk.sys driver keeps checking every 3 seconds on PR table on LUN to make sure that only the owning node has ownership and can access that drive.

Maintenance mode is a mechanism provided through cluster.exe and the Failover Cluster API that places the specified resource in a mode that will disable health checking. After maintenance mode is enabled for a resource, Resource Monitor will ignore health check calls on the resource even though the resource is left in online mode. This will allow tools like ChkDsk to function against a resource that is in maintenance mode. Administrators should note that while ChkDsk is running the disk resource is not available to the application even though the resource is in online mode. When you put a disk in maintenance mode, this setting is an in-memory state and is not saved in the cluster registry hive. This change is not a persistent change. The next time that a disk is brought offline and then back online, the disk reverts to its standard behavior [we will see this later in article]
If there is any change to the state of the disk resource in maintenance mode, the maintenance mode setting is disabled. The maintenance mode setting is disabled when the following conditions are true:

Maintenance mode will remain on until one of the following occurs:
You turn it off. The node on which the resource is running restarts or loses communication with other nodes (which causes failover of all resources on that node). For a disk that is not in Cluster Shared Volumes, the disk resource goes offline or fails.

Ok fair enough…lots of talking..l picked one of my cluster shared volume “SR” and by default it is 0 [as seen below]. I am going to put this CSV disk resource in maintenance mode and then will run chkdsk on it. Ideally one should either save state or preferably properly shutdown all the VM ‘s who’s VHD are placed on this CSV before putting disk resource into maintenance mode. You will get a message that all dependent services and applications will be brought offline and cluster shared volume won’t be accessible from c:\clusterstaorage namespace.

C:\>cluster.exe res SR /priv
Listing private properties for 'SR':
T Resource Name Value
-- -------------------- ------------------------------ -----------------------
D SR DiskRunChkDsk 0 (0x0)

As I dint turned off my dependent VM’s before doing this action cluster service had to do it.

It also Removed access through the \ClusterStorage\volume path, however still allowing the owner node to access the volume through its identifier (GUID). This action also suspends direct IO from other nodes, allowing access only through the owner node. As I mentioned earlier “When you put a disk in maintenance mode, this setting is an in-memory state and is not saved in the cluster registry hive. This change is not a persistent change” we do not see this setting in registry and nor via cluster.exe output as 1 instead of 0 even after putting the disk in maintennance mode [though it’s very strange as then those properties should not be there or may be shoud be displayed in a different way] [see below]


Now once my CSV disk resource is in maintennance mode I need the GUID to run chkdsk on it. For a non CSV disk this GUID is not required as you will have a drive letter. I can either fetch the guid from mountvol.exe or powershell as shown below. though its more reliable and easier to take it from powershell if you have multiple CSV disk resources per node.

PS C:\Users\Administrator.UTOGWE> get-clustersharedvolume "SR" fc *
class ClusterSharedVolume
Name = SR
State = Online
OwnerNode =
class ClusterNode
Name = labw2k8hypv-1
State = Up
SharedVolumeInfo =
class ClusterSharedVolumeInfo
FaultState = 4
FriendlyVolumeName = C:\ClusterStorage\Volume1
Partition =
class ClusterDiskPartitionInfo
Name = \\?\Volume{ee372d39-9ec6-11de-b4ae-0017a4770008}
DriveLetter =
DriveLetterMask = 0
FileSystem = NTFS
FreeSpace = 618369548288
MaintenanceMode = True
RedirectedAccess = False
Id = 8e62ea00-9763-4c21-86d1-4a05708be24a

Possible values for VolumeName along with current mount points are:

Now once I have the guid with me its piece of cake as I just need to run the command for defrag or chkdsk..and here we go.

C:\>chkdsk file:///?\Volume{ee372d39-9ec6-11de-b4ae-0017a4770008}\
The specified volume name does not have a mount point or drive letter.
C:\>chkdsk /f \\?\Volume{ee372d39-9ec6-11de-b4ae-0017a4770008}
The type of the file system is NTFS.
Volume label is SR.
CHKDSK is verifying files (stage 1 of 3)...
256 file records processed.
File verification completed.
0 large file records processed.
0 bad file records processed.
0 EA records processed.
0 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
324 index entries processed.
Index verification completed.
0 unindexed files scanned.
0 unindexed files recovered.
CHKDSK is verifying security descriptors (stage 3 of 3)...
256 file SDs/SIDs processed.
Security descriptor verification completed.
34 data files processed.
Windows has checked the file system and found no problems.
786428927 KB total disk space.
182458068 KB in 43 files.
32 KB in 36 indexes.
0 KB in bad sectors.
90215 KB in use by the system.
65536 KB occupied by the log file.
603880612 KB available on disk.
4096 bytes in each allocation unit.
196607231 total allocation units on disk.
150970153 allocation units available on disk.

Well this is one way and then there is another easier way to do this via powershell. Another reason for you to fall in love with powershell.

PS C:\Users\Administrator.UTOGWE> get-help repair-clustersharedvolume
Run repair tools on a Cluster Shared Volume locally on a cluster node.
Repair-ClusterSharedVolume -ChkDsk [-VolumeName] [-Parameters ] []

Repair-ClusterSharedVolume -Defrag [-VolumeName] [-Parameters ] []

This cmdlet runs chkdsk.exe or defrag.exe on a CSV volume. It will turn maintenance on for the volume, move the cluster resource to the node running this cmdlet, run the tool, and then turn maintenance off for the volume. This cmdlet has to run locally on one of the cluster nodes. To run remotely, use PowerShell Remoting.

Once chkdsk/defrag has finished you need to turn off maintennance mode and manually turn on the virtual machines dependent on the CSV disk resource.

Hope this article would have given you an insight of what are the various disk maintennence options we have with failover clustering and how we can take right decisions during architecting our high availability solution.

Gaurav Anand