Sysinternals Homepage
Forum Home Forum Home > Windows Discussions > Development
  New Posts New Posts RSS Feed - Mutexes and multiple machines
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

Mutexes and multiple machines

 Post Reply Post Reply Page  <12
Author
Message
Guvna View Drop Down
Newbie
Newbie
Avatar

Joined: 16 December 2005
Location: United States
Status: Offline
Points: 19
Post Options Post Options   Thanks (0) Thanks(0)   Quote Guvna Quote  Post ReplyReply Direct Link To This Post Posted: 26 January 2006 at 1:57pm
Well I could design and implement a solution for this, but I must say that I am very surprised if Microsoft do not have anything like this already, even if it's undocumented!

It may be that they (nor anyone else) has had to tackle a problem that requires this facility, it is unusual I suppose.

What I want is a network transparent extension to Mutexes. Now I used to work a lot on an OS called VOS (a derivative of Multics) that ran (and still does in fact) on Stratus minicomputers (see www.stratus.com).

From my internals knowledge of that OS, I recall they had a concept called a "Shadow" and for every system primitive (like event etc) they had a shadow version (ie an event data structure in the kernel had a flag called "shadow").

So in the kernel namespace (akin to that in NT) one could find events that were either "real" (referring to a local pathname) or "shadow" (refferring to a remote pathname).

This was hidden and meant that the same API call s$notify_event would work with an event whether that event was local or remote, so the event API was independnet of the locality of an event (always implemented using a real full file pathname as the unique identifier for the event).

The OS, when a user-app called s$notify_event, would send a message to other machines on the network and the receiving OS would see the message, scan its event table and wake-up all processes waiitng on this (shadow) event.

Now I can ponder this idea and create a similar "shadow mutex" and possibly invent a system service to manage this (I cant easily add code to the OS which is where it should be perhaps) but note that a Mutex is waited for using WaitForSingleObject (and others) so this requires some non-trivial design, if we dont want deadlocks, hangs, etc.

However this is somewhat involved and not something to be knocked-up without a lot of thought and testing.

So does anyone on SysInternals have any input on this?

H










Back to Top
lamie View Drop Down
Senior Member
Senior Member


Joined: 03 November 2005
Status: Offline
Points: 253
Post Options Post Options   Thanks (0) Thanks(0)   Quote lamie Quote  Post ReplyReply Direct Link To This Post Posted: 27 January 2006 at 1:30pm
I don't think we need to get involved in any kind of kernel hacking here. You should be able to do the whole thing with user land API's.

You have a resource X which is modified by many different clients but only by one at a time. When a client wants access the resource instead of asking the OS if the resource is free it asks the central controlling server. If the resource is in use the server says no and the client gets on with other things. In other words it polls the server with a dedicated thread just the same way it would do if it were doing this locally.

The server contains the most recent copy of the resource and each time a client releases the lock on the resource it first uploads the modified resource to the server such that all clients gaining access to the resource are guaranteed to have the latest version of the resource.

I can't see any reason why this would be unfeasible.
Back to Top
mkplante View Drop Down
Newbie
Newbie


Joined: 04 January 2006
Location: United States
Status: Offline
Points: 11
Post Options Post Options   Thanks (0) Thanks(0)   Quote mkplante Quote  Post ReplyReply Direct Link To This Post Posted: 27 January 2006 at 2:21pm
Two things:  first of all, with a client-server architecture, it probably would be easier.  Is that available?  Secondly, how does the server know that a client has crashed and therefore put the mutex in WAIT_ABANDONED_0 or whatever?  Whenever a single machine crashes with local objects, it's pretty simple to decide, since memory is local...how much of this can TCP really solve?

Edited by mkplante
Back to Top
Guvna View Drop Down
Newbie
Newbie
Avatar

Joined: 16 December 2005
Location: United States
Status: Offline
Points: 19
Post Options Post Options   Thanks (0) Thanks(0)   Quote Guvna Quote  Post ReplyReply Direct Link To This Post Posted: 27 January 2006 at 2:35pm
Well I am not intending to do anything in the "kernel" so to speak. Your suggestion seems to assume there is some "central controlling server" which is not what I had in mind.

If the locker and the lock are not on that central server then there is no reason for that server to play in a role in that lock operation is there?

Also such models require one to designate in some way a machine as a central server which is an ecumberance in my opinion.

Doubtless something can be engineered to do what I want, but I was hoping that something already existed that does this.

There are numerous complications that cannot be glossed over and this makes the issue non-trivial. For example if a process on a machine aquires a "lock" on another machine (via some proxy process) then the locking process dies or exits, the proxy needs to know.

I think a model that works is that EVERY machine must run a lock proxy (service) process, and every machine is on an equal footing with every other (no cenrtal server).

If a process wants a lock (Mutex etc) and that lock is local, just get the lock in the usual way. If the lock is remote then contact the proxy on the machine where the lock (resource etc) is situated and wait for it to get that lock and send you a reply.

Now the complication is that when one tries to get a Mutex that is busy, one often calls WaitForSingleObject but if the Mutex is remote then one needs to wait on a local Event that is somehow linked to a wait operation performed by the proxy.

In other words, I try to get remote lock (via a proxy) I fail to get it, so I decide to wait; somehow we need to tell that proxy that we are waiting and then have the proxy wait on the Mutex (which is local from the point of view of the proxy).

But the proxy must be available for other requests, all the tume, so this proxied-wait must be a thread started in the proxy for the sole purpose of waiting (and trying to get) on the lock.

Of course the original waiting process could die or crash etc and the proxy-thread (waiting on its behalf) needs to know this and stop waiting.

This whole thing can get rather involved, but needs to be rock solid if it is to be deployed in a real world setting.


Back to Top
lamie View Drop Down
Senior Member
Senior Member


Joined: 03 November 2005
Status: Offline
Points: 253
Post Options Post Options   Thanks (0) Thanks(0)   Quote lamie Quote  Post ReplyReply Direct Link To This Post Posted: 27 January 2006 at 2:41pm
If you want to go this way then whoever has the lock on the resource has to notify all other stations so that the other stations know who they are to poll for the resource. I still think this is doable but I think that a central server model would work better. However, that is not to say that you need to dedicate one machine to always act as the server.
Back to Top
mkplante View Drop Down
Newbie
Newbie


Joined: 04 January 2006
Location: United States
Status: Offline
Points: 11
Post Options Post Options   Thanks (0) Thanks(0)   Quote mkplante Quote  Post ReplyReply Direct Link To This Post Posted: 27 January 2006 at 7:25pm

Okay, tell me if this idea works or not:

The machine that is actually connected to the shared resource acts as the server (not the machine that currently owns it, since that changes, and may sometimes be no machine).  Call this Machine A.  If no one owns the resource, Machine B requests it from A and A grants it, with A and B keeping track of that fact.  Machine B sends keepalive packets to A every, say, 500ms, and A somehow acknowledges them (if it's not already obvious that A is still up).  C requests it from A, and A puts it on a queue.  A sends to C either a keepalive packet every 500ms to indicate that B still owns it, a message to indicate B released it and C now owns it (which must be acknowledged), or a message to indicate that the object is abandoned. 

Behavior specific to abandonment should be specific to the resource, since certain cleanup behavior might be required.  The reason I say "if it's not already obvious that A is still up," is that if A is physically connected to the resource protected by the mutex, chances are that B is continuously communicating with A already anyway.

I'm sure there are flaws with this idea, as I just came up with it off the top of my head, but are they serious?

Back to Top
Guvna View Drop Down
Newbie
Newbie
Avatar

Joined: 16 December 2005
Location: United States
Status: Offline
Points: 19
Post Options Post Options   Thanks (0) Thanks(0)   Quote Guvna Quote  Post ReplyReply Direct Link To This Post Posted: 01 February 2006 at 12:23pm
Well your are right I think about the "keep alive" or heartbeat, however this is done it does need to be done.

The queue is an interesting idea but probably not necessary because a Mutex (which is what I want to model across machines) does not use a queue, multiple waiters just wait and then one and only one, will get the Mutex when the current owner releases it.

A process(thread in actuality) could wait on a remote Mutex, by waiting on an IP socket and the proxy would do a "real" wait on its behalf, if/when that wait gets satisfied/times out, then a msg wd be sent over the socked indicating this.

Basically one needs to write analogs of existimg functions, like this:

CreateGlobalMutex
OpenGlobalMutex
WaitForSingleGlobalObject
WaitForMultipleGlobalObjects
ReleaseGlobalMutex

If any of these are called for a Mutex that is "local" then they pretty much just call the real Win32 method, but if they are called for a "remote" Mutex, then they interact (using some kind of FSM perhaps) with the remote proxy over a socket/sockets.

If this were done correctly, then one could take existing code that uses Mutexes and just replace the calls with the "Global" versions, rebuild and run.

The thing is I really dont want to design/code all this if I can really avoid it!!!


Back to Top
lamie View Drop Down
Senior Member
Senior Member


Joined: 03 November 2005
Status: Offline
Points: 253
Post Options Post Options   Thanks (0) Thanks(0)   Quote lamie Quote  Post ReplyReply Direct Link To This Post Posted: 01 February 2006 at 2:40pm
Now that sounds like the best idea yet. I like the idea of the new function testing to see if the resource is local and then passing off to the usual win32 function. You could accomplish this with very little overhead and then make applications that would work both locally and on a distributed network.

I like it!
Back to Top
 Post Reply Post Reply Page  <12
  Share Topic   

Forum Jump Forum Permissions View Drop Down