Lock and Mutex are very basic ideas in every language and OS.
Recently, I think for a while about Lock / Mutex/ SpinLock / Reentrant Lock / Distributed Lock, and find something interesting.

1. Mutex

Mutex(Lock) is easy, it locks() a critical code block and then will mutual exclusively do something, and after that will unlock() the critical code block, and everything goes on.

It keeps the critical code block running in the order as your wrote, but not the using the re-ordered sequence by Memory Ordering.

So what Memory_ordering will break then? Let’s call whatever it could break as invariant here right now.

l := sync.Mutex{}
l.Lock()
// critical code block, some `invariant`(s) holds and could be broken 
l.Unlock()

By holding the lock, means I need some invariant(s) hold, and probably I will break these invariant(s) in the next few lines of critical code block.
But don’t worry, after Unlock() invocation, that means:

I longer need these invariant(s) anymore, and
If I broke them in the critical code block, I have already restored these invariant(s)

So, Mutex is a tool to help you to confidently keep the invariant as what it should be after critical code block, by preventing the memory re-order, either by Compiler or by OS at the runtime.

2. SpinLock

2.1. Compare and Swap

Compare and Swap, or CAS, is an easy way to implement synchronization mechanism. Here is the logic:

// needs atomic implementation
function cas(p : pointer to int, old : int, new : int) returns bool {
    if *p ≠ old {
        return false
    }
    *p ← new
    return true
}

So it will compare if the current value is the old value:

if not, means, replaced by others, i.e., the lock is held by others, will just return false
if yes, means, not replaced by others yet, i.e., hold the lock, will just replace using the new value

Of course, the most import is the above cas function needs to be executed atomically, either using lock or implemented by hardware/CPU CompareAndSwap itself.

2.1.1. CAS and Lock-Free Data Structure

With CAS feature, we can implement some lock-free thread-safe data structure. Here is the Enqueue() method for a lock-free queue.

Enqueue(x)
	q ← new record
	q^.value ← x
	q^.next ← NULL
	p ← tail
	oldp ← p
	repeat
		while p^.next ≠ NULL
			p ← p^.next
	// iterate until p^.next == NULL, which mens, 
	// p is the very last one element, and swap, p.next = q
	until Compare&Swap(p^.next, NULL, q) 

	Compare&Swap(tail, oldp, q)
end

Dequeue()
	repeat
		p ← head
		if p^.next = NULL
			error queue empty
	// iterate until head == p, which means, 
	// p is the very head of this queue, and swap, head point to p.next
	until Compare&Swap(head, p, p^.next)

	return p^.next^.value
end

No lock used above, but the Queue data structure is thread safe
Enqueue keeps checking the invariant p.next == NULL, or tail not changed
Dequeue keeps checking the invariant head == p, or head not changed

So, again, essentially point out that, concurrency model needs to protect invariant keep as it should be.

In a summary, it use a retry logic in loop until the invariant is satisfied, and once the invariant is satisfied, it immediately change the value to a new one, under the condition that, the CAS step is atomic!

So, if you want to implement a lock free data structure, put the check-and-set procedure using a Loop-CAS implementation.

2.1.2. CAS and Optimistic Lock Update

With CAS feature, we can also implement optimistic lock.
For example, using SQL

1	Update colume_A = value, version = previous_value+1 where version = previous_value;

to implement a Compare and Swap semantics. Plus a loop, we are good to implement the Loop-CAS, i.e, lock-free Update.

Here is an example of Laravel, updated_at will be automatically updated each time of update request to the row.

function transfer($fromAccountId, $toAccountId, $balance)
{
  $fromQuery = Account::whereId($fromAccountId);
  if (! $fromQuery->exists()) {
    throw new InvalidAccountException();
  }
  $toQuery = Account::whereId($toAccountId);
  if (! $toQuery->exists()) {
    throw new InvalidAccountException();
  }
  // step 1, decrease money from $fromAccount
  do {
    // get the value of `updated_at`
    $fromAccount = $fromQuery->first();
    if ($fromAccount->balance < $amount) {
      throw new InsufficientBalanceException();
    }
    // if updated_at is not changed in the middle 
    // will change the Account in DB
    $updated = Account::whereId($fromAccountId)
      ->where('updated_at', '=', $fromAccount->updated_at)
      ->update(['balance' => $fromAccount->balance - $amount]);
  } while (! $updated);

  // step 2, add money into $toAccount
  do {
    $toAccount = $toQuery->first();    
    $updated = Account::whereId($toAccountId)
      ->where('updated_at', '=', $toAccount->updated_at)
      ->update(['balance' => $toAccount->balance + $amount]);
  } while (! $updated);

  // step 3, save transaction record 
  $transaction = new Transaction();
  $transaction->from_account_id = $fromAccountId;
  $transaction->to_account_id   = $toAccountId;
  $transaction->amount          = $amount;
  $transaction->save();
}

The good thing is: lock free, with higher performance.

But the bad thing is, or not that bad, just not as good as Pessimistic Lock, it’s kind of Eventual Consistency.
If another thread read data in the middle time of step 2, it will read an inconsistent data state between the fromAccount and the toAccount.
Because during each step1, step2, step3, it ensure the invariant only inside these steps (update_at is not changed), but not across these steps (toAccount.balance + fromAccount.balance == totalMoney).

2.2. SpinLock Implementation using CAS

If we put more eye into the Optimistic Lock example above, we can know step 2 is blocked by step 1, which is using a Loop-CAS model.

So what if we think step 1 as a Lock() method, which means, only step 1 is executed, step 2 could start. Here we got the idea of SpinLock() to use CAS.

package spinlock

import (
	"runtime"
	"sync"
	"sync/atomic"
)

type SpinLock uint32

func NewSpinLock() sync.Locker {
	var lock SpinLock
	return &lock
}
func (sl *SpinLock) Lock() {
	// keep checking, until sl == 0, i.e, nobody locks it,
	// and then change the sl = 1, i.e, lock it
	for !atomic.CompareAndSwapUint32((*uint32)(sl), 0, 1) {
		runtime.Gosched()
	}
}
func (sl *SpinLock) Unlock() {
	atomic.StoreUint32((*uint32)(sl), 0)
}

3. When to SpinLock? When to Mutex?

Differences?

SpinLock: As you can see, SpinLock idea comes from Loop-CAS, which is lock-free, but kinda busy checking.
Mutex: When Lock() invocation happens, however can’t get the lock, the thread/goroutine will be added to wait list of scheduler and thread/goroutine switch happens at this time.

So SpinLock thinks that I should very probably get the lock in my scheduling CPU slot, and in a aggressive way.

What about Mutex scheduling? In Golang implementation, it’s:

If Lock() happens and can’t get the lock, will be added to wait list,
For new young goroutine who already get the lock, will just let them run, since they already get the lock,
If the old goroutine has been wait for very long time, will give the lock to him and let him run, so it won’t wait for too long. By default, starvationThresholdNs is 1e6 nanoseconds (1ms). Normally it’s FIFO order, but if starvationThresholdNs meet, will enter starvation mode.

In a summary:

Under multi-core CPU, if you think that you will quickly get the lock, then please use SpinLock, so that it doesn’t need context switch to get the lock and execute the code.
1. the time to get the lock is short (short task or low concurrency)
2. the critical code block running time cost won’t be too much, i.e., can quickly do the shared variables computation and release the lock
3. the concurrency is not too much, or it will needs lots of context switches to release the lock
Under single-core CPU, never use SpinLock, since it will try to use up the CPU slot, until context switch happens and then back to get the lock. Context Switch always happens in 1 Core CPU. But the above SpinLock code implementation is good, since it always yield the CPU to the other goroutines by using runtime.Gosched(). So the SpinLock implementation here is a bit like a Mutex but lock-free.
Under multi-core CPU, if you think your critical code block can’t complete in the assigned CPU slot, please try to use Mutex, so that it won’t waste up much time on getting the lock.
1. the time to get the lock is long (long task or high concurrency)
2. the critical code block running time cost is little large, i.e, keeps computation on top of shared variables for a long time. So it’s important not to put I/O into critical code block.

To sum up:

Single-core: not use SpinLock, or at least yield the processor every time failed CAS.
Multi-core, short task or low concurrency: use SpinLock.
Multi-core, long task or high concurrency: use Mutex.
Don't use I/O in critical code block, this should be a common knowledge.

If high concurrency with heavy I/O, then better to use goroutine + channel solution.

4. Reentrant Lock or Recursive Lock, why not?

TBD

5. Distributed Lock

TBD

Ref: