Tuesday, May 29, 2012

GIST NOTES 11 - Advanced Concurrency and Non-blocking Threads


GIST NOTES 11 - Advanced Concurrency and Non-blocking Threads

[DISCLAIMER: This is solely for non-commercial use. I don't claim ownership of this content. This is a crux of all my readings studies and analysis. Some of them are excerpts from famous books on  the subject. Some of them are my contemplation upon experiments with direct hand coded code samples using IDE or notepad.


I've created this mainly to reduce an entire book into few pages of critical content that we should never forget. Even after years, you don't need to read the entire book again to get back its philosophy. I hope these notes will help you to replay the entire book in your mind once again.]


[JDK 7]


======================
CONCURRENT PROGRAMMING
======================
>in concurrent programming there are two basic units of execution; 1.processes and 2. threads
>concurrency or the effect of concurrency is possible even in single processor systems; but gives enhanced performance in multiprocessor systems which are common today

Processes
----------
>a process is a self contained execution environment
>it has a complete, private set of basic run-time resources particularly memory space
>processes can be seen as programs or applications however, an application can be a set of cooperating processes
>to facilitate communication between processes, most operating systems support Inter Process Communication(IPC) resources
>such resources may be pipes or sockets
>IPC is not just for communication between processes on the same system, but also for processes running on different systems
>most JVM implementations run as a single process; a java application can create additional processes using ProcessBuilder object

Threads
-------
>threads are sometimes called as lightweight processes(LWP)
>both processes and threads provide an execution environment
>but creating a new thread requires fewer resources than creating a new process
>threads exist within a process; every process has at least one thread
>threads share the resources (like memory and open files) of the process under which they run
>this makes for efficient but potentially problematic communication among threads

>every java application has at least one thread(along with several system threads of course); user always starts with 'main' thread when he runs the java app

>one thread can interrupt another thread; on such occasion it is a general programming practice for the interrupted thread to return from whatever it was doing

>suppose, a thread didn't receive InterruptedException for long time; how can it check whether it has received any interrupts or not?
>it can call Thread.interrupted() method which returns true if there was any pending interrupt signals for this thread; if true the thread can return from its work or waiting status; interrupted() method clears the interrupted flag when called, so that it can receive next interrupt signals on that boolean variable
>upon the detection of interrupt, the thread can throw InterruptedException in order to propagate the signal to a centralized exception handler

Interrupt status flag
---------------------
>this flag is maintained internally in every thread
>[non static]Thread.interrupt() can set this flag
>[static]Thread.interrupted() returns the flag and clears it(this method can be used by the owner thread only)
>[non static]Thread.isInterrupted() returns the flag leaving the flag status intact; this method is used by other threads to check this thread
>By convention any method that exits by throwing InterruptedException, clears interrupt flag on exit; however another thread might set interrupt flag to true immediately again

>join() and sleep() have overloaded methods to specify time out; both of them throw InterruptedException upon interrupt() call on this thread by other threads

>interrupt(), interrupted(), isInterrupted()

Thread Interference
-------------------
>corrupting the shared data (shared by multiple threads)

Memory Consistency Errors
-------------------------
>errors that occur due to inconsistent 'view' of the shared data by a thread

>to avoid this error, an understanding of happens-before relationship is required
>this relationship ensures that memory writes are visible and apparent for every thread; because write happens before other thread reads; one of the things that create happens-before relationship is synchronization
>two more such things are Thread.start() and Thread.join()
>Thread.start() every statement before this call happen before all the statements of the new thread
>Thread.join() all statements of target(which is joined by another) thread happen before the subsequent statements of the joining thread
>synchronized keyword before constructor is illegal
>no need to synchronize constructors because, only one thread has access to an object which is being constructed through constructors
>but before that make sure that the reference to the object which is being constructed(through constructor call) is not leaked prematurely; like adding 'this' reference to a shared list inside the constructor will create concurrency issues
>synchronized methods are simple solutions to avoid thread interference and memory consistency errors

>final fields can be safely read through non-synchronized methods, but still 'liveness' problem can occur due to this

Intrinsic lock or monitor lock
------------------------------
>this locks provide both exclusive access and happens-before relationship essential to visibility
>these are nothing but object locks in java used internally for synchronized methods and blocks
>synchronizing unwanted code can cause 'liveness' problem
>synchronized blocks allow use of multiple locks for different sections of code, and allow to keep unwanted code lines from being synchronized

Reentrant synchronization
-------------------------
>a thread can acquire a lock it already owns again and again
>this facility enables reentrant synchronization
>hence easy to call another synchronized method for which the already acquired lock is sufficient
>without reentrant synchronization, a thread can block itself forever and to avoid that, requires additional precautions

Atomic Action
-------------
>all steps happen at once; nobody can come in between

What are/aren't already atomic in java?
---------------------------------------
1.read or write for reference variables ARE
2.read or write for primitives except double and long ARE
3.read or write for volatile variables of all kinds ARE(reference, primitives including long & double)
4.increment/decrement operations ARE NOT atomic

>atomic operations cannot be interleaved hence they can be used without the fear of thread interference
>use of volatile variables reduces the risk of memory consistency errors
>because writing volatile variables ensures happens-before relationship with subsequent reads of that variable(what a humbo jumbo about happens-before stuff? I don't understand a thing. Well, remember everybody is forced to not keep any cached data for volatile variables? so when someone writes a volatile variable, everyone who reads subsequently will by default by the design of volatile variable and its rules, get the latest update; hence, any volatile variable write has the happens-before relationship with any upcoming reads)
>not only the latest writes of volatile variables are visible to all threads; but also the side effects of the code that updated is also visible to all threads

>using atomic variables is more efficient than synchronized codes; but lot of responsibility(to avoid memory consistency errors) is left to users of atomic variables, which is not the case for synchronized codes; that is in sychronized coded applications, threads can be dumb and not care about the way other threads might change the shared data; but in atomic variable situation, threads have to be aware of everyone else and their actions on the shared data and how they can screw the data up;

>so the choice between atomic variables and synchronized code depends on the size and complexity of the application

Liveness: ability of a concurrent application to execute in a timely manner
--------

Liveness Problems: deadlock, starvation and livelock

Common one: deadlock - blocked by each other(everybody is waiting for each other)
Less common ones: starvation and livelock

Starvation: a thread not able to gain regular access and unable to progress; happens when a greedy thread hogs the shared resources; for example when a synchronized method takes long time to return and that method is called by a thread frequently, other threads that need frequent access to other methods on the same object will be often blocked

Livelock: here threads are not blocked as in deadlock; they are simply too busy responding to each other to resume work; they get tangled up in responses only never moving on to resume the work; similar to two people attempting to pass each other in a corridor; happens when both the threads are being nice to each other blindly, not realizing that they are doing an awkward dancing(eek!)

Guarded Blocks
--------------
>certain block of code has to be entered upon meeting a specific condition only
>instead of waiting for the condition on a CPU wasteful loop, use wait() notifyAll() mechanism of threads
>Note: There is a second notification method, notify, which wakes up a single thread. Because notify doesn't allow you to specify the thread that is woken up, it is useful only in massively parallel applications — that is, programs with a large number of threads, all doing similar chores. In such an application, you don't care which thread gets woken up.

A Strategy for Defining Immutable Objects(no need for synchronization for these)
-----------------------------------------
The following rules define a simple strategy for creating immutable objects. Not all classes documented as "immutable" follow these rules. This does not necessarily mean the creators of these classes were sloppy — they may have good reason for believing that instances of their classes never change after construction. However, such strategies require sophisticated analysis and are not for beginners.
Don't provide "setter" methods — methods that modify fields or objects referred to by fields.
Make all fields final and private.
Don't allow subclasses to override methods. The simplest way to do this is to declare the class as final. A more sophisticated approach is to make the constructor private and construct instances in factory methods.
If the instance fields include references to mutable objects, don't allow those objects to be changed:
Don't provide methods that modify the mutable objects.
Don't share references to the mutable objects. Never store references to external, mutable objects passed to the constructor; if necessary, create copies, and store references to the copies. Similarly, create copies of your internal mutable objects when necessary to avoid returning the originals in your methods.
--x--

java.util.concurrent.locks
--------------------------
>a framework for locking and waiting for conditions that is distinct from built-in synchronization and monitors
>Lock replaces use of synchronized methods and statements
>a Condition replaces the use of the Object monitor methods

Lock
----
The use of synchronized methods or statements provides access to the implicit monitor lock associated with every object, but forces all lock acquisition and release to occur in a block-structured way: when multiple locks are acquired they must be released in the opposite order, and all locks must be released in the same lexical scope in which they were acquired.

While the scoping mechanism for synchronized methods and statements makes it much easier to program with monitor locks, and helps to avoid many common programming errors involving locks, there are occasions where you need to work with locks in a more flexible way. For example, some algorithms for traversing concurrently accessed data structures require the use of "hand-over-hand" or "chain locking": you acquire the lock of node A, then node B, then release A and acquire C, then release B and acquire D and so on. Implementations of the Lock interface enable the use of such techniques by allowing a lock to be acquired and released in different scopes, and allowing multiple locks to be acquired and released in any order.

With this increased flexibility comes additional responsibility. The absence of block-structured locking removes the automatic release of locks that occurs with synchronized methods and statements. In most cases, the following idiom should be used:

     Lock l = ...;
     l.lock();
     try {
         // access the resource protected by this lock
     } finally {
         l.unlock();
     }

When locking and unlocking occur in different scopes, care must be taken to ensure that all code that is executed while the lock is held is protected by try-finally or try-catch to ensure that the lock is released when necessary.
Lock implementations provide additional functionality over the use of synchronized methods and blocks by providing a non-blocking attempt to acquire a lock (tryLock()), an attempt to acquire the lock that can be interrupted (lockInterruptibly(), and an attempt to acquire the lock that can timeout (tryLock(long, TimeUnit)).

A Lock class can also provide behavior and semantics that is quite different from that of the implicit monitor lock, such as guaranteed ordering, non-reentrant usage, or deadlock detection. If an implementation provides such specialized semantics then the implementation must document those semantics.

Note that Lock instances are just normal objects and can themselves be used as the target in a synchronized statement. Acquiring the monitor lock of a Lock instance has no specified relationship with invoking any of the lock() methods of that instance. It is recommended that to avoid confusion you never use Lock instances in this way, except within their own implementation.

The java.util.concurrent package defines three executor interfaces:
-------------------------------------------------------------------
1.Executor, a simple interface that supports launching new tasks.
2.ExecutorService, a subinterface of Executor, which adds features that help manage the lifecycle, both of the individual tasks and of the executor itself.
3.ScheduledExecutorService, a subinterface of ExecutorService, supports future and/or periodic execution of tasks.

Runnable - a task that returns nothing
Callable - a task that returns something
Future - which helps to retrieve the return value of a Callable; helps to manage the status of Runnable and Callable tasks

ExecutorService - supports creation and management of Runnable/Callable through Future objects; also supports managing the shutdown of the executor

Creating Your Own Thread Pool
-----------------------------
>determining thread pool size (fixed or dynamic)
>job addition (single thread or multithreads add jobs)
>workers should be responsive to the manager all the time
>premature shutdown/graceful shutdown of all threads
>manager(or main thread) can wait for all threads to complete through join() method
>workers can wait on empty queue
>nofication is necessary whenever a new job is added to the queue
>when a thread is interrupted from wait() and if the wait() statement is wrapped in try-catch within synchronized block, when the control enters catch block to handle InterruptedException, at this time also it needs the lock on the object; if you put sleep statement inside catch block, it sleeps while holding the lock; no other thread can use the lock;
>the above situation happens if you badly coded the after statements to wait()-try-catch section; because when the manager interrupts all the workers to shutdown, the first interrupted thread can hold the lock in the catch block and sleep forever, and no other thread will be able to proceed to graceful shutdown and thread pool will never shutdown and never do any work either

>in the middle of an atomic operation do not give up your lock to others; if you do, when you(thread) come back, start the atomic operation all over again(so be mindful of where you put wait() call)

>PThread - POSIX standard threads usually available in Unix systems; POSIX-Portable Operating System Interface

Guarded Block - sensitive section of code block that accesses shared data

Mutex - mutual exclusion mechanism to protect guarded blocks

Monitor - mechanism to protect guarded blocks using mutex objects such as locks

Semaphore - alternate mechanism to protect guarded blocks; differs from monitor in the respect that semaphore has more than one identical resources (instead of a single lock in Monitors), and keeps giving them away to each thread, each time keeping track of no.of remaining resources; once all of them are over, it forces new threads to wait; it also doesn't insist on ownership (threads that acquired lock, has complete ownership over it and it can decide when to release) to threads as in Monitors

Dining philosopher problem - thread contention illustrated by shared forks

IPC Communication Through: sockets, signal handlers, shared memory, semaphores, and files


-----------------------------------------------------------------------------------------------
Java Concurrency In Practice By Brian, Tim, Joshua, Joseph, David and Doug Lea
-----------------------------------------------------------------------------------------------
[Reading this book gives you nightmares. It is like, all you know about java for-sure, go wrong in concurrent environment]

Threads are more basic units of scheduling than processes.

Since threads share same heap/memory and resources from their process, they can communicate more efficiently.

Threads are useful in GUI applications for improving responsiveness of the user interface, and in server applications for improving resource utilization and throughput.

They simplify the implementation of JVM(GC usually runs in one or more dedicated threads).

Java java.nio is for non-blocking IO(was developed for the use of single threaded programs which cannot be blocked).

[[[Frameworks(such as servlets, RMI, etc) introduce concurrency into applications by calling application components from framework threads. Components invariably access application state, that requiring that all code paths accessing that state be thread-safe.]]]

Use of the following could introduce concurrency issues into your SINGLE threaded program
-----------------------------------------------------------------------------------------
>Timer - TimerTasks might access shared data in future in parallel to your application
>Servlets and Java Server Pages(JSP) - servlets need to be thread safe because to handle high volume http requests, multiple threads are employed by the webserver; even if a servlet is used by one thread, it still could access application wide objects like ServletContext(application-scoped) or shared session-objects(HttpSession)
>RMI - remote object calls happen in their own threads created by RMI framework; so remote objects should ensure concurrent access for themselves as well as the data objects they expose
>Swing and AWT - GUI apps are inherently asynchronous; to ensure responsiveness of GUI at all times, Swing and AWT create a separate thread for handling user-initiated events, and to update the GUI visible to the user; Swing components(like JTable) are not thread-safe; instead thread-safety is achieved by confining all access to GUI to a single thread; any code which wants to control GUI should run under the single event-dispatcher thread

#often it is effective to place synchronization as close to shared data(or within shared data object) so that thread-safety is ensured everywhere
#it is also imperative to see what is atomic at every level of programming; though you use a thread-safe data structure, it may not provide atomicity at your application level, and hence synchronization is needed in your application level as well

>When designing thread safe classes, good object oriented techniques encapsulation, immutability, and clear
specification of invariants are your best friends.

>A class is [Thread Safe] when it continues to behave correctly when accessed from multiple threads

>Thread safe classes encapsulate any needed synchronization so that clients need not provide their own

>stateless objects(which have no fields, relying only on method local variables) are always thread safe(because local variables on stack are specific to the executing thread and not shared)

>Sateless Servlets are great because they don't require thread safety codes

>Race Condition: The possibility of incorrect results in the presence of unlucky timing; the correctness of a computation depends on the relative timing or interleaving of multiple threads by the runtime; in other words, getting the correct result depends on chance factor;
>The most common type of race condition is check-then-act, where a potentially stale observation is used to make a decision on what to do next
>race condition is often confused with a related term data-race; data-race happens when two threads share a non-final field without coordination; happens whenever a thread is trying to write a variable that might be read next by another thread, or trying to read a variable that might have been written by another thread last time; not all race conditions are data races; not all data races are race conditions; but both make application behave unpredictably in concurrent environment
>lazy initialization(initialize only when it is required, and only once) can cause race conditions if not synchronized [e.g. concurrency problem in singleton pattern]

>Compound Actions - sequences of operations that must be executed atomically in order to remain thread-safe

>when there is only one state variable(instance field or otherwise), use Atomic data type to ensure thread-safety; when you have more than one state variable in your object, using atomic data type is not just enough

>to preserve state consistency, update related variables in a single atomic operation

>intrinsic locks (or monitor locks) in java act as mutexes(mutual exclusion locks)

>Reentrancy means that locks are acquired on a per-thread rather than perinvocation basis. [7] Reentrancy is  implemented by associating with each lock an acquisition count and an owning thread. When the count is zero, the lock is considered unheld. When a thread acquires a previously unheld lock, the JVM records the owner and sets the acquisition count to one. If that same thread acquires the lock again, the count is incremented, and when the owning thread exits the synchronized block, the count is decremented. When the count reaches zero, the lock is released.
[7] This differs from the default locking behavior for pthreads (POSIX threads) mutexes, which are granted on a per invocation basis.

>Reentrancy facilitates encapsulation of locking behavior, and thus simplifies the development of object oriented
concurrent code. Without reentrant locks, the very natural looking code in Listing 2.7, in which a subclass overrides a synchronized method and then calls the superclass method, would deadlock.

>synchronization is also about visibility of data modifications across all threads (no surprises to any thread)

>statement reordering(different order from the code) can be done by compiler and JVM memory model and CPU to take advantage of caching data between operations, and for improving performance; this reordering can give the most weirdest results in a concurrent environment where thread-safety is missing

>3.1.2. Non atomic 64 bit Operations When a thread reads a variable without synchronization, it may see a stale value, but at least it sees a value that was actually  placed  there  by some  thread  rather  than some random  value.  This  safety  guarantee  is  called  out of thin air safety. Out of thin air safety applies to all variables, with one exception: 64 bit numeric variables (double and long) that are not declared volatile . The Java Memory Model requires fetch and store operations to be atomic, but for nonvolatile long and double variables, the JVM is permitted to treat a 64 bit read or write as two separate 32 bit operations. If the reads and writes occur in different threads, it is therefore possible to read a nonvolatile long and  get back the high 32 bits of one value and the low 32 bits of another.[3] Thus, even if you don't care about stale values, it is  not  safe  to  use  shared  mutable  long  and  double  variables  in  multithreaded  programs  unless  they  are  declared volatile or guarded by a lock.
[3] When the Java Virtual Machine Specification was written, many widely used processor architectures could not efficiently provide atomic 64 bit arithmetic operations.

>Locking is not just about mutual exclusion; it is also about memory visibility. To ensure that all threads see the most up to date values of shared mutable variables, the reading and writing threads must synchronize on a common lock.

>The  Java  language  also  provides  an  alternative,  weaker  form  of  synchronization,  volatile  variables,  to  ensure  that updates to a variable are propagated predictably to other threads. When a field is declared volatile, the compiler and runtime  are  put on  notice  that  this  variable  is  shared  and  that  operations  on  it  should  not  be  reordered  with  other memory  operations.  Volatile  variables  are  not  cached  in  registers  or  in  caches  where  they  are  hidden  from  other processors, so a read of a volatile variable always returns the most recent write by any thread.

>Use  volatile  variables  only when  they  simplify implementing and  verifying  your  synchronization  policy;  avoid  using volatile variables when verifying correctness, would require subtle reasoning about visibility. Good uses of volatile variables  include  ensuring  the  visibility  of  their  own  state,  that  of  the  object  they  refer  to,  or  indicating  that  an important lifecycle event (such as initialization or shutdown) has occurred.

>[6] Debugging  tip:  For  server  applications, be  sure  to always specify  the -server JVM command  line  switch when  invoking  the JVM, even  for development and testing. The server JVM performs more optimization than the client JVM, such as hoisting variables out of a loop that are not modified  in the loop; code that might appear to work  in the development environment  (client JVM) can break  in the deployment environment (server JVM). For example, had we "forgotten" to declare the variable asleep as volatile in Listing 3.4, the server JVM could hoist the test out
of the loop (turning it into an infinite loop), but the client JVM would not. An infinite loop that shows up in development is far less costly than one that shows up only in production.

Listing 3.4. Counting Sheep.
volatile boolean asleep;
...
while (!asleep)
countSomeSheep();

>Locking can guarantee both visibility and atomicity; volatile variables can only guarantee visibility.

>You can use volatile variables only when all the following criteria are met:
  Writes  to  the  variable  do  not  depend  on  its  current  value,  or  you  can  ensure  that  only  a  single  thread  ever updates the value; The variable does not participate in invariants with other state variables; and Locking is not required for any other reason while the variable is being accessed.

>Listing 3.6. Allowing Internal Mutable State to Escape. Don't Do this.
class UnsafeStates {
private String[] states = new String[] {
"AK", "AL" ...
};
public String[] getStates() { return states; }
}

>Publishing states in this way is problematic because any caller can modify its contents. In this case, the states array has escaped its intended scope, because what was supposed to be private state has been effectively made public.
Publishing an object also publishes any objects referred to by its non private fields. More generally, any object that is reachable from a published  object by  following some chain  of non private  field references and method  calls has  also been published.

>From  the  perspective  of  a  class  C,  an  alien  method  is  one  whose  behavior  is  not  fully  specified  by  C.  This  includes methods in other classes as well as overrideable methods (neither private nor final) in C itself. Passing an object to an alien method must also be considered publishing that object. Since you can't know what code will actually be invoked, you don't know that the alien method won't publish the object or retain a reference to it that might later be used from another thread. Whether another thread actually does something with a published reference doesn't really matter, because the risk of misuse is still present.[7] Once an object escapes, you have to assume that another class or thread may, maliciously or carelessly,  misuse  it.  This  is  a  compelling  reason  to  use  encapsulation:  it  makes  it  practical  to  analyze  programs  for
correctness and harder to violate design constraints accidentally.
[7] If someone steals your password and posts it on the alt.free passwords newsgroup, that information has escaped: whether or not someone has (yet) used those credentials to create mischief, your account has still been compromised. Publishing a reference poses the same sort of risk.

>A  final  mechanism  by  which  an  object  or  its  internal  state  can  be  published  is  to  publish  an  inner  class  instance,  as shown  in  ThisEscape  in  Listing  3.7.  When  ThisEscape  publishes  the  EventListener,  it  implicitly  publishes  the enclosing  ThisEscape  instance  as  well,  because  inner  class  instances  contain  a  hidden  reference  to  the  enclosing instance.

28  Java Concurrency In Practice
Listing 3.7. Implicitly Allowing the this Reference to Escape. Don't Do this.
public class ThisEscape {
public ThisEscape(EventSource source) {
source.registerListener(
new EventListener() {
public void onEvent(Event e) {
doSomething(e);
}
});
}
}

>an  object  is  in  a predictable, consistent state only after its constructor returns, so publishing an object from within its constructor can publish an incompletely constructed object. This is true even if the publication is the last statement in the constructor. If the this reference escapes during construction, the object is considered not properly constructed.[8]
[8] More specifically, the this reference should not escape from the thread until after the constructor returns. The this reference can be stored somewhere by the constructor as long as it is not used by another thread until after construction. SafeListener in Listing 3.8 uses this technique. Do not allow the this reference to escape during construction.

>Calling an overrideable instance method (one that is neither private nor final) from the constructor can also allow the this reference to escape. If  you  are  tempted  to  register  an  event  listener  or  start  a  thread  from  a  constructor,  you  can  avoid  the  improper construction by using a private constructor and a public factory method, as shown in SafeListener in Listing 3.8.

Listing 3.8. Using a Factory Method to Prevent the this Reference from Escaping During Construction.

public class SafeListener {
private final EventListener listener;
private SafeListener() {
listener = new EventListener() {
public void onEvent(Event e) {
doSomething(e);
}
};
}

public static SafeListener newInstance(EventSource source) {
SafeListener safe = new SafeListener();
source.registerListener(safe.listener);
return safe;
}
}

>3.3. Thread Confinement
Accessing  shared, mutable  data  requires  using  synchronization; one way  to avoid  this  requirement  is  to  not  share.  If data is only accessed from a single thread, no synchronization is needed. This technique, thread confinement, is one of the simplest ways to achieve thread safety. When an object is confined to a thread, such usage is automatically thread safe even if the confined object itself is not [CPJ 2.3.2].
Swing uses thread confinement extensively. The Swing visual components and data model objects are not thread safe;
instead, safety is achieved by confining them to the Swing event dispatch thread. To use Swing properly, code running in threads  other  than  the  event  thread  should  not  access  these  objects.  (To  make  this  easier,  Swing  provides  the invokeLater mechanism to schedule a Runnable for execution in the event thread.) Many concurrency errors in Swing applications stem from improper use of these confined objects from another thread.
Another common application of thread confinement is the use of pooled JDBC (Java Database Connectivity) Connection
objects. The JDBC specification does not require that Connection objects be thread safe.[9] In typical server applications, a thread acquires a connection from the pool, uses it for processing a single request, and returns it. Since most requests, such  as  servlet  requests  or  EJB  (Enterprise  JavaBeans)  calls,  are  processed  synchronously  by a  single  thread, and  the pool  will  not  dispense  the same  connection  to  another  thread  until  it  has  been  returned,  this pattern  of connection management implicitly confines the Connection to that thread for the duration of the request.
[9] The connection pool implementations provided by application servers are thread safe; connection pools are necessarily accessed from multiple threads, so a non thread safe implementation would not make sense.
Just as the language has no mechanism for enforcing that a variable is guarded by a lock, it has no means of confining an object  to  a  thread.  Thread  confinement  is  an  element  of  your  program's  design  that  must  be  enforced  by  its implementation. The language and core libraries provide mechanisms that can help in maintaining thread confinement local variables and the ThreadLocal class   but even with these, it is still the programmer's responsibility to ensure that thread confined objects do not escape from their intended thread.

Stack Confinement
-----------------
>a special case of thread confinement
>an object can only be reached through local variables - this is called stack confinement
>also for primitives, the language semantics ensure that primitive local variables are always stack confined
>stack confinement however resides in the heads of the programmers and hence has to be documented for the future maintainers not to let the stack local objects be published outside

ThreadLocal
-----------
>wrap global variables in ThreadLocal object(static private field) to confine it within a thread
>ThreadLocal maintains an internal map for each thread and serves the request according to the calling thread(that is current thread)

Listing 3.10. Using ThreadLocal to Ensure thread Confinement.
private static ThreadLocal connectionHolder
= new ThreadLocal() {
public Connection initialValue() {
return DriverManager.getConnection(DB_URL);
}
};
public static Connection getConnection() {
return connectionHolder.get();
}

>when the thread exits, the thread local copies specific to the thread become avaialable for garbage collection

Immutability
------------
>Immutable objects are always thread safe

>[16] While it may seem that field values set in a constructor are the first values written to those fields and therefore that there are no "older" values  to  see  as  stale  values,  the  Object  constructor  first writes  the  default values  to  all  fields  before subclass  constructors  run.  It  is  therefore
possible to see the default value for a field as a stale value.

>Using a static initializer is often the easiest and safest way to publish objects that can be statically constructed:
public static Holder holder = new Holder(42);

Static initializers are executed by the JVM at class initialization time; because of internal synchronization in the JVM, this mechanism is guaranteed to safely publish any objects initialized in this way [JLS 12.4.2].

>4.1. Designing a Thread safe Class

The design process for a thread safe class should include these three basic elements:
  Identify the variables that form the object's state;
  Identify the invariants that constrain the state variables;
  Establish a policy for managing concurrent access to the object's state.

>Encapsulating data within an object confines access to the data to the object's methods, making it easier to ensure that the data is always accessed with the appropriate lock held.

>Confinement  makes  it  easier  to  build  thread safe  classes  because  a  class  that  confines  its  state  can  be  analyzed  for thread safety without having to examine the whole program.

>Priority inheritance is when a thread gets a temporary bump in priority, due to what threads are waiting on locks that it owns. With priority inheritence, if a low priority thread owns a lock, that a high priority thread later wants -- it will be bumped to the higher priority. The reason is... the higher priority thread is blocked waiting for this lock, so the low priority thread needs to run at a higher priority in order to finish up and free the lock for the high priority thread. The low priority thread is "inheriting" the high priority of the high priority thread.

>If a class is composed of multiple independent thread safe state variables and has no operations that have any invalid state transitions, then it can delegate thread safety to the underlying state variables.

>If a state variable is thread safe, does not participate in any invariants that constrain its value, and has no prohibited state transitions for any of its operations, then it can safely be published.

>If extending a  class to add another atomic operation  is  fragile because  it  distributes the locking code  for a  class over multiple classes in an object hierarchy, client side locking is even more fragile because it entails putting locking code for class C into classes that are  totally unrelated to C. Exercise care when using  client side  locking on classes that do not commit to their locking strategy. Client side locking has a lot in common with class extension   they both couple the behavior of the derived class to the implementation of the base  class. Just as extension violates encapsulation of implementation  [EJ  Item  14],  client side locking violates encapsulation of synchronization policy.

>Document a class's thread safety guarantees for its clients; document its synchronization policy for its maintainers.

>Because  the  synchronized  collections  commit  to  a  synchronization  policy  that  supports  client side  locking,  [1]  it  is possible to create new operations that are atomic with respect to other collection operations as long as we know which lock to use. The synchronized collection classes guard each method with the lock on the synchronized collection object itself.

>Listing 5.3. Iteration that may Throw ArrayIndexOutOfBoundsException.
for (int i = 0; i < vector.size(); i++)
doSomething(vector.get(i));

>using  iterators does not obviate the need to lock the  collection during  iteration  if other threads can concurrently modify it. The iterators returned by the synchronized collections are not designed to deal with concurrent modification, and they are fail fast   meaning that if they detect that the collection has changed since iteration began, they throw the unchecked ConcurrentModificationException.

>These fail fast iterators are not designed to be foolproof they are designed to catch concurrency errors on a "good faith effort"  basis and  thus  act  only  as  early warning  indicators  for  concurrency  problems.  They  are implemented  by associating a modification count with the collection: if the modification count changes during iteration, hasNext or next throws ConcurrentModificationException. However, this check is done without synchronization, so there is a risk of seeing a stale value of the modification count and therefore that the iterator does not realize a modification has been made. This was a deliberate design tradeoff to reduce the performance impact of the concurrent modification detection code.[2]

>[2] ConcurrentModificationException can arise in single threaded code as well; this happens when objects are removed from the collection directly rather than through Iterator.remove()

>for-each loop also internally uses iterators to iterate the collection and hence for-each loop also can throw ConcurrentModificationException if not guarded by a lock

>but locking the whole iteration loop also hurts performance(with starvation and deadlock risk as well)

>possibly, the collection can be cloned in each thread for iteration(cloning itself should be guarded) if it is less costlier than concurrently iterating the same collection via multiple threads

>hidden iterator is possible when we print a collection variable; internally toString() method iterates over the elements and there is a possible ConcurrentModificationException while printing a collection as well

>If HiddenIterator wrapped the HashSet with a synchronizedSet, encapsulating the synchronization, this sort of error would not occur. Just as encapsulating an object's state makes it easier to preserve its invariants, encapsulating its synchronization makes it easier to enforce its synchronization policy.

>Replacing  synchronized  collections  with  concurrent  collections can  offer  dramatic  scalability improvements  with  little risk.

>While  you  can  simulate  the  behavior  of  a  Queue  with  a List in fact, LinkedList also implements Queue   the Queue classes were added because eliminating the random access requirements of List admits more efficient concurrent implementations.

>Concurrent collections, e.g. ConcurrentHashMap: Instead  of  synchronizing  every method  on  a  common  lock,  restricting  access  to  a  single thread at a  time,  it uses a  fine grained locking mechanism  called lock  striping (see Section 11.4.3) to allow a greater degree of shared access.

>Tradeoff with concurrent collections: iterator is not absolute; it doesn't give snapshot view of the collection; parallel modifications while iterating is allowed; size() and isEmpty() methods only tell approximate condition at that time which can change every instant; basically the absoluteness(determinability) we had with synchronized collections is lost in concurrent collections

>Only if your application  needs  to  lock  the  map  for  exclusive  access  [3]  is  ConcurrentHashMap  not  an  appropriate  drop in replacement.

>Since  a  ConcurrentHashMap  cannot  be  locked  for  exclusive  access,  we  cannot  use  client side  locking  to  create  new atomic operations such as put if absent; but these are already provided in concurrent collections

>CopyOnWriteArrayList/CopyOnWriteArraySet: create new collection whenever it is modified;
>Iterators for the copy on write collections retain a reference to the backing array that was current at the start of iteration, and since this will never change, they need to synchronize only briefly to ensure visibility of the array contents. As a result, multiple threads can iterate the collection without interference from one another or from threads wanting to modify the collection. The iterators returned by the copy on write  collections do not throw ConcurrentModificationException and return the elements  exactly as  they were at the time the iterator was created, regardless of subsequent modifications.

>Obviously,  there  is  some  cost  to  copying  the  backing  array  every  time  the  collection  is  modified,  especially  if  the collection  is  large; the copy on write  collections are  reasonable to use only when  iteration  is  far more common  than modification. This criterion exactly describes many event notification systems: delivering a notification requires iterating the  list  of  registered  listeners  and  calling  each one  of  them,  and  in  most  cases  registering  or  unregistering  an  event listener is far less common than receiving an event notification.

>BlockingQueue  simplifies  the  implementation  of  producer consumer  designs  with  any
number of producers and consumers. One of the most common producer consumer designs is a thread pool coupled
with a work queue; this pattern is embodied in the Executor task execution framework

>If the producers consistently generate work faster than the consumers can process it, eventually the application will run out of memory because work items will queue up without bound. Again, the blocking nature of put greatly simplifies coding  of  producers;  if  we  use  a  bounded  queue,  then  when  the  queue  fills  up  the  producers  block,  giving  the consumers time to catch up because a blocked producer cannot generate more work.

>Bounded queues are a powerful resource management tool for building reliable applications: they make your program
more robust to overload by throttling activities that threaten to produce more work than can be handled.

>if blocking queues  don't  fit  easily  into your  design, you can  create other  blocking data  structures using Semaphore

>The last BlockingQueue implementation, SynchronousQueue, is not really a queue at all, in that it maintains no storage
space for queued elements. Instead, it maintains a list of queued threads waiting to enqueue or dequeue an element. In
the dish washing analogy, this would be like having no dish rack, but instead handing the washed dishes directly to the next available dryer. While this may seem a strange way to implement a queue, it reduces the latency associated with moving  data  from  producer  to  consumer  because  the  work  can  be  handed  off  directly.  (In  a  traditional  queue,  the enqueue  and  dequeue  operations  must  complete  sequentially  before  a  unit  of  work  can  be  handed  off.)  The  direct handoff also feeds back more information about the state of the task to the producer; when the handoff is accepted, it knows a consumer has taken responsibility for it, rather than simply letting it sit on a queue somewhere   much like the difference between handing a document to a colleague and merely putting it in her mailbox and hoping she gets it soon.
Since a SynchronousQueue has no storage capacity, put and take will block unless another thread is already waiting to
participate in the handoff. Synchronous queues are generally suitable only when there are enough consumers that there
nearly always will be one ready to take the handoff.

>The  producer consumer  pattern  also  enables  several  performance  benefits.  Producers  and  consumers  can execute concurrently;  if  one  is  I/O bound  and  the  other  is  CPU bound,  executing  them  concurrently  yields  better  overall throughput  than  executing  them  sequentially.

>5.3.2. Serial Thread Confinement
The blocking queue implementations in java.util.concurrent all contain sufficient internal synchronization to safely
publish objects from a producer thread to the consumer thread.

For mutable objects, producer consumer designs and blocking queues facilitate serial thread confinement for handing
off ownership of objects from producers to consumers. A thread confined object is owned exclusively by a single thread, but that ownership can be "transferred" by publishing it safely where only one other thread will gain access to it and ensuring that the publishing thread does not access it after the handoff. The safe publication ensures that the object's state  is  visible  to  the new owner, and  since  the  original owner  will not touch  it  again,  it  is now  confined  to the new thread. The new owner may modify it freely since it has exclusive access.
Object pools exploit serial thread confinement, "lending" an object to a requesting thread. As long as the pool contains sufficient  internal  synchronization  to  publish  the  pooled  object  safely,  and  as  long  as  the  clients  do  not  themselves publish the pooled object or use  it  after returning  it to the pool, ownership can be transferred  safely from thread to thread.
One could also use other publication mechanisms for transferring ownership of a mutable object, but it is necessary to
ensure that only one thread receives the object being handed  off. Blocking queues make  this easy; with a  little more work,  it  could  also  be done  with  the  atomic  remove  method  of  ConcurrentMap  or  the  compareAndSet  method  of AtomicReference.

>5.3.3. Deques and Work Stealing
Java 6 also adds another two collection types, Deque (pronounced "deck") and BlockingDeque, that extend Queue and
BlockingQueue. A Deque is a double ended queue that allows efficient insertion and removal from both the head and
the tail. Implementations include ArrayDeque and LinkedBlockingDeque.
Just  as  blocking  queues  lend  themselves  to  the  producer consumer  pattern,  deques  lend  themselves  to  a  related pattern  called  work  stealing.  A  producer consumer  design  has  one  shared  work  queue  for  all  consumers;  in  a  work stealing design, every consumer has its own deque. If a consumer exhausts the work in its own deque, it can steal work from the tail of someone else's deque. Work stealing can be more scalable than a traditional producer consumer design because workers don't contend for a shared work queue; most of the time they access only their own deque, reducing contention.  When  a  worker  has  to  access  another's  queue,  it  does  so  from  the  tail  rather  than  the  head,  further reducing contention.
Work  stealing  is  well  suited  to problems  in which consumers  are  also producers    when performing a  unit  of work  is likely to result in the identification of more work. For example, processing a page in a web crawler usually results in the identification of new pages to be crawled. Similarly, many graph exploring algorithms, such as marking the heap during garbage collection, can be efficiently parallelized using work stealing. When a worker identifies a new unit of work, it places it at the end of its own deque (or alternatively, in a work sharing design, on that of another worker); when its deque is empty, it looks for work at the end of someone else's deque, ensuring that each worker stays busy.

>The  put  and  take methods  of  BlockingQueue  throw  the  checked  InterruptedException,  as do  a number  of  other library methods such as Thread.sleep. When a method can throw InterruptedException, it is telling you that it is a blocking method, and further that if it is interrupted, it will make an effort to stop blocking early.

Interruption is a cooperative mechanism. One thread cannot force another to stop what it is doing and do something
else; when thread A interrupts thread B, A is merely requesting that B stop what it is doing when it gets to a convenient stopping point   if it feels like it. While there is nothing in the API or language specification that demands any specific application level  semantics  for  interruption,  the  most  sensible  use  for  interruption  is  to  cancel  an  activity.  Blocking methods that are responsive to interruption make it easier to cancel long running activities on a timely basis.

>When your code calls a method that throws InterruptedException, then your method is a blocking method too, and
must have a plan for responding to interruption. For library code, there are basically two choices:
Propagate the InterruptedException. This is often the most sensible policy if you can get away with it   just propagate
the InterruptedException to your caller. This could involve not catching InterruptedException, or catching  it and
throwing it again after performing some brief activity specific cleanup.
Restore the interrupt. Sometimes you cannot throw InterruptedException, for instance when your code is part of a
Runnable.  In  these  situations,  you must  catch  InterruptedException  and  restore  the  interrupted  status  by  calling
interrupt  on  the  current  thread,  so  that  code  higher  up  the  call  stack  can  see  that  an  interrupt  was  issued,  as
demonstrated in Listing 5.10.
You can get much more sophisticated with interruption, but these two approaches should work in the vast majority of
situations. But there is one thing you should not do with InterruptedException; catch it and do nothing in response.
This deprives code higher up on the call stack of the opportunity to act on the interruption, because the evidence that
the thread was interrupted is lost. The only situation in which it is acceptable to swallow an interrupt is when you are
extending  Thread  and  therefore  control  all  the  code  higher  up  on  the  call  stack.

Listing 5.10. Restoring the Interrupted Status so as Not to Swallow the Interrupt.
public class TaskRunnable implements Runnable {
BlockingQueue queue;
...
public void run() {
try {
processTask(queue.take());
} catch (InterruptedException e) {
// restore interrupted status
Thread.currentThread().interrupt();
}
}
}

>5.5. Synchronizers
Blocking queues are unique among the collections classes: not only do they act as containers for objects, but they can
also coordinate the control flow of producer and consumer threads because take and put block until the queue enters
the desired state (not empty or not full).
A synchronizer is any object that coordinates the control flow of threads based on its state. Blocking queues can act as
synchronizers;  other  types  of  synchronizers  include  semaphores,  barriers,  and  latches.  There  are  a  number  of
synchronizer classes in the platform library; if these do not meet your needs, you can also create your own

>Latches - one time use synchronizers; latch is closed(blocks all threads) until a given condition happens; after the condition occurs, the latch is opened permanently(all threads can pass the gate)
>CountDownLatch - blocks all threads until a given counter reaches zero(can be initialized to a positive value upon start)
>in a thread pool, the master thread can wait at once for all workers to finish using a countdown latch instead of waiting on each one sequentially; also the master can start all threads at once using a latch

  Ensuring  that a computation does not proceed until  resources  it needs have been  initialized. A  simple  binary
(two state)  latch could be used to  indicate "Resource R has been  initialized", and any  activity that  requires R
would wait first on this latch.
  Ensuring that a service does not start until other services on which it depends have started. Each service would
have an associated binary latch; starting service S would involve first waiting on the latches for other services on
which S depends, and then releasing the S latch after startup completes so any services that depend on S can
then proceed.
  Waiting until all the parties involved in an activity, for instance the players in a multi player game, are ready to
proceed. In this case, the latch reaches the terminal state after all the players are ready.

>FutureTask also acts like a latch

  Ensuring  that a computation does not proceed until  resources  it needs have been  initialized. A  simple  binary
(two state)  latch could be used to  indicate "Resource R has been  initialized", and any  activity that  requires R
would wait first on this latch.
  Ensuring that a service does not start until other services on which it depends have started. Each service would
have an associated binary latch; starting service S would involve first waiting on the latches for other services on
which S depends, and then releasing the S latch after startup completes so any services that depend on S can
then proceed.
  Waiting until all the parties involved in an activity, for instance the players in a multi player game, are ready to
proceed. In this case, the latch reaches the terminal state after all the players are ready.

Tasks  described  by  Callable  can  throw  checked  and  unchecked  exceptions,  and  any  code  can  throw  an  Error.
Whatever  the  task  code  may  throw,  it  is  wrapped  in  an  ExecutionException  and  rethrown  from  Future.get.  This
complicates  code  that  calls  get,  not  only  because  it must  deal  with  the  possibility  of  ExecutionException  (and  the
unchecked  CancellationException),  but  also  because  the  cause  of  the  ExecutionException  is  returned  as  a
THRowable, which is inconvenient to deal with.

5.5.3. Semaphores
Counting semaphores are used to control the number of activities that can access a certain resource or perform a given
action  at  the same  time  [CPJ  3.4.1].  Counting semaphores  can  be  used  to  implement  resource  pools  or  to  impose  a
bound on a collection.
A Semaphore manages a set of  virtual  permits; the  initial number of permits  is passed to the Semaphore constructor.
Activities can acquire permits (as long as some remain) and release permits when they are done with them. If no permit
is available, acquire blocks until one is (or until interrupted or the operation times out). The release method returns a
permit to the semaphore. [4] A degenerate case of a counting semaphore is a binary semaphore, a Semaphore with an
initial count of one. A binary semaphore can be used as a mutex with non reentrant locking semantics; whoever holds
the sole permit holds the mutex.

[4] The implementation has no actual permit objects, and Semaphore does not associate dispensed permits with threads, so a permit acquired in
one thread can be released from another thread. You can think of acquire as consuming a permit and release as creating one; a Semaphore is not
limited to the number of permits it was created with.
Semaphores are useful for implementing resource pools such as database connection pools. While it is easy to construct
a fixed sized pool that fails if you request a resource from an empty pool, what you really want is to block if the pool is
empty and unblock when it becomes nonempty again. If you initialize a Semaphore to the pool size, acquire a permit
before  trying  to  fetch  a  resource  from  the  pool,  and  release  the  permit  after  putting  a  resource  back  in  the  pool,
acquire blocks until the pool becomes nonempty. This technique is used in the bounded buffer class in Chapter 12. (An
easier way to construct a blocking object pool would be to use a BlockingQueue to hold the pooled resources.)
Similarly,  you  can  use  a  Semaphore  to  turn  any  collection  into  a  blocking  bounded  collection,  as  illustrated  by
BoundedHashSet  in  Listing  5.14. The  semaphore  is  initialized  to  the  desired maximum  size  of  the  collection. The add
operation acquires a permit before adding the item into the underlying collection. If the underlying add operation does
not actually add anything, it releases the permit immediately. Similarly, a successful remove operation releases a permit,
enabling  more  elements  to  be  added.  The  underlying  Set  implementation  knows  nothing  about  the  bound;  this  is
handled by BoundedHashSet.

5.5.4. Barriers
We have seen how latches can facilitate starting a group of related activities or waiting for a group of related activities
to complete. Latches are single use objects; once a latch enters the terminal state, it cannot be reset.
Barriers are similar to latches in that they block a group of threads until some event has occurred [CPJ 4.4.3]. The key
difference  is  that  with  a  barrier,  all  the  threads must come  together  at  a  barrier  point  at  the same  time  in  order  to
proceed. Latches are for waiting for events; barriers are for waiting for other threads. A barrier implements the protocol
some families use to rendezvous during a day at the mall: "Everyone meet at McDonald's at 6:00; once you get there,
stay there until everyone shows up, and then we'll figure out what we're doing next."
CyclicBarrier  allows a  fixed number  of  parties  to rendezvous  repeatedly  at a  barrier  point and  is useful  in  parallel
iterative algorithms that break down a problem into a fixed number of independent subproblems. Threads call await
when they reach the barrier point, and await blocks until all the threads have reached the barrier point. If all threads
meet at the barrier point, the barrier has been successfully passed, in which case all threads are released and the barrier
is reset so it can be used again. If a call to await times out or a thread blocked in await is interrupted, then the barrier is
considered  broken  and  all  outstanding  calls  to  await  terminate  with  BrokenBarrierException.  If  the  barrier  is
successfully  passed, await  returns a unique  arrival index  for each thread, which can be used  to  "elect" a leader  that
takes some special action in the next iteration. CyclicBarrier also lets you pass a barrier action to the constructor;
this is a Runnable that is executed (in one of the subtask threads) when the barrier is successfully passed but before the
blocked threads are released.

Another form of barrier is Exchanger, a two party barrier in which the parties exchange data at the barrier point [CPJ
3.4.3]. Exchangers are useful when the parties perform asymmetric activities, for example when one thread fills a buffer
with data and the other thread consumes the data from the buffer; these threads could use an Exchanger to meet and
exchange  a  full  buffer  for  an  empty  one.  When  two  threads  exchange  objects  via  an  Exchanger,  the  exchange
constitutes a safe publication of both objects to the other party.
The timing of the exchange depends on the responsiveness requirements of the application. The simplest approach is
that the filling task exchanges when the buffer is full, and the emptying task exchanges when the buffer is empty; this
minimizes  the  number  of  exchanges  but  can  delay  processing  of  some  data  if  the  arrival  rate  of  new  data  is
unpredictable. Another approach would be that the filler exchanges when the buffer is full, but also when the buffer is
partially filled and a certain amount of time has elapsed.

>JVM can't  exit  until  all the  (non daemon) threads have terminated, so  failing to
shut down an Executor could prevent the JVM from exiting.

>6.2.5. Delayed and Periodic Tasks
The Timer facility manages the execution of deferred ("run this task in 100 ms") and periodic ("run this task every 10
ms")  tasks.  However,  Timer  has  some  drawbacks,  and  ScheduledThreadPoolExecutor  should  be  thought  of  as  its
replacement.[6]  You  can  construct  a  ScheduledThreadPoolExecutor  through  its  constructor  or  through  the
newScheduledThreadPool factory.

78  Java Concurrency In Practice
[6] Timer does have support  for scheduling based on absolute, not  relative time, so that tasks can be sensitive to changes  in the system clock;
ScheduledThreadPoolExecutor supports only relative time.
A Timer creates only a single thread for executing timer tasks. If a timer task takes too long to run, the timing accuracy
of other TimerTasks can suffer. If a recurring TimerTask is scheduled to run every 10 ms and another Timer-Task takes
40 ms to run, the recurring task either (depending on whether it was scheduled at fixed rate or fixed delay) gets called
four times in rapid succession after the long running task completes, or "misses" four invocations completely. Scheduled
thread pools address this limitation by letting you provide multiple threads for executing deferred and periodic tasks.
Another problem with Timer is that it behaves poorly if a TimerTask throws an unchecked exception. The Timer thread
doesn't catch the exception, so an unchecked exception thrown from a TimerTask terminates the timer thread. Timer
also doesn't resurrect the thread in this situation; instead, it erroneously assumes the entire Timer was cancelled. In this
case, TimerTasks that are already scheduled but not yet executed are never run, and new tasks cannot be scheduled.
(This problem, called "thread leakage" is described in Section 7.3, along with techniques for avoiding it.)

>ScheduledThreadPoolExecutor  deals
properly with ill behaved tasks; there is little reason to use Timer in Java 5.0 or later.
If  you  need  to  build  your own  scheduling  service,  you  may  still  be  able  to  take  advantage  of  the  library  by  using  a
DelayQueue,  a  BlockingQueue  implementation  that  provides  the  scheduling  functionality  of
ScheduledThreadPoolExecutor. A DelayQueue manages a collection of Delayed objects. A Delayed has a delay time
associated  with  it:  DelayQueue  lets  you  take  an  element  only  if  its  delay  has  expired.  Objects  are  returned  from  a
DelayQueue ordered by the time associated with their delay.

>Future  represents  the  lifecycle  of  a  task  and  provides  methods  to  test  whether  the  task  has  completed  or  been
cancelled,  retrieve  its  result,  and  cancel  the  task.  Callable  and  Future  are  shown  in  Listing  6.11.  Implicit  in  the
specification  of Future  is  that  task  lifecycle  can  only move forwards,  not backwards     just  like  the ExecutorService
lifecycle. Once a task is completed, it stays in that state forever.
The behavior of get varies depending on the task state (not yet started, running, completed). It returns immediately or
throws  an  Exception  if  the  task  has  already  completed,  but  if  not  it  blocks  until  the  task  completes.  If  the  task
completes  by  throwing  an  exception,  get  rethrows  it  wrapped  in  an  ExecutionException;  if  it  was  cancelled,  get
throws CancellationException.  If get throws ExecutionException, the underlying exception can be retrieved with
getCause.

>There  are  several ways  to  create a Future  to  describe a  task. The submit methods  in ExecutorService  all  return a
Future, so that you can submit a Runnable or a Callable to an executor and get back a Future that can be used to
retrieve the result or cancel the task. You can also explicitly instantiate a FutureTask for a given Runnable or Callable.
(Because FutureTask implements Runnable, it can be submitted to an Executor for execution or executed directly by
calling its run method.)

As  of  Java  6,  ExecutorService  implementations  can  override  newTaskFor  in  AbstractExecutorService  to  control
instantiation  of  the  Future  corresponding  to  a  submitted  Callable  or  Runnable.  The  default  implementation  just
creates a new FutureTask, as shown in Listing 6.12.
Listing 6.12. Default Implementation of newTaskFor in ThreadPoolExecutor.
protected RunnableFuture newTaskFor(Callable task) {
return new FutureTask(task);
}

Submitting a Runnable or Callable to an Executor constitutes a safe publication (see Section 3.5) of the Runnable or
Callable from  the submitting  thread  to  the  thread  that  will eventually execute  the  task.  Similarly,  setting  the  result
value for a Future constitutes a safe publication of the result from the thread in which it was computed to any thread
that retrieves it via get.

>The primary challenge in executing tasks within a time budget is making sure that you don't wait longer than the time
budget  to  get  an  answer  or  find  out  that  one  is  not  forthcoming.  The  timed  version  of  Future.get  supports  this
requirement: it returns as soon as the result is ready, but throws TimeoutException if the result is not ready within the
timeout period.

Cancelling a task
-----------------
>One such cooperative mechanism is setting a "cancellation requested" flag that the task checks periodically; if it finds
the  flag  set,  the  task  terminates  early.  PrimeGenerator  in  Listing  7.1,  which  enumerates  prime  numbers  until  it  is
cancelled,  illustrates  this  technique.  The  cancel  method  sets  the  cancelled  flag,  and  the  main  loop  polls  this  flag
before searching for the next prime number. (For this to work reliably, cancelled must be volatile.)

>Blocking  library methods  like Thread.sleep and Object.wait  try to detect when a  thread has been  interrupted and
return  early.  They  respond  to  interruption  by  clearing  the  interrupted  status  and  throwing  InterruptedException,
indicating  that  the  blocking  operation  completed  early  due  to  interruption.  The  JVM  makes  no  guarantees  on  how
quickly a blocking method will detect interruption, but in practice this happens reasonably quickly.

>If a thread is interrupted when it is not blocked, its interrupted status is set, and it is up to the activity being cancelled to
poll  the  interrupted  status  to  detect  interruption.  In  this  way  interruption  is  "sticky?”  if  it  doesn't  trigger  an
InterruptedException, evidence of interruption persists until someone deliberately clears the interrupted status.
Calling  interrupt  does  not  necessarily  stop  the  target  thread  from  doing  what  it  is  doing;  it  merely  delivers  the
message that interruption has been requested.
A good way to think about interruption is that it does not actually interrupt a running thread; it just requests that the
thread  interrupt  itself  at  the  next  convenient  opportunity.  (These  opportunities  are  called  cancellation  points.) Some
methods,  such  as  wait,  sleep,  and  join,  take  such  requests  seriously,  throwing  an  exception when  they  receive  an
interrupt request or encounter an already  set  interrupt  status upon  entry. Well behaved methods may  totally ignore
such  requests  so  long  as  they  leave  the  interruption  request  in  place  so  that  calling  code  can  do  something  with  it.
Poorly behaved methods swallow the interrupt request, thus denying code further up the call stack the opportunity to
act on it.
The static interrupted method should be used with caution, because it clears the current thread's interrupted status. If
you call interrupted and it returns TRue, unless you are planning to swallow the interruption, you should do something
with it   either throw InterruptedException or restore the interrupted status by calling interrupt again

>Interruption is usually the most sensible way to implement cancellation.

>Because  each  thread  has  its  own  interruption  policy,  you  should  not  interrupt  a  thread  unless  you  know  what
interruption means to that thread.
Critics have derided the Java interruption facility because it does not provide a preemptive interruption capability and
yet  forces  developers  to  handle  InterruptedException.  However,  the  ability  to  postpone  an  interruption  request
enables developers to craft flexible interruption policies that balance responsiveness and robustness as appropriate for
the application.

>Only code that implements a thread's interruption  policy may swallow an  interruption request. General purpose  task
and library code should never swallow interruption requests.

GUI APPLICATIONS
-----------------
>Single threaded GUI frameworks are not unique to Java; Qt, NextStep, MacOS Cocoa, X Windows, and many others are
also  single threaded.  This  is  not  for  lack  of  trying;  there  have  been  many  attempts  to  write  multithreaded  GUI
frameworks, but because of persistent problems with race conditions and deadlock, they  all eventually arrived at the
single threaded  event queue  model  in  which  a  dedicated  thread  fetches  events  off  a  queue  and  dispatches them  to
application defined event handlers. (AWT originally tried to support a greater degree of multithreaded access, and the
decision to make Swing single threaded was based largely on experience with AWT.)

>Multithreaded GUI  frameworks  tend  to  be  particularly  susceptible  to  deadlock,  partially  because  of  the  unfortunate
interaction  between  input  event  processing  and  any  sensible  object oriented  modeling  of  GUI  components.  Actions
initiated by the user tend to "bubble up" from the OS to the application   a mouse click is detected by the OS, is turned
into a "mouse click" event by the toolkit, and is eventually delivered to an application listener as a higher level event
such as a "button pressed" event. On the other hand, application initiated actions "bubble down" from the application
to the OS   changing the background color of a component originates in the application and is dispatched to a specific
component class and eventually into the OS for rendering. Combining this tendency for activities to access the same GUI
objects in the opposite order with the requirement of making each object thread safe yields a recipe for inconsistent
lock ordering, which leads to deadlock (see Chapter 10). And this is exactly what nearly every GUI toolkit development
effort rediscovered through experience.

>Another factor leading to deadlock in multithreaded GUI frameworks is the prevalence of the model view control (MVC)
pattern. Factoring user interactions into cooperating model, view, and controller objects greatly simplifies implementing
GUI  applications,  but  again  raises  the  risk  of  inconsistent  lock  ordering.  The  controller  calls  into  the  model,  which
notifies the view that something has changed. But the controller can also call into the view, which may in turn call back
into  the  model  to  query  the  model  state.  The  result  is  again  inconsistent  lock  ordering,  with  the  attendant  risk  of
deadlock.
In his weblog,[1] Sun VP Graham Hamilton nicely sums up the challenges, describing why the multithreaded GUI toolkit is
one of the recurring "failed dreams" of computer science.
[1] http://weblogs.java.net/blog/kgh/archive/2004/10

>if you were to use multithreaded UI toolkits, things  will  mostly
work, but you will get occasional hangs (due to deadlocks) or glitches (due to races). This multithreaded approach works
best for people who have been intimately involved in the design of the toolkit.

>Single threaded  GUI  frameworks  achieve  thread  safety  via  thread  confinement;  all  GUI  objects,  including  visual
components and data models, are accessed exclusively from the event thread. Of course, this just pushes some of the
thread safety burden back onto the application developer, who must make sure these objects are properly confined.

>9.1.2. Thread Confinement in Swing
All Swing components (such as JButton and JTable) and data model objects (such as TableModel and TreeModel) are
confined to the event thread, so any code that accesses these objects must run in the event thread. GUI objects are kept
consistent not by synchronization, but by thread confinement. The upside is that tasks that run in the event thread need
not  worry  about  synchronization  when  accessing  presentation  objects;  the  downside  is  that  you  cannot  access
presentation objects from outside the event thread at all.

>The Swing  single thread  rule: Swing components and models should be created, modified, and queried only from the
event dispatching thread.

>As with all rules, there are a few exceptions. A small number of Swing methods may be called safely from any thread;
these are clearly identified in the Javadoc as being thread safe. Other exceptions to the single thread rule include:
  SwingUtilities.isEventDispatchThread, which determines whether the current thread is the event thread;
  SwingUtilities.invokeLater, which  schedules a Runnable  for  execution on  the event thread  (callable from
any thread);
  SwingUtilities.invokeAndWait,  which  schedules  a  Runnable  task  for  execution  on  the  event  thread  and
blocks the current thread until it completes (callable only from a non GUI thread);
  methods to enqueue a repaint or revalidation request on the event queue (callable from any thread); and
  methods for adding and removing listeners (can be called from any thread, but listeners will always be invoked
in the event thread).

>In a GUI  application,  events  originate  in  the event thread and bubble up  to  application provided  listeners, which  will
probably perform some computation that affects the presentation objects. For simple, short running tasks, the entire
action can stay  in the event thread; for longer running  tasks, some of the processing should be offloaded to another
thread.

>In the simple case, confining presentation objects to the event thread is completely natural. Listing 9.3 creates a button
whose color changes randomly when pressed. When the user clicks on the button, the toolkit delivers an ActionEvent
in the event thread to all registered action listeners. In response, the action listener picks a new color and changes the
button's  background  color.  So  the  event  originates  in  the  GUI  toolkit  and  is  delivered  to  the  application,  and  the
application modifies the GUI in response to the user's action. Control never has to leave the event thread

>When the view
receives an event indicating the model data may have changed, it queries the model for the new data and updates the
display. So in a button listener that modifies the contents of a table, the action listener would update the model and call
one  of  the  fireXxx methods,  which would  in  turn  invoke  the  view's  table model  listeners,  which  would update  the
view.  Again,  control  never  leaves  the  event  thread.  (The  Swing  data  model  fireXxx  methods  always  call  the model
listeners directly rather than submitting a new event to the event queue, so the fireXxx methods must be called only
from the event thread.)

>The  task  triggered when  the button  is  pressed  is composed  of  three  sequential  subtasks whose  execution  alternates
between the event thread and the background thread. The first subtask updates the user interface to show that a long
running  operation  has  begun  and  starts  the  second  subtask  in  a  background  thread.  Upon  completion,  the  second
subtask queues the third subtask to run again in the event thread, which updates the user interface to reflect that the
operation has completed. This sort of "thread hopping" is typical of handling long running tasks in GUI applications.

>9.3.1. Cancellation
Any task that takes long enough to run in another thread probably also takes long enough that the user might want to
cancel it. You could implement cancellation directly using thread interruption, but it is much easier to use Future, which
was designed to manage cancellable tasks.
When you call cancel on a Future with mayInterruptIfRunning set to true, the Future implementation interrupts
the thread that is executing the task if it is currently running. If your task is written to be responsive to interruption, it
can return early if it is cancelled. Listing 9.6 illustrates a task that polls the thread's interrupted status and returns early
on interruption.

>regardless of how thread pool is implemented it is the task that has to be coded to be responsive to interrupts if it has to be a cancellable task

>SwingUtilities is there to submit tasks to EventDispatchThread(EDT)
>SwingWorker is there to create cancellable, progressful tasks that can run on thirdparty threads (other than EDT)

>when GUI data model has to be updated in thread-safe manner use the same thread jumping technique with SwingWorker and SwingUtilities, to post model updates to EDT so that it can update the DataModel

>otherwise, EDT and background threads can use concurrent collections(such as ConcurrentHashMap, CopyOnWriteArrayList) to safely share data only when it suits the context

9.4.2. Split Data Models
----------------------------
>From  the  perspective  of  the  GUI,  the  Swing  table  model  classes  like  TableModel  and  treeModel  are  the  official
repository  for  data  to  be  displayed.  However,  these  model  objects  are  often  themselves  "views"  of  other  objects
managed by the application. A program that has both a presentation domain and an application domain data model is
said to have a split model design (Fowler, 2005).
In a split model design, the presentation model is confined to the event thread and the other model, the shared model,
is thread safe and may be accessed by both the event thread and application threads. The presentation model registers
listeners with the shared model so it can be notified of updates. The presentation model can then be updated from the
shared model  by  embedding  a  snapshot  of  the  relevant  state  in  the  update  message  or  by  having  the  presentation
model retrieve the data directly from the shared model when it receives an update event.
>this was the case with Brocade EFCM product which has server side model updating client side GUI via event notifications (the model in question was maintained at both the server and at the client GUI model)

>The snapshot approach is simple, but has limitations. It works well when the data model is small, updates are not too
frequent, and the structure of the two models are similar. If the data model is large or updates are very frequent, or if one
or  both  sides  of  the  split  contain  information  that  is  not  visible  to  the  other  side,  it  can  be  more  efficient  to  send
incremental  updates  instead  of  entire  snapshots.  This  approach  has  the  effect  of  serializing  updates  on  the  shared
model  and  recreating  them  in  the  event  thread  against  the  presentation model.  Another  advantage  of  incremental
updates  is that  finer grained information about what changed can improve the perceived quality of the  displayif only
one vehicle moves, we don't have to repaint the entire display, just the affected regions.

>Consider a split model design when a data model must be shared by more than one thread and implementing a thread
safe data model would be inadvisable because of blocking, consistency, or complexity reasons.

>Borrowing  from  the approach taken by GUI frameworks, you can  easily  create a  dedicated thread  or  single threaded
executor for accessing the native library, and provide a proxy object that intercepts calls to the thread confined object
and submits them as tasks to the dedicated thread. Future and newSingleThreadExecutor work together to make this
easy; the proxy method can submit the task and immediately call Future.get to wait for the result. (If the class to be
thread confined implements an interface, you can automate the process of having each method submit a Callable to a
background thread executor and waiting for the result using dynamic proxies.)
----x----

7.3.1. Uncaught Exception Handlers
----------------------------------
The  previous  section  offered  a  proactive  approach  to  the  problem  of  unchecked  exceptions.  The  Thread  API  also
provides  the  UncaughtExceptionHandler  facility,  which  lets  you  detect  when  a  thread  dies  due  to  an  uncaught
exception.  The  two  approaches  are  complementary:  taken  together,  they  provide  defense indepth  against  thread
leakage.
When  a  thread  exits  due  to  an  uncaught  exception,  the  JVM  reports  this  event  to  an  application provided
UncaughtExceptionHandler  (see  Listing  7.24);  if no handler  exists, the  default behavior  is to  print the stack  trace to
System.err.[8]
[8] Before Java 5.0, the only way to control the UncaughtExceptionHandler was by subclassing ThreadGroup[ThreadGroup implements UncaughtExceptionHandler interface; we need access to uncaughtException(Thread, Throwable) method from there]. In Java 5.0 and later, you
can set an UncaughtExceptionHandler on a per thread basis with Thread.setUncaughtExceptionHandler, and can also set the
default  UncaughtExceptionHandler  with  Thread.setDefaultUncaughtExceptionHandler.  However,  only  one  of  these
handlers  is  called; first JVM  looks  for  a  per thread  handler,  then  for  a  ThreadGroup  handler.  The  default  handler  implementation  in
ThreadGroup delegates to its parent thread group, and so on up the chain until one of the ThreadGroup handlers deals with the uncaught
exception or it bubbles up to the toplevel thread group. The top level thread group handler delegates to the default system handler (if one exists;
the default is none) and otherwise prints the stack trace to the console.
Listing 7.24. UncaughtExceptionHandler Interface.
public interface UncaughtExceptionHandler {
void uncaughtException(Thread t, Throwable e);
}
What  the handler should do  with an uncaught exception depends on your  quality of service  requirements. The most
common response is to write an error message and stack trace to the application log, as shown in Listing 7.25. Handlers
can also take more direct action, such as trying to restart the thread, shutting down the application, paging an operator,
or other corrective or diagnostic action.
Listing 7.25. UncaughtExceptionHandler that Logs the Exception.
public class UEHLogger implements Thread.UncaughtExceptionHandler {
public void uncaughtException(Thread t, Throwable e) {
Logger logger = Logger.getAnonymousLogger();
logger.log(Level.SEVERE,
"Thread terminated with exception: " + t.getName(),
e);
}
}

In long running applications, always use uncaught exception handlers for all threads that at least log the exception.

To  set  an  UncaughtExceptionHandler  for  pool  threads,  provide  a  ThreadFactory  to  the  ThreadPoolExecutor
constructor. (As with all thread manipulation, only the thread's owner should change its UncaughtExceptionHandler.)
The standard thread pools allow an uncaught task exception to terminate the pool thread, but use a try-finally block
to be notified when this happens so the thread can be replaced. Without an uncaught exception handler or other failure
notification mechanism, tasks can appear to fail silently, which can be very confusing. If you want to be notified when a
task  fails  due  to  an  exception  so  that  you  can  take  some  task specific  recovery  action,  either  wrap  the  task  with  a
Runnable or Callable that catches the exception or override the afterExecute hook in ThreadPoolExecutor.
Somewhat  confusingly,  exceptions  thrown  from  tasks  make  it  to  the  uncaught  exception  handler  only  for  tasks
submitted with execute; for tasks submitted with submit, any thrown exception, checked or not, is considered to be
part  of  the  task's  return  status.  If  a  task  submitted  with  submit  terminates  with  an  exception,  it  is  rethrown  by
Future.get, wrapped in an ExecutionException.


7.4. JVM Shutdown
---------------------
The JVM can shut down in either an orderly or abrupt manner. An orderly shutdown is initiated when the last "normal"
(non daemon) thread terminates, someone calls System.exit, or by other platform specific means (such as sending a
SIGINT or hitting Ctrl-C). While this is the standard and preferred way for the JVM to shut down, it can also be shut
down abruptly by calling Runtime.halt or by killing the JVM process through the operating system (such as sending a
SIGKILL).
7.4.1. Shutdown Hooks
In an orderly shutdown, the JVM first starts all registered shutdown hooks. Shutdown hooks are unstarted threads that
are registered with Runtime.addShutdownHook. The JVM makes no guarantees on the order in which shutdown hooks
are started. If any application threads (daemon or nondaemon) are still running at shutdown time, they continue to run
concurrently  with  the  shutdown  process.  When  all  shutdown  hooks  have  completed,  the  JVM  may  choose  to  run
finalizers  if  runFinalizersOnExit  is  true,  and  then  halts.  The  JVM  makes  no  attempt  to  stop  or  interrupt  any
application threads that are still running at shutdown time; they are abruptly terminated when the JVM eventually halts.
If the shutdown hooks or finalizers don't complete, then the orderly shutdown process "hangs" and the JVM must be
shut down abruptly. In an abrupt shutdown, the JVM is not required to do anything other than halt the JVM; shutdown
hooks will not run.
Shutdown  hooks  should  be  thread safe:  they  must  use  synchronization  when  accessing  shared  data  and  should  be
careful to avoid deadlock,  just  like any other concurrent code. Further, they should not make assumptions about the
state of the application (such as whether other services have shut down already or all normal threads have completed)
or about why the JVM is shutting down, and must therefore be coded extremely defensively. Finally, they should exit as
quickly as possible, since their existence delays JVM termination at a time when the user may be expecting the JVM to
terminate quickly.
Shutdown  hooks  can  be  used  for  service  or  application  cleanup,  such  as  deleting  temporary  files  or  cleaning  up
resources  that  are  not automatically  cleaned up by  the OS.  Listing  7.26 shows how LogService  in  Listing  7.16  could
register a shutdown hook from its start method to ensure the log file is closed on exit.
Because shutdown hooks  all run concurrently,  closing the log  file could cause trouble for other shutdown hooks who
want to use the logger. To avoid this problem, shutdown hooks should not rely on services that can be shut down by the
application  or  other  shutdown  hooks. One  way  to  accomplish  this  is  to  use  a  single  shutdown  hook  for  all  services,
rather  than  one  for  each  service,  and  have  it  call  a  series  of  shutdown  actions.  This  ensures  that  shutdown  actions
execute sequentially in a single thread, thus avoiding the possibility of race conditions or deadlock between shutdown
actions. This technique can be used whether or not you use shutdown hooks; executing shutdown actions sequentially
rather than concurrently eliminates many potential sources of failure. In applications that maintain explicit dependency
information among services, this technique can also ensure that shutdown actions are performed in the right order.

Listing 7.26. Registering a Shutdown Hook to Stop the Logging Service.
public void start() {
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
try { LogService.this.stop(); }
catch (InterruptedException ignored) {}
}
});
}

>All JVM created threads are daemon threads except for main thread

>Normal  threads  and  daemon  threads  differ  only  in  what  happens  when  they  exit.  When  a  thread  exits,  the  JVM
performs an inventory of running threads, and if the only threads that are left are daemon threads, it initiates an orderly
shutdown. When  the  JVM  halts,  any  remaining  daemon  threads  are  abandoned     finally  blocks  are  not  executed,
stacks are not unwound   the JVM just exits.
>but System.exit() does not wait for all user threads to finish; it just waits for shutdown hooks to finish and terminates regardless of the presence of any user or daemon threads

Finalizers
-----------
Since finalizers can run in a thread managed by the JVM, any state accessed by a finalizer will be accessed by more than
one thread and therefore must be accessed with synchronization. Finalizers offer no guarantees on when or even if they
run,  and  they  impose  a  significant  performance  cost  on  objects  with  nontrivial  finalizers.  They  are  also  extremely
difficult to write correctly.[9] In most cases, the combination of finally blocks and explicit close methods does a better
job of resource management than finalizers; the sole exception is when you need to manage objects that hold resources
acquired by native methods. For these reasons and others, work hard to avoid writing or using classes with  finalizers
(other than the platform library classes) [EJ Item 6].

Chapter 8: Applying Thread Pools
================================
>Tasks that use ThreadLocal. ThreadLocal allows each thread to have its own private "version" of a variable. However,
executors are free to reuse threads as they see fit. The standard Executor implementations may reap idle threads when
demand  is  low  and  add  new  ones  when  demand  is  high,  and  also  replace  a  worker  thread  with  a  fresh  one  if  an
unchecked exception  is thrown from a  task. ThreadLocal makes sense to use  in pool threads only  if the thread local
value has a lifetime that is bounded by that of a task; Thread-Local should not be used in pool threads to communicate
values between tasks.

8.1.1. Thread Starvation Deadlock
----------------------------------------
If tasks that depend on other tasks execute in a thread pool, they can deadlock. In a single threaded executor, a task
that submits another task to the same executor and waits for its result will always deadlock. The second task sits on the
work  queue  until  the  first  task  completes,  but  the  first  will  not  complete  because  it  is  waiting  for  the  result  of  the
second task. The same thing can happen in larger thread pools if all threads are executing tasks that are blocked waiting
for other tasks still on the work queue. This is called thread starvation deadlock, and can occur whenever a pool task
initiates  an  unbounded  blocking  wait  for  some  resource  or  condition  that  can  succeed  only  through  the  action  of

another pool task, such as waiting for the return value or side effect of another task, unless you can guarantee that the
pool is large enough.

Sizing Thread Pools
-------------------
To  size a thread pool properly, you need to understand your computing environment, your resource budget, and the
nature of your tasks. How many processors does the deployment system have ?How much memory ?Do tasks perform
mostly computation, I/O, or some combination ?Do they require a scarce resource, such as a JDBC connection ?If you
have  different  categories  of  tasks  with  very  different  behaviors,  consider  using multiple  thread  pools  so each can be
tuned according to its workload.
For compute intensive tasks, an Ncpu processor system usually achieves optimum utilization with a thread pool of Ncpu
+1  threads.  (Even  compute intensive  threads  occasionally  take  a  page  fault  or  pause  for  some  other  reason,  so  an
"extra" runnable thread prevents CPU cycles from going unused when this happens.) For tasks that also include I/O or
other blocking operations, you want a larger pool, since not all of the threads will be schedulable at all times. In order to
size the pool properly, you must estimate the ratio of waiting time to compute time for your tasks; this estimate need
not be precise and can be obtained through pro filing or instrumentation. Alternatively, the size of the thread pool can
be tuned by running the application using several different pool sizes under a benchmark load and observing the level of
CPU utilization.
Given these definitions:
     = number of CPUs
     = target CPU utilization,          

  = ratio of wait time to compute time
The optimal pool size for keeping the processors at the desired utilization is:
                           



You can determine the number of CPUs using Runtime:
int N_CPUS = Runtime.getRuntime().availableProcessors();

>Of course, CPU cycles are not the only resource you might want to manage using thread pools. Other resources that can
contribute  to  sizing  constraints are memory,  file  handles, socket  handles, and database connections.  Calculating pool
size  constraints  for these types of resources  is  easier:  just add up how much of that resource each task requires and
divide that into the total quantity available. The result will be an upper bound on the pool size.
When tasks require a pooled resource such as database connections, thread pool size and resource pool size affect each
other.  If each  task  requires a connection, the  effective  size  of the  thread pool  is  limited by the connection pool  size.
Similarly, when the only consumers of connections are pool tasks, the effective size of the connection pool is limited by
the thread pool size.

>we can create different types of thread pools using Executors; but to vary the nature of thread pools(size, queueing, creation of threads, etc) you can use ThreadPoolExecutor; say configuration of a thread pool

core size - size of the thread pool in normal situation
max pool size - pool size can go up to this point if thread pool is busy; but always attempt to stay at core size whenever idle threads present
keep alive time - maximum time a thread can be idle; after this if the pool size is greater than core size, this thread will be terminated

>when ThreadPoolExecutor is created not all threads are created; as and when tasks are submitted, threads are created; unless you call prestartAllCoreThreads

> Developers are sometimes tempted to set the core size to zero so that the worker threads will eventually be torn down and therefore won't
prevent the JVM from exiting, but this can cause some strange seeming behavior in thread pools that don't use a SynchronousQueue for their
work queue (as newCachedThreadPool does). If the pool is already at the core size, ThreadPoolExecutor creates a new thread only if
the work queue is full. So tasks submitted to a thread pool with a work queue that has any capacity and a core size of zero will not execute until
the queue fills up, which is usually not what is desired. In Java 6, allowCoreThreadTimeOut allows you to request that all pool threads be
able to time out; enable this feature with a core size of zero if you want a bounded thread pool with a bounded work queue but still have all the
threads torn down when there is no work to do.
Listing 8.2. General Constructor for ThreadPoolExecutor.
public ThreadPoolExecutor(int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue workQueue,
ThreadFactory threadFactory,
RejectedExecutionHandler handler) { ... }

>The newFixedThreadPool factory sets both the core pool size and the maximum pool size to the requested pool size,
creating  the  effect  of  infinite  timeout;  the  newCachedThreadPool  factory  sets  the  maximum  pool  size  to
Integer.MAX_VALUE and  the  core  pool  size  to  zero  with a timeout  of one minute,  creating  the  effect  of an  infinitely
expandable  thread  pool  that  will  contract  again when demand  decreases. Other combinations  are  possible  using  the
explicit ThreadPool-Executor constructor.

>For  very  large  or  unbounded  pools,  you  can  also  bypass  queuing  entirely  and  instead  hand  off  tasks  directly  from
producers  to  worker  threads  using  a  SynchronousQueue.  A  SynchronousQueue  is  not  really  a  queue  at  all,  but  a
mechanism  for  managing  handoffs  between  threads.  In  order  to  put  an  element  on  a  SynchronousQueue,  another
thread must already be waiting to accept the handoff. If no thread is waiting but the current pool size is less than the
maximum,  Thread-PoolExecutor  creates  a  new  thread;  otherwise  the  task  is  rejected  according  to  the  saturation
policy. Using a direct handoff is more efficient because the task can be handed right to the thread that will execute it,
rather than first placing it on a queue and then having the worker thread fetch it from the queue. SynchronousQueue is
a  practical choice  only  if  the  pool  is unbounded  or  if  rejecting  excess  tasks  is acceptable. The  newCachedThreadPool
factory uses a SynchronousQueue.

>The  newCachedThreadPool  factory  is  a  good  default  choice  for  an  Executor,  providing  better  queuing  performance
than a fixed thread pool.[5] A fixed size thread pool is a good choice when you need to limit the number of concurrent
tasks  for  resource management  purposes,  as  in  a  server  application  that  accepts  requests  from  network  clients  and
would otherwise be vulnerable to overload.

[5] This performance difference comes from the use of SynchronousQueue instead of LinkedBlocking-Queue. SynchronousQueue
was replaced in Java 6 with a new non blocking algorithm that improved throughput in Executor benchmarks by a factor of three over the Java
5.0 SynchronousQueue implementation (Scherer et al., 2006).
Bounding either the thread pool or the work queue is suitable only when tasks are independent. With tasks that depend
on other tasks, bounded thread pools or queues can cause thread starvation deadlock; instead, use an unbounded pool
configuration like newCachedThreadPool.[6]
[6]  An  alternative  configuration  for  tasks  that  submit  other  tasks  and  wait  for  their  results  is  to  use  a  bounded  thread  pool,  a
SynchronousQueue as the work queue, and the caller runs saturation policy.

8.3.3. Saturation Policies
--------------------------
When  a  bounded  work  queue  fills  up,  the  saturation  policy  comes  into  play.  The  saturation  policy  for  a
ThreadPoolExecutor  can  be  modified  by  calling  setRejectedExecutionHandler.  (The  saturation  policy  is  also  used
when  a  task  is  submitted  to  an  Executor  that  has  been  shut  down.)  Several  implementations  of
RejectedExecutionHandler  are  provided,  each  implementing  a  different  saturation  policy:  AbortPolicy,
CallerRunsPolicy, DiscardPolicy, and DiscardOldestPolicy.
The default policy, abort, causes execute to throw the unchecked Rejected-ExecutionException; the caller can catch
this  exception  and  implement  its own  overflow  handling  as  it  sees  fit.  The  discard  policy  silently  discards  the  newly
submitted task if it cannot be queued for execution; the discard oldest policy discards the task that would otherwise be
executed  next  and  tries  to  resubmit  the  new  task.  (If  the  work  queue  is  a  priority  queue,  this  discards  the  highest
priority element, so the combination of a discard oldest saturation policy and a priority queue is not a good one.)

>The caller runs policy implements a form of throttling that neither discards tasks nor throws an exception, but instead
tries  to  slow  down  the  flow  of  new  tasks  by  pushing  some  of  the  work  back  to  the  caller.  It  executes  the  newly
submitted task not in a pool thread, but in the thread that calls execute. If we modified our WebServer example to use
a bounded queue and the caller runs policy, after all the pool threads were occupied and the work queue filled up the
next task would be executed in the main thread during the call to execute. Since this would probably take some time,
the main thread cannot submit any more tasks for at least a little while, giving the worker threads some time to catch
up on the backlog. The main thread would also not be calling accept during this time, so incoming requests will queue
up in the TCP layer instead of in the application. If the overload persisted, eventually the TCP layer would decide it has
queued  enough  connection  requests  and  begin  discarding  connection  requests  as  well.  As  the  server  becomes
overloaded, the overload is gradually pushed outward   from the pool threads to the work queue to the application to
the TCP layer, and eventually to the client   enabling more graceful degradation under load.

Listing 8.3. Creating a Fixed sized Thread Pool with a Bounded Queue and the Caller runs Saturation Policy.
ThreadPoolExecutor executor
= new ThreadPoolExecutor(N_THREADS, N_THREADS,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(CAPACITY));
executor.setRejectedExecutionHandler(
new ThreadPoolExecutor.CallerRunsPolicy());

8.3.4. Thread Factories
----------------------------
Whenever  a  thread  pool  needs  to  create  a  thread,  it  does  so  through  a  thread  factory  (see  Listing  8.5).  The  default
thread factory creates a new, nondaemon thread with no special configuration. Specifying a thread factory allows you to
customize the configuration of pool threads. ThreadFactory has a single method, newThread, that is called whenever a
thread pool needs to create a new thread.
There  are  a  number  of  reasons  to  use  a  custom  thread  factory.  You  might  want  to  specify  an
UncaughtExceptionHandler  for  pool  threads,  or  instantiate  an  instance  of  a  custom  Thread  class,  such  as  one  that
performs debug logging. You might want to modify the priority (generally not a very good idea; see Section 10.3.1) or
set the daemon status (again, not all that good an idea; see Section 7.4.2) of pool threads. Or maybe you just want to
give pool threads more meaningful names to simplify interpreting thread dumps and error logs.

Listing 8.5. ThreadFactory Interface.
public interface ThreadFactory {
Thread newThread(Runnable r);
}

>The interesting customization takes place in MyAppThread, shown in Listing 8.7, which lets you provide a thread name,
sets  a  custom  UncaughtException-Handler  that  writes  a  message  to  a  Logger,  maintains  statistics  on  how  many
threads have been created and destroyed, and optionally writes a debug message to the log when a thread is created or
terminates.
If your application takes advantage of security policies to grant permissions to particular codebases, you may want to
use  the  privilegedThreadFactory  factory  method  in  Executors  to  construct  your  thread  factory.  It  creates  pool
threads that have the same permissions, AccessControlContext, and contextClassLoader as the thread creating the
privilegedThreadFactory.  Otherwise,  threads  created  by  the  thread  pool  inherit  permissions  from  whatever  client
happens  to be  calling execute  or submit  at  the time a new  thread  is needed, which  could cause  confusing  security
related exceptions.

8.3.5. Customizing ThreadPoolExecutor After Construction
--------------------------------------------------------
Most of the options passed to the ThreadPoolExecutor constructors can also be modified after construction via setters
(such as the core thread pool size, maximum thread pool size, keep alive time, thread factory, and rejected execution
handler).  If  the  Executor  is  created  through  one  of  the  factory  methods  in  Executors  (except
newSingleThreadExecutor), you can cast the result to Thread-PoolExecutor to access the setters as in Listing 8.8.
Executors  includes  a  factory  method,  unconfigurableExecutorService,  which  takes  an  existing  ExecutorService
and wraps  it with one exposing only the methods of ExecutorService so  it cannot be further configured. Unlike the
pooled implementations, newSingleThreadExecutor returns an ExecutorService wrapped in this manner, rather than
a  raw  ThreadPoolExecutor.  While  a  single threaded  executor  is  actually  implemented  as  a  thread  pool  with  one
thread, it also promises not to execute tasks concurrently. If some misguided code were to increase the pool size on a
single threaded executor, it would undermine the intended execution semantics.

8.4. Extending ThreadPoolExecutor
---------------------------------
ThreadPoolExecutor was designed for extension, providing several "hooks" for subclasses to override beforeExecute,
afterExecute, and terminate that can be used to extend the behavior of ThreadPoolExecutor.
The  beforeExecute  and  afterExecute  hooks  are  called  in  the  thread  that  executes  the  task,  and  can  be  used  for
adding logging, timing, monitoring, or statistics gathering. The afterExecute hook is called whether the task completes
by returning normally from run or by throwing an Exception. (If the task completes with an Error, afterExecute is not
called.) If beforeExecute throws a RuntimeException, the task is not executed and afterExecute is not called.
The terminated hook is called when the thread pool completes the shutdown process, after all tasks have finished and
all worker threads have shut down. It can be used to release resources allocated by the Executor during its lifecycle,
perform notification or logging, or finalize statistics gathering.


>Sequential loop iterations are suitable for parallelization when each iteration is independent of the others and the work
done in each iteration of the loop body is significant enough to offset the cost of managing a new task.

Chapter 13: Explicit Locks
==========================

>Lock and ReentrantLock

Why create a new locking mechanism that is so similar to intrinsic locking ?Intrinsic locking works fine in most situations
but has some functional limitations   it is not possible to interrupt a thread waiting to acquire a lock, or to attempt to
acquire a lock without being willing to wait for it forever. Intrinsic locks also must be released in the same block of code
in which  they  are  acquired;  this  simplifies  coding and  interacts  nicely  with  exception handling,  but makes non block
structured  locking  disciplines  impossible. None  of  these  are  reasons  to  abandon  synchronized, but  in some  cases  a
more flexible locking mechanism offers better liveness or performance.

>With intrinsic locks, a deadlock is fatal   the only way to recover is to restart the application,
and the only defense  is to  construct your program so  that  inconsistent  lock ordering  is impossible. Timed and  polled locking offer another option: probabilistic deadlock avoidance.

>lockInterruptibly() is useful for implementing cancellable tasks

> When we started this book, ReentrantLock seemed the last word in lock scalability. Less than a year later, intrinsic locking gives it a good run for its money. Performance is not just a moving target, it can be a fast moving target.

>the comparative advantage of explicit locks over intrinsic went down in Java 6 from what it was in Java 5 because intrinsic locks improved in Java6

>With  a  fair  lock,  a  newly  requesting  thread  is  queued  if  the  lock  is  held  by  another  thread  or  if  threads  are  queued waiting for the lock; with a nonfair lock, the thread is queued only if the lock is currently held.[4]
[4] The polled tryLock always barges, even for fair locks.

 #otherwise there is no point in having tryLock() method if it can't barge in?!

>usually non-fair locks have higher throughput than fair lock because of reduced time spent in queueing threads
>concurrent collections have even higher throughput than non-fair locks

>auto lock release of intrinsic locking is a far greater advantage over the explicit locks; use explicit locks only when you can't achieve what you want with synchronized code

A warning cry
-------------
Under  Java  5.0,  intrinsic  locking  has  another  advantage  over  ReentrantLock:  thread  dumps  show  which  call  frames
acquired which locks and can detect and identify deadlocked threads. The JVM knows nothing about which threads hold
ReentrantLocks  and  therefore  cannot  help  in  debugging  threading  problems  using  ReentrantLock.  This  disparity  is
addressed in Java 6 by providing a management and monitoring interface with which locks can register, enabling locking
information for ReentrantLocks to appear in thread dumps and through other management and debugging interfaces.
The  availability  of  this  information  for  debugging  is  a  substantial,  if  mostly  temporary,  advantage  for  synchronized;
locking information in thread dumps has saved many programmers from utter consternation. The non block structured
nature  of  ReentrantLock  still means  that  lock  acquisitions  cannot  be  tied  to  specific  stack  frames,  as  they  can  with
intrinsic locks.
Future  performance  improvements  are  likely  to  favor  synchronized  over  ReentrantLock.  Because  synchronized  is
built into the JVM, it can perform optimizations such as lock elision for thread confined lock objects and lock coarsening
to eliminate synchronization with intrinsic locks (see Section 11.3.2); doing this with library based locks seems far less
likely.  Unless  you  are  deploying  on  Java  5.0  for  the  foreseeable  future  and  you  have  a  demonstrated  need  for
ReentrantLock's  scalability  benefits  on  that  platform,  it  is  not  a  good  idea  to  choose  ReentrantLock  over
synchronized for performance reasons.

ReadWrite Lock
--------------
>read write  locks  allow:  a  resource  can be  accessed  by  multiple  readers  or  a  single
writer at a time, but not both.

>The  interaction  between  the  read  and  write  locks  allows  for  a  number  of  possible  implementations.  Some  of  the
implementation options for a ReadWriteLock are:
Release preference. When a writer releases the write lock and both readers and writers are queued up, who should be
given preference   readers, writers, or whoever asked first ?
Reader barging.  If the  lock  is held by readers but there are waiting  writers, should newly  arriving readers be granted
immediate  access,  or  should  they  wait  behind  the  writers ? Allowing  readers  to  barge  ahead  of  writers  enhances
concurrency but runs the risk of starving writers.
Reentrancy. Are the read and write locks reentrant ?
Downgrading. If a thread holds the write lock, can it acquire the read lock without releasing the write lock ?This would
let a writer "downgrade" to a read lock without letting other writers modify the guarded resource in the meantime.
Upgrading. Can a read lock be upgraded to a write lock in preference to other waiting readers or writers ?Most read
write  lock implementations do  not  support upgrading,  because  without  an  explicit  upgrade  operation  it  is  deadlock
prone. (If two readers simultaneously attempt to upgrade to a write lock, neither will release the read lock.)
ReentrantReadWriteLock  provides  reentrant  locking  semantics  for  both  locks.  Like  ReentrantLock,  a
ReentrantReadWriteLock can be constructed as non fair (the default) or fair. With a fair lock, preference is given to the
thread that has been waiting the longest; if the lock is held by readers and a thread requests the write lock, no more
readers are allowed to acquire the read lock until the writer has been serviced and releases the write lock. With a non
fair lock, the order in which threads are granted access is unspecified. Downgrading from writer to reader is permitted;
upgrading from reader to writer is not (attempting to do so results in deadlock).
Like ReentrantLock, the write lock in ReentrantReadWriteLock has a unique owner and can be released only by the
thread that acquired it. In Java 5.0, the read lock behaves more like a Semaphore than a lock, maintaining only the count
of active readers, not their identities. This behavior was changed in Java 6 to keep track also of which threads have been
granted the read lock. [6]
[6] One reason for this change is that under Java 5.0, the lock implementation cannot distinguish between a thread requesting the read lock for the
first time and a reentrant lock request, which would make fair read write locks deadlock prone.
Read write locks can improve concurrency when locks are typically held for a moderately long time and most operations
do not modify the guarded resources. ReadWriteMap in Listing 13.7 uses a ReentrantReadWriteLock to wrap a Map so
that  it  can  be  shared  safely  by  multiple  readers  and  still  prevent  reader writer  or  writer writer  conflicts.[7]  In  reality,
ConcurrentHashMap's  performance  is  so  good  that  you  would  probably  use  it  rather  than  this  approach  if  all  you
needed was a concurrent hash based map, but this technique would be useful if you want to provide more concurrent
access to an alternate Map implementation such as LinkedHashMap.
[7] ReadWriteMap does not implement Map because implementing the view methods such as entrySet and values would be difficult and
the "easy" methods are usually sufficient.

Chapter 14:Building Custom Synchronizers
========================================
>Every call to wait is implicitly associated with a specific condition predicate. When  calling wait regarding a particular condition predicate, the caller must already hold the lock associated with the condition queue, and that lock must also guard the state variables from which the condition predicate is composed.

>When  control  re enters  the  code  calling wait,  it has  reacquired  the  lock  associated  with  the  condition queue.  Is  the condition predicate now true ?Maybe. It might have been true at the time the notifying thread called notifyAll, but could  have  become  false  again  by  the  time  you  reacquire  the  lock.  Other  threads  may  have  acquired  the  lock  and changed the object's state between when your thread was awakened and when wait reacquired the lock. Or maybe it hasn't been true at all since you called wait. You don't know why another thread called notify or notifyAll; maybe it was because another condition  predicate associated with the same  condition queue became  true. Multiple condition predicates per condition queue are quite common   BoundedBuffer uses the same condition queue for both the "not full" and "not empty" predicates.[9]

>For all these reasons, when you wake up from wait you must test the condition predicate again, and go back to waiting
(or  fail)  if  it  is not yet true. Since you can wake up repeatedly without your condition predicate being true, you must
therefore always call wait from within a loop, testing the condition predicate in each iteration. The canonical form for a
condition wait is shown in Listing 14.7.
Listing 14.7. Canonical Form for State dependent Methods.
void stateDependentMethod() throws InterruptedException {
// condition predicate must be guarded by lock
synchronized(lock) {
while (!conditionPredicate())
lock.wait();
// object is now in desired state
}
}

>When using condition waits (Object.wait or Condition.await):
  Always have a condition predicate some test of object state that must hold before proceeding;
  Always test the condition predicate before calling wait, and again after returning from wait;
  Always call wait in a loop;
  Ensure  that  the  state  variables  making  up  the  condition  predicate  are  guarded  by  the  lock  associated  with  the condition queue;
  Hold the lock associated with the condition queue when calling wait, notify, or notifyAll; and
  Do not release the lock after checking the condition predicate but before acting on it.

>Missing signals: if you miss the signal that comes before waiting, you end up waiting forever. so check the condition before calling wait(while holding the lock) as shown in the previous code sample

>Whenever  you  wait  on  a  condition,  make  sure  that  someone  will  perform  a  notification  whenever  the  condition predicate becomes true.

>when two threads wait on different conditions on the same object, a single notify() might wake up the wrong thread and target thread may wait forever for the hijacked notify; so use notifyAll()

>Single notify can be used instead of notifyAll only when both of the following conditions hold:
  Uniform waiters. Only one condition predicate is associated with the condition queue, and each thread executes the
same logic upon returning from wait; and One in, one out. A notification on the condition variable enables at most one thread to proceed.

#latch that lets multiple threads in is not a candidate for single notify
#bounded buffer where threads might wait for two different conditions(empty or full) again is not a candidate for single notify

>but single notify() is efficient than notifyAll() if we could use it

> If ten threads are waiting on a condition
queue,  calling notifyAll causes each of them to wake up and contend for the lock; then most or all of them will go
right back to sleep. This means a  lot of context switches and a  lot of contended  lock acquisitions for each event that
enables (maybe) a single thread to make progress. (In the worst case, using notify-All results in O(n^2) wakeups where
n  would  suffice.)  This  is  another  situation  where  performance  concerns  support  one  approach  and  safety  concerns
support the other.

>a  notification  is  performed  every  time  an
object is put into or removed from the buffer. This could be optimized by observing that a thread can be released from a
wait only if the buffer goes from empty to not empty or from full to not full, and notifying only if a put or take effected
one  of  these  state  transitions.  This  is  called  conditional  notification.  While  conditional  notification  can  improve
performance,  it  is  tricky  to  get  right  (and  also  complicates  the implementation  of  subclasses) and  so  should be used
carefully.
Single notification and conditional notification are optimizations. As always, follow the principle "First make it right, and
then make it fastif it is not already fast enough" when using these optimizations; it is easy to introduce strange liveness
failures by applying them incorrectly.

>AbstractQueuedSynchronizer, upon which most  of  the  state dependent  classes  in java.util.concurrent  are  built
(see  Section  14.4),  exploits  the  concept  of  exit  protocol.  Rather  than  letting  synchronizer  classes  perform  their own
notification,  it  instead  requires  synchronizer  methods  to  return  a  value  indicating  whether  its  action  might  have
unblocked one  or more  waiting  threads.  This  explicit  API  requirement makes  it  harder  to  "forget"  to  notify on  some
state transitions.

>Intrinsic  condition queues have  several drawbacks. Each  intrinsic  lock can have  only one  associated  condition queue,
which means that in classes like BoundedBuffer multiple threads might wait on the same condition queue for different
condition predicates, and the most common pattern for locking involves exposing the condition queue object. Both of
these factors make it impossible to enforce the uniform waiter requirement for using notifyAll. If you want to write a
concurrent object  with  multiple  condition  predicates,  or  you  want  to  exercise more  control  over  the  visibility  of  the
condition  queue,  the  explicit  Lock  and  Condition  classes  offer  a  more  flexible  alternative  to  intrinsic  locks  and
condition queues.

>A Condition is associated with a single Lock, just as a condition queue is associated with a single intrinsic lock; to create
a Condition, call Lock.newCondition on the associated lock. And just as Lock offers a richer feature set than intrinsic
locking, Condition offers a richer feature set than intrinsic condition queues: multiple wait sets per lock, interruptible
and uninterruptible condition waits, deadline based waiting, and a choice of fair or nonfair queueing.
Listing 14.10. Condition Interface.
public interface Condition {
void await() throws InterruptedException;
boolean await(long time, TimeUnit unit)
throws InterruptedException;
long awaitNanos(long nanosTimeout) throws InterruptedException;
void awaitUninterruptibly();
boolean awaitUntil(Date deadline) throws InterruptedException;
void signal();
void signalAll();
}

>Unlike intrinsic condition queues, you can have as many Condition objects per Lock as you want. Condition objects
inherit the fairness setting of their associated Lock; for fair locks, threads are released from Condition.await in FIFO
order.
Hazard  warning:  The  equivalents  of  wait,  notify,  and  notifyAll  for  Condition  objects  are  await,  signal,  and
signalAll. However, Condition extends Object, which means that it also has wait and notify methods. Be sure to
use the proper versions   await and signal - instead!

> ReentrantLock  requires  that  the Lock  be  held  when  calling signal  or  signalAll,  but  Lock  implementations  are  permitted  to
construct Conditions that do not have this requirement.

14.4. Anatomy of a Synchronizer
-------------------------------
>The  interfaces  of ReentrantLock and Semaphore have a  lot  in common. Both  classes  act  as a  "gate",  allowing only a limited  number  of  threads  to  pass  at  a  time;  threads  arrive  at  the  gate  and  are  allowed  through  (lock  or  acquire returns successfully), are made to wait (lock or acquire blocks), or are turned away (tryLock or tryAcquire returns false, indicating that the lock or permit did not become available in the time allowed). Further, both allow interruptible, uninterruptible, and timed acquisition attempts, and both allow a choice of fair or nonfair queueing of waiting threads.

>it  is  a common exercise to prove that a counting semaphore can be implemented using a lock (as in SemaphoreOnLock in Listing 14.12) and that a lock can be implemented using a counting semaphore.
In actuality, they are both implemented using a common base class, Abstract-QueuedSynchronizer (AQS)as are many
other  synchronizers.  AQS  is  a  framework  for  building  locks  and  synchronizers,  and  a  surprisingly  broad  range  of synchronizers can be built  easily and efficiently using it. Not only are ReentrantLock and Semaphore built using AQS, but so are CountDownLatch, ReentrantReadWriteLock, SynchronousQueue,[12] and FutureTask.
[12] Java6 replaces the AQS based SynchronousQueue with a (more scalable) non blocking version.

>Using  AQS  to  build  synchronizers  offers  several  benefits.  Not  only  does  it  substantially  reduce  the  implementation effort, but you also needn't pay for multiple points of contention, as you would when constructing one synchronizer on top of another. In SemaphoreOnLock, acquiring a permit has two places where it might block   once at the lock guarding the semaphore state, and then again if a permit is not available. Synchronizers built with AQS have only one point where they might block, reducing context switch overhead and improving throughput. AQS was designed for scalability, and all the synchronizers in java.util.concurrent that are built with AQS benefit from this.

AbstractQueuedSynchronizer
--------------------------
>For a class to be state dependent, it must have some state. AQS takes on the task of managing some of the state for the synchronizer  class:  it  manages  a  single  integer  of  state  information  that  can  be manipulated  through  the  protected getState, setState, and compareAndSetState methods.  This  can be used  to  represent  arbitrary  state;  for example, ReentrantLock uses it to represent the count of times the owning thread has acquired the lock, Semaphore uses it to represent the number of permits remaining, and FutureTask uses it to represent the state of the task (not yet started, running,  completed,  cancelled).  Synchronizers  can  also  manage  additional  state  variables  themselves;  for  example, ReentrantLock  keeps  track  of  the  current  lock  owner  so  it  can  distinguish  between  reentrant  and  contended  lock acquisition requests.

>Acquisition and release in AQS take the forms shown in Listing 14.13. Depending on the synchronizer, acquisition might be exclusive, as with Reentrant-Lock, or nonexclusive, as with Semaphore and CountDownLatch. An acquire operation has two parts. First, the synchronizer decides whether the current state permits acquisition; if so, the thread is allowed to  proceed,  and  if  not,  the  acquire  blocks  or  fails.  This  decision  is  determined  by  the  synchronizer  semantics;  for example, acquiring a lock can succeed if the lock is unheld, and acquiring a latch can succeed if the latch is in its terminal state.
The  second  part  involves  possibly  updating  the  synchronizer  state;  one  thread  acquiring  the  synchronizer  can  affect whether other threads can acquire it. For example, acquiring a lock changes the lock state from "unheld" to "held", and acquiring a permit from a Semaphore reduces the number of permits left. On the other hand, the acquisition of a latch by one thread does not affect whether other threads can acquire it, so acquiring a latch does not change its state.

Listing 14.13. Canonical Forms for Acquisition and Release in AQS.
boolean acquire() throws InterruptedException {

while (state does not permit acquire) {
if (blocking acquisition requested) {
enqueue current thread if not already queued
block current thread
}
else
return failure
}

possibly update synchronization state
dequeue thread if it was queued
return success
}
void release() {
update synchronization state
if (new state may permit a blocked thread to acquire)
unblock one or more queued threads
}

>A  synchronizer  supporting  exclusive  acquisition  should  implement  the  protected methods  TRyAcquire,  TRyRelease,
and  isHeldExclusively,  and  those  supporting  shared  acquisition  should  implement  tryAcquireShared  and
TRyReleaseShared. The acquire, acquireShared, release, and releaseShared methods in AQS call the TRy forms of
these methods in the synchronizer subclass to determine if the operation can proceed. The synchronizer subclass can
use  getState,  setState,  and  compareAndSetState  to  examine  and  update  the  state  according  to  its  acquire  and
release semantics, and informs the base class through the return status whether the attempt to acquire or release the
synchronizer  was  successful.  For  example,  returning  a  negative  value  from  TRyAcquireShared  indicates  acquisition
failure; returning zero indicates the synchronizer was acquired exclusively; and returning a positive value indicates the
synchronizer was acquired nonexclusively. The TRyRelease and TRyReleaseShared methods should return true if the
release may have unblocked threads attempting to acquire the synchronizer.
To simplify implementation of locks that support condition queues (like ReentrantLock), AQS also provides machinery
for constructing condition variables associated with synchronizers.

14.5.1. A Simple Latch
---------------------------
OneShotLatch in Listing 14.14 is a binary latch implemented using AQS. It has two public methods, await and signal,
that correspond to acquisition and release. Initially, the latch is closed; any thread calling await blocks until the latch is
opened. Once the latch is opened by a call to signal, waiting threads are released and threads that subsequently arrive
at the latch will be allowed to proceed.

Listing 14.14. Binary Latch Using AbstractQueuedSynchronizer.
@ThreadSafe
public class OneShotLatch {
private final Sync sync = new Sync();
public void signal() { sync.releaseShared(0); }
public void await() throws InterruptedException {
sync.acquireSharedInterruptibly(0);
}
private class Sync extends AbstractQueuedSynchronizer {
protected int tryAcquireShared(int ignored) {
// Succeed if latch is open (state == 1), else fail
return (getState() == 1) ? 1 : -1;
}
protected boolean tryReleaseShared(int ignored) {
setState(1); // Latch is now open
return true; // Other threads may now be able to acquire
}
}
}
In  OneShotLatch,  the  AQS  state  holds  the  latch  state     closed  (zero)  or  open  (one).  The  await  method  calls
acquireSharedInterruptibly  in  AQS,  which  in  turn  consults  the  TRyAcquireShared  method  in  OneShotLatch.  The
tryAcquire-Shared implementation must return a value indicating whether or not acquisition can proceed. If the latch
has  been  previously  opened,  tryAcquireShared  returns  success,  allowing  the  thread  to  pass;  otherwise  it  returns  a
value  indicating  that  the  acquisition  attempt  failed.  The  acquireSharedInterruptibly  method  interprets  failure  to
mean that the thread should be placed on the queue of waiting threads. Similarly, signal calls releaseShared, which
causes tryReleaseShared to be consulted. The TRyReleaseShared implementation unconditionally sets the latch state
to open and indicates (through its return value) that the synchronizer is in a fully released state. This causes AQS to let
all waiting threads attempt to reacquire the synchronizer, and acquisition will now succeed because tryAcquireShared
returns success.
OneShotLatch is a fully functional, usable, performant synchronizer, implemented in only twenty or so lines of code. Of
course, it is missing some useful feature   such as timed acquisition or the ability to inspect the latch statebut these are
easy  to  implement  as  well,  since  AQS  provides  timed  versions  of  the  acquisition  methods  and  utility  methods  for
common inspection operations.
OneShotLatch could have been implemented by extending AQS rather than delegating to it, but this is undesirable for
several reasons [EJ Item 14]. Doing so would undermine the simple (two method) interface of OneShotLatch, and while
the public methods of AQS won't allow callers to corrupt the latch state, callers could easily use them incorrectly. None
of the synchronizers  in java.util.concurrent extends AQS directly    they  all delegate to private inner subclasses of
AQS instead.

14.6.1. ReentrantLock
----------------------
ReentrantLock  supports  only  exclusive  acquisition,  so  it  implements  tryAcquire,  tryRelease,  and
isHeldExclusively;  tryAcquire  for  the  non fair  version  is  shown  in  Listing  14.15.  ReentrantLock  uses  the
synchronization  state  to  hold  the  lock  acquisition  count, and  maintains an owner  variable  holding  the  identity  of  the
owning thread that is modified only when the current thread has just acquired the lock or is just about to release it.[14] In
tryRelease,  it checks the owner  field  to ensure  that  the  current thread owns  the  lock before  allowing an unlock  to
proceed; in tryAcquire, it uses this field to differentiate between a reentrant acquisition and a contended acquisition
attempt.
[14] Because the protected state manipulation methods have the memory semantics of a volatile read or write and ReentrantLock is careful to
read the owner field only after calling getState and write it only before calling setState, ReentrantLock can piggyback on the memory
semantics of the synchronization state, and thus avoid further synchronization   see Section 16.1.4.
When a thread attempts to acquire a lock, tryAcquire first consults the lock state. If it is unheld, it tries to update the
lock state to indicate that it is held. Because the state could have changed since it was first inspected a few instructions
ago, tryAcquire uses compareAndSetState to attempt to atomically update the state to indicate that the lock is now
held and confirm that the state has not changed since last observed. (See the description of compareAndSet in Section
15.3.)  If  the  lock  state  indicates  that  it  is  already  held,  if the  current thread  is  the owner  of  the  lock,  the  acquisition
count is incremented; if the current thread is not the owner of the lock, the acquisition attempt fails.
Listing 14.15. tryAcquire Implementation From Non fair ReentrantLock.
protected boolean tryAcquire(int ignored) {
final Thread current = Thread.currentThread();
int c = getState();
if (c == 0) {
if (compareAndSetState(0, 1)) {
owner = current;
return true;
}
} else if (current == owner) {
setState(c+1);
return true;
}
return false;
}
ReentrantLock  also  takes  advantage  of  AQS's  built in  support  for  multiple  condition  variables  and  wait  sets.
Lock.newCondition returns a new instance of ConditionObject, an inner class of AQS.

>semaphore types of synchronizers only override xxxsharedxxx() versions of methods in AQS

14.6.2. Semaphore and CountDownLatch
---------------------------------------
Semaphore uses the AQS synchronization state to hold the count of permits currently available. The tryAcquireShared
method (see Listing 14.16) first computes the number of permits remaining, and if there are not enough, returns a value
indicating  that  the  acquire  failed.  If  sufficient  permits  appear  to  be  left,  it  attempts  to  atomically  reduce  the  permit
count  using  compareAndSetState.  If  that  succeeds  (meaning  that  the  permit  count  had  not  changed  since  it  last
looked), it returns a value indicating that the acquire succeeded. The return value also encodes whether other shared
acquisition attempts might succeed, in which case other waiting threads will also be unblocked.
The  while  loop  terminates  either  when  there  are  not  enough  permits  or  when  TRyAcquireShared  can  atomically
update the permit count to reflect acquisition. While any given call to compareAndSetState may fail due to contention
with another thread (see Section 15.3), causing it to retry, one of these two termination criteria will become true within
a reasonable number of retries. Similarly, tryReleaseShared increases the permit count, potentially unblocking waiting
threads, and retries until the update succeeds. The return value of TRyReleaseShared indicates whether other threads
might have been unblocked by the release.
CountDownLatch  uses AQS  in a  similar manner  to Semaphore:  the  synchronization  state  holds  the  current count. The
countDown  method  calls  release, which  causes  the  counter  to  be  decremented  and  unblocks  waiting  threads  if  the
counter  has  reached  zero;  await  calls  acquire,  which  returns  immediately  if  the  counter  has  reached  zero  and
otherwise blocks.
Listing 14.16. tryacquireshared and tryreleaseshared from Semaphore.
protected int tryAcquireShared(int acquires) {
while (true) {
int available = getState();
int remaining = available - acquires;
if (remaining < 0
|| compareAndSetState(available, remaining))
return remaining;
}
}
protected boolean tryReleaseShared(int releases) {
while (true) {
int p = getState();
if (compareAndSetState(p, p + releases))
return true;
}
}


14.6.3. FutureTask
-------------------
At first glance, FutureTask doesn't even look like a synchronizer. But Future.get has semantics that are very similar to that of a latch if some event (the completion or cancellation of the task represented by the FutureTask) has occurred, then threads can proceed, otherwise they are queued until that event occurs.
FutureTask  uses  the  AQS  synchronization  state  to  hold  the  task  status running,  completed,  or  cancelled.  It  also maintains additional state variables to hold the result of the computation or the exception it threw. It further maintains a  reference  to  the  thread  that  is  running  the  computation  (if  it  is  currently  in  the  running  state),  so  that  it  can  be interrupted if the task is cancelled.

14.6.4. ReentrantReadWriteLock
-------------------------------
The  interface  for  ReadWriteLock  suggests  there  are  two  locks; a  reader  lock  and  a  writer  lock;but  in  the  AQS based implementation  of  ReentrantReadWriteLock,  a  single  AQS  subclass  manages  both  read  and  write  locking. ReentrantRead-WriteLock uses 16  bits  of the  state  for the  write lock  count, and the other 16  bits  for the read lock count. Operations on the read lock use the shared acquire and release methods; operations on the write lock use the exclusive acquire and release methods. Internally, AQS  maintains  a  queue  of  waiting  threads,  keeping  track  of  whether  a  thread  has  requested  exclusive  or shared access. In ReentrantRead-WriteLock, when the lock becomes available, if the thread at the head of the queue was  looking  for  write access  it  will get  it, and  if the thread  at the head  of the queue was  looking  for read  access,  all
queued threads up to the first writer will get it.[15]

[15] This mechanism does not permit the choice of a reader preference or writer preference policy, as some read write lock implementations do. For that, either the AQS wait queue would need to be something other than a FIFO queue, or two queues would be needed. However, such a strict ordering policy is rarely needed in practice; if the nonfair version of ReentrantReadWriteLock does not offer acceptable liveness, the fair version usually provides satisfactory ordering and guarantees nonstarvation of readers and writers.


====================
NON-BLOCKING THREADS
====================

Refer http://www.ibm.com/developerworks/java/library/j-jtp04186/

--


>no locking mechanism is used to execute a critical section; all are allowed to perform their task simultaneously however, upon detection of outdated data in the middle of an atomic operation, the clients requested for atomic operation should be ready to receive aborted attempt; that is their entire operation will be cancelled and they have to try at it again

>the data getting outdated in the middle of an atomic sequence is detected by 'Compare And Set' methodology where, in the final step of committing the data, once again the data is checked for consistency; setting the data is always preceded by comparing the current data with data read during the start of the atomic operation(allowing other threads to butt-in in between)

>ususally CAS(Compare And Set) is provided by hardware instructions directly, and non-blocking algorithms build on these machine instructions

>but however from Java 5, atomic operations related registers(data field) are provided as java atomic data types by java.util.concurrent.atomic package; these atomic data types can be used to build non-blocking algorithms in java itself now

>non-blocking algorithms are fundamentally something similar to source control conflict-merge operations; for example, in CVS or ClearCase, when two different copies are committed at the same time, one succeeds and one fails; the failed copy should update its source one more time and make its changes on top of it and try to commit again and so on; this principle is used in non-blocking thread implementations

>non-blocking threads should request/entrust the entire atomic operation to the data; sometimes it gets accepted, sometimes it gets rejected; upon rejection, the thread has to reupdate the data and based on the new data, it has to compile its atomic operation and try again

>In locking mechanism, there is over-head of context switching, is involved when a thread has to wait for a lock; it practically gets suspended not doing any work till it acquires the lock; but in non-blocking thread situation, the thread at the worst, reattempts to modify the data couple of times instead of getting itself suspended; that is all; so throughput and performance is superior in non-blocking thread systems

>but! when the contention for a resource is extremely intensive, non-blocking threads keep attempting to modify the data, and the rejection rate(attempt failed due to an update in the data) is very high; and it is very high for all threads; in this situation, blocking threads perform better because, they simply wait for the lock and taking turn nicely instead of making chaos and increasing the rejection rate

>non-blocking threads can be implemented only when atomic data instruction is supported by the hardware

>java atomic classes are built on top of machine level atomic instructions which are very common in todays processors; but if the underlying platform does not support machine level atomic instructions, then internal locking mechanism is used and hence it may not be an actual non-blocking thread

>java atomic classes are kind of extension on volatile variables

>we know that volatile variables are so volatile that care should be taken when reading and modifying them, because they can change at anytime

>atomic java classes provide methods which act like they are dealing with volatile variables

>some methods provide behavior 'read like you read a volatile variable' - get() method
>some provide the behavior 'write like you write a volatile variable' - set()
>'read and then write like you read and write(togther one after the other) a volatile variable' - compareAndSet()

>as long as your atomic operation consists only one CAS operation(counter and stacks) your task ends in one do-while() loop where you keep trying till CAS succeeds; but what if your atomic operation is little complex(linked list and queues with tail) and involves two or more CAS operations; threads are thread-safe only with respect to one CAS; in this situation things can go wrong in between two CAS operations of a thread; the solution is always let every thread know clearly that some other thread is in the midst of its multiple CAS operations; this can be done by introducing code logics that look for inconsistency in the underlying data; so every thread can detect such inconsistency(in case of tailed queue, any thread can see that queue is being updated currently when it sees, its tail item has next reference; this means some thread is about to change the tail to point to actual new tail); so, instead of waiting for the thread to finish, the thread which sees the inconsistency finishes the job of the other thread and continues with its own task now; the original thread which was in the midst of updating, should check for its second CAS success status; it should perform second CAS and simply assume that it was a success; because any other thread that comes afterwards or came in between, would have done the job; otherwise, the first thread will corrupt the queue here

>so a data can be in two states; one is normal state; other is quiescent state; in the second state, any thread should be able to see the inconsistency and should be able to cleanup the data on its own

Usage of non-blocking algos:
----------------------------
>GC uses for concurrent and parallel collection
>scheduler uses to schedule processes and threads
>SynchronousQueue was replaced with new non-blocking version
>

New IDEA(well, old idea only; happens in atomic update while loops and CAS):
----------------------------------------------------------------------------
One way to achieve non blocking concurrency is to appoint a gateway agent to the shared data and put all requests to read and modify through that agent using a queue to hold such requests; agent performs the requests one by one from the queue; (note here each request is an atomic operation(set of operations that have to be performed together or cancel everything altogether) any request that cannot be committed to the shared data(due to inconsistency arose after one of the previous requests by someone else) is rejected and a notification is sent to the requester; the requester should always wait for the success or failure nofications for his requests in order to conclude his update/read operation; upon receiving failure status, the requester has to place a fresh request.

Another thing that would make everything easy is to let the said agent broadcast updates to the shared data to all threads, so that any thread is preparing itself to place a new request is well informed about the latest on going events.
--x--

Chapter 15: Atomic variables and non-blocking algorithms
========================================================
>Much  of  the  recent  research  on  concurrent  algorithms  has  focused  on  non blocking  algorithms,  which  use  low level atomic  machine  instructions  such  as  compare and swap  instead  of  locks  to  ensure  data  integrity  under  concurrent access. Non blocking algorithms are used extensively in operating systems and JVMs for thread and process scheduling, garbage collection, and to implement locks and other concurrent data structures.

>As of Java 5.0, it is possible to build efficient non blocking algorithms in Java using the atomic variable classes such as AtomicInteger and AtomicReference. Atomic variables can also be used as "better volatile variables" even if you are not developing non blocking algorithms. Atomic variables offer the same memory semantics as volatile variables, but with additional support for atomic updates   making  them  ideal  for  counters,  sequence  generators, and  statistics  gathering  while  offering  better  scalability  than lock based alternatives.

>Volatile  variables  are  a  lighter weight  synchronization  mechanism  than  locking  because  they  do  not  involve  context switches  or  thread  scheduling.  However,  volatile  variables  have  some  limitations  compared  to  locking:  while  they provide  similar  visibility  guarantees,  they  cannot  be  used  to  construct  atomic  compound  actions.  This  means  that volatile variables cannot be used when one variable depends on another, or when the new value of a variable depends on  its  old  value.  This  limits when  volatile  variables  are  appropriate,  since  they  cannot be used  to  reliably  implement common tools such as counters or mutexes.[2]

>Locking  has  a  few  other  disadvantages. When  a  thread  is  waiting  for  a  lock,  it  cannot  do  anything  else.  If  a  thread holding a  lock  is delayed (due to a page  fault, scheduling  delay, or the  like), then no thread that needs that  lock can make progress. This can be a serious problem if the blocked thread is a high priority thread but the thread holding the lock  is  a  lower priority  thread     a  performance  hazard  known  as  priority  inversion.  Even  though  the  higher priority thread should have precedence,  it must wait  until  the  lock  is  released, and  this  effectively downgrades  its  priority to that of the lower priority thread. If a thread holding a lock is permanently blocked (due to an infinite loop, deadlock, livelock, or other liveness failure), any threads waiting for that lock can never make progress.

>For lock based classes with fine grained operations (such as the synchronized collections classes, where most methods contain only a few operations), the ratio of scheduling overhead to useful work can be quite high when the lock is frequently contended.

15.2. Hardware Support for Concurrency
--------------------------------------
Exclusive locking is a pessimistic technique   it assumes the worst (if you don't lock your door, gremlins will come in and rearrange  your  stuff)  and  doesn't  proceed  until  you  can  guarantee,  by  acquiring  the  appropriate  locks,  that  other threads will not interfere. For  fine grained  operations,  there  is  an  alternate  approach  that  is  often  more  efficient     the  optimistic  approach, whereby you proceed with an update, hopeful that you can complete it without interference. This approach relies on collision detection to determine if there has been interference from other parties during the update, in which case the operation  fails  and  can  be  retried  (or  not).  The  optimistic  approach  is  like  the  old  saying,  "It  is  easier  to  obtain forgiveness than permission", where "easier" here means "more efficient".
Processors designed for multiprocessor operation provide special instructions for managing concurrent access to shared variables.  Early  processors  had  atomic  test and set,  fetch and increment,  or  swap  instructions  sufficient  for implementing mutexes that could in turn be used to implement more sophisticated concurrent objects. Today, nearly every modern  processor has some  form  of atomic read modify write  instruction, such  as compare and swap  or  load linked/store conditional. Operating systems and JVMs use these  instructions to implement  locks and concurrent data structures, but until Java 5.0 they had not been available directly to Java classes.

15.2.1. Compare and Swap
------------------------
The approach taken by most processor  architectures,  including IA32 and Sparc,  is to implement a compare and swap
(CAS)  instruction.  (Other  processors,  such  as  PowerPC,  implement  the same  functionality  with  a  pair  of  instructions: loadlinked and store conditional.) CAS has three operands   a memory location V on which to operate, the expected old value  A,  and  the new  value  B. CAS  atomically updates V  to  the new  value  B,  but  only  if  the  value  in V  matches  the expected old value A; otherwise it does nothing. In either case, it returns the value currently in V. (The variant called compare and set instead returns whether the operation succeeded.) CAS means "I think V should have the value A; if it does, put B there, otherwise don't change it but tell me I was wrong." CAS is an optimistic technique   it proceeds with the update in the hope of success, and can detect  failure if another thread has updated the variable since it was last examined. SimulatedCAS in Listing 15.1 illustrates the semantics (but not the implementation or performance) of CAS. When  multiple  threads  attempt  to  update  the  same  variable  simultaneously  using  CAS,  one  wins  and  updates  the variable's  value,  and  the  rest  lose.  But  the  losers  are  not  punished  by  suspension,  as  they  could  be  if  they  failed  to acquire a  lock;  instead, they are  told that they didn't win the race  this time but can  try  again. Because a thread that
loses a CAS is not blocked, it can decide whether it wants to try again, take some other recovery action, or do nothing.[3] This flexibility eliminates many of the liveness hazards associated with locking (though in unusual cases can introduce the risk of livelock   see Section 10.3.3).

>writing a CAS function in java or any other high level programming language is not exactly the actual CAS with actual performance benefits; you need to use CAS machine intructions(hardware support is necessary) in order to implement CAS

>At  first  glance,  the  CAS based  counter  looks  as  if  it  should  perform  worse  than  a  lock based  counter;  it  has  more operations  and  a  more  complicated  control  flow,  and  depends  on  the  seemingly  complicated  CAS  operation.  But  in reality, CAS based counters significantly outperform lock based counters if there is even a small amount of contention, and often even if there is no contention. The fast path for uncontended lock acquisition typically requires at least one CAS plus other lock related housekeeping, so more work is going on in the best case for a lock based counter than in the normal  case  for  the  CAS based  counter.  Since  the  CAS  succeeds  most  of  the  time  (assuming  low  to  moderate contention), the hardware will correctly predict the branch implicit in the while loop, minimizing the overhead of the more complicated control logic.
The language syntax for locking may be compact, but the work done by the JVM and OS to manage locks is not. Locking
entails traversing a relatively complicated code path in the JVM and may entail OS level locking, thread suspension, and context  switches.  In  the  best  case,  locking  requires  at  least  one  CAS,  so  using  locks  moves  the  CAS  out  of  sight  but doesn't save any actual execution cost. On the other hand, executing a CAS from within the program involves no JVM code, system calls, or scheduling activity. What looks like a longer code path at the application level is in fact a much shorter code path when JVM and OS activity are taken into account. The primary disadvantage of CAS is that it forces the  caller  to  deal  with  contention  (by  retrying,  backing  off,  or  giving  up),  whereas  locks  deal  with  contention automatically by blocking until the lock is available.[5]
[5] Actually, the biggest disadvantage of CAS is the difficulty of constructing the surrounding algorithms correctly.
CAS  performance  varies  widely  across  processors.  On  a  single CPU  system,  a  CAS  typically  takes  on  the  order  of  a handful  of  clock  cycles,  since  no  synchronization  across  processors  is  necessary.  As  of  this  writing,  the  cost  of  an uncontended CAS on  multiple CPU systems ranges from about ten to about 150  cycles; CAS performance  is a  rapidly moving  target  and  varies  not  only  across  architectures but  even  across  versions  of  the  same  processor.  Competitive forces will likely result in continued CAS performance improvement over the next several years. A good rule of thumb is that the cost of the "fast path" for uncontended lock acquisition and release on most processors is approximately twice the cost of a CAS.

15.2.3. CAS Support in the JVM
------------------------------
So, how does Java code convince the processor to execute a CAS on its behalf ?Prior to Java 5.0, there was no way to do this short of writing native code. In Java 5.0, low level support was added to expose CAS operations on int, long, and object references, and the JVM compiles these into the most efficient means provided by the underlying hardware. On platforms supporting CAS, the runtime inlines them into the appropriate machine instruction(s); in the worst case, if a CAS like instruction is not available the JVM uses a spin lock. This low level JVM support is used by the atomic variable classes  (AtomicXXX  in  java.util.concurrent. atomic)  to  provide  an  efficient  CAS  operation  on  numeric  and reference  types;  these  atomic  variable  classes  are  used,  directly  or  indirectly,  to  implement  most  of  the  classes  in java.util.concurrent.

Atomic Classes
--------------
AtomicInteger bears a superficial resemblance to an extended Counter class, but offers far
greater scalability under contention because it can directly exploit underlying hardware support for concurrency.
There  are  twelve  atomic  variable  classes,  divided  into  four  groups:  scalars,  field  updaters,  arrays,  and  compound variables. The most commonly used atomic variables are the scalars: AtomicInteger, AtomicLong, AtomicBoolean, and AtomicReference.  All  support  CAS;  the  Integer  and  Long  versions  support  arithmetic  as  well.  (To  simulate  atomic variables  of  other  primitive  types,  you  can  cast  short  or  byte  values  to  and  from  int,  and  use  floatToIntBits  or doubleToLongBits for floating point numbers.)
The  atomic  array  classes  (available  in  Integer,  Long,  and  Reference  versions)  are  arrays  whose  elements  can  be updated atomically. The atomic array classes provide volatile access semantics to the elements of the array, a feature not  available  for  ordinary  arrays    a  volatile  array  has volatile  semantics  only  for  the  array  reference,  not  for  its elements. (The other types of atomic variables are discussed in Sections 15.4.3 and 15.4.4.)
While the atomic  scalar  classes extend Number, they do not extend the  primitive wrapper  classes such as Integer or Long.  In  fact,  they  cannot:  the  primitive  wrapper  classes  are  immutable  whereas  the  atomic  variable  classes  are mutable.  The  atomic  variable  classes  also  do  not  redefine  hashCode  or  equals;  each  instance  is  distinct.  Like  most mutable objects, they are not good candidates for keys in hash based collections.

15.3.2. Performance Comparison: Locks Versus Atomic Variables
-------------------------------------------------------------
To  demonstrate  the  differences  in  scalability  between  locks  and  atomic  variables,  we  constructed  a  benchmark comparing  several  implementations  of  a  pseudorandom  number  generator  (PRNG).  In  a  PRNG,  the  next  "random" number is a deterministic function of the previous number, so a PRNG must remember the previous number as part of its state.
Listings 15.4 and 15.5 show two implementations of a thread safe PRNG, one using ReentrantLock and the other using
AtomicInteger. The test driver invokes each repeatedly; each iteration generates a random number (which fetches and
modifies the shared seed state) and also performs a number of "busy work" iterations that operate strictly on thread
local data. This simulates typical operations that include some portion of operating on shared state and some portion of operating on thread local state.
Figures 15.1 and 15.2 show throughput with low and moderate levels of simulated work in each iteration. With a low
level  of  thread local computation,  the  lock  or  atomic  variable  experiences  heavy  contention;  with  more  thread local computation, the lock or atomic variable experiences less contention since it is accessed less often by each thread.

15.4. Non blocking Algorithms
-----------------------------
Lock based algorithms are at risk for a number of liveness failures. If a thread holding a lock is delayed due to blocking I/O, page  fault, or other  delay,  it  is  possible that no thread  will make  progress. An algorithm  is  called non blocking  if failure or suspension of any thread cannot cause failure or suspension of another thread; an algorithm is called lock free if, at each step, some thread can make progress. Algorithms that use CAS exclusively for coordination between threads can, if constructed correctly, be both non blocking and lock free. An uncontended CAS always succeeds, and if multiple threads contend for a CAS, one always wins and therefore makes progress. Non blocking algorithms are also immune to deadlock or priority inversion (though they can exhibit starvation or livelock because they can involve repeated retries).
We've  seen  one  non blocking  algorithm  so  far:  CasCounter.  Good  non blocking  algorithms  are  known  for  many common data  structures,  including  stacks, queues,  priority queues, and hash  tables    though designing new ones  is a task best left to experts.

15.4.2. A Non blocking Linked List
----------------------------------
The two non blocking algorithms we've seen so far, the counter and the stack, illustrate the basic pattern of using CAS to update a value speculatively, retrying if the update fails. The trick to building non blocking algorithms is to limit the scope of atomic changes to a single variable. With counters this is trivial, and with a stack it is straightforward enough, but for more complicated data structures such as queues, hash tables, or trees, it can get a lot trickier.

15.4.3. Atomic Field Updaters
-----------------------------
Listing 15.7 illustrates the algorithm used by ConcurrentLinkedQueue, but the actual implementation is a bit different. Instead  of  representing  each  Node  with  an  atomic  reference,  ConcurrentLinkedQueue  uses  an  ordinary  volatile reference and updates it through the reflection based AtomicReferenceFieldUpdater, as shown in Listing 15.8.

Listing 15.8. Using Atomic Field Updaters in ConcurrentLinkedQueue.

private class Node {
private final E item;
private volatile Node next;
public Node(E item) {
this.item = item;
}
}

private static AtomicReferenceFieldUpdater nextUpdater= AtomicReferenceFieldUpdater.newUpdater(Node.class, Node.class, "next");

The  atomic  field  updater  classes  (available  in  Integer,  Long,  and  Reference  versions)  represent  a  reflection based "view"  of  an  existing  volatile  field  so  that  CAS  can  be  used  on  existing  volatile  fields.  The  updater  classes  have  no constructors;  to  create  one,  you  call  the  newUpdater  factory  method,  specifying  the  class  and  field  name.  The  field updater  classes are not tied to a  specific instance; one can be used to update the target  field for any instance of the target class(Node a, b;). The atomicity guarantees for the updater classes are weaker than for the regular atomic classes because you  cannot  guarantee  that  the  underlying  fields  will  not  be  modified  directly; the  compareAndSet  and  arithmetic
methods guarantee atomicity only with respect to other threads using the atomic field updater methods(some threads might use the Node instance directly to update). In  ConcurrentLinkedQueue,  updates  to  the  next  field  of  a  Node  are  applied  using  the  compareAndSet  method  of nextUpdater.  This somewhat  circuitous approach  is used  entirely  for performance  reasons.  For  frequently  allocated, short lived objects  like queue  link nodes, eliminating the creation of an AtomicReference for each Node  is  significant enough to reduce the cost of insertion operations. However, in nearly all situations, ordinary atomic variables perform just fine   in only a few cases will the atomic field updaters be needed. (The atomic field updaters are also useful when you want to perform atomic updates while preserving the serialized form of an existing class.)

15.4.4. The ABA Problem
-----------------------
The ABA problem is an anomaly that can arise from the naive use of compare and swap in algorithms where nodes can
be recycled (primarily in environments without garbage collection). A CAS effectively asks "Is the value of V still A?", and proceeds  with  the  update  if  so.  In  most  situations,  including  the  examples  presented  in  this  chapter,  this  is  entirely sufficient. However, sometimes we really want to ask "Has the value of V changed since I last observed it to be A?" For some  algorithms, changing V from A  to B and then back  to A  still counts  as a change  that  requires us  to  retry some algorithmic step.
This ABA problem can arise in algorithms that do their own memory management for link node objects. In this case, that the head of a list still refers to a previously observed node is not enough to imply that the contents of the list have not changed. If you cannot avoid the ABA problem by letting the garbage collector manage link nodes for you, there is still a relatively  simple  solution:  instead  of  updating  the  value  of  a  reference,  update  a  pair  of  values,  a  reference  and  a version  number.  Even  if  the  value  changes  from  A  to  B  and  back  to  A,  the  version  numbers  will  be  different.
AtomicStampedReference (and its cousin AtomicMarkableReference) provide atomic conditional update on a pair of
variables. AtomicStampedReference updates an object reference integer pair, allowing "versioned" references that are
immune[8] to the ABA problem. Similarly, AtomicMarkableReference updates an object reference boolean pair that is
used by some algorithms to let a node remain in a list while being marked as deleted.[9]

[8] In practice, anyway; theoretically the counter could wrap.
[9] Many  processors  provide  a  double wide  CAS  (CAS2  or  CASX)  operation  that  can  operate  on  a  pointer integer  pair,  which  would make  this operation reasonably efficient. As of Java 6, Atomic-StampedReference does not use double wide CAS even on platforms that support it. (Double wide CAS  differs from DCAS, which operates on two unrelated memory  locations;  as  of  this writing, no  current  processor implements DCAS.)

16.1.3. The Java Memory Model in 500 Words or Less
--------------------------------------------------
The Java Memory Model is specified in terms of actions, which include reads and writes to variables, locks and unlocks of monitors, and starting and joining with threads. The JMM defines a partial ordering [2] called happens before on all actions within the program. To guarantee that the thread executing action B can see the results of action A (whether or not A and B occur in different threads), there must be a happens before relationship between A and B. In the absence of a happens before ordering between two operations, the JVM is free to reorder them as it pleases.

The only order guarantees in JVM
--------------------------------
A data race occurs when a variable is read by more than one thread, and written by at least one thread, but the reads
and writes are not ordered by happens before. A correctly synchronized program is one with no data races; correctly
synchronized programs exhibit sequential consistency, meaning that all actions within the program appear to happen in
a fixed, global order.
The rules for happens before are:
  Program  order  rule.  Each  action  in  a  thread  happens before  every  action  in  that  thread  that  comes  later  in  the program order.
  Monitor lock rule. An unlock on a monitor lock happens before every subsequent lock on that same monitor lock.[3]
  Volatile variable rule. A write to a volatile field happens before every subsequent read of that same field.[4]
  Thread start rule. A call to Thread.start on a thread happens before every action in the started thread.
  Thread  termination  rule.  Any  action  in  a  thread  happens before  any  other  thread  detects  that  thread  has terminated, either by successfully return from Thread.join or by Thread.isAlive returning false.
  Interruption rule. A thread calling interrupt on another thread happens before the interrupted thread detects the
interrupt (either by having InterruptedException thrown, or invoking isInterrupted or interrupted).
  Finalizer rule. The end of a constructor for an object happens before the start of the finalizer for that object.
  Transitivity. If A happens before B, and B happens before C, then A happens before C.

Other happens before orderings guaranteed by the class library include:
  Placing  an  item  in  a  thread safe  collection  happens before  another  thread  retrieves  that  item  from  the collection;
  Counting down on a CountDownLatch happens before a thread returns from await on that latch;
  Releasing a permit to a Semaphore happens before acquiring a permit from that same Semaphore;
  Actions  taken by  the  task  represented by a Future happens before  another  thread  successfully  returns from
Future.get;
  Submitting a Runnable or Callable to an Executor happens before the task begins execution; and
  A thread arriving at a CyclicBarrier or Exchanger happens before the other threads are released from that
same barrier or exchange point. If CyclicBarrier uses a barrier action, arriving at the barrier happens before
the barrier action, which in turn happens before threads are released from the barrier.



No comments:

Post a Comment