аЯрЁБс>ўџ 79ўџџџ6џџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџьЅСu@ №П0Ы1bjbjцц :ŒŒЫ)џџџџџџˆzzzzzzzŽвввв оŽ] ъўўўўўўўўмоооооо$G R™ Š zўўўўў zzўў ўРzўzўмўм ИzzИўђ €­R3‡,ФвО:Им- 0] И# ј # ИŽŽzzzz# zИ$ўўўўўўў  ŽŽDвŽŽвGridSAT Application Description Wahid Chrabakh, Dan Nurmi {chrabakh,nurmi}@cs.ucsb.edu Background GridSAT is a parallel and complete satisfiability solver used to solve non-trivial SAT problems is a grid environment (large number of widely distributed, heterogeneous resources). The application uses a parallel solver algorithm based on Chaff to essentially attempt to solve SAT problems of the form ‘given a large, non-trivial Boolean expression, is there a variable configuration (and what are the variable values) which results in the expression evaluating to TRUE?’. The system stands apart from other SAT solvers in the sense that it was designed explicitly to run in grid environments, and has built in intelligent parallelism scheduling mechanisms. As a result of this design, the system has been used to successfully and quickly solve several previously unknown problems by utilizing vast amounts of computational resources. Computational Aspects GridSAT is a distributed parallel satisfiability solver based on Chaff. Chaff is a highly optimized complete sequential solver. GridSAT modifies the original sequential algorithm in order to make it possible to divide the search space between many clients. The sequential algorithm is based on the basic Davis-Putnam-Logeman-Loveland (DPLL) algorithm but it includes new optimizations. The basic algorithm uses a “decision” stack to keep track of its progress. The first level on the decision stack will be populated only by variable assignments which are definite. The algorithm starts with an empty stack in addition to the initial set of clauses which make up the SAT problem. The algorithm heuristically selects a variable and a Boolean value to assign to it. This is called a decision and adds a new level to the decision stack. The first variable in this level will be the decision variable. Additional variable assignments are added to the current level as new values are deduced from previous decisions. The process continues until the algorithm runs into a conflict where a variable is deduced to be both true and false. At this point the algorithm backtracks by shrinking the decision stack and continuing in a different part of the search space. The algorithm terminates when a solution is found where all clauses evaluate to true, or deducing that no such solution can be found. In the former case the problem is deemed “satisfiable”, but in the latter case the problem is said to be “unsatisfiable”. A major improvement to the basic algorithm is “learning”. This process enables the algorithm to deduce new clauses and store them in a local database. New clauses are generated after a conflict and are used in backtracking. The database of new clauses can grow indefinitely. Memory usage can be controlled by deleting some of the new clauses, (new clauses carry only redundant information that can be used to speed optimization) using some heuristic. GridSAT clients are also aware of the runtime conditions on the machine and will only use a configurable fraction of the memory available. This is accomplished by not allowing the clause database to out-grow the available memory. Since determining the difficulty of a given satsifiability instance before it is executed is equivalent to the Turning incompleteness problem, we require a heuristic for determining when resources should be added to the problem for the purposes of gaining additional parallelism. If a problem instance runs (without exhausting the memory of the machine on which it is running with new clauses) for a specified “timeout” value, we declare it to be suitable for additional concurrency. The goal of GridSAT, however, is only use additional resources when needed since the problem is tightly coupled. Thus, setting this timeout threshold is a scheduling parameter that can dramatically affect performance. A problem split (due to memory exhaustion or execution timeout) happens between two clients: the old client which has a sub-problem it is currently solving and a new client which wishes to participate in solving the same sub-problem. At the end of the splitting process both clients will have a new sub-problem to solve. However, together the two new sub-problems cover the entire search space of the initial sub-problem. GridSAT allows new clients to join and leave the pool of clients at any time. Actually resources that are finished investigating their assigned sub-problem are vacated. They may, however, be assigned a new sub-problem in the future. GridSAT allows clients to exchange learned clauses. This is beneficial since clauses learned by one client can be integrated and used to prune parts of the search space in other clients. GridSAT only exchanges the small size clause for two reasons. First, the smaller clauses have the greater impact on pruning the search space. A learned clause with few literals can potentially prune large portions of the decision tree. Second, smaller clauses require less communication overhead. GridSAT clients can incorporate new clause immediately after they are received. Therefore if a lot of clauses are received they will not be stored wasting valuable memory resources. The size of shared clauses is determined statically at the start and is usually set between 3 and 10. Since GridSAT clients are threaded clause exchange is overlapped with computation. . Usage Model and Runtime Behavior The GridSAT system uses a relatively unique, application driven mechanism for managing its pool of resources and degree of parallelism during the lifetime of a single execution. At the beginning of each execution, GridSAT obtains resource information by using the Information Server module from the GrADS project. This module queries the MDS for a list of available resources and their associated capabilities such as memory, number of CPUs and CPU types and speeds. It also consults the NWS server to obtain an array of network measurements for the connectivity between hosts. It is also possible to provide resource specification and connectivity using an input file. The user of GridSAT should have previously staged the application binary on resources he wishes to use (it currently does not use the launcher or contract monitor because it is not an MPI program). Once the resources are defined (and their qualities have been characterized using the NWS), the user starts the master program. The master launches a single solver task on one of the resources. This single task starts solving the entire problem, which amounts to a logical parameter sweep, until a threshold is reached in terms of the amount of time that has elapsed or memory used since the computation began, or until the memory is exhausted. The time threshold is determined to be two times the amount of time it would take to ‘split’ the search space and send the data to this client’s ‘best neighbor’, as calculated from NWS network latency and bandwidth measurements. The ‘best neighbor’ is a resource from the node pool which has been assigned a configurable weighted sum of CPU, memory, and bandwidth, to be the fastest way to split the search space and continue computation. Since CPU speed and memory size are correlated, previous GridSAT runs give CPU a dominating fraction of the weighted sum. The master program coordinates the two clients which are involved in splitting a sub-problem. The role of the master program is to introduce the old client and then new client. The clients then communicate directly as peers during the splitting process and keep the master up-to-date about their progress. After splitting both clients start solving their respective sub-problems. The newly added client node uses the same ‘two times the split time’ method to determine when it should split, at which point the client host consults the master node for the client’s ‘best neighbor’, splits, and then continues. Satisfiability problems can be arbitrarily hard and may take long periods of time even using large resource pools. The duration on some of the problems we investigated ranges from seconds to weeks. During this time planned and un-planned resource and network failures are frequent. GridSAT clients take checkpoints to recover from such failures. GridSAT clients are automatically restarted when a resource recovers and are assigned a sub-problem from the available list of checkpoints. GridSAT clients can be configured to take two types of checkpoints: Light or Heavy. Light checkpoints only involve saving the first level of the decision stack. This is on the order of the number of variables which is small and may reach 10s of KBs. This kind of checkpoints is managed by the master so as not to be overwhelming. Heavy checkpoints, however, are much larger and are in the order of 100s of MBs. Such files cannot be managed by a single host and are best stored in a distributed storage environment such as IBP. This solution also avoids having a communication bottleneck, which would exist if the checkpoints are stored at some central location. The resources in the resource pool include individual machines, clusters and batch systems. A batch system is represented in the resource list by a single machine which is the head node. When GridSAT starts, it launches an application on the head node which submits a request to the batch scheduler. When the request reaches the head of the queue the GridSAT clients are launched and connect to the master and join the pool of clients. The set of clients obtained from a batch job are treated in the same way as other clients. Since batch jobs represent a computationally powerful set of well connected nodes, GridSAT directs clients to migrate to run within this host set. This provides the distributed solver a powerful resource environment with smaller communication overhead while splitting or exchanging clauses. If the batch job is terminated by the scheduler before a solution to the SAT problem is found, the master will notice that a number of clients have failed. At this point the associated sub-problems may be assigned to other clients in the future using their respective checkpoints. Migration is currently implemented using static information which indicates which sets of machine names are marked as a destination for migration. GridSAT executions could take days or weeks, depending on the problem search space, resource pool, and resource attributes. By using dynamic internal scheduling techniques, individual client checkpoints, and ability to utilize new resources during the systems lifetime, the GridSAT application accomplishes the goal of keeping volatile grid resources busy until the application completes.  !Yde­ У і   –ДФХ+,Q)W)о)п))*J* +++,,А0У0T1l1Ы1њѓюъѓъѓърдъаъаъѓъЬъЬъЬъЬъЬъЬъЬъhд?.h68Ehw=4CJOJPJQJhw=4OJPJQJhw=4 hw=45 hw=45CJ hw=4CJ  !XYdeЌ ­ У Х Ч  ЏАZ[ЎЏ +,‘ §ћћћ§ћћћћћѕыћћћћхххћћћћ§ћх„а`„а „а7$8$H$`„а7$8$H$Ы1§‘ ’ ц&ч&d+e+D0E0Ы1љљїљљљљї„а`„аАа/ Ар=!А"А# $ %АœH@ёџH Normal CJOJPJQJ_HmH sH tH <@< Heading 1$@&5CJDAђџЁD Default Paragraph FontVi@ѓџГV  Table Normal :V і4ж4ж laі (k@єџС(No List Ы):џџџџ !XYdeЌ­УХЧЏ А Z [ ЎЏ +,‘’цчd#e#D(E(Э)‰<0‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€‰<0€˜0€€˜0€€x˜0€€˜0€€x˜0€€˜0€€˜0€€˜0€€˜0€€0€€ €˜0€€˜0€€˜0€€˜0€€˜0€€˜0€€˜0€€˜0€€ ˜0€€˜0€€(Ы1‘ Ы1Ы1'/5:<Jfm‰—ФЫцєDK _ j Ÿ Ќ t { ‡ • LSХЬАЗkr–ИП07 X]зоAHчѕ  A H Э д Т!Х!c"f"%$,$Ф$Ы$Ч%Ю%F(M(X)_)`)k)Э)DJ=?џ ‹Э)3333Э)`)k)Э)џџD ND ND ND ND NWahid ChrabakhD ND N Rich Wolski Rich Wolskiхд?.w=468Eџ@€k)k)иьk)k)Ы)@@џџUnknownџџџџџџџџџџџџG‡z €џTimes New Roman5€Symbol3& ‡z €џArial3‡z €џTimes qˆ№аh"Д„†щк„Fј<#K<#K!№ЅРДД€24Ж)Ж) 3ƒa№мH(№џ?фџџџџџџџџџџџџџџџџџџџџџ68EџџGridSAT Application DescriptionD NWahid Chrabakhўџр…ŸђљOhЋ‘+'Гй0tˆИФам№  0 < HT\dlш GridSAT Application DescriptionridD N N Normal.dotlWahid Chrabakht25iMicrosoft Word 10.0@аЯ8љ@ЄЪIТ(Ф@F\/‡,Ф<#ўџеЭеœ.“—+,љЎ0  hp€ˆ˜  ЈАИ Р ьшnone KЖ){  GridSAT Application Description Title ўџџџ !"#$%ўџџџ'()*+,-ўџџџ/012345ўџџџ§џџџ8ўџџџўџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџRoot Entryџџџџџџџџ РF i`3‡,Ф:€1TableџџџџџџџџWordDocumentџџџџџџџџ:SummaryInformation(џџџџ&DocumentSummaryInformation8џџџџџџџџџџџџ.CompObjџџџџџџџџџџџџjџџџџџџџџџџџџџџџџџџџџџџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџўџ џџџџ РFMicrosoft Word Document MSWordDocWord.Document.8є9Вq