Recall the Vertex Cover problem from the video lectures: given an undirected graph (with no parallel edges), compute a minimize-size subset of vertices that includes at least one endpoint of every edge. Consider the following greedy algorithm, given a graph : (1) initialize ; (2) while is not a vertex cover of : (2a) let denote the edges with neither endpoint in ; (2b) let be some edge of ; (2c) add both endpoints of to ; (3) return .

Consider the following statement: for every graph with vertices, this greedy algorithm returns a vertex cover with size at most times that of an optimal (minimum-size) vertex cover. Which of the following is the smallest choice of the function for which this statement is true?

[Hint: suppose the greedy algorithm picks an edge with endpoints and . What can you say about every feasible solution to the problem?]

ANSWER: What the algorithm finds in the end is a matching (a set of edges no two of which share an endpoint) that is “maximal” (we can’t add any more edges to it and keep it a matching). Note that a “maximum” matching (we cannot do better) is also a “maximal” matching, but the converse is not necessarily true. See the following example:

Matching in a Graph

Option 1 is correct. Let be a maximal matching, and be the set of all endpoints in .

Claim: The output of the algorithm is feasible.

Proof: We prove this by contradiction: suppose there exists an edge such that . Since does not share an endpoint with any of the vertices in , is a larger matching, which contradicts being a maximal matching.

Lemma: Given algorithm gives a 2-approximation for minimum vertex cover regardless of the choice of .

Proof: Let be the minimum vertex cover in graph . The edges of are independent; thus any feasible cover must take at least one vertex from every edge in . This means that and then we have:

This technique of lower bounding the optimum is often key in proving approximation factors, as we are usually unable to compute the value of OPT.

This algorithm was discovered independently by Fanica Gavril and Mihalis Yannakakis.

In the set cover problem, you are given sets , each a subset of a common set with size . The goal is to select as few of these sets as possible, subject to the constraint that the union of the chosen sets is all of . (You can assume that .) For example, if the given sets are {1,2},{2,3}, and {3,4}, then the optimal solution consists of the sets {1,2} and {3,4}.

Here is a natural iterative greedy algorithm. First, initialize 𝐶 =, where 𝐶 denotes the sets chosen so far. The main loop is: as long as the union 𝑆𝐶𝑆 of the sets chosen so far is not the entire set 𝑈:

  • Let 𝑆𝑖 be a set that includes the maximum-possible number of elements not in previously-chosen sets (i.e., that maximizes |𝑆𝑖 𝑆𝐶𝑆|).
  • Add 𝑆𝑖 to 𝐶.

Consider the following statement: for every instance of the set cover problem (with |𝑈| =𝑛), this greedy algorithm returns a set cover with size at most 𝑓(𝑛) times that of an optimal (minimum-size) set cover. Which of the following is the smallest choice of the function 𝑓(𝑛) for which this statement is true?

[Hint: what’s the mininum-possible progress that the greedy algorithm can make in each iteration, as a function of the size of an optimal set cover and of the number of elements that have not yet been covered?]

  • 2
  • 𝑂(𝑙𝑜𝑔𝑛)
  • 𝑂(𝑛)
  • 𝑂(𝑛)

ANSWER: Before giving the proof, we need one useful technical inequality.

Lemma: For all 𝑥 >0,

(11𝑥)𝑥1𝑥

where 𝑒 is the base of the natural logarithm.

Proof: We use the fact that for any real 𝑧 (positive, zero, or negative), 1 +𝑧 𝑒𝑧. (This follows from the Taylor’s expansion 𝑒𝑧 =1 +𝑧 +𝑧22! +𝑧33! + 1 +𝑧.) Now, if we substitute 1𝑥 for 𝑧 we have (1 1𝑥) 𝑒1𝑥. By raising both sides to the 𝑥th power, we have the desired result. We can also arrive at the same result pictorially as shown below:

We will cheat a bit. Let 𝑐 denote the size of the optimum set cover, and let 𝑔 denote the size of the greedy set cover minus 1. We will show that 𝑔 𝑐ln𝑛. (we could really show that 𝑔 +1 𝑐ln𝑛, but this is close enough and saves us some messy details.)

Let’s consider how many new elements we cover with each round of the algorithm. Initially, there are 𝑛0 =𝑛 elements to be covered. Let 𝑛𝑖 denote the number of elements remaining to be covered after 𝑖 iterations of the greedy algorithm. After 𝑖 1 rounds, there are 𝑛𝑖1 elements that remain to be covered. We know that there is a cover of size 𝑐 for these elements (namely, the optimal cover), and so by the pigeonhole principal there exists some set that covers at least 𝑛𝑖1𝑐 elements. Since the greedy algorithm selects the set covering the largest number of remaining elements, it must select a set that covers at least this many elements. The number of elements that remain to be covered is at most

𝑛𝑖𝑛𝑖1𝑛𝑖1𝑐=𝑛𝑖1(11𝑐)

Thus, with each iteration the number of remaining elements decreases by a factor of at least (1 1𝑐). If we repeat this 𝑖 times, we have

𝑛𝑖𝑛0(11𝑐)𝑖=𝑛(11𝑐)𝑖

How long can this go on? Since the greedy heuristic ran for 𝑔 +1 iterations, we know that just prior to the last iteration we must have had at least one remaining uncovered element, and so we have

1𝑛𝑔𝑛(11𝑐)𝑔=𝑛((11𝑐)𝑐)𝑔𝑐

(In the last step, we just rewrote the expression in a manner that makes it easier to apply the above technical lemma.) By the above lemma we have

1𝑛(1𝑒)𝑔𝑐

Now, if we multiply by 𝑒𝑔𝑐 on both sides and take natural logs we find that 𝑔 satisfies:

𝑒𝑔𝑐𝑛𝑔𝑐ln𝑛𝑔𝑐ln𝑛

Therefore (ignoring the missing “+1” as mentioned above), the greedy set cover is larger than the optimum set cover by a factor of at most ln𝑛. Since logs of different bases differ only by a constant factor, option 2 is correct.

Suppose you are given 𝑚 sets 𝑆1,,𝑆𝑚, each a subset of a common set 𝑈. The goal is to choose 2 of the 𝑚 sets, 𝑆𝑖 and 𝑆𝑗, to maximize the size |𝑆𝑖 𝑆𝑗| of their union. One natural heuristic is to use a greedy algorithm: (i) choose the first set 𝑆𝑖 to be as large as possible, breaking ties arbitrarily; (ii) choose the second set 𝑆𝑗 to maximize |𝑆𝑖 𝑆𝑗| (i.e., as the set that includes as many elements as possible that are not already in 𝑆𝑖), again breaking ties arbitrarily. For example, if the given sets are {1,2},{2,3},𝑎𝑛𝑑{3,4}, then the algorithm might pick the set {1,2} in the first step; if it does so, it definitely picks the set {3,4} in the second step (for an objective function value of 4).

Consider the following statement: for every instance of the above problem, the greedy algorithm above chooses two sets 𝑆𝑖,𝑆𝑗 such that |𝑆𝑖 𝑆𝑗| is at least 𝑐 times the maximum size of the union of two of the given sets. Which of the following is the largest choice of the constant 𝑐 for which this statement is true?

  • 1
  • 12
  • 23
  • 34

ANSWER: This is known as the Maximum Coverage Problem. We will prove the lower bound for 𝑘 sets, and show that the result doesn’t depend on the number of sets picked. Let 𝑂𝑃𝑇 denote the optimal solution; 𝑥𝑖 denote the number of newly covered elements at the 𝑖th iteration; 𝑦𝑖 denote the total number of covered elements up to, and including, the 𝑖th iteration, i.e., 𝑦𝑖 =𝑖𝑗=1𝑥𝑖; and 𝑧𝑖 is the number of uncovered elements after the 𝑖th iteration, i.e., 𝑧𝑖 =𝑂𝑃𝑇 𝑦𝑖. Moreover, 𝑥0 =𝑦0 =0, and 𝑧0 =𝑂𝑃𝑇.

Claim 1: 𝑥𝑖+1 𝑧𝑖𝑘

Proof: At each step, the algorithm selects a subset whose inclusion covers the maximum number of uncovered elements. Since the optimal solution uses 𝑘 sets to cover 𝑂𝑃𝑇 elements, some set must cover at least 1𝑘 fraction of the at least 𝑧𝑖 remaining uncovered elements from 𝑂𝑃𝑇. Hence, 𝑥𝑖+1 𝑧𝑖𝑘.

Claim 2: 𝑧𝑖+1 (11𝑘)𝑖+1 𝑂𝑃𝑇

Proof: By induction. For 𝑖 =0,𝑧1 =(1 1𝑘) 𝑂𝑃𝑇. We assume inductively that 𝑧𝑖 (11𝑘)𝑖 𝑂𝑃𝑇. Then

𝑧𝑖+1𝑧𝑖𝑥𝑖+1𝑧𝑖(11𝑘)-- using Claim 1(11𝑘)𝑖+1𝑂𝑃𝑇-- using the inductive hypothesis

Lemma: The algorithm is a 1 1𝑒 approximation for the optimal solution.

Proof: Using Claim 2, and the fact that (1 1𝑘) 𝑒1𝑘 (shown earlier), we have 𝑧𝑘 (11𝑘)𝑘 𝑂𝑃𝑇 𝑂𝑃𝑇𝑒. Hence, 𝑦𝑘 =𝑂𝑃𝑇 𝑧𝑘 (1 1𝑒) 𝑂𝑃𝑇.

Lastly, since 1𝑒 0.37,1 1𝑒 0.63. Option 3 is right, since 23 0.67, the closest to 1 1𝑒.

Consider the following job scheduling problem. There are 𝑚 machines, all identical. There are 𝑛 jobs, and job 𝑗 has size 𝑝𝑗. Each job must be assigned to exactly one machine. The load of a machine is the sum of the sizes of the jobs that get assigned to it. The makespan of an assignment of jobs is the maximum load of a machine; this is the quantity that we want to minimize. For example, suppose there are two machines and 4 jobs with sizes 7,8,5,6. Assigning the first two jobs to the first machine and the last two jobs to the second machine yields machine loads 15 and 11, for a makespan of 15. A better assignment puts the first and last jobs on the first machine and the second and third jobs on the second machine, for a makespan of 13.

Consider the following greedy algorithm. Iterate through the jobs 𝑗 =1,2,3,,𝑛 one-by-one. When considering job 𝑗, assign it to the machine that currently has the smallest load (breaking ties arbitrarily). For example, in the four-job instance above, this algorithm would assign the first job to the first machine, the second job to the second machine, the third job to the first machine, and the fourth job to the second machine (for a suboptimal makespan of 14).

Consider the following statement: for every such job scheduling instance, this greedy algorithm computes a job assignment with makespan at most 𝑐 times that of an optimal (minimum-makespan) job assignment. Which of the following is the smallest choice of the constant 𝑐 that makes this statement true?

[Hint: let 𝐴 and 𝐵 denote the average and maximum job sizes (𝐴 =𝑗𝑝𝑗𝑚 and 𝐵 =max𝑗𝑝𝑗). Try to relate both the optimal solution and the output of the greedy algorithm to 𝐴,𝐵.]

  • 2
  • 65
  • 32
  • 4

ANSWER: Let 𝑃 be the optional solution, One of the lower bounds on 𝑃 is given by 𝐴, because clearly, we can’t do better than assigning every machine the same load.

Lemma 1: 𝑃 𝐴

However, there is one case where this lower bound is too weak to be useful; when the size of one job dominates all the other. Thus, an additional lower bound is given by Lemma 2.

Lemma 2: 𝑃 𝐵

Consider machine 𝑀𝑖 that attains the maximum load in our assignment. Let 𝑃𝑖 be the load after the assignment of job 𝑗 to it. Just before job 𝑗 was assigned to 𝑀𝑖, the load on it was the smallest of any machine; therefore, every machine had a load of at least 𝑃𝑖 𝑝𝑗. Adding up the loads of all machines at that point, we have

𝑚(𝑃𝑖𝑝𝑗)𝑚𝑘=1𝑝𝑗(𝑃𝑖𝑝𝑗)1𝑚𝑚𝑘=1𝑝𝑗(𝑃𝑖𝑝𝑗)𝐴(𝑃𝑖𝑝𝑗)𝑃-- using Lemma 1

𝑝𝑗 can at most be 𝐵; i.e. 𝑝𝑗 𝐵 𝑃 𝑝𝑗 (using Lemma 2). After assigning job 𝑗 to machine 𝑀𝑖, its load is given by 𝑃𝑖 𝑝𝑗 +𝑝𝑗 2𝑃. Therefore, the greedy algorithm is a 2-approximation algorithm, and option 1 is correct.

Consider the same makespan-minimization job scheduling problem studied in the previous problem. Now suppose that, prior to running the greedy algorithm in the previous problem, we first sort the jobs from biggest to smallest. For example, in the four-job instance discussed in the previous problem, the jobs would be considered in the order 8,7,6,5, and the greedy algorithm would then produce an optimal schedule, with makespan 13.

Consider the following statement: for every such job scheduling instance, the greedy algorithm (with this sorting preprocessing step) computes a job assignment with makespan at most 𝑐 times that of an optimal (minimum-makespan) job assignment. Which of the following is the smallest choice of the constant 𝑐 for which this statement is true?

  • 32
  • 2
  • 65
  • 4

ANSWER:

Lemma: If there are more than 𝑚 jobs, 𝑃 2𝑝𝑚+1

Proof: If there are as many jobs as there are machines, then the optimal assignment is trivial; every machine gets a job. The optimal makespan is 𝐵. When there are more than 𝑚 jobs, since they are assigned in decreasing order of size, size of job 𝑚 +1 is no bigger than that of any of the jobs 1 through 𝑚. Since there are 𝑚 machines, and at least 𝑚 +1 jobs, by the Piegeonhole Principle, one machine is assigned two jobs, and its load is at least 2𝑝𝑚+1.

As before, we consider a machine 𝑀𝑖 that has the maximum load. If 𝑀𝑖 only holds a single job, then the schedule is optimal. So let’s assume that machine 𝑀𝑖 has at least two jobs, and let 𝑝𝑗 be the size of the last job assigned to the machine. Note that 𝑗 𝑚 +1, since the algorithm will assign the first 𝑚 jobs to 𝑚 distinct machines. Thus 𝑝𝑗 𝑝𝑚+1 12𝑃 (by the Lemma above).

The rest of the proof is very similar to that of the previous one. 𝑃𝑖 𝑝𝑗 +𝑝𝑗 𝑃 +12𝑃 =32𝑃. Thus, option 1 is correct.

Which of the following statements is NOT true about the generic local search algorithm?

  • The generic local search algorithm is guaranteed to eventually converge to an optimal solution.
  • The output of the generic local search algorithm generally depends on the choice of the starting point.
  • The output of the generic local search algorithm generally depends on the choice of the superior neighboring solution to move to next (in an iteration where there are multiple such solutions).
  • In some cases, the generic local search algorithm can require an exponential number of iterations before terminating.

ANSWER: Option 1 is false. Local search is guaranteed to eventually converge to an locally optimal solution, which at times, can be quite far from a globally optimal solution.

Options 2, 3 and 4 is correct; see lecture video 18.4.

Suppose we apply local search to the minimum cut problem. Given an undirected graph, we begin with an arbitrary cut (A,B). We check if there is a vertex v such that switching v from one group to the other would strictly decrease the number of edges crossing the cut. (Also, we disallow vertex switches that would cause A or B to become empty.) If there is such a vertex, we switch it from one group to the other; if there are many such vertices, we pick one arbitrarily to switch. If there are no such vertices, then we return the current locally optimal cut (A,B). Which of the following statements is true about this local search algorithm?

  • This local search algorithm is guaranteed to terminate in a polynomial number of iterations.
  • This local search algorithm is guaranteed to compute a minimum cut.
  • This local search algorithm is guaranteed to compute a cut for which the number of crossing edges is at most twice the minimum possible.
  • If this local search algorithm is guaranteed to terminate in a polynomial number of iterations, it would immediately imply P = NP.

ANSWER: Option 1 is true. Every iteration strictly decreases the number of crossing edges, so there can be only 𝑚 iterations.

Options 2 and 3 are false.

Option 4 is false. local search is guaranteed to eventually converge to an locally optimal solution, whereas P = NP implies we can find a globally optimal solution in polynomial time.

In the maximum 𝑘-cut problem, the input is an undirected graph 𝐺 =(𝑉,𝐸), and each edge has a nonnegative weight 𝑤𝑒. The goal is to partition 𝑉 into 𝑘 non-empty pieces 𝐴1,,𝐴𝐾 to maximize the total weight of the cut edges (i.e., edges with endpoints in different 𝐴𝑖’s). The maximum cut problem (as studied in lecture) corresponds to the special case where 𝑘 =2.

Consider applying local search to the maximum 𝑘-cut problem: (i) start with an arbitrary 𝑘-cut; (ii) repeat: if possible, move a vertex from one set 𝐴𝑖 to another set 𝐴𝑗 so as to strictly increase the total weight of the cut edges; (iii) once no such move is possible, halt.

Consider the following statement: for every instance of the maximum 𝑘-cut problem, every local maximum has objective function value at least 𝑓(𝑘) times that of the maximum possible. Which of the following is the biggest choice of the function 𝑓(𝑘) for which this statement is true?

  • 12
  • 1 1𝑘(𝑘1)
  • 1𝑘
  • 𝑘1𝑘

ANSWER: This paper presents a local search solution that achieves a lower bound no smaller than 1 1𝑘 of the optimal solution. It appears that none of the given options are correct.

Suppose 𝑋 is a random variable that has expected value 1. What is the probability that 𝑋 is 2 or larger? (Choose the strongest statement that is guaranteed to be true.)

  • At most 100%
  • At most 25%
  • 0
  • At most 50%

ANSWER: Consider the following probability distribution for 𝑋:

𝑃(𝑋)={{0.9 if 𝑋=20.1 if 𝑋=10 otherwise

Then 𝐸(𝑋) =1 yet 𝑃(𝑋 2) =0.9. This counterexample eliminates the second, third, and fourth options. So, option 1 is correct.

Suppose 𝑋 is a random variable that is always nonnegative and has expected value 1. What is the probability that 𝑋 is 2 or larger? (Choose the strongest statement that is guaranteed to be true.)

  • At most 100%
  • At most 25%
  • 0
  • At most 50%

ANSWER: Using Markov’s Inequality, we have:

𝑃(𝑋2)𝐸[𝑋]2=12

Therefore, option 4 is correct.

Leave a comment