NHacker Next
login
▲How randomness improves algorithms (2023)quantamagazine.org
50 points by kehiy 3 days ago | 20 comments
Loading comments...
EdwardCoffin 9 hours ago [-]
It's unmentioned in the article, but Trevor Blackwell's PhD thesis, Applications of Randomness in System Performance Measurement [1] was advocating this in 1998:

This thesis presents and analyzes a simple principle for building systems: that there should be a random component in all arbitrary decisions. If no randomness is used, system performance can vary widely and unpredictably due to small changes in the system workload or configuration. This makes measurements hard to reproduce and less meaningful as predictors of performance that could be expected in similar situations.

[1] https://tlb.org/docs/thesis.pdf

hinkley 4 hours ago [-]
All else being equal, I like to have either a prime number of servers or a prime number of inflight requests per server. I’m always slightly afraid someone is going to send a batch of requests or tune a benchmark to be run a number of time that divides exactly evenly into the system parallelism and we won’t be testing what we think we are testing due to accidental locality of reference that doesn’t show up in the general population. Not unlike how you get uneven gear wear if you mesh two gears that have a large common denominator of tooth count, like a ratio of 3:1 or 2:3, so the same teeth keep meeting all the time.

But all else is seldom equal and Random 2 works as well or better.

hinkley 4 hours ago [-]
If you squint a bit, many of the optimizations for quicksort are essentially a very short Monte Carlo simulation.

Randomness seems to help more with intentionally belligerent users than anything else. There is no worst pattern because the same question twice yields a different result. For internal use that can help a little with batch processing, because the last time I did that seriously I found a big problem with clustering of outlier workloads. One team of users had 8-10x the cost of any other team. But since this work involved populating several caches, I sorted by user instead of team and that sorted it. Also meant that one request for team A was more likely to arrive after another request had populated the team data in cache instead of two of them concurrently asking the same questions. So it not only smoothed out server load it also reduced the overall read intensity a bit. But shuffling the data worked almost as well and took,hours instead of days.

jvanderbot 4 hours ago [-]
Adversary input is basically at the core of analysis of these types of algorithms, yes.

Here's another crossover analogy. In games, for any deterministic strategy there are games where you can't play a deterministic (pure) strategy and hope to get the best outcome, you need randomized (mixed) ones. Otherwise the Adversary could have anticipated and generated a different pathological response.

hinkley 4 hours ago [-]
I think this started though with systems doing incremental work and then passing the result on to an algorithm that does worse with partially sorted or inverted lists than with random ones. Adversarial came later, particularly with the spread of the Internet.
jvanderbot 3 hours ago [-]
I assure you adversarial analysis predates the internet
sestep 9 hours ago [-]
Could the question mark in the HN version of the title be removed? It makes it read as a bit silly.
k_g_b_ 8 hours ago [-]
In my experience it's a common mistake of non-native English speakers, of native speakers of Slavic languages in particular. I see it often at work with titles starting with an interrogative word like "how".
pixelpoet 8 hours ago [-]
Guaranteed this is the case, I see it a lot too. They've done it twice before on previous submissions: https://news.ycombinator.com/item?id=44755116 and https://news.ycombinator.com/item?id=44785347

In case anyone is curious, the way to phrase it as a question would be, "How does randomness improve algorithms?"

jvanderbot 5 hours ago [-]
Weirdly, "Why randomness improves Algorithms." Is closer to the truth and also cannot be expressed correctly with a question mark.
prerok 31 minutes ago [-]
Indeed. Well, FWIW, the title translated into my native Slavic language would also make no sense with a question mark.

What's interesting is that both How... and How does... would translate into the same words but with a dot or a question mark at the end it would mean two different things.

That said, that would be true for many languages.

egypturnash 6 hours ago [-]
It’s not there in the original title.
optimalsolver 8 hours ago [-]
Written by a shiba inu
flerovium114 9 hours ago [-]
[dead]
furyofantares 5 hours ago [-]
I don't think 'random' is doing any of the work. These sound like they would work fine with a deterministic PRNG seeded at 0. They don't sound like they need to be looking at lava lamps or the like.

It's that there's a population of values (integers for factoring, nodes-to-delete for the graph) where we know a way to get a lot of information cheaply from most values, but we don't know which values, so we sample them.

Which isn't to say the PRNG isn't doing work - maybe it is, maybe any straightforward iteration through the sample space has problems, failure values being clumped together, or similar values providing overlapping information.

If so that suggests to me that you can do better sampling than PRNG, although maybe the benefit is small. When the article talks about 'derandomizing' an algorithm, is it referring to removing the concept of sampling from this space entirely, or is it talking about doing a better job sampling than 'random'?

jvanderbot 5 hours ago [-]
I don't follow the question.

A pseudo random sequence of choices is still sufficiently detached from the input. Random here means "I'm making a decision in a way that is independent from the input sufficiently so that structuring the input adversarially won't cause worst case performance." Coupled with "the cost of this algorithm is expressed assuming real random numbers".

That's the work Random is doing.

INB4 worst case: you can do worst case analysis on randomized analysis but it's either worst case across any choice or worst case in expectation, not worst case given a poor implementation of RNG, effectively randomization sometimes serves to shake you out of an increasingly niche and unlikely series of bad decisions that is the crux of an adversarial input.

To wit

> In the rare cases where the algorithm makes an unlucky choice and gets bogged down at the last step, they could just stop and run it again.

furyofantares 5 hours ago [-]
I think "How randomness improves algorithms" makes it sound kind of mystical and paradoxical.

Instead we could say "we know shortcuts that work for most values in a population, but we don't know how to figure out which values without expensive computation" and that's not very mystical or paradoxical.

And using random sampling is now the obvious way to deal with a situation. But it's not random doing the work, that's an implementation detail of how you do the sampling.

It very well can be that there isn't another obvious way to iterate through the sample space that isn't a disaster. Maybe bad values are clumped together so when you retry on failure you retry a lot. But if that's the case, there might also be a better way to sample than random - if bad values are clumped then there may be an intelligent way to sample such that you hit two bad values in a row less often than random.

My question is if that's what's being referred to as 'derandomizing' - taking one of these algorithms that uses random sampling and sample more intelligently. Or if they instead mean using what they learned from the (so-called) probabilistic algorithm to go back to a more traditional form.

jvanderbot 5 hours ago [-]
Well in fact we know (if P! = NP) that for any randomized ALG there's a good deterministic one. So its not going do the kind of game changing class breaking work you're looking for. You know, where random is the only way.

How randomization helps is by making it much easier to design algorithms. E.g. Verifying a solution is cheap, so proving your random choice is in some class of good choices and making an algorithm that turns that into a solution is still an interesting approach and opens up solutions to things we cant yet solve.

Derandomizing as presented apparently means proving for a few random but careful choices you're in the funnel towards the right solution, or (hopefully) can detect otherwise, and a few transformations or reasonably performant steps produce your solution.

furyofantares 4 hours ago [-]
I'm not looking for any kind of game changing class breaking work. I don't think I'm being understood here at all.

I must have put too much emphasis on PRNG or on maybe there's a more intelligent way to sample - you can ignore those comments. I just think it's framed poorly to make it sound more counterintuitive than it is.

lifeinthevoid 4 hours ago [-]
> Well in fact we know (if P! = NP) that for any randomized ALG there's a good deterministic one

Oh, that's cool, do you have a reference for that?

jvanderbot 4 hours ago [-]
TFA (edit: in the politest way)