The Paperwork Explosion

YouTube embed

Machines should work; people should think.

The Paperwork Explosion, a trippy marketing film for IBM by Jim Henson

C.f. David Graeber - Of Flying Cars and the Declining Rate of Profit:

If we do not notice that we live in a bureaucratic society, that is because bureaucratic norms and practices have become so all-pervasive that we cannot see them, or, worse, cannot imagine doing things any other way.

Computers have played a crucial role in this narrowing of our social imaginations. Just as the invention of new forms of industrial automation in the eighteenth and nineteenth centuries had the paradoxical effect of turning more and more of the world’s population into full-time industrial workers, so has all the software designed to save us from administrative responsibilities turned us into part- or full-time administrators. In the same way that university professors seem to feel it is inevitable they will spend more of their time managing grants, so affluent housewives simply accept that they will spend weeks every year filling out forty-page online forms to get their children into grade schools. We all spend increasing amounts of time punching passwords into our phones to manage bank and credit accounts and learning how to perform jobs once performed by travel agents, brokers, and accountants.

Graeber further elaborates this theme in his book The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy. The book is really just a collection of essays and doesn’t totally hold together, but it’s worth a read.

Ideally, the amount of bureaucracy in the world pre- and post- computer should have been the same but completed more quickly in the computerized world. In reality, however, computers made it practical to centralize the management of things that had been handled informally before. Theoretically, this is good because one innovation in the center can be effortlessly distributed to the periphery, but this benefit comes with the Hayekian cost that the periphery is closer to the ground truth than the center, and there may not be sufficient institutional incentives to transmit the information to the center effectively. The result is a blockage that the center tries to solve by mandating that an ever increasing number of reports be sent to it: a paperwork explosion.

Share memory by communicating

If you’ve learned about the Go programming language at all, you’ve probably come across the koan, “Don’t communicate by sharing memory; share memory by communicating.” It’s a snappy little bit of chiasmus, but what does it actually mean? The natural inclination is to say, “It means ‘channels good; mutexes bad.’ ” Certainly, that’s not too far off the mark as a first order approximation of its meaning. But it’s actually a bit deeper than that.

Read more…

What Happens Next Will Amaze You

Maciej Cegłowski - What Happens Next Will Amaze You

Another great presentation by Maciej Cegłowski. This one is interesting because he has six concrete legal proposals for the internet:

  1. Right To Download
  2. Right To Delete
  3. Limits on Behavioral Data Collection
  4. Right to Go Offline
  5. Ban on Third-Party Advertising
  6. Privacy Promises

I think these ideas are great, and politicians should start trying to implement them in law.

(A seventh proposal that is only needed in the US is that sales tax law should be uniform for stores online, since they no longer need the weird, special tax break due to unenforceable collection rules that they have for some reason.)

Also worth thing about is his section on the importance of not giving up hope:

It’s easy to get really depressed at all this. It’s important that we not let ourselves lose heart.

If you’re over a certain age, you’ll remember what it was like when every place in the world was full of cigarette smoke. Airplanes, cafes, trains, private offices, bars, even your doctor’s waiting room—it all smelled like an ashtray. Today we live in a world where you can go for weeks without smelling a cigarette if you don’t care to.

The people in 1973 were no more happy to live in that smoky world than we would be, but changing it seemed unachievable. Big Tobacco was a seemingly invincible opponent. Getting the right to breathe clean air back required a combination of social pressure, legal action, activism, regulation, and patience.

It took a long time to establish that environmental smoke exposure was harmful, and even longer to translate this into law and policy. We had to believe in our capacity to make these changes happen for a long time before we could enjoy the results.

I use this analogy because the harmful aspects of surveillance have a long gestation period, just like the harmful effects of smoking, and reformers face the same kind of well-funded resistance. That doesn’t mean we can’t win. But it does mean we have to fight.

Pessimism is a kind of a luxury enjoyed by those who know that they won’t be hurt as deeply by the entrenchment of the unacceptable status quo. Let’s not give up on the internet yet.

Source: idlewords.com

How to scrape an old PHP (or whatever) site with wget for use in Nginx

If you’re like me, in your youth you once made websites with PHP that have uncool URLs like /index.php?seemed-like-a-good-idea=at-the-time. Well, time has passed and now you want to stop using Apache, MySQL, and PHP on your LAMP server, but you also don’t want to just drop your old website entirely off the face of the internet. How can you migrate your old pages to Nginx?

The simple solution is to use wget. It’s easy to install on pretty much any platform. (On OS X, try installing it with homebrew.) But there are a few subtleties to using it. You want to keep your ugly old URLs with ? in them working, even though you don’t want them to be dynamically created from a database any more. You also want to make sure Nginx serves your pages with the proper mime-type of text/html because if the mime-type is set incorrectly, browsers will end up downloading your pages instead of displaying them.

Here’s what to do.

First, use FTP or whatever to copy the existing site onto your local machine. (These are old sites, right? So you don’t have version control do you? 😓) This step is to ensure you have all the images, CSS files, and other assets that were no doubt haphazardly scattered throughout your old site.

Next, go through and delete all the *.php files and any hidden .whatever files, so Nginx doesn’t end up accidentally serving up a file that contains your old Yahoo mail password from ten years ago or something because it seemed like a good idea to save your password in plaintext at the time.

Now, cd into the same directory as your copy of the files on the server and use this command with wget to add scraped copies of your dynamic pages:

    wget \
         --recursive \
         --no-clobber \
         --page-requisites \
         --html-extension \
         --domains example.com \
         --no-parent \
             www.example.com/foobar/index.php/

Here’s what the flags mean:

  • --recursive is so you scrape all the pages that can be reached from the page you start with.
  • --no-clobber means you won’t replace the files you just fetched off the server.
  • --page-requisites is somewhat redundant, but it will fetch any asset files you may have missed in your copy from the server.
  • --html-extension is a bit of a wrinkle: it saves all the files it fetches with a .html extension. This is so that Nginx will know to serve your pages with the correct mimetype.
  • --domains example.com and --no-parent are so you only scrape a portion of the site that you want to scrape. In this case, the root of example.com would be left alone. Your case may be different.
  • The final argument is the address of the page to start fetching from.

wget will save these pages with two wrinkles that you’ll need to tell Nginx about. First, as mentioned, Nginx needs to know to ignore the .html on the end of the file names. Second, you’ll need to be able to serve up URLs with ? in the file name. To do both of those things, in Nginx, add this directive to the server block for your new thingie try_files $uri $uri/index.html $request_uri.html =404;. try_files tells Nginx to try multiple files when serving a URL in the order specified. $uri is the plain URL (e.g. for your CSS/JS/image assets), $uri/index.html serves up index pages, which wget will create whenever a URL ends in a slash. $request_uri.html serves up files including ? in the middle with a final .html as was appended by wget.

Here’s a minimally complete Nginx configuration example:

    http {

        server {
            server_name www.example.com;
            server_name  example.com;

            listen 80;

            root /path/to/sites/example;
            error_page 404 403 /404.html;
            try_files $uri $uri/index.html $request_uri.html =404;
        }
    }

See the Nginx HTTP server boilerplate configs project for a complete example. (Note that this example assumes you have a 404.html to serve up for missing pages.)

Programming is pure logic

Programming is pure logic. It won’t work if there’s an inconsistency in the logic. The errors aren’t produced inside the system. They are produced from the outside. If the system doesn’t work, it’s definitely your fault.

The funny thing is that every programmer thinks his logic will work when they finish coding a program. It never does, but at that moment, everyone believes there’s no error in the logic they have written, and confidently hits the “Enter” key.

— Satoru Iwata in Hobo Mainichi with Shigesato Itoi and Shigeru Miyamoto

Source: 1101.com

What Do We Save When We Save the Internet?

Think about regret as if it were sin. Some regrets are mild, but acute. The regret associated with choosing the wrong supermarket checkout lane, or buying an outfit that you notice goes on sale the next week—these seem woeful. They chafe, but their pains are pin pricks that soon subside. These are venial regrets.

Regret is more severe when it steeps in sorrow rather than in misadventure, when it becomes chronic—mortal rather than venial. But counter-intuitively, mortal regrets are less noticeable than venial ones, because they burn slow and low instead of hot and fast: the regret of overwork and its deleterious effects on family. The regret of sloth in the face of opportunity. The regret of acquiescence to one’s own temperament.

Mortal regrets are tender, and touched inadvertently they explode with affective shrapnel. Venial regrets shout, “alas!” but mortal regrets whisper, “if only.”

Ian Bogost - What Do We Save When We Save the Internet?

Source: The Atlantic

On using Go channels like Python generators

I really like Go, and I really like Python, so it’s natural that I apply patterns I’ve learned in Python to Go… even when it’s maybe not appropriate.

For example, in Python, it’s common to use generators to process a stream of data. Take this simple (unrealistic, inefficient) odd number generator:

>>> def odds(from_, to):
...     for n in range(from_, to):
...         if n % 2:
...             yield n
... 
>>> print(*odds(0, 10), sep=", ")
1, 3, 5, 7, 9

How would you do something like that in Go? The natural approach is to use channels. Go’s channels are a language primitive that allows you to send values from one light-weight green-ish thread (“go-routine”) to another. Many other languages would make these into methods in a standard library threading package, but since they’re language primitives in Go, they get used more often, even in simple programs.

Here is an incorrect first attempt at simulating generators using channels:

package main

import "fmt"

func Odds(from, to int, c chan int) {
    if from%2 == 0 {
        from += 1
    }
    for i := from; i < to; i += 2 {
        c <- i
    }
}

func main() {
    c := make(chan int)
    go Odds(0, 10, c)
    for v := range c {
        fmt.Println(v)
    }
}

This program ends with fatal error: all goroutines are asleep - deadlock! because the loop in for v := range c never quits. It never quits because the channel is never closed. An even worse version of this would be one where go Odds(0, 10, c) was just Odds(0, 10, c). That ends with a deadlock before even printing anything.

What is a non-terrible version of this code? We should move more of the responsibility into the Odds function. The Odds function should be in charge of creating the channel, spawning a go-routine, and closing the channel when the go-routine is done, so that the caller can be freed of that responsibility.

Here’s what that looks like:

package main

import "fmt"

func Odds(from, to int) <-chan int {
    c := make(chan int)
    go func() {
        defer close(c)
        if from%2 == 0 {
            from += 1
        }
        for i := from; i < to; i += 2 {
            c <- i
        }
    }()
    return c
}

func main() {
    for v := range Odds(0, 10) {
        fmt.Println(v)
    }
}

There are a couple of thing to notice here. First, the caller doesn’t need to use the go statement because the Odds function now has an anonymous function inside of it that it uses to spawn a go-routine. (This use of an anonymous function looks a bit like Javascript to me.)

Second, the caller doesn’t need to create its own channel to give to our pseudo-generator. Not only that, since the type of the channel is <-chan int, it’s perfectly clear that the channel is only for receiving ints, not for sending, and this is enforced at compile time.

This leads to the most important point: that it doesn’t deadlock at the end! That’s because the channel gets closed. Generally speaking, in Go the one doing the sending on a channel should also be the one to do the closing of the channel when it’s done sending. (Double closing a channel causes a runtime panic.) Since the Odds function returns a receive-only channel, the compiler actually enforces that the close cannot be done in the main function.

But notice how it gets closed, with defer close(c). In Python, the with statement is used for resource management things like remembering to close files and sockets. In Go, defer is used to ensure that a certain command happens whenever a certain function ends (whether in a normal return or an exception-like panic). As a convention, it’s a good idea to defer the close of something as soon as it’s applicable.

So far so good, but what about this scenario:

func main() {
    for v := range Odds(0, 10) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            return
        }
    }
}

The channel returned by Odds wasn’t exhausted, so it was never closed. Since the function returning was main, this isn’t a big deal in this case, but this is actually a somewhat subtle bug that can affect otherwise correct-looking code. It’s a memory leak, or more specifically a go-routine leak. Python and Go are both garbage collected, but there’s a subtle difference here. In Python if I write gen = odds(0, 10) and then next(gen) the resulting generator will be cleaned up shortly after gen goes out of scope. Similarly, in Go the channel created by c := Odds(0, 10) will be garbage collected once all references to it go away. Go-routines, however, only go “out of scope” when they return or panic, since it can have side-effects independent of whether anything in the main thread has a reference to it. (E.g.)

The standard way to prevent go-routine leaks is through the use of “quit channels.” However, this looks like it’s just going to re-introduce a lot of the complexity that we got rid from the first/incorrect example. We need to make up a new channel to pass around, figure out what value to send as a quit signal, etc.

A quit channel is more complicated than just letting a go-routine leak or run to exhaustion, but there are a few tricks we can take advantage of to make it less annoying:

package main

import "fmt"

func Odds(from, to int, quit <-chan struct{}) <-chan int {
    c := make(chan int)
    go func() {
        defer close(c)
        if from%2 == 0 {
            from += 1
        }
        for i := from; i < to; i += 2 {
            select {
            case <-quit:
                return
            case c <- i:
            }
        }
    }()
    return c
}

func main() {
    quit := make(chan struct{})
    defer close(quit)
    for v := range Odds(0, 10, quit) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            return
        }
    }
}

First of all, what is <-chan struct{}? type struct {} is a type for hanging methods off of with no associated data. chan struct{} is a channel that passes no data. You could also write chan bool but that has two potential values, true or false. struct{} lets your readers know, “There’s no data being passed here.” The only reason for the channel to exist is to synchronize things. No other information is being sent. The arrow in front of <-chan struct{} further specifies that the caller is the one who will be in charge of the quit channel, not the function.

Next, select (not to be confused with switch, which also has cases) is a statement in Go that lets you send or receive from multiple channels at once. If multiple channels are ready simultaneously, the runtime will pseudo-randomly select just one to use, otherwise the first channel to be ready will fire. In this case, it means “either send the value of i on the output channel or return if the quit channel has something.”

When you close a channel, from then on anyone listening to that channel receives the blank value for that type (0 for int, "" for string, etc.) Instead of using close, you could also send an empty struct:

func main() {
    quit := make(chan struct{})
    for v := range Odds(0, 10, quit) {
        fmt.Println(v)
        if v > 5 {
            fmt.Println("Tired of counting all day.")
            quit <- struct{}{}
        }
    }
}

close(quit) is less ugly than quit <- struct{}{}, but the real reason I think it’s better to use close gets at my next couple of points.

Why didn’t I put the make for the quit channel into Odds like I did for the chan int in the second example? It certainly could be done, but fundamentally a caller knows better than the function itself when the go-routine should quit, so the caller should have the responsibility. This lets us do shortcuts like this:

func main() {
    for v := range Odds(0, 10, nil) {
        fmt.Println(v)
    }
}

Since the caller knew the channel would be drained to exhaustion, it didn’t need to make a quit channel at all. In Go, nil channel never sends, but there’s nothing illegal or panic-inducing about listening to a nil channel, so Odds works just fine.

Furthermore, we can reuse the same quit channel for multiple functions. If we were sending the quit signal with quit <- struct{}{}, we’d have to send one message for each go-routine that reuses the same quit channel and needs to be closed. However because closed channels are always “sending” their blank value, we can stop all of the go-routines that are listening to the same quit channel by just doing close(quit) once.

Put it all together and this approach allows for more advanced (and Python generator-like) usages of channels. The second Project Euler problem is “By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.” Here is an over-engineered solution using Go’s channels:

package main

import "fmt"

func sendOrFail(n int64, out chan<- int64, quit <-chan struct{}) bool {
    select {
    case out <- n:
        return true
    case <-quit:
        return false
    }
}

func FibGen(quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        var last, current int64 = 1, 2

        if ok := sendOrFail(last, out, quit); !ok {
            return
        }
        for {
            if ok := sendOrFail(current, out, quit); !ok {
                return
            }
            last, current = current, last+current
        }
    }()
    return out
}

func Evens(in <-chan int64, quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        for n := range in {
            if n%2 == 0 {
                if ok := sendOrFail(n, out, quit); !ok {
                    return
                }
            }
        }
    }()
    return out
}

func TakeUntil(max int64, in <-chan int64, quit <-chan struct{}) <-chan int64 {
    out := make(chan int64)
    go func() {
        defer close(out)
        for n := range in {
            if n > max {
                return
            }
            if ok := sendOrFail(n, out, quit); !ok {
                return
            }
        }
    }()
    return out
}

func main() {
    var total int64

    quit := make(chan struct{})
    defer close(quit)
    for n := range Evens(TakeUntil(4000000, FibGen(quit), quit), quit) {
        total += n
    }
    fmt.Println("Total:", total)
}

Compare to similar Python code:

>>> def fibgen():
...     last, current = 1, 2
...     yield last
...     while True:
...         yield current
...         last, current = current, last + current
... 
>>> def take_until(max, gen):
...    for n in gen:
...       if n > max:
...           return
...       yield n
... 
>>> def evens(gen):
...    for n in gen:
...       if n%2 == 0:
...           yield n
... 
>>> sum(evens(take_until(4000000, fibgen())))
4613732

The Go version isn’t as simple as a basic approach using Python generators. It’s 73 lines versus only 20 lines of Python. (15 lines just for all of those closing }.) Still, Go has certain advantages, like being concurrent and type safe. In real code, you could imagine making and processing multiple network calls concurrently using techniques like this. The lowest level generator opens a socket to get some data from somewhere on the internet, then it pipes it results out to a chain of listeners that transform the data. In a case like that, Go’s easy concurrency could be an important advantage over Python.

In theory, it’s simple.

Laugh-Out-Loud Cats #2254

In theory, it’s simple.

  1. Identify problem.
  2. Identify tools available to fix problem.
  3. Apply tools to problem, fixing it (sometimes, if lucky).

In practice, however, I have a persistent habit of following this procedure:

  1. Become aware of a cool new tool.
  2. Apply tool to nothing, noodling around.
  3. Identify problem solved by tool (sometimes, if lucky).

I’m not the only one with this problem, of course. Indeed, the difference between human intelligence and machine intelligence is that humans have the ability to not just solve a problem but create new problems to solve. One can hardly imagine a history of music that doesn’t start with kids noodling on guitars and only occasionally becoming fortunate enough to turn into the Ramones or whatever. Still, in philosophy it can be particularly dangerous: “Here’s a wonderfully elegant theory. Let’s mutilate reality until it fits the theory!

Source: apelad.blogspot.com