Sunday, January 15, 2017

Stuff I'm reading, MLK Day edition

My, how time races by. I've been quite busy and not getting enough time to read.

Is there ever enough time to read?

  • CIDR 2017
    The biennial Conference on Innovative Data Systems Research (CIDR) is a systems-oriented conference, complementary in its mission to the mainstream database conferences like SIGMOD and VLDB, emphasizing the systems architecture perspective. CIDR gathers researchers and practitioners from both academia and industry to discuss the latest innovative and visionary ideas in the field.
  • Optimizing Space Amplification in RocksDB
    RocksDB is an embedded, high-performance, persistent keyvalue storage engine developed at Facebook. Much of our current focus in developing and configuring RocksDB is to give priority to resource efficiency instead of giving priority to the more standard performance metrics, such as response time latency and throughput, as long as the latter remain acceptable. In particular, we optimize space efficiency while ensuring read and write latencies meet service-level requirements for the intended workloads. This choice is motivated by the fact that storage space is most often the primary bottleneck when using Flash SSDs under typical production workloads at Facebook. RocksDB uses log-structured merge trees to obtain significant space efficiency and better write throughput while achieving acceptable read performance.
  • How and why the leap second affected Cloudflare DNS
    The root cause of the bug that affected our DNS service was the belief that time cannot go backwards. In our case, some code assumed that the difference between two times would always be, at worst, zero.
  • The Road to 2 Million Websocket Connections in Phoenix
    2 million is a figure we are pleased with. However, we did not quite max out the machine and we have not yet made any effort toward reducing the memory usage of each socket handler. In addition, there are more benchmarks we will be performing. This particular set of benchmarks was set exclusively around the number of simultaneous open sockets. A chat room with 2 million users is awesome, especially when the messages are broadcast so quickly. This is not a typical use case though.
  • Adaptive logging: optimizing logging and recovery costs in distributed in-memory databases
    This is a paper about the trade-offs between transaction throughput and database recovery time. Intuitively for example, you can do a little more work on each transaction (lowering throughput) in order to reduce the time it takes to recover in the event of failure. Recovery is based on information in logs, classically an ARIES-style write-ahead log, that records the values of data items.

    In the case of in-memory databases, you can also go the other way, and do a little less work when creating the logs (recording information for use in recovery) at the expense of longer recovery times, but gaining higher throughput. We can simplify recovery on the assumption that there is no need to undo the effects of uncommitted transactions – these existed solely in-memory and had not yet been persisted to disk.

  • Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads
    The idea behind this research is to provide an honest comparison for the two popular RDBMSs. Sveta and Alexander wanted to test the most recent versions of both MySQL and PostgreSQL with the same tool, under the same challenging workloads and using the same configuration parameters (where possible). However, because both PostgreSQL and MySQL ecosystems evolved independently, with standard testing tools (pgbench and SysBench) used for each database, it wasn’t an easy journey.
  • Bitpacking and Compression of Sparse Datasets
    It turns out that gzipping after bitpacking yields a 1000x compression. Even on its highest compression settings, gzip was leaving a 8x compression on the table when applied to the raw data. It turns out that if you know the structure of your own data, you can very easily do much, much better than a generic compression algorithm. -- on both speed and compression.
  • The Real Reason Your City Has No Money
    All of the programs and incentives put in place by the federal and state governments to induce higher levels of growth by building more infrastructure has made the city of Lafayette functionally insolvent. Lafayette has collectively made more promises than it can keep and it's not even close. If they operated on accrual accounting -- where you account for your long term liabilities -- instead of a cash basis -- where you don't -- they would have been bankrupt decades ago. This is a pattern we see in every city we've examined. It is a byproduct of the American pattern of development we adopted everywhere after World War II.
  • Software Copyright Litigation After Oracle v. Google
    Oracle America has factored into at least four cases so far. One of these cases settled, one is on appeal, and the other two likely will be appealed in the near future. The latter two cases also involve patent claims, so appeals will be heard by the CAFC. (The CAFC has nearly exclusive appellate jurisdiction over cases with patent claims.) One can assume that the plaintiffs added the patent claims to ensure CAFC jurisdiction.
  • This is Fine: Engineering War Stories (and What We Learned) in 2016
    In the past an engineer would be tasked with a project, crawl into a dark hole, and come out days, weeks or months later clutching their precious code. Sometimes this worked out really well but other times it was disastrous.
  • The Inside Story of BitTorrent’s Bizarre Collapse
    sometimes technologies are not products. And they’re not companies. They’re just damn good technologies.
  • “Side Hustle” as a Sign of the Apocalypse
    And WTF has happened to our culture when we just take it as fact that everyone needs to have multiple jobs and work as a cab driver and rent out every square inch of space in their apartment and be a task rabbit gopher who waits in line for tickets when they’re not walking dogs or temping and we all just chalk it up to “progress”??? In the old days, this meant your life was falling apart. Now it just means you’re part of “the sharing economy.”
  • The Chemistry Behind Your Home’s Water Supply
    We take for granted the water that comes out of the taps in our home when we turn them on – but a lot of work goes into getting it there. Chemistry, too, has a hand in making sure that the water is safe to drink. Here, we take a look at the water treatment process, and in particular the chemicals used to get clean drinking water to your tap.

No comments:

Post a Comment