I've managed to put some order to my thoughts about interpretable models, how they're used in biological research, and why many published articles using machine learning fail to excite biologists in any meaningful way.


Genome sequencing data from consortia is staggeringly large. A recent release of the project released whole genome sequencing data from 15 708 samples.


If you have high quality whole genome sequencing data, say at 30X coverage, 1 person's entire genome, encoding only the base calls, would be ~ 1GB. With quality controls and other information, this can be ~ 3 GB of raw sequencing data.

That means this dataset would be ~ 46 TB of raw data, alone

@VictorVenema Fair point. I've seen those comics and related content about the decentralized tracing algorithms, like the DP3T protocol.


I have high hopes for those. And I understand that some of the permissions required by these apps are necessary, like accessing file storage to store the messages from nearby devices.

But access to email addresses, location, etc, are easy to go under the radar for most apps in this initial burst

> 16 of the 50 apps indicate that the user’s data will be made anonymous, encrypted and secured and will be transmitted online and reported only in an aggregated format.
> What is not clear is whether any of the data collected are protected by any laws or regulations such as the Health Insurance Portability and Accountability Act or electronic protected health information.

> In addition, some apps explicitly state that they will collect information about the person’s age, email address, phone number and postal code; the device’s location, unique device identifiers, mobile IP address and operating system; and the types of browsers used on the mobile device.

> We found that 30 of the 50 apps require permission for numerous types of access to users’ mobile devices. For example, some demand access to contacts, photos, media, files, location data, the camera, the device ID, call information, the WiFi connection, the microphone, full network access, the Google service configuration, and the ability to change network connectivity and audio settings, to name just a few types of access.

So it turns out that all those privacy advocates who were warning about COVID contact tracing apps being an easy way to breach people's privacy were totally right


I've had a bit of trouble installing some crates on my machine due to not having a C compiler properly configured.

So I wrote a tutorial on how to install everything you need to get and as a part of your Rust toolchain on Windows.


vd is the tool that I have been looking for for a long time


Got tabular data that you want to view in the terminal? This is the way to do it

