I've been a fan of Bruce Schneier for well over a decade. That includes following his blog (and before that the Crypto-Gram newsletter), as well as reading many of his books. One theme that he has returned to again and again is that there are a lot of bad security products. In particular, one segment of Applied Cryptography always stands out in my mind:
The simple-XOR algorithm is really an embarrassment; it's nothing more than a Vigenère polyalphabetic cipher. It's here only because of its prevalence in commercial software packages.And now we have our first CUDA project! Over the next few weeks, we'll use the massively parallel single-instruction multiple-thread streaming multiprocessors in the NVIDA GPU to implement one of the worst encryption algorithms since ROT13. With luck, this will provide the world with a fresh supply of snake oil products for Bruce to induct into the doghouse. I'm excited.
-- Applied Cryptography, p14
As usual, let's start with the code:
/*
* snake_oil1.cu
*
* CUDA program to do bad encryption
*
* compile with: nvcc -o snake_oil1 snake_oil1.cu
*
* resultsovercoffee@gmail.com
*/
#include <iostream>
#include <string>
#include <new>
#include <cstdlib>
using namespace std;
// function prototype for our CUDA kernel
__global__ void badcrypto(char *, char *, char *);
int main(int argc, char *argv[]) {
string in("cleartxt");
string key("password");
int size = key.size();
dim3 grid(1,1);
dim3 block(size);
//declare, allocate, and populate input & key on the device
char *d_in;
if (cudaMalloc(&d_in,size) != cudaSuccess){
cerr << cudaGetErrorString(cudaGetLastError()) << endl;
exit(EXIT_FAILURE);
}
if (cudaMemcpy(d_in,in.c_str(),size,cudaMemcpyHostToDevice)
!= cudaSuccess) {
cerr << cudaGetErrorString(cudaGetLastError()) << endl;
exit(EXIT_FAILURE);
}
char *d_key;
if (cudaMalloc(&d_key,size) != cudaSuccess){
cerr << cudaGetErrorString(cudaGetLastError()) << endl;
exit(EXIT_FAILURE);
}
if (cudaMemcpy(d_key,key.c_str(),size,cudaMemcpyHostToDevice)
!= cudaSuccess) {
cerr << cudaGetErrorString(cudaGetLastError()) << endl;
exit(EXIT_FAILURE);
}
// declare, allocate and zero output array on device
char *d_out;
if (cudaMalloc(&d_out,size) != cudaSuccess){
cerr << cudaGetErrorString(cudaGetLastError()) << endl;
exit(EXIT_FAILURE);
}
if (cudaMemset(d_out, 0 ,size) != cudaSuccess){
cerr << cudaGetErrorString(cudaGetLastError()) << endl;
exit(EXIT_FAILURE);
}
//launch kernel on device to generate results
badcrypto<<<grid,block>>>(d_in, d_out, d_key);
//declare and allocate an output array on the host
char *h_out = new (nothrow)char[size];
if (h_out == 0) {
cerr << "host allocation failed" << endl;
exit(EXIT_FAILURE);
}
//copy the device buffer back to the host
if (cudaMemcpy(h_out,d_out,size,cudaMemcpyDeviceToHost)
!= cudaSuccess) {
cerr << cudaGetErrorString(cudaGetLastError()) << endl;
exit(EXIT_FAILURE);
}
//print results
for (int i=0;i<size;i++) {
cout << (int)h_out[i] << " " ;
}
cout << endl;
//deallocate arrays
cudaFree(d_in);
cudaFree(d_key);
cudaFree(d_out);
delete[] h_out;
//bye
exit(EXIT_SUCCESS);
}
/*
* badcrypto
*
* kernel that "encrypts" data using xor. Don't do this.
*/
__global__ void badcrypto(char *in, char *out, char *key) {
out[threadIdx.x] = in[threadIdx.x] ^ key[threadIdx.x];
}
Wow, so much fail there. This should look vaguely like last week's exercise. We're defining two strings (the cleartex input in and the key) copying them to the device, doing a bitwise XOR, then copying the results back. Currently, this isn't going to be very useful to purveyors of bad software -- the strings are statically defined and the kernel is launched as a single thread block (limiting us to 512 threads on some GPUs). Additionally, the cleartext and key need to be the same length -- that's clearly not acceptable since without the ability to encrypt large amounts of data with a short key an ISV might accidentally implement a one-time-pad.
That's enough of a start. My plan for next few weeks is to first turn this into a usable app, then do some light optimization to better map it to GPU hardware. Given the lightweight nature of bitwise XOR, we probably won't see cosmic gains in performance, but it should be fun anyway.
No comments:
Post a Comment