### Randomly not so random

In a random sequence, all sequences are equally likely, even not so random ones.

Random random = new Random(441287210); for(int i=0;i<10;i++) System.out.print(random.nextInt(10)+" "); }prints

1 1 1 1 1 1 1 1 1 1and

Random random = new Random(-6732303926L); for(int i=0;i<10;i++) System.out.println(random.nextInt(10)+" "); }prints

0 1 2 3 4 5 6 7 8 9Lastly

public static void main(String ... args) { System.out.println(randomString(-229985452)+' '+randomString(-147909649)); } public static String randomString(int seed) { Random rand = new Random(seed); StringBuilder sb = new StringBuilder(); for(int i=0;;i++) { int n = rand.nextInt(27); if (n == 0) break; sb.append((char) ('`' + n)); } return sb.toString(); }prints

hello world

How do you find out the appropriate numbers for the result?

ReplyDeleteLol. Actually the period is only like 4 billion. You can brute force pretty much any sequence match to see if it exists in O(4Billion*k) where k is the size of the k-mer.

ReplyDeleteIf you have an English dictionary you can index the whole thing into the rand generator in O(4Billion * log(DictionarySize)). What would be uber-cool is come up with a compression scheme that takes generates k-mers of a string, then matches against the entire Rand() sequence; then replace parts of the string with pairs of Rand() seeds and generator lengths.

Actually how many GB is the entire period? 2^32*32 bits (assuming the back end of your PRNG is hitting every number exactly once) == 16 GB.

ReplyDeleteMultiply that by a factor of 4 and add a smidge to overlap a bit from back to front; and it is 100GB to store the suffix array for the entire period.

This is sexy! I'm going to figure this out

ReplyDeleteFunny, however one should expect that for every 1E9:th seed, all the ten digits would be the same. Thus, it is likely that there are about 18446744073 such long seeds and "441287210" is just one of them...

ReplyDelete