<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-37780335</id><updated>2011-04-21T21:15:07.183-07:00</updated><title type='text'>Compression code</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://ipcomp.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37780335/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://ipcomp.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Ian Parker</name><uri>http://www.blogger.com/profile/04935165705298640950</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://photos1.blogger.com/blogger/6837/844/1600/IanParker.0.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-37780335.post-116440483273172857</id><published>2006-11-24T13:46:00.000-08:00</published><updated>2006-12-07T06:32:04.293-08:00</updated><title type='text'></title><content type='html'>I feel this should be viewed in the context of the Hutter prize and it is important to examine how this will affect AI. The program given simply compresses a series of integers, although thw program is eventually intended to provode for compression based on constructing a probability table.&lt;br /&gt;&lt;br /&gt;The arrangement is in order of decreasing probability. Your first word in a string is simply the mean frequency of every word. The list for the second word is the probabilities BEARING IN MIND THAT THE FIRST WORD IS ******. I was assuming that the list was prepared using DCA. OK something else could be, my rationalizations like always feminine singular is the way I think. DCA if applied will bring out gender differences with eigenvctors containg la, el, los, las.&lt;br /&gt;DCA of course talks the language of probability, it does not attempt to parse. It might implicity parse under certain circumstances, in all European languages, with the exception of English there is gender and agreement. DCA will pick this up and could be described as a partial parsing.&lt;br /&gt;&lt;br /&gt;A diagram of what is happenning is given below.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://photos1.blogger.com/x/blogger/6837/844/1600/227517/Decode.jpg"&gt;&lt;img style="CURSOR: hand" alt="" src="http://photos1.blogger.com/x/blogger/6837/844/400/322090/Decode.jpg" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is the guts of it There are a few algorithms in here which are not really relevant. You can compile it into whatever system you wish. There are some I/O routines which the user will have to write for himself. They are fairly trivial.&lt;br /&gt;&lt;br /&gt;/* This program is designed to achieve the ultimate in compression. THere are codes like the Huffing code that lose very little&lt;br /&gt;entropy. This provides for the ultimate. It is a well known mathematical fact that the number of ways of arranging objects in holes&lt;br /&gt;is N!/(Pi(n(i)!(N-Sigman(i))!) This assumes that each site is occupied by 1 and 1 only object.&lt;br /&gt;The problem then is to enumerate each individual configuration. If each configuration can be numbered compression has been achieved.&lt;br /&gt;Consider N!/(n!(N-n)!) If we can evaluate that expression the complete expression can be evaluated. For we can work out the&lt;br /&gt;configuration for n(0), We are now left with N-n(0) places. Then n(1), N-n(0) and so on. The algorithm used is the following:-&lt;br /&gt;If we take the n[0]th poaition of number n[0] and place it somewhere we know that the number of arrangements of the remaining&lt;br /&gt;n[0]-1 will be P!/((n[0]-1)!(P-1-n[0])!)&lt;br /&gt;This gives us a scheme of numbering. If we calculate Sigma(P!/((n[0]-1)!(P-1-n[0])!)) as ball -&gt; P position we have a numbering scheme.&lt;br /&gt;*/&lt;br /&gt;class Factorial{&lt;br /&gt;/* This class is the class of factorials. arr starts off with 1,2,3.....n It is created in this form by Factorial(int).&lt;br /&gt;divide divides by an integer. It does this by finding the Highest common factor using Euclid's algorithm, and divides&lt;br /&gt;repeatedly until the number you are divising by becomes equal to 1. If division is by a factorial the numbers are taken&lt;br /&gt;in turn. In this way we get N!/(n!(N-n)!) in is simplest form.&lt;br /&gt;*/&lt;br /&gt;public:&lt;br /&gt;int n;//The number of numbers in the factorial&lt;br /&gt;int *arr;&lt;br /&gt;Factorial(int);//sets up factorial. Initially the numbers are sequential&lt;br /&gt;void divide(Factorial);//This divides by another factorial. It does this to limit the size of the numbers.&lt;br /&gt;void divide(int);//This divides by a single integer&lt;br /&gt;};&lt;br /&gt;class Big{&lt;br /&gt;/* This is the Big integer class which ios the guts of the program.&lt;br /&gt;n is the number of I*4 wordsin the integer&lt;br /&gt;*arr is the array of numbers&lt;br /&gt;We need to be able to perform the basic arithmetrical operations and in addition needs to have a remainder when dividing by a big&lt;br /&gt;integer, Basically we use these remainders are the numbers of N!/(n!(N-n)!). To make division easier the big integer we are&lt;br /&gt;dividing by is presented as a series of I84 integers. We divide by each in turn.&lt;br /&gt;*/&lt;br /&gt;public:&lt;br /&gt;int n;&lt;br /&gt;int *arr;&lt;br /&gt;void add(int);//Add I*4 integer&lt;br /&gt;void add(Big);//Add another&lt;br /&gt;void sub(Big);//Subtract big integer&lt;br /&gt;void mul(int);//Multiply by I*4 integer&lt;br /&gt;void mul(Big);//Muliply by big integer. In normal form, not in the special form for division.&lt;br /&gt;int div(int);//Divide by I*4 inter&lt;br /&gt;Big div(Big);//Big integer division&lt;br /&gt;Big(int);//create a big integer of length n&lt;br /&gt;Big();//create a big integer without arrays&lt;br /&gt;Big(int*,int);//Create a big integer from the form for division&lt;br /&gt;Big(Factorial);//Convert a factorial into a big integer&lt;br /&gt;void wr(CFile*);//Write big integer onto file&lt;br /&gt;void rd(CFile*);//read big integer from file&lt;br /&gt;int bits();&lt;br /&gt;boolean operator&lt;(Big a);//These are simply comparison operators boolean operator&gt;(Big a);&lt;br /&gt;};&lt;br /&gt;class Compress{&lt;br /&gt;/* This class contains the complete compressed information - sufficient for loss free compression.&lt;br /&gt;*/&lt;br /&gt;public:&lt;br /&gt;int n,max;&lt;br /&gt;// n is the number of differnt numbers in the string.&lt;br /&gt;// max is the highest number in the string, not necessarily the same as the above.&lt;br /&gt;int *fact,*trans;&lt;br /&gt;// fact contains the number of numbers in each non zero category&lt;br /&gt;// trans is a table for translating each number into its value.&lt;br /&gt;Big *b;//The big integer&lt;br /&gt;void ecode(int*,int);//Encode&lt;br /&gt;void rd(CFile*);//read and write files&lt;br /&gt;void wr(CFile*);&lt;br /&gt;Compress();&lt;br /&gt;int* dcode(int* n);//decode&lt;br /&gt;};&lt;br /&gt;class Entry{&lt;br /&gt;public:&lt;br /&gt;CString s;&lt;br /&gt;int q;&lt;br /&gt;unsigned int ind;&lt;br /&gt;};&lt;br /&gt;class HashArray{&lt;br /&gt;public:&lt;br /&gt;int size;&lt;br /&gt;int last[3];&lt;br /&gt;int places,nplaces;&lt;br /&gt;int nvar;&lt;br /&gt;int pos,posa,mvar;&lt;br /&gt;int *len;&lt;br /&gt;unsigned short **mat;&lt;br /&gt;int **res;&lt;br /&gt;void stats(int*);&lt;br /&gt;int in;&lt;br /&gt;int ksiz;&lt;br /&gt;Entry *array;&lt;br /&gt;int put(char*);&lt;br /&gt;int get(char*);&lt;br /&gt;boolean IsMatch(char*);&lt;br /&gt;void Del(char *);&lt;br /&gt;HashArray(int);&lt;br /&gt;void rd(CFile*);&lt;br /&gt;void wr(CFile*);&lt;br /&gt;void sort(int*);&lt;br /&gt;int init(int);&lt;br /&gt;int combine();&lt;br /&gt;void Vect(int*,int);&lt;br /&gt;void del();&lt;br /&gt;};&lt;br /&gt;Big encode(int*,int,int);&lt;br /&gt;void decode(Big,int*,int,int);&lt;br /&gt;boolean IsLetter(unsigned char);&lt;br /&gt;double Gosper(int);//This is a routine that gives the log of the factorial without actually evaluating it. It calculates entropy.&lt;br /&gt;double kern(int);&lt;br /&gt;void Diag(int*,int);&lt;br /&gt;void Dev(double*,double*,int);&lt;br /&gt;double *VecMat(int *vec);&lt;br /&gt;int decomp(unsigned char*,int*,int);&lt;br /&gt;int compr(int*,unsigned char*,int);&lt;br /&gt;&lt;br /&gt;Factorial::Factorial(int m){//This constructs a factorial of size m&lt;br /&gt;n=m;&lt;br /&gt;arr=new int[n];&lt;br /&gt;for(int i=0;i&lt;n;i++)arr[i]=i+1; k="euclid_algorithm(arr[i],m);//Find" i="1;i&lt;n;i++){" j="0;" arr="new" n="0;" m="=1)return;//When"&gt;&gt;1;&lt;br /&gt;}&lt;br /&gt;j=j+(n-i-1)*32;&lt;br /&gt;return j;&lt;br /&gt;}&lt;br /&gt;return j;&lt;br /&gt;}&lt;br /&gt;Big::Big(Factorial f){//Create a big integer in divide form from a factorial&lt;br /&gt;unsigned int *ar,q;&lt;br /&gt;int mm=1+f.n/2;//Inconcievable resultant integer as big as this&lt;br /&gt;ar=new unsigned int[mm];&lt;br /&gt;n=0;&lt;br /&gt;ar[n]=1;&lt;br /&gt;for(int i=1;i&lt;f.n;i++){ q="mult(ar[n],j);//Multiplies" k="(int)arr;" i="n;" j="f.arr[i];" arr="new" m="n;" kk="(int)b.arr;" ii="b.n;"&gt;Write(&amp;m,sizeof(int));&lt;br /&gt;f-&gt;Write(arr+i,m*sizeof(int));&lt;br /&gt;}&lt;br /&gt;void Big::rd(CFile *f){//Read a big inter from file ff&lt;br /&gt;f-&gt;Read(&amp;n,sizeof(int));&lt;br /&gt;arr=new int[n];&lt;br /&gt;f-&gt;Read(arr,n*sizeof(int));&lt;br /&gt;}&lt;br /&gt;boolean Big::operator&lt;(Big a){//These two functions allow us to use inequalities in the way they are used for ordinary unsigned // integers in C++ and Java for(int i=0;i&lt;n;i++){&gt;(unsigned int)a.arr[i])return false;&lt;br /&gt;}&lt;br /&gt;return false;&lt;br /&gt;}&lt;br /&gt;boolean Big::operator&gt;(Big a){&lt;br /&gt;for(int i=0;i&lt;n;i++){&gt;(unsigned int)a.arr[i])return true;&lt;br /&gt;if((unsigned int)arr[i]&lt;(unsigned int)a.arr[i])return false; } return false; } Compress::Compress(){//Creates an empty compress class n=0; } int *Compress::dcode(int *qq){ /* This subroutine decompresses a string. It does this by first looking for the positions of 1s then 2s etc. It decodes each individual string. *qq on exit contains the length of the string. */ int len=0; int *arr; int i,j,k,m; Big *mul;// Contains the big integers for multiplication for(i=0;i&lt;n;i++)len=len+fact[i]; k="fact[n-i-2];" i="0;i&lt;n-1;i++){" j="fact[n-1];//The" arr="new" m="j+k;" mul="new" max="0;"&gt;max)max=fact[i];//Get longest string&lt;br /&gt;int *pos=new int[max];&lt;br /&gt;for(i=n-1;i&gt;0;i--){//Loop is in inverse order&lt;br /&gt;Big d=b[0].div(mul[i-1]);//Divide. Remainder represents the configuration&lt;br /&gt;int mm=fact[n-1-i];&lt;br /&gt;decode(d,pos,mm,d.n);//Get string&lt;br /&gt;k=m=0;&lt;br /&gt;for(j=0;j&lt;len;j++){ k="=pos[m]){"&gt;=mm)break;//are we at end&lt;br /&gt;}&lt;br /&gt;k++;&lt;br /&gt;}&lt;br /&gt;delete d.arr;&lt;br /&gt;delete mul[i-1].arr;&lt;br /&gt;}&lt;br /&gt;delete pos;&lt;br /&gt;pos=new int[n];&lt;br /&gt;zero(pos,n*sizeof(int));&lt;br /&gt;i=0;&lt;br /&gt;max=0;&lt;br /&gt;while(i==0){//pos contains the translation table&lt;br /&gt;m=trans[max++];&lt;br /&gt;if(m&lt;1)continue; i="1;" k="0;k&lt;n;k++)i=" class="" i="0;i&lt;len;i++){"&gt;Read(&amp;max,sizeof(int));&lt;br /&gt;f-&gt;Read(&amp;n,sizeof(int));&lt;br /&gt;fact=new int[n];&lt;br /&gt;trans=new int[max+1];&lt;br /&gt;f-&gt;Read(fact,n*sizeof(int));&lt;br /&gt;f-&gt;Read(trans,(max+1)*sizeof(int));&lt;br /&gt;b-&gt;rd(f);&lt;br /&gt;}&lt;br /&gt;void Compress::wr(CFile *f){//write class to file f&lt;br /&gt;f-&gt;Write(&amp;max,sizeof(int));&lt;br /&gt;f-&gt;Write(&amp;n,sizeof(int));&lt;br /&gt;f-&gt;Write(fact,n*sizeof(int));&lt;br /&gt;f-&gt;Write(trans,(max+1)*sizeof(int));&lt;br /&gt;b-&gt;wr(f);&lt;br /&gt;}&lt;br /&gt;void Compress::ecode(int *str,int len){&lt;br /&gt;/* This subroutine encodes a string of length len in *str. In order to facilitate decoding it takes the highest number first&lt;br /&gt;*/&lt;br /&gt;int i,j,k,m,nbits=0;&lt;br /&gt;Big *mul,*val;&lt;br /&gt;b=new Big[1];&lt;br /&gt;max=0;&lt;br /&gt;for(i=0;i&lt;len;i++){&gt;max)max=str[i];&lt;br /&gt;}&lt;br /&gt;fact=new int[max];//Get numbers of each number in string&lt;br /&gt;trans=new int[max+1];&lt;br /&gt;n=0;&lt;br /&gt;zero(fact,max*sizeof(int));&lt;br /&gt;zero(trans,(max+1)*sizeof(int));&lt;br /&gt;for(i=0;i&lt;len;i++){ k="fact[n-i-2];" i="0;i&lt;npos;i++){" j="fact[n-1];;" arr="b1.arr;" n="size;" m="j+k;" mul="new" size="(nbits+31)/32;//Total" ma="npos;" bb="encode(xx,ka,mul[i].n);" mm="n-i-1;" ia="=mm)xx[ka++]=ja;" ka="0;" ja="0;int" nbits="nbits+val[i].bits();//Get" val="new" xx="new"&gt;b2)){//Compare. When the number of positions is greater than the big integer we have just passed position&lt;br /&gt;b2.sub(b3);//Subtract b3&lt;br /&gt;b3.mul(j-1);//Calculate number of permutations for the rest&lt;br /&gt;b3.div(j-ma);&lt;br /&gt;j++;&lt;br /&gt;}&lt;br /&gt;pos[npos-1-i]=j-2;&lt;br /&gt;zero(b3.arr,dim*sizeof(int));&lt;br /&gt;ma--;&lt;br /&gt;}&lt;br /&gt;delete b3.arr;&lt;br /&gt;}&lt;br /&gt;Big encode(int *pos,int npos,int dim){&lt;br /&gt;/* This one encodes. It is the analogue of the one above.&lt;br /&gt;*/&lt;br /&gt;Big b2(dim);&lt;br /&gt;Big b3(dim);&lt;br /&gt;int ma=npos;&lt;br /&gt;for(int i=0;i&lt;npos;i++){ i="0;i&lt;arr[0];i++){" j="ma+1;j&lt;na;j++){//Calculate" n="arr[0];" m="arr[1];" str="new" na="pos[npos-1-i]+2;"&gt;j)str[i]++;&lt;br /&gt;}&lt;br /&gt;millisec();&lt;br /&gt;c.ecode(str,n);&lt;br /&gt;arr[0]=millisec();&lt;br /&gt;stra=c.dcode(&amp;k);&lt;br /&gt;arr[1]=millisec();&lt;br /&gt;int_put(arr,2);&lt;br /&gt;for(i=0;i&lt;n;i++){ i="0;i&lt;m-1;i++){" n="arr[0];" m="arr[1];" str="new" nbins="m;"&gt;0;i--){&lt;br /&gt;j=f.arr[i];&lt;br /&gt;if(j==1)continue;&lt;br /&gt;m=q.div(j);&lt;br /&gt;}&lt;br /&gt;zero(q.arr,q.n*sizeof(int));&lt;br /&gt;i=0;&lt;br /&gt;k=nbins;&lt;br /&gt;while(nbins&gt;1){&lt;br /&gt;q.mul(nbins);&lt;br /&gt;j=ranmod(nbins);&lt;br /&gt;q.add(j);&lt;br /&gt;str[i++]=j;&lt;br /&gt;for(int ii=0;ii&lt;k;ii++){&gt;if(bins[ii]==0)continue;&lt;br /&gt;if(j==0){&lt;br /&gt;j=ii;&lt;br /&gt;break;&lt;br /&gt;}&lt;br /&gt;j--;&lt;br /&gt;}&lt;br /&gt;bins[j]--;&lt;br /&gt;if(bins[j]==0)nbins--;&lt;br /&gt;}&lt;br /&gt;goto loop;&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;EUCLID'S ALGORITHM&lt;br /&gt;int euclid_algorithm(int i,int j){//This algorithm impliments euclids algorithm for integers i and j&lt;br /&gt;int k=1;&lt;br /&gt;while(k!=0){&lt;br /&gt;k=i%j;&lt;br /&gt;i=j;&lt;br /&gt;j=k;&lt;br /&gt;}&lt;br /&gt;return i;&lt;br /&gt;}&lt;br /&gt;bcopy (utility function)&lt;br /&gt;/* This subroutine copies one string onto another.&lt;br /&gt;len bytes are copied from x to y*/&lt;br /&gt;void bcopy(void *x,void *y,int len){&lt;br /&gt;int i=len/4;&lt;br /&gt;int j,k;&lt;br /&gt;_asm{&lt;br /&gt;mov esi,x&lt;br /&gt;mov edi,y&lt;br /&gt;mov ecx,i&lt;br /&gt;rep movsd&lt;br /&gt;mov j,edi&lt;br /&gt;mov k,esi&lt;br /&gt;}&lt;br /&gt;int *xx=(int*)x;&lt;br /&gt;int *yy=(int*)y;&lt;br /&gt;if((len&amp;3)==0)return;&lt;br /&gt;_asm{&lt;br /&gt;mov edi,j&lt;br /&gt;mov esi,k&lt;br /&gt;xor eax,eax&lt;br /&gt;mov ecx,len&lt;br /&gt;and ecx,3&lt;br /&gt;rep movsb&lt;br /&gt;}&lt;br /&gt;}&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37780335-116440483273172857?l=ipcomp.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://ipcomp.blogspot.com/feeds/116440483273172857/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37780335&amp;postID=116440483273172857' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37780335/posts/default/116440483273172857'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37780335/posts/default/116440483273172857'/><link rel='alternate' type='text/html' href='http://ipcomp.blogspot.com/2006/11/i-feel-this-should-be-viewed-in.html' title=''/><author><name>Ian Parker</name><uri>http://www.blogger.com/profile/04935165705298640950</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://photos1.blogger.com/blogger/6837/844/1600/IanParker.0.jpg'/></author><thr:total>2</thr:total></entry></feed>
