Commit 5be8636faccb9d213de9498b2cf999e873091bbd

Removed orthogonal_sample.m (incorrect name for algorithm and needed only by SAVU)
  • orthogonal_sample.m 194 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  • Diff rendering mode:
  • inline
  • side by side

orthogonal_sample.m

1% orthogonal_sample.m - creates an orthogonal sample of 1:k*m in N dimensions
2%
3% USAGE:
4%
5% [S, m] = orthogonal_sample(N,k,n, [DEBUG])
6%
7% where the parameters are all scalar:
8%
9% N = number of dimensions
10% k = number of "large" subdivisions (subspaces) per dimension
11% n = number of samples to place in each subspace
12%
13% DEBUG: if this parameter exists, the results (projected into the first two dimensions
14% by discarding the other dimensions) will be plotted.
15%
16% Allowed ranges:
17%
18% N >= 1, N integer
19% k >= 1, k integer
20% n >= 1, n integer
21%
22% If k = 1, the algorithm reduces to Latin hypercube sampling.
23% If N = 1, the algorithm simply produces a random permutation of 1:k.
24%
25% Let m = n*k^(N-1) the number of bins for one variable in one subspace.
26% The total number of samples is always exactly k*m.
27% Each component of a sample can take on values 1, 2, ..., k*m.
28%
29% S (output) = samples; (k*m)-by-N matrix where each row is an N-tuple of numbers in 1:k*m.
30% m (output) = number of bins per parameter in one subspace.
31% (This is always n*k^(N-1), but is provided as output for convenience.)
32%
33% See e.g. Wikipedia for a description of both orthogonal and Latin hypercube sampling:
34% http://en.wikipedia.org/wiki/Latin_hypercube_sampling
35%
36% Briefly, this is a bit like designing the first stage of a Sudoku puzzle: each subspace must
37% have exactly the same number of samples, and no two samples may occur on the same hyperplane.
38%
39% Examples:
40%
41% N = 2 dimensions, k = 3 subspaces per axis, n = 1 sample per subspace. m will be n*k^(N-1) = 1 * 3^(2-1) = 3.
42% [S,m] = orthogonal_sample(2, 3, 1, 'show')
43%
44% Compare the previous example with this Latin hypercube that has 9 samples in total:
45% (We choose 9 because in the previous example, k*m = 3*3 = 9.)
46% [S,m] = orthogonal_sample(2, 1, 9, 'show')
47%
48% JJ 2010-09-23
49%
50function [S, m] = orthogonal_sample(N,k,n,DEBUG)
51 % Parameter parsing.
52 %
53 if nargin < 3 || N < 1 || k < 1 || n < 1
54 fprintf(1,'Invalid parameters. Please see "help orthogonal_sample" for instructions.\n');
55 S = [];
56 m = 0;
57 return;
58 end
59
60 if nargin > 3
61 debug_vis = 1;
62 else
63 debug_vis = 0;
64 end
65
66 % Bullet-proof the parameters.
67 if N - floor(N) ~= 0
68 fprintf(1,'orthogonal_sample.m: WARNING: non-integer N = %0.3g; discarding decimal part\n', N);
69 N = floor(N);
70 end
71 if k - floor(k) ~= 0
72 fprintf(1,'orthogonal_sample.m: WARNING: non-integer k = %0.3g; discarding decimal part\n', k);
73 k = floor(k);
74 end
75 if n - floor(n) ~= 0
76 fprintf(1,'orthogonal_sample.m: WARNING: non-integer n = %0.3g; discarding decimal part\n', n);
77 n = floor(n);
78 end
79
80 % Discussion.
81
82 % Proof that the following algorithm implements orthogonal sampling:
83 %
84 % * Orthogonal sampling has two properties: Latin hypercube sampling globally, and equal density in each subspace.
85 % * The independent index vector generation for each parameter guarantees the Latin hypercube property: some numbers will have been used, and removed from the index vectors, when the next subspace along the same hyperplane is reached. Thus, the same indices cannot be used again for any such subspace. This process continues until each index has been used exactly once.
86 % * The orthogonal sampling property is enforced by the fact that each subspace gets exactly one sample generated in one run of the loop. The total number of samples is, by design, divisible by the number of these subspaces. Therefore, each subspace will have the same sample density.
87
88 % Run time and memory cost:
89 %
90 % * Exactly k*m samples will be generated. This can be seen from the fact that there are k*m bins per parameter, and they all get filled by exactly one sample.
91 % * Thus, runtime is in O(k*m) = O( k * n*k^(N-1) ) = O( n*k^N ). (This isn't as bad as it looks. All it's saying is that a linear number of bins gets filled. This is much less than the total number of bins (k*m)^N - which is why orthogonal sampling is needed in the first place. Orthogonal sampling gives us a reduction in sample count by the factor (k*m)^(N-1).)
92 % * Required memory for the final result is (k*m)*N reals (plus some overhead), where the N comes from the fact that each N-tuple generated has N elements. Note that the index vectors also use up k*m*N reals in total (k*N vectors, each with m elements). Thus the memory cost is 2*k*m*N reals plus overhead.
93 % * Note that using a more complicated implementation that frees the elements of the index vectors as they are used up probably wouldn't help with the memory usage, because many vector implementations never decrease their storage space even if elements are deleted.
94 % * In other programming languages, one might work around this by using linked lists instead of vectors, and arranging the memory allocations for the elements in a very special way (i.e. such that the last ones in memory always get deleted first). By using a linked list for the results, too, and allocating them over the deleted elements of the index vectors (since they shrink at exactly the same rate the results grow), one might be able to bring down the memory usage to k*m*N plus overhead.
95 % * Finally, note that in practical situations N, k and m are usually small, so the factor of 2 doesn't really matter.
96
97 % Algorithm.
98
99 % Find necessary number of bins per subspace so that equal nonzero density is possible.
100 % A brief analysis shows that in order to exactly fill up all k*m bins for one variable,
101 % we must have k*m = n*k^N, i.e...
102 m = n*k^(N-1);
103
104 % Create index vectors for each subspace for each parameter. (There are k*N of these.)
105 I = zeros(N,k,m);
106 for i = 1:N
107 for j = 1:k
108 % We create random permutations of 1:m here so that in the sampling loop
109 % we may simply pick the first element from each vector. This is conceptually
110 % somewhat clearer than the alternative of picking a random element from an ordered vector.
111 %
112 % Note that the permutation property ensures that each integer in 1:m only appears once in each vector.
113 I(i,j,:) = randperm(m);
114 end
115 end
116
117 % Start with an empty result set. We will place the generated samples here.
118 S = [];
119
120 L = k*m; % number of samples still waiting for placement
121 Ns = k^N; % number of subspaces in total (cartesian product of k subspaces per axis in N dimensions)
122 while L > 0
123 % Loop over all subspaces, placing one sample in each.
124 for j = 1:Ns % index subspaces linearly
125 % Find, in each dimension, which subspace we are in.
126 pj = zeros(1,N); % multi-index (vector containing an index in each dimension) for this subspace
127 % Simple example: (N,k,n) = (2,3,1) --> pj = 1 1, 2 1, 3 1, 1 2, 2 2, 3 2, 1 3, 2 3, 3 3
128 pj = 1 + mod( floor( (j-1) ./ k.^(0:N-1) ), k );
129
130 % Construct one sample point.
131 % Note: we must use a for loop because ":" doesn't allow to specify two indices as the same.
132 % FIXME: better ways to do this?
133 %
134 s = zeros(1,N);
135 for i = 1:N
136 s(i) = I(i, pj(i), 1); % grab first element in the corresponding index vector
137 I(i,pj(i),:) = cat(3, I(i,pj(i),2:end), -1); % remove the used element, bump others toward the front
138 end
139
140 % Now s contains a sample from (1:m)^N. By its construction,
141 % the sample conforms to the Latin hypercube requirement.
142
143 % Compute base index along each dimension: the first element of the current subspace
144 % is at the multi-index (a+1) (where the +1 is scalar, added to each element of a)
145 %
146 a = (pj-1)*m;
147
148 % Add the new sample to the result set.
149 S = [ S ; a+s ];
150 end
151
152 % Note that we placed exactly Ns samples during the for loop.
153 L = L - Ns;
154 end
155
156 % Result visualization (for debug and illustrative purposes)
157 %
158 if debug_vis && N > 1
159 figure(1);
160 clf;
161
162 % Plot the picked points
163 plot(S(:,1), S(:,2),'bo');
164 axmax = k*m + 1;
165
166 hold on;
167
168 % Mark the subspaces onto the figure
169 for j = 1:k-1
170 xy = 0.5 + j*m;
171 plot( [xy, xy], [0.5, axmax - 0.5], 'k', 'LineWidth', 2.0 );
172 plot( [0.5, axmax - 0.5], [xy, xy], 'k', 'LineWidth', 2.0 );
173 end
174
175 % Mark bins
176 for j = 1:(k*m)-1
177 xy = 0.5 + j;
178 plot( [xy, xy], [0.5, axmax - 0.5], 'k');
179 plot( [0.5, axmax - 0.5], [xy, xy], 'k');
180 end
181
182 % Make a box around the area
183 plot( [0.5, axmax - 0.5], [0.5, 0.5], 'k', 'LineWidth', 2.0 );
184 plot( [0.5, axmax - 0.5], [axmax - 0.5, axmax - 0.5], 'k', 'LineWidth', 2.0 );
185 plot( [0.5, 0.5], [0.5, axmax - 0.5], 'k', 'LineWidth', 2.0 );
186 plot( [axmax - 0.5, axmax - 0.5], [0.5, axmax - 0.5], 'k', 'LineWidth', 2.0 );
187
188 hold off;
189
190 % Set the axes so that the extreme indices just fit into the view
191 axis( [0.5 axmax-0.5 0.5 axmax-0.5 ] );
192
193 end
194end