Commit a4d184bf authored by Karsten Loesing's avatar Karsten Loesing
Browse files

Store raw descriptors as byte[], offset, and length.

Prior to this commit we read raw descriptor bytes from disk, split
them into serveral byte[] for each contained descriptor, and stored
those copies together with descriptors.  We further copied descriptor
parts, like signatures or status entries, and stored those copies as
well.

Overall, we temporarily required up to 3 times the size of descriptor
files just to store raw descriptor contents: 1) the entire descriptor
file read to memory, 2) copies of all contained descriptors, and 3)
copies of contained descriptor parts.  After moving on to the next
descriptor file, 1) was freed, but 2) and 3) remained in memory.  This
was rather wasteful.

With this commit we store raw descriptors as reference to the byte[]
containing the entire descriptor file plus offset and length of the
part containing one descriptor.  Similarly we store raw descriptor
parts as a reference to the full descriptor plus offset and length of
the descriptor part.  This saves a lot of memory, and it avoids
unnecessary array copying.

This change is also a step towards not storing raw descriptor contents
in memory at all, but instead leaving contents on disk and accessing
parts as needed.  However, this commit does not take that step yet.

The original purpose of this commit was to prepare switching from the
platform's default charset to UTF-8 for #21932.  The idea was to
reduce access to DescriptorImpl#rawDescriptorBytes and add all methods
working on those bytes, including converting them to a String, to
DescriptorImpl.  This commit achieves this purpose by preparing that
switch, yet it does not take that step, either.  Switching to UTF-8 is
midly backward-incompatible, so it'll have to wait until 2.0.0.
However, switching will be much easier based on the changes in this
commit.

Many of these changes in this commit are interdependent which makes it
difficult to split up this commit with reasonable effort.  Still, in
order to facilitate reviews, here is an explanation of changes made in
this commit from top to bottom:

Move all code for processing raw descriptor bytes from a) detecting
the descriptor type, b) finding descriptor starts and ends, up to c)
invoking the right DescriptorImpl subclass constructors from
DescriptorImpl and its subclasses over to DescriptorParserImpl.

Include offset and limit in the constructors of DescriptorImpl and
most of its subclasses.

Refer to directory and network status parts in RelayDirectoryImpl and
NetworkStatusImpl and its subclasses by offset and length rather than
passing copies of raw descriptors.

Provide two overloaded methods DescriptorImpl#newScanner() that
internally handle the byte[]-to-String conversion rather than leaving
this task to all DescriptorImpl subclasses.

In DescriptorImpl, rather than storing a copy of raw descriptor bytes
per descriptor, store a reference to a potentially larger byte[],
containing all descriptors read from a given file, together with
offset and length.

Provide various methods in DescriptorImpl that provide access to raw
descriptor bytes and that internally handle issues like unified
character encoding.

Include an XXX21932 tag in all places where byte[] is currently
converted to String using the platform's default charset.

Update existing methods in DescriptorImpl to only access
rawDescriptorBytes within offset and offset + length.

In classes referenced from DescriptorImpl subclasses, like
DirSourceEntryImpl and NetworkStatusEntryImpl, rather than storing a
copy of raw descriptor bytes, store a reference to the parent
DescriptorImpl instance together with offset and length.

Change raw descriptor bytes in ExitListEntryImpl into a String,
because the byte[] we stored there was never read from disk but
generated by ourselves using String#getBytes() using the platform's
default charset.  We also never used raw bytes in ExitListEntryImpl
anyway.  Admittedly, we could use offset and length there, too, but
the amount of saved memory is likely not worth the necessary code
changes.

Remove redundant zero-length checks from DescriptorImpl subclasses
including ExitListImpl, NetworkStatusImpl, and RelayDirectoryImpl.
These checks are redundant, because we already performed the same
checks in DescriptorImpl#countKeys().

Move commonly used helper methods for finding the first index of a
keyword or splitting descriptory by keyword from DescriptorImpl
subclasses, like NetworkStatusImpl and RelayDirectoryImpl, to
DescriptorImpl.

In test classes, replace the numerous invocations of DescriptorImpl
subclass constructors with local buildSomething() methods, so that
future changes to constructor signatures won't produce a diff as long
as this one.
parent fadcaa4b
# Changes in version 1.8.0 - ??
* Medium changes
- Store raw descriptor contents as offset and length into a
referenced byte[], rather than copying contents into a separate
byte[] per descriptor.
* Minor changes
- Turn keyword strings into enums and use the appropriate enum sets
and maps to avoid repeating string literals and to use more speedy
......
......@@ -5,34 +5,14 @@ package org.torproject.descriptor.impl;
import org.torproject.descriptor.BridgeExtraInfoDescriptor;
import org.torproject.descriptor.DescriptorParseException;
import org.torproject.descriptor.ExtraInfoDescriptor;
import java.util.ArrayList;
import java.util.List;
public class BridgeExtraInfoDescriptorImpl
extends ExtraInfoDescriptorImpl implements BridgeExtraInfoDescriptor {
protected static List<ExtraInfoDescriptor> parseDescriptors(
byte[] descriptorsBytes, boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
List<ExtraInfoDescriptor> parsedDescriptors = new ArrayList<>();
List<byte[]> splitDescriptorsBytes =
DescriptorImpl.splitRawDescriptorBytes(descriptorsBytes,
Key.EXTRA_INFO.keyword + SP);
for (byte[] descriptorBytes : splitDescriptorsBytes) {
ExtraInfoDescriptor parsedDescriptor =
new BridgeExtraInfoDescriptorImpl(descriptorBytes,
failUnrecognizedDescriptorLines);
parsedDescriptors.add(parsedDescriptor);
}
return parsedDescriptors;
}
protected BridgeExtraInfoDescriptorImpl(byte[] descriptorBytes,
boolean failUnrecognizedDescriptorLines)
int[] offsetAndLimit, boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
super(descriptorBytes, failUnrecognizedDescriptorLines);
super(descriptorBytes, offsetAndLimit, failUnrecognizedDescriptorLines);
}
}
......@@ -18,10 +18,11 @@ import java.util.TimeZone;
public class BridgeNetworkStatusImpl extends NetworkStatusImpl
implements BridgeNetworkStatus {
protected BridgeNetworkStatusImpl(byte[] statusBytes,
String fileName, boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
super(statusBytes, failUnrecognizedDescriptorLines, false, false);
protected BridgeNetworkStatusImpl(byte[] rawDescriptorBytes,
int[] offsetAndLength, String fileName,
boolean failUnrecognizedDescriptorLines) throws DescriptorParseException {
super(rawDescriptorBytes, offsetAndLength, failUnrecognizedDescriptorLines,
false, false);
this.setPublishedMillisFromFileName(fileName);
}
......@@ -55,7 +56,7 @@ public class BridgeNetworkStatusImpl extends NetworkStatusImpl
}
}
protected void parseHeader(byte[] headerBytes)
protected void parseHeader(int offset, int length)
throws DescriptorParseException {
/* Initialize flag-thresholds values here for the case that the status
* doesn't contain those values. Initializing them in the constructor
......@@ -71,7 +72,7 @@ public class BridgeNetworkStatusImpl extends NetworkStatusImpl
this.enoughMtbfInfo = -1;
this.ignoringAdvertisedBws = -1;
Scanner scanner = new Scanner(new String(headerBytes)).useDelimiter(NL);
Scanner scanner = this.newScanner(offset, length).useDelimiter(NL);
while (scanner.hasNext()) {
String line = scanner.next();
String[] parts = line.split("[ \t]+");
......@@ -154,19 +155,19 @@ public class BridgeNetworkStatusImpl extends NetworkStatusImpl
}
}
protected void parseDirSource(byte[] dirSourceBytes)
protected void parseDirSource(int offset, int length)
throws DescriptorParseException {
throw new DescriptorParseException("No directory source expected in "
+ "bridge network status.");
}
protected void parseFooter(byte[] footerBytes)
protected void parseFooter(int offset, int length)
throws DescriptorParseException {
throw new DescriptorParseException("No directory footer expected in "
+ "bridge network status.");
}
protected void parseDirectorySignature(byte[] directorySignatureBytes)
protected void parseDirectorySignature(int offset, int length)
throws DescriptorParseException {
throw new DescriptorParseException("No directory signature expected "
+ "in bridge network status.");
......
......@@ -6,9 +6,7 @@ package org.torproject.descriptor.impl;
import org.torproject.descriptor.BridgePoolAssignment;
import org.torproject.descriptor.DescriptorParseException;
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.List;
import java.util.Scanner;
import java.util.SortedMap;
import java.util.TreeMap;
......@@ -16,26 +14,11 @@ import java.util.TreeMap;
public class BridgePoolAssignmentImpl extends DescriptorImpl
implements BridgePoolAssignment {
protected static List<BridgePoolAssignment> parseDescriptors(
byte[] descriptorsBytes, boolean failUnrecognizedDescriptorLines)
protected BridgePoolAssignmentImpl(byte[] rawDescriptorBytes,
int[] offsetAndlength, boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
List<BridgePoolAssignment> parsedDescriptors = new ArrayList<>();
List<byte[]> splitDescriptorsBytes =
DescriptorImpl.splitRawDescriptorBytes(descriptorsBytes,
Key.BRIDGE_POOL_ASSIGNMENT.keyword + SP);
for (byte[] descriptorBytes : splitDescriptorsBytes) {
BridgePoolAssignment parsedDescriptor =
new BridgePoolAssignmentImpl(descriptorBytes,
failUnrecognizedDescriptorLines);
parsedDescriptors.add(parsedDescriptor);
}
return parsedDescriptors;
}
protected BridgePoolAssignmentImpl(byte[] descriptorBytes,
boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
super(descriptorBytes, failUnrecognizedDescriptorLines, false);
super(rawDescriptorBytes, offsetAndlength, failUnrecognizedDescriptorLines,
false);
this.parseDescriptorBytes();
this.checkExactlyOnceKeys(EnumSet.of(Key.BRIDGE_POOL_ASSIGNMENT));
this.checkFirstKey(Key.BRIDGE_POOL_ASSIGNMENT);
......@@ -44,8 +27,7 @@ public class BridgePoolAssignmentImpl extends DescriptorImpl
}
private void parseDescriptorBytes() throws DescriptorParseException {
Scanner scanner = new Scanner(new String(this.rawDescriptorBytes))
.useDelimiter(NL);
Scanner scanner = this.newScanner().useDelimiter(NL);
while (scanner.hasNext()) {
String line = scanner.next();
if (line.startsWith(Key.BRIDGE_POOL_ASSIGNMENT.keyword + SP)) {
......
......@@ -5,34 +5,14 @@ package org.torproject.descriptor.impl;
import org.torproject.descriptor.BridgeServerDescriptor;
import org.torproject.descriptor.DescriptorParseException;
import org.torproject.descriptor.ServerDescriptor;
import java.util.ArrayList;
import java.util.List;
public class BridgeServerDescriptorImpl extends ServerDescriptorImpl
implements BridgeServerDescriptor {
protected static List<ServerDescriptor> parseDescriptors(
byte[] descriptorsBytes, boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
List<ServerDescriptor> parsedDescriptors = new ArrayList<>();
List<byte[]> splitDescriptorsBytes =
DescriptorImpl.splitRawDescriptorBytes(descriptorsBytes,
Key.ROUTER.keyword + SP);
for (byte[] descriptorBytes : splitDescriptorsBytes) {
ServerDescriptor parsedDescriptor =
new BridgeServerDescriptorImpl(descriptorBytes,
failUnrecognizedDescriptorLines);
parsedDescriptors.add(parsedDescriptor);
}
return parsedDescriptors;
}
protected BridgeServerDescriptorImpl(byte[] descriptorBytes,
boolean failUnrecognizedDescriptorLines)
protected BridgeServerDescriptorImpl(byte[] rawDescriptorBytes,
int[] offsetAndLength, boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
super(descriptorBytes, failUnrecognizedDescriptorLines);
super(rawDescriptorBytes, offsetAndLength, failUnrecognizedDescriptorLines);
}
}
......@@ -6,7 +6,7 @@ package org.torproject.descriptor.impl;
import org.torproject.descriptor.Descriptor;
import org.torproject.descriptor.DescriptorParseException;
import java.io.UnsupportedEncodingException;
import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
......@@ -25,162 +25,146 @@ public abstract class DescriptorImpl implements Descriptor {
public static final String SP = " ";
protected static List<Descriptor> parseDescriptors(
byte[] rawDescriptorBytes, String fileName,
boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException {
List<Descriptor> parsedDescriptors = new ArrayList<>();
if (rawDescriptorBytes == null) {
return parsedDescriptors;
protected byte[] rawDescriptorBytes;
/**
* The index of the first byte of this descriptor in
* {@link #rawDescriptorBytes} which may contain more than just one
* descriptor.
*/
protected int offset;
/**
* The number of bytes of this descriptor in {@link #rawDescriptorBytes} which
* may contain more than just one descriptor.
*/
protected int length;
/**
* Returns a <emph>copy</emph> of the full raw descriptor bytes.
*
* <p>If possible, subclasses should avoid retrieving raw descriptor bytes and
* converting them to a String themselves and instead rely on
* {@link #newScanner()} and related methods to parse the descriptor.</p>
*
* @return Copy of the full raw descriptor bytes.
*/
@Override
public byte[] getRawDescriptorBytes() {
return this.getRawDescriptorBytes(this.offset, this.length);
}
/**
* Returns a <emph>copy</emph> of raw descriptor bytes starting at
* <code>offset</code> and containing <code>length</code> bytes.
*
* <p>If possible, subclasses should avoid retrieving raw descriptor bytes and
* converting them to a String themselves and instead rely on
* {@link #newScanner()} and related methods to parse the descriptor.</p>
*
* @param offset The index of the first byte to include.
* @param length The number of bytes to include.
* @return Copy of the given raw descriptor bytes.
*/
protected byte[] getRawDescriptorBytes(int offset, int length) {
if (offset < this.offset || offset + length > this.offset + this.length
|| length < 0) {
throw new IndexOutOfBoundsException("offset=" + offset + " length="
+ length + " this.offset=" + this.offset + " this.length="
+ this.length);
}
byte[] result = new byte[length];
System.arraycopy(this.rawDescriptorBytes, offset, result, 0, length);
return result;
}
/**
* Returns a new {@link Scanner} for parsing the full raw descriptor starting
* using the platform's default charset.
*
* @return Scanner for the full raw descriptor bytes.
*/
protected Scanner newScanner() {
return this.newScanner(this.offset, this.length);
}
/**
* Returns a new {@link Scanner} for parsing the raw descriptor starting at
* byte <code>offset</code> containing <code>length</code> bytes using the
* platform's default charset.
*
* @param offset The index of the first byte to parse.
* @param length The number of bytes to parse.
* @return Scanner for the given raw descriptor bytes.
*/
protected Scanner newScanner(int offset, int length) {
/* XXX21932 */
return new Scanner(new ByteArrayInputStream(this.rawDescriptorBytes, offset,
length));
}
/**
* Returns the index within the raw descriptor of the first occurrence of the
* given <code>key</code>, or <code>-1</code> if the key is not contained.
*
* @param key Key to search for.
* @return Index of the first occurrence, or -1.
*/
protected int findFirstIndexOfKey(Key key) {
String ascii = new String(this.rawDescriptorBytes, this.offset, this.length,
StandardCharsets.US_ASCII);
if (ascii.startsWith(key.keyword + SP)
|| ascii.startsWith(key.keyword + NL)) {
return this.offset;
}
byte[] first100Chars = new byte[Math.min(100,
rawDescriptorBytes.length)];
System.arraycopy(rawDescriptorBytes, 0, first100Chars, 0,
first100Chars.length);
String firstLines = new String(first100Chars);
if (firstLines.startsWith("@type network-status-consensus-3 1.")
|| firstLines.startsWith(
"@type network-status-microdesc-consensus-3 1.")
|| ((firstLines.startsWith(
Key.NETWORK_STATUS_VERSION.keyword + SP + "3")
|| firstLines.contains(
NL + Key.NETWORK_STATUS_VERSION.keyword + SP + "3"))
&& firstLines.contains(
NL + Key.VOTE_STATUS.keyword + SP + "consensus" + NL))) {
parsedDescriptors.addAll(RelayNetworkStatusConsensusImpl
.parseConsensuses(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type network-status-vote-3 1.")
|| ((firstLines.startsWith(
Key.NETWORK_STATUS_VERSION.keyword + SP + "3" + NL)
|| firstLines.contains(
NL + Key.NETWORK_STATUS_VERSION.keyword + SP + "3" + NL))
&& firstLines.contains(
NL + Key.VOTE_STATUS.keyword + SP + "vote" + NL))) {
parsedDescriptors.addAll(RelayNetworkStatusVoteImpl
.parseVotes(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type bridge-network-status 1.")
|| firstLines.startsWith(Key.R.keyword + SP)) {
parsedDescriptors.add(new BridgeNetworkStatusImpl(
rawDescriptorBytes, fileName, failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type bridge-server-descriptor 1.")) {
parsedDescriptors.addAll(BridgeServerDescriptorImpl
.parseDescriptors(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type server-descriptor 1.")
|| firstLines.startsWith(Key.ROUTER.keyword + SP)
|| firstLines.contains(NL + Key.ROUTER.keyword + SP)) {
parsedDescriptors.addAll(RelayServerDescriptorImpl
.parseDescriptors(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type bridge-extra-info 1.")) {
parsedDescriptors.addAll(BridgeExtraInfoDescriptorImpl
.parseDescriptors(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type extra-info 1.")
|| firstLines.startsWith(Key.EXTRA_INFO.keyword + SP)
|| firstLines.contains(NL + Key.EXTRA_INFO.keyword + SP)) {
parsedDescriptors.addAll(RelayExtraInfoDescriptorImpl
.parseDescriptors(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type microdescriptor 1.")
|| firstLines.startsWith(Key.ONION_KEY.keyword + NL)
|| firstLines.contains(NL + Key.ONION_KEY.keyword + NL)) {
parsedDescriptors.addAll(MicrodescriptorImpl
.parseDescriptors(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type bridge-pool-assignment 1.")
|| firstLines.startsWith(Key.BRIDGE_POOL_ASSIGNMENT.keyword + SP)
|| firstLines.contains(NL + Key.BRIDGE_POOL_ASSIGNMENT.keyword + SP)) {
parsedDescriptors.addAll(BridgePoolAssignmentImpl
.parseDescriptors(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type dir-key-certificate-3 1.")
|| firstLines.startsWith(Key.DIR_KEY_CERTIFICATE_VERSION.keyword + SP)
|| firstLines.contains(
NL + Key.DIR_KEY_CERTIFICATE_VERSION.keyword + SP)) {
parsedDescriptors.addAll(DirectoryKeyCertificateImpl
.parseDescriptors(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type tordnsel 1.")
|| firstLines.startsWith("ExitNode" + SP)
|| firstLines.contains(NL + "ExitNode" + SP)) {
parsedDescriptors.add(new ExitListImpl(rawDescriptorBytes, fileName,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type network-status-2 1.")
|| firstLines.startsWith(
Key.NETWORK_STATUS_VERSION.keyword + SP + "2" + NL)
|| firstLines.contains(
NL + Key.NETWORK_STATUS_VERSION.keyword + SP + "2" + NL)) {
parsedDescriptors.add(new RelayNetworkStatusImpl(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type directory 1.")
|| firstLines.startsWith(Key.SIGNED_DIRECTORY.keyword + NL)
|| firstLines.contains(NL + Key.SIGNED_DIRECTORY.keyword + NL)) {
parsedDescriptors.add(new RelayDirectoryImpl(rawDescriptorBytes,
failUnrecognizedDescriptorLines));
} else if (firstLines.startsWith("@type torperf 1.")) {
parsedDescriptors.addAll(TorperfResultImpl.parseTorperfResults(
rawDescriptorBytes, failUnrecognizedDescriptorLines));
int keywordIndex = ascii.indexOf(NL + key.keyword + SP);
if (keywordIndex < 0) {
keywordIndex = ascii.indexOf(NL + key.keyword + NL);
}
if (keywordIndex < 0) {
return -1;
} else {
throw new DescriptorParseException("Could not detect descriptor "
+ "type in descriptor starting with '" + firstLines + "'.");
return this.offset + keywordIndex + 1;
}
return parsedDescriptors;
}
protected static List<byte[]> splitRawDescriptorBytes(
byte[] rawDescriptorBytes, String startToken) {
List<byte[]> rawDescriptors = new ArrayList<>();
String splitToken = NL + startToken;
String ascii;
try {
ascii = new String(rawDescriptorBytes, "US-ASCII");
} catch (UnsupportedEncodingException e) {
return rawDescriptors;
}
int endAllDescriptors = rawDescriptorBytes.length;
int startAnnotations = 0;
boolean containsAnnotations = ascii.startsWith("@")
|| ascii.contains(NL + "@");
while (startAnnotations < endAllDescriptors) {
int startDescriptor;
if (ascii.indexOf(startToken, startAnnotations) == 0) {
startDescriptor = startAnnotations;
} else {
startDescriptor = ascii.indexOf(splitToken, startAnnotations - 1);
if (startDescriptor < 0) {
break;
} else {
startDescriptor += 1;
}
}
int endDescriptor = -1;
if (containsAnnotations) {
endDescriptor = ascii.indexOf(NL + "@", startDescriptor);
/**
* Returns a list of two-element arrays containing offsets and lengths of
* descriptors starting with the given <code>key</code> in the raw descriptor
* starting at byte <code>offset</code> containing <code>length</code> bytes.
*
* @param key Key to search for.
* @param offset The index of the first byte to split.
* @param length The number of bytes to split.
* @param truncateTrailingNewlines Whether trailing newlines shall be
* truncated.
* @return List of two-element arrays containing offsets and lengths.
*/
protected List<int[]> splitByKey(Key key, int offset, int length,
boolean truncateTrailingNewlines) {
List<int[]> splitParts = new ArrayList<>();
String ascii = new String(this.rawDescriptorBytes, offset, length,
StandardCharsets.US_ASCII);
int from = 0;
while (from < length) {
int to = ascii.indexOf(NL + key.keyword + SP, from);
if (to < 0) {
to = ascii.indexOf(NL + key.keyword + NL, from);
}
if (endDescriptor < 0) {
endDescriptor = ascii.indexOf(splitToken, startDescriptor);
if (to < 0) {
to = length;
} else {
to += 1;
}
if (endDescriptor < 0) {
endDescriptor = endAllDescriptors - 1;
int toNoNewline = to;
while (truncateTrailingNewlines && toNoNewline > from
&& ascii.charAt(toNoNewline - 1) == '\n') {
toNoNewline--;
}
endDescriptor += 1;
byte[] rawDescriptor = new byte[endDescriptor - startAnnotations];
System.arraycopy(rawDescriptorBytes, startAnnotations,
rawDescriptor, 0, endDescriptor - startAnnotations);
startAnnotations = endDescriptor;
rawDescriptors.add(rawDescriptor);
splitParts.add(new int[] { offset + from, toNoNewline - from });
from = to;
}
return rawDescriptors;
}
protected byte[] rawDescriptorBytes;
@Override
public byte[] getRawDescriptorBytes() {
return this.rawDescriptorBytes;
return splitParts;
}
protected boolean failUnrecognizedDescriptorLines = false;
......@@ -193,23 +177,33 @@ public abstract class DescriptorImpl implements Descriptor {
: new ArrayList<>(this.unrecognizedLines);
}
protected DescriptorImpl(byte[] rawDescriptorBytes,
protected DescriptorImpl(byte[] rawDescriptorBytes, int[] offsetAndLength,
boolean failUnrecognizedDescriptorLines, boolean blankLinesAllowed)
throws DescriptorParseException {
int offset = offsetAndLength[0];
int length = offsetAndLength[1];
if (offset < 0 || offset + length > rawDescriptorBytes.length
|| length < 0) {
throw new IndexOutOfBoundsException("Invalid bounds: "
+ "rawDescriptorBytes.length=" + rawDescriptorBytes.length
+ " offset=" + offset + " length=" + length);
}
this.rawDescriptorBytes = rawDescriptorBytes;
this.offset = offset;
this.length = length;
this.failUnrecognizedDescriptorLines =
failUnrecognizedDescriptorLines;
this.cutOffAnnotations(rawDescriptorBytes);
this.cutOffAnnotations();
this.countKeys(rawDescriptorBytes, blankLinesAllowed);
}
/* Parse annotation lines from the descriptor bytes. */
private List<String> annotations = new ArrayList<>();
private void cutOffAnnotations(byte[] rawDescriptorBytes)
throws DescriptorParseException {
String ascii = new String(rawDescriptorBytes);
private void cutOffAnnotations() throws DescriptorParseException {
int start = 0;
String ascii = new String(this.getRawDescriptorBytes(),
StandardCharsets.US_ASCII);
while ((start == 0 && ascii.startsWith("@"))
|| (start > 0 && ascii.indexOf(NL + "@", start - 1) >= 0)) {
int end = ascii.indexOf(NL, start);
......@@ -220,13 +214,8 @@ public abstract class DescriptorImpl implements Descriptor {
this.annotations.add(ascii.substring(start, end));
start = end + 1;
}
if (start > 0) {
int length = rawDescriptorBytes.length;
byte[] rawDescriptor = new byte[length - start];
System.arraycopy(rawDescriptorBytes, start, rawDescriptor, 0,
length - start);
this.rawDescriptorBytes = rawDescriptor;
}
this.offset += start;
this.length -= start;
}
@Override
......@@ -246,16 +235,13 @@ public abstract class DescriptorImpl implements Descriptor {
if (rawDescriptorBytes.length == 0) {
throw new DescriptorParseException("Descriptor is empty.");
}
String descriptorString = new String(rawDescriptorBytes);
if (!blankLinesAllowed && (descriptorString.startsWith(NL)
|| descriptorString.contains(NL + NL))) {
throw new DescriptorParseException("Blank lines are not allowed.");
}
boolean skipCrypto = false;
Scanner scanner = new Scanner(descriptorString).useDelimiter(NL);
Scanner scanner = this.newScanner().useDelimiter(NL);
while (scanner.hasNext()) {
String line = scanner.next();
if (line.startsWith(Key.CRYPTO_BEGIN.keyword)) {
if (line.isEmpty() && !blankLinesAllowed) {