Skip to content

Mistake in computing number of exported functions #8

@laam-egg

Description

@laam-egg

We'll be talking about the file src/thrember/features.py.

First, notice in ImportsInfo::process_raw_features:

# ...
# Number of libraries/imports
lengths = [len(imports), len(libraries)]

# Two separate elements: libraries (alone) and fully-qualified names of imported functions
return np.hstack([lengths, libraries_hashed, imports_hashed]).astype(np.float32)

As you can see, the feature vector would contain the number of imported functions and imported libraries.

Now, go to ExportsInfo::process_raw_features:

# ...
exports_hashed = FeatureHasher(128, input_type="string").transform([raw_obj]).toarray()[0]
return np.hstack([np.array([len(exports_hashed)]), exports_hashed.astype(np.float32)])

So the feature vector would contain len(exports_hashed), which is always 128, instead of the number of exported functions as I would personally expect.

Proposed remedy:

return np.hstack([np.array([len(raw_obj)]), exports_hashed.astype(np.float32)])

where len(raw_obj) is the number of exported functions (for reason, see ExportsInfo::raw_features).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions