Malware Analysis
Pesquisas Acadêmicas: Malware Analysis. Pesquise 862.000+ trabalhos acadêmicosPor: mangialardobr • 8/12/2014 • 3.824 Palavras (16 Páginas) • 307 Visualizações
JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 1
Integrating Static and Dynamic Malware Analysis
Using Machine Learning
Reinaldo J Mangialado - IME and Julio Cesar Duarte, PhD, IME
Abstract—Malware Analisys and Classification Systems uses
static or dynamic techniques along with machine learning algorithms
that automate the task of classification and prediction
of malicious code. Both have weaknesses that allow the use
of evasion techniques for analyzing malicious code, hampering
the identification and classification of malwares. We propose
the unification of static and dynamic analysis, as a method of
collecting data from malware that lessens the chance of success of
evasion techniques. From the data collected in the analysis phase,
we use the C5.0 machine learning algorithm with ensembled
classifiers Random Forest implemented in machine learning
framework FAMA to perform classification and prediction of
malwares. In the experiment, was found that the accuracy
of unified analysis achieved accuracy rates above 90% while
static and dynamic analysis had lower rates. Furthermore, we
categorized malware on types and obtained indices greater than
80% of accuracy.
Index Terms—Malware. Static Analysis. Dynamic Analysis.
Unified analysis. Machine Learning.
I. INTRODUCTION
MALWARE is a software developed by hackers to perform
harmful actions on a computer. To detect malware
two techniques are used: static malware analysis and dynamic
malware analysis. Static analysis is done by collecting data
on the static binary code and dynamic analysis is done by
collecting data from processes performed by the malware
and both have limitations that can be exploited by evasion
techniques of analysis. [1]
To hinder the success of evasion techniques such as code
obfuscation used by evasion of static analysis and delay of
execution code, used by evasion of dynamic analysis, can be
used the integration of static and dynamic malware analysis.
Once collected data by static and dynamic analysis of
malware, they are organized into data vectors for the machine
learning algorithms. The machine learning automates the process
of analysis that would be done by an analyst.
After the characteristics of files were obtained by static or
dynamic analysis, a security expert classifies files as malware
or not malware or according to the type of malware. Then,
these data are used by machine learning algorithm that are
able to learn a generalized description of malware and apply
this knowledge to classify new, unseen instances of malware.
The results are analyzed and adjustments can be made on the
choice of features to improve them. This process is illustrated
in Figure 1.
Several works have been published addressing the issue of
malware analysis. [3] uses sandbox technique for performing
dynamic analysis of malware along with machine learning
for automating the identification of malware. The process of
Fig. 1. Machine Learning Classification Process
Font: [2]
data collection was based on dynamic analysis, which allows
to extract data information about processes performed by
Windows PE32 format files. 13 attributes (characteristics) were
selected whose values, for each of the instances of the set
of examples, were used as input to the process of machine
learning. The machine learning algorithms used were Naive
Bayes, SVM, J48 (C4.5), CART and Random Forest.
Malicious Executable Classification System (MECS) [4]
uses static analysis of malware to detect executable files, in
any format, without using any technique or system for the
removal of code obfuscation. 1971 benign and 1651 malicious
files were used in the experiment. Malware characteristics
were extracted from executable sequences of bytes. These
sequence were converted into n-grams. Data were organized
to various learning algorithms machine: IBK, TFIDF, Naive
Bayes, Support Vector Machines (SVMs), Decision Trees,
Boosted Naive Bayes, Boosted Decision Trees and SVM.
Ipanda is an analysis oriented comprehensive malware
analysis tool [5]. Allows a comprehensive analysis of malwares
and generate reports that include information about the
structure and behavior of malwares. Various techniques
...