TrabalhosGratuitos.com - Trabalhos, Monografias, Artigos, Exames, Resumos de livros, Dissertações
Pesquisar

Malware Analysis

Pesquisas Acadêmicas: Malware Analysis. Pesquise 862.000+ trabalhos acadêmicos

Por:   •  8/12/2014  •  3.824 Palavras (16 Páginas)  •  310 Visualizações

Página 1 de 16

JOURNAL OF LATEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 1

Integrating Static and Dynamic Malware Analysis

Using Machine Learning

Reinaldo J Mangialado - IME and Julio Cesar Duarte, PhD, IME

Abstract—Malware Analisys and Classification Systems uses

static or dynamic techniques along with machine learning algorithms

that automate the task of classification and prediction

of malicious code. Both have weaknesses that allow the use

of evasion techniques for analyzing malicious code, hampering

the identification and classification of malwares. We propose

the unification of static and dynamic analysis, as a method of

collecting data from malware that lessens the chance of success of

evasion techniques. From the data collected in the analysis phase,

we use the C5.0 machine learning algorithm with ensembled

classifiers Random Forest implemented in machine learning

framework FAMA to perform classification and prediction of

malwares. In the experiment, was found that the accuracy

of unified analysis achieved accuracy rates above 90% while

static and dynamic analysis had lower rates. Furthermore, we

categorized malware on types and obtained indices greater than

80% of accuracy.

Index Terms—Malware. Static Analysis. Dynamic Analysis.

Unified analysis. Machine Learning.

I. INTRODUCTION

MALWARE is a software developed by hackers to perform

harmful actions on a computer. To detect malware

two techniques are used: static malware analysis and dynamic

malware analysis. Static analysis is done by collecting data

on the static binary code and dynamic analysis is done by

collecting data from processes performed by the malware

and both have limitations that can be exploited by evasion

techniques of analysis. [1]

To hinder the success of evasion techniques such as code

obfuscation used by evasion of static analysis and delay of

execution code, used by evasion of dynamic analysis, can be

used the integration of static and dynamic malware analysis.

Once collected data by static and dynamic analysis of

malware, they are organized into data vectors for the machine

learning algorithms. The machine learning automates the process

of analysis that would be done by an analyst.

After the characteristics of files were obtained by static or

dynamic analysis, a security expert classifies files as malware

or not malware or according to the type of malware. Then,

these data are used by machine learning algorithm that are

able to learn a generalized description of malware and apply

this knowledge to classify new, unseen instances of malware.

The results are analyzed and adjustments can be made on the

choice of features to improve them. This process is illustrated

in Figure 1.

Several works have been published addressing the issue of

malware analysis. [3] uses sandbox technique for performing

dynamic analysis of malware along with machine learning

for automating the identification of malware. The process of

Fig. 1. Machine Learning Classification Process

Font: [2]

data collection was based on dynamic analysis, which allows

to extract data information about processes performed by

Windows PE32 format files. 13 attributes (characteristics) were

selected whose values, for each of the instances of the set

of examples, were used as input to the process of machine

learning. The machine learning algorithms used were Naive

Bayes, SVM, J48 (C4.5), CART and Random Forest.

Malicious Executable Classification System (MECS) [4]

uses static analysis of malware to detect executable files, in

any format, without using any technique or system for the

removal of code obfuscation. 1971 benign and 1651 malicious

files were used in the experiment. Malware characteristics

were extracted from executable sequences of bytes. These

sequence were converted into n-grams. Data were organized

to various learning algorithms machine: IBK, TFIDF, Naive

Bayes, Support Vector Machines (SVMs), Decision Trees,

Boosted Naive Bayes, Boosted Decision Trees and SVM.

Ipanda is an analysis oriented comprehensive malware

analysis tool [5]. Allows a comprehensive analysis of malwares

and generate reports that include information about the

structure and behavior of malwares. Various techniques

...

Baixar como (para membros premium)  txt (27.5 Kb)  
Continuar por mais 15 páginas »
Disponível apenas no TrabalhosGratuitos.com