content¶
This property contain the source code of the actual webpage. You can set this property with the source code of an HTML page to replace the content of the current web page.
This module provide a create() function that returns a webpage object. This object allows to load and manipulate a web page.
var page = require("webpage").create();
In the page variable, you have then an object with many properties and methods. See below.
Note: almost properties and methods are implemented, but some are not documented yet. Please help us to document them ;-).
Properties list:
clipRect canGoBack canGoForward captureContent content cookies customHeaders event focusedFrameName frameContent frameName framePlainText frameTitle frameUrl framesCount framesName libraryPath navigationLocked offlineStoragePath offlineStorageQuota ownsPages pages pagesWindowName paperSize plainText scrollPosition settings title url viewportSize windowName zoomFactor
Functions list:
addCookie() childFramesCount() childFramesName() clearCookies() close() currentFrameName() deleteCookie() evaluateJavaScript() evaluate() evaluateAsync() getPage() go() goBack() goForward() includeJs() injectJs() open() openUrl() release() reload() render() renderBase64() sendEvent() setContent() stop() switchToFocusedFrame() switchToFrame() switchToChildFrame() switchToMainFrame() switchToParentFrame() uploadFile()
Callbacks list:
onAlert onCallback onClosing onConfirm onConsoleMessage onError onFilePicker onInitialized onLoadFinished onLoadStarted onNavigationRequested onPageCreated onPrompt onResourceRequested onResourceReceived onUrlChanged
Internal methods to trigger callbacks:
closing() initialized() javaScriptAlertSent() javaScriptConsoleMessageSent() loadFinished() loadStarted() navigationRequested() rawPageCreated() resourceReceived() resourceRequested() urlChanged()
This is an object indicating the coordinates of an area to capture, used by the render() method. It contains four properties: top, left, width, height.
To modify it, set an entire object on this property.
page.clipRect = { top: 14, left: 3, width: 400, height: 300 };
Indicates if there is a previous page in the navigation history. This is a boolean. Read-only.
Indicates if there is a next page in the navigation history. This is a boolean. Read-only.
This is an array of regexp matching content types of resources for which you want to retrieve the content. The content is then set on the body property of the response object received by your onResourceReceived callback.
webpage.captureContent = [ /css/, /image\/.*/ ]
This limitation exists to avoid to take memory uselessly (in the case where you don’t need the body property), since resources like images or videos could take many memory.
(SlimerJS only)
This property contain the source code of the actual webpage. You can set this property with the source code of an HTML page to replace the content of the current web page.
This is an array of all Cookie objects stored in the current profile, and which corresponds to the current url of the webpage.
When you set an array of Cookie to this property, cookies will be set for the current url: their domain and path properties will be changed.
Note: modifying an object in the array won’t modify the cookie. You should retrieve the array, modify it, and then set the cookies property with this array. Probably you would prefer to use the addCookie() method to modify a cookie.
If cookies are disabled, modifying this property does nothing.
Be careful about the inconsistent behavior of the expiry property.
This property is an object defining additionnal HTTP headers that will be send with each HTTP request, both for pages and resources.
Example:
webpage.customHeaders = {
"foo": "bar"
}
To define user agent, prefer to use webpage.settings.userAgent
This is an object (read only) that hosts some constants to use with sendEvent().
There is a modifier property containing constants for key modifiers:
page.event.modifier.shift
page.event.modifier.ctrl
page.event.modifier.alt
page.event.modifier.meta
page.event.modifier.keypad
There is also a key property containing constants for key codes.
Implemented. Documentation needed.
This property contain the source code of the current frame. You can set this property with the source code of an HTML page to replace the content of the current frame.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Indicates the path of the sqlite file where content of window.localStorage is stored. Read only.
Note: in PhantomJS, this is the path of a directory. The storage is different than in Gecko. Contrary to PhantomJS, this property cannot be changed with the --local-storage-path flag from the command line.
Contains the maximum size of data for a page, stored in window.localStorage. The number is in Bytes. Default is 5 242 880 (5MB). Read only.
To change this number, use the --local-storage-quota flag in the command line.
This boolean indicates if pages opening by the webpage (by window.open()) should be children of the webpage (true) or not (false). Default is true.
When it is true, child pages appears in the pages property.
This is the list of child pages that the page has currently opened with window.open().
If a child page is closed (by window.close() or by webpage.close()), the page is automatically removed from this list.
You should not keep a strong reference to this array since you obtain only a copy, so in this case you won’t see changes.
If “ownsPages” is “false”, this list won’t owns the child pages.
list of window name (strings) of child pages.
The window name is the name given to window.open().
The list is only from child pages that have been created when ownsPages was true.
Not implemented.
Contains the content of the web page as text. For html pages, you’ll have only texts of the page.
Read only.
This property contains an object indicating the scrolling position. You can read or modify it. The object contains two properties: top and left
Example:
page.scrollPosition = { top: 100, left: 0 };
This property allows to set some options for the load of a page. Changing them after the load has no effect.
javascriptEnabled: false to deactivate javascript in web pages (default is true)
javascriptCanCloseWindows (not supported yet)
javascriptCanOpenWindows (not supported yet)
loadImages: false to deactivate the loading of images (default is true)
localToRemoteUrlAccessEnabled (not supported yet)
maxAuthAttempts (not supported yet)
password (not supported yet)
userAgent: string to define the user Agent in HTTP requests. By default, it is something like "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 SlimerJS/0.7" (depending of the version of Firefox/XulRunner you use)
userName (not supported yet)
XSSAuditingEnabled (not supported yet)
webSecurityEnabled (not supported yet)
content of script elements, invisible elements etc.. Default: false. (SlimerJS only)
page.settings.userAgent = "My Super Agent / 1.0"
It allows to retrieve the title of the loaded page. (Readonly)
This property contains the current url of the page. If nothing is loaded yet, this is an empty string. Read only.
This property allows to change the size of the viewport, e.g., the size of the window where the webpage is displayed.
It is useful to test the display of the web page in different size of windows.
viewportSize is an object with with width and height properties, containing the size in pixels.
Note that changing this property triggers a reflow of the rendering and this is done asynchronously (this is how browser rendering engines work). So for example, if you take a screenshot with webpage.render() just after setting the viewportSize, you may not have the final result (you call render() too early).
page.viewportSize = { width: 480, height: 800 };
Contains the name of the window, e.g. the name given to window.open() if the page has been opened with this method.
Contains the zoom factor of the webpage display. Setting a value to this property decreases or increases the size of the web page rendering. A value between 0 and 1 decreases the size of the page, and a value higher than 1 increases its size. 1 means no zoom (normal size).
Note that changing its value refreshes the display of the page asynchronously. So for example, if you call render() just after setting a value on zoomFactor, the screenshot may not represent the final result (render() is called too early). After the call of zoomFactor, You probably have to put the code into a callback given to window.setTimeout(), or you can call slimer.wait(500) (which is not compatible with PhantomJS).
Add a cookie in the cookies storage of the current profile, for the current url. The parameter is a Cookie object. The domain and the path of the cookie will be set to the domain and the path of the current url.
It returns true if the cookie has been really added. If cookies are disabled, the cookie is not added into the cookie database.
Be careful about the inconsistent behavior of the expiry property.
Implemented. Documentation needed.
Implemented. Documentation needed.
Delete all cookies corresponding to the current url.
Close the web page. It means that it closes the window displaying the web page. After the close, some methods cannot be used and you should call open() or openUrl() to be able to reuse the webpage object.
Implemented. Documentation needed.
It deletes all cookies that have the given name and corresponding to the current url.
It returns true if some cookies have been deleted. It works only if cookies are enabled.
Evaluate the current javascript source (in a string), into the context of the loaded web page. It returns the result of the evaluation.
It executes the given function in the context of the loaded web page. It means that the code of the function cannot access to objects and variables of your script. For example, in this function, the document and window objects are belongs to the loaded page, not to your script. In other terms, you cannot use closures.
var page = require('webpage').create();
page.open("http://example.com", function (status) {
var someContent = page.evaluate(function () {
return document.querySelector("#aDiv").textContent;
});
console.log('The introduction: ' + someContent);
slimer.exit()
});
You can give additionnal parameters to evaluate(). This will be the parameters for the function. For example, here the function will receive “#aDiv” as parameter:
var someContent = page.evaluate(function (selector) {
return document.querySelector(selector).textContent;
}, "#aDiv");
Parameters can only some basic javascript objects or literal values. You cannot pass some objects like DOM elements. In other terms, you cannot pass parameters on which you cannot call a toString() or you cannot serialize as a JSON value.
evaluate() returns the value returned by the function.
It is equivalent to evaluate(), but with some differences:
This methods returns the child page that matches the given “window.name”.
Only children opened when ownsPage was true are checked.
This method allows to navigate into the navigation history. The parameter, an integer, indicates how far to move forward or backward in the navigation history.
webpage.go(-3);
webpage.go(-1); // equivalent to webpage.goBack()
webpage.go(1); // equivalent to webpage.goForward()
webpage.go(4);
Displays the previous page in the navigation history.
Displays the next page in the navigation history.
It loads into the current web page, the javascript file stored at the given url.
When the load is done, the given callback is called.
It loads and executes the given javascript file into the context of the current script. So the loaded script has access to all variable of the current module.
If the given filename is a relative path, SlimerJS tries to resolve the full path from the current working directory (that is the directory from which SlimerJS has been launched). If the file is not found, SlimerJS tries to resolve with the libraryPath.
Note: there is a limitation in SlimerJS. If the loaded script wants to modify a variable of the current script/module, it should call window.myvariable = '..' instead of myvariable = '..'.
This method allows to open a page into a virtual browser.
Since this operation is asynchronous, you cannot do something on the page after the call of open(). You should provide a callback or you should use the returned promise (not compatible with PhantomJS), to do something on the loaded page. The callback or the promise receives a string “success” if the loading has been succeded.
Example with a callback function:
page.open("http://slimerjs.org", function(status){
if (status == "success") {
console.log("The title of the page is: "+ page.title);
}
else {
console.log("Sorry, the page is not loaded");
}
})
Example with the returned promise (not compatible with PhantomJS):
page.open("http://slimerjs.org")
.then(function(status){
if (status == "success") {
console.log("The title of the page is: "+ page.title);
}
else {
console.log("Sorry, the page is not loaded");
}
})
To load two pages, one after an other, here is how to do:
page.open("http://example.com/page1", function(status){
// do something on the page...
page.open("http://example.com/page2", function(status){
// do something on the page...
})
})
With the promise, it’s better in term of code (not compatible with PhantomJS):
page.open("http://example.com/page1")
.then(function(status){
// do something on the page...
return page.open("http://example.com/page2")
})
.then(function(status){
// do something on the page...
// etc...
return page.open("http://example.com/page3")
})
Other arguments:
The open() method accepts several arguments:
Remember that in all cases, the method returns a promise.
httpConf is an object. See webpage.openUrl below. operation, data and headers should have same type of values as you can find in httpConf.
Note that open() call in fact openUrl().
Like open(), it loads a webpage. The only difference is the number and the type of arguments.
httpConf is an object with these properties:
httpConf is optional and you can give null instead of an object. The default method will be 'get', without data and without specific headers.s
settings is an object like webpage.settings. In fact the given value changes webpage.settings. You can indicate null if you don’t want to set new settings.
callback is a callback function, called when the page is loaded.
openUrl() returns a promise.
Similar to close(). This method is deprecated in PhantomJS. webpage.close() should be used instead.
Reload the current web page.
This method takes a screenshot of the web page and stores it into the given file. You can limit the area to capture by setting the clipRect property.
By default, it determines the format of the file by inspecting its extension. It supports only jpg and png format (PDF and gif probably in future version).
The second parameter is an object containing options. Here are its possible properties:
format: indicate the file format (then the file extension is ignored). possible values: jpg, png, jpeg.
quality: the compression quality. A number between 0 and 1.
(zoomFactor is then ignored)
onlyViewport: (SlimerJS only), set to true if you only want to take a screenshot of the current viewport. By default, it is false, and screenshot has the size of the content, except when webpage.clipRect is set.
Note: because of a limitation of Gecko (see Mozilla bug 650418), plugins content like flash cannot be rendered in the screenshot (even if you can see it in the window).
This method takes a screenshot of the web page and returns it as a string containing the image in base64. The format indicates the format of the image: jpg, png, jpeg.
You can limit the area to capture by setting the clipRect property.
Instead of giving the format, you can give an object containing options (SlimerJS only). See the render() function.
It sends hardware-like events to the web page, through the browser window, like a user does when he types on a keyboard or uses his mouse. Then the browser engine (Gecko) translates these events into DOM events into the web page.
So this method does not synthetize directly DOM events. This is why you cannot indicate a DOM element as target.
With this method, you can generate keyboard events and mouse events. Arguments depends which type of event you want to generate.
The event type is given as the first argument.
Mouse events
You should indicate ‘mouseup’, ‘mousedown’, ‘mousemove’, ‘doubleclick’ or ‘click’ as event type.
Arguments arg1 and arg2 should represent the mouse position on the window. arg1 is the horizontal coordinate (x) and arg2 is the vertical coordinate (y). These arguments are optional. In this case, give null as value.
The fourth argument is the pressed button. Indicates ‘left’, ‘middle’ or ‘right’.
The “modifier” argument is a combination of keyboard modifiers, i.e., a code indicating if a key like ‘ctrl’ or ‘alt’ is pressed. Codes are available on the webpage.event.modifier object:
If no modifiers key, just use 0 as value.
// we send a click with ctrl+shift and the left button
var mod = page.event.modifier.ctrl | page.event.modifier.shift;
page.sendEvent('click', null, null, 'left', mod);
The targeted DOM element is the DOM element under the indicated coordinates.
Note that if coordinates are outside the viewport of the window, the webpage will not receives DOM events.
Keyboard events
You should indicate ‘keyup’, ‘keypress’ or ‘keydown’ as event type.
The second parameter is a key code (from webpage.event.key), or a string of one or more characters.
You can also indicate a modifier key as fifth argument. See above for mouse events.
Third and fourth argument are not taken account for keyboard events. Just give null for them.
page.sendEvent('keypress', page.event.key.B);
page.sendEvent('keypress', "C");
page.sendEvent('keypress', "abc");
var mod = page.event.modifier.ctrl | page.event.modifier.shift;
page.sendEvent('keypress', page.event.key.A, null, null, mod);
When you give a string as a second parameter, if its length is more than one character:
The targeted DOM element is the DOM element that has the focus.
This method allows to replace the content of the current page with the given HTML source code. The URL indicates the address assigned to this new content.
It stops the loading of the page.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
A form may content an <input type="file"> element. Of course, because SlimerJs is a scriptable browser, you cannot manipulate the file picker opened when you click on this element. uploadFile() allows you to set the value of such elements.
Arguments are the CSS selector of the input element, and the full path of the file. The file must exist. You can also indicate an array of path, if the input element accepts several files.
Note that a virtual file picker is opened when calling uploadFile(), and so the onFilePicker callback is called. If this callback exists and returns a filename, the filename given to uploadFile() is ignored.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
Implemented. Documentation needed.
This callback is called when the browser needs to open a file picker. This is the case when a click is made on an <input type="file"> element.
The callback receives the previous selected file, and should return the path of the new selected file. If the target element accepts several files, you can return an array of file path.
This should be a function that is called when the loading of the page is initialized, So before the content is loaded (before onLoadStarted). It receives no arguments.
Note: It seems that it is not called at the same opening step as PhantomJS. In PhantomJS, its implementation is a bit obscure. In PhantomJS, sometimes it is called twice, sometimes never, and sometime only one time. We don’t know why. We will try to match the same behavior in future versions. For the moment, in SlimerJS, it is called twice: one time when the browser is ready to load the page (webpage.url gives nothing), and one time when the content of the page is loaded (webpage.url is set but resources are not loaded yet).
This callback is called when the loading of the page is finished (including its resources like images etc). It is called also after each the loading of a frame is finished.
It receives a string as argument. Its value is “success” if the loading is a success else it receives “fail” if a network error occured.
The loading is considered as a success when a correct HTTP response is received, with a status code etc. It means that it receives “success” even in case of a 404 http error for example.
page.onLoadFinished = function(status) {
console.log('Status: ' + status);
// Do other things here...
};
In SlimerJS, you can receive additionnal arguments (that you don’t have in PhantomJS):
page.onLoadFinished = function(status, url, isFrame) {
console.log('Loading of '+url+' is a '+ status);
if (!isFrame) {
// this is the main content
}
};
This callback is called when the loading of the page is starting or when an frame inside the page is loading. In SlimerJS, it receives arguments contrary to PhantomJS:
page.onLoadStarted = function(url, isFrame) {
console.log('Loading of '+url+' starts.');
if (!isFrame) {
// this is the main content
}
};
Note: It seems that it is not called at the same opening step as PhantomJS. In PhantomJS, its implementation is a bit obscure and PhantomJS’s documentation does not match the real behavior. It seems it is called before the onInitialized call, before the network process starts. We will try to match the same behavior in future versions.
Implemented. Documentation needed.
Implemented. Documentation needed.
This callback is invoked when the browser received a part of a resource. It can be called several times with multiple chunk of data, during the load of this resource. A resource can be the web page itself, or any other resources like images, frames, css files etc.
The unique parameter received by the callback is an object containing these informations:
page.onResourceReceived = function(response) {
console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
};
Note about the ``body`` property: by default, the body property is filled only for the resource that corresponds to the main html page. For other resources, it will be empty.
If you want to have it filled for resources used in the page, you have to indicate their content type into captureContent property. This limitation exists to avoid to take memory uselessly (in the case where you don’t need the body property), since resources like images or videos could take many memory.
This callback is invoked when the browser starts to load a resource. A resource can be the web page itself, or any other resources like images, frames, css files etc.
The callback may accept two parameters :
page.onResourceRequested = function(requestData, networkRequest) {
console.log('Request (#' + requestData.id + '): ' + JSON.stringify(requestData));
};
Properties of requestData are:
The networkRequest object has two methods:
will be called.
the given url.
This callback is invoked when the main URL of the browser changes, so when a new document will be loaded. The only argument to the callback is the new URL.
Example:
page.onUrlChanged = function(targetUrl) {
console.log('New URL: ' + targetUrl);
};
To retrieve the old URL, use the onLoadStarted callback.
Call the callback onInitialized if it has been set.
Call the callback onAlert with given parameters, if the callback has been set.
Call the callback onConsoleMessage with given parameters, if the callback has been set.
Call the callback onLoadFinished with given parameters, if the callback has been set.
Call the callback onLoadStarted with given parameters, if the callback has been set.
Call the callback onPageCreated with given parameters, if the callback has been set.
Call the callback onResourceReceived with given parameters, if the callback has been set.
Call the callback onResourceRequested with given parameters, if the callback has been set.
Call the callback onUrlChanged with given parameters, if the callback has been set.